R Libraries: GGPLOT2 – Grammar of Layered plots for Data Visualization

Introduction

Not everyone is a fan of matrix of numbers, its hard to make sense to a 2 dimensional matrix but, once we have a visual plot of these numbers we can start understanding the relation between them and thus start making sense. That’s the beauty of visualization and ggplot2 is an enabler library in R. ggplot2 gives us immense power to customize our plots w.r.t. not only dimensions but also colors, shapes, layers, etc. By the end of this read, one will be able to understand how layered graphs actually work and get their hands dirty with it thus, giving you a direction to create your own art! ggplot2 is a mighty library to endeavor we will limit ourselves today to layer plotting of data visualization.

“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey

Theory

ggplot is a library which works with layers and to add plot features one can just keep on adding the layers with a visualization in mind. To better understand this concept, refer the image below.

Three different layers of plot

As we see above, there are three different layers of a single plot. The very first layer is the aesthetics or axis which are just x and y plot co-ordinates. The second layer consists of two random non-scaled scatter plots. And the final layer or the third layer shows the two curves which are used for fitting in the layer two scattered plots (color coded: green line fit the green scatter points in layer 2 and red line fits the red scattered points of layer 2).

The very first layer defined has to be the aesthetics or x and y co-ordinate for a plot. Then, there’s no importance of order of layers. In our case, let’s observe how layer 2 comes over layer 1 and how layer 3 comes over layer 1 and 2 to form the complete plot. Refer the two images below

Demonstration of layer 2 over layer 1
Demonstration of layer 3 over layer 1 and 2, forming complete plot

So, this is the grammar or the science of plots for making plots in ggplot2 R library. Once knowing this, you can be creative to customize them as per your needs or imagination. Now we will see some of the basics of plotting these layers in R environment and will demonstrate you how you can make minor changes in these layers to make these plots more visual and understandable.

“The greatest value of a picture is when it forces us to notice what we never expected to see.” — John Tukey

Plots

We will be using ‘mpg’ or miles per gallon data in the tidyverse library to demonstrate this tutorial. For this, we really do not need to load a copy of data inside the R environment but we need to load the required libraries in R, so that they can be accessed. Below are the listings.

Listing for loading the required libraries in R

Two Layer plots

Now, once we have our environment ready we will see how a two layer plot is formed. Below is the listing with comments to better understand followed by the image showing the plot formed.

Listing for plotting 2 layer plot
Two layer plot formed

We can clearly see above our first layer in listing has listed x = displ and y = hwy which can be seen on the plot above. And second layer defining geom_points which plots points of the x and y defined co-ordinates for all the cars.

Now, lets take it to next level, the above is a cool graph but not much interactive. The above is a 2 dimensional graph only giving us readings for displ vs hwy. Let’s make it more interactive and three dimensional by adding a new dimension ‘size’ to make the plot more informative. Below is the listing followed by the plot

Listing for including size in two layer plot
Two layer plot with size = class

Now, by classifying our plot w.r.t. size we can better interpret our displ vs hwy readings for all the different types of cars. Not a fan of sizes? One can also plot the class with colors for plot beautification. Below is the listing:

Listing for including color in two layer plot
Two layer plot with color = class

This, makes a better sense in visual classification. If you are a true Englishmen you may replace “color = class” with”colour = class” you will end up with same plot. This is for all UK followers. As I mentioned before, once you have the complete idea of layer plots, one can just play around and customize plots at individual requirement level. If your boss likes the points to be classified with respect to transparency use class = alpha or if he likes the points to be classified with different shapes use class = shape. The drawback being there is only 6 default shapes defined in gglot2. So when you have 6 or less classification this is the best option. Below are the listings.

Listing for including alpha in two layer plot
Two layer plot with alpha = class

As mentioned before shapes default to 6 classifications, but ggplot will never give you error in plot it will plot it with repeating the shapes from one. So, for our seventh classification i.e. SUV the shape on the graph is repeated with the circular shape which is also equal to 2seater. But in legends on the right hand side of the plot you see SUV is left with blank legend as it is a repeated shape and not a unique shape. This is a very important point to keep in mind when plotting with shape = class.

Listing for including shape in two layer plot
Two layer plot with shape = class

Plotting with Grids

It is always a good practice to split our plot in grids where, each grid consists of a subset of a data. This is possible by adding a function named facet_wrap. Below is the listing showing addition of facet_wrap in the ggplot function followed by the plot demonstrating what exactly facet_wrap does.

Listing showing addition of facet_wrap in ggplot
facet_wrap plot

The above shows how all different cars (or class) have been plotted for displ against hwy. Each grid represents one type of class. This is a better way of visualizing at a ground level. As we have 7 types of class or 7 types of cars in the data we are analyzing we have 7 plots arranged across 2 rows as we indicated nrow = 2.

Three Layer Plots

Finally, as we come to the end of this tutorial, I will show as you promised in the start plotting a three layer plot. Plots can be n-layered by we are limiting ourselves today to three layers. Just like two layer, we just have to add (+) one extra layer which we need to plot. Below is the listing for plotting three layer plots

Three layer plot listing
Three layer plot

As mentioned before, once we know the science behind the layers, you can just play around with ggplot to create the plots the way you want or the way you like. Few examples are shown below with the listing.

Listing for creating different three layered plot
Plot formed

As we see above on the top of our 2 layered plot which we did the most in this tutorial, I have now added a third layer to fit lines for drv i.e drivetrain of a car. This is color coded i.e. we have three types of drv as shown with three different geom_smooth lines.

Conclusion

  1. We learnt how to plot layered plots using ggplot2 library in R
  2. We can use different sizes, shapes, alpha, color, etc. to differentiate classes and make more visual plots
  3. We also understood how to make plots more interactive so that they tell us more about the data and data relationships
  4. At last we saw 3 layers plots which gives us lot of information of the mpg data
  5. Plots helps us better understand the data, which might not be possible by just viewing number matrices
  6. Humans are more visual then numerical, we can retain the memory of visual plot other than csv files
  7. One can create, customize visuals as per one’s requirements or vision in ggplot2

END NOTE

Its been a long time since The Datum’s last blog. We really appreciate the support received in this time frame from across 75 countries. We hope everyone out there learns something out of investing time into this, that’s our only aim. So, if you learnt something new today and useful please like, share and subscribe to The Datum for such regular updates on Data Science and Machine Learning. We will keep continuing with the Library Series as we are better equipped with building algorithms in R once we know the libraries and their source codes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s