-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathR_utility_belt_ggplot.Rmd
223 lines (165 loc) · 7.56 KB
/
R_utility_belt_ggplot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: 'R Utility Belt: ggplot2 - To boldly plot what no one has plotted before!!'
author: "Erica Acton"
date: "April 12, 2017"
output: md_document
---
##**Grammar of graphics:** a language to be able to communicate about what we are plotting programatically
![(Figure borrowed from Paul Hiemstra)](./ggplot2.png)
#The Grammar
* The *aesthetics* are what you can see. For example, the data being plotted or represented by a shape or colour.
This data could be presented in multiple ways. Data is mapped to aesthetics. A plot can have multiple layers (for example,
a scatterplot with a regression line).
* Lines, bars, and points are examples of *geometric objects (geoms)* that could be used to present the data.
* The data units need to be converted to physical units in order to be displayed. This uses *scaling* and a *coordinate system* (position adjustment for where is our pixel going). Other *statistical transformations* can also occur, such as aggregating data, jittering, density estimates, a boxplot and binning.
* *Facetting* allows us to plot subsets of the data.
This grammar allows very specific control over your plot and the ability to change features (easily). Plots can be scaled and made
pretty enough for publication quality images.
###Let's build a plot by adding these components one by one to see how this works.
I have tried to include some things which I used to end up googling ALL THE TIME.
**Note: This dataset is fake - any resemblance to animals living or otherwise is purely coincidental.**
```{r, suppressPackageStartupMessages()}
library(ggplot2)
```
```{r}
#Code for creating furry_dataset.csv in in the repo under furry_dataset.R
fdat <- read.csv("furry_dataset.csv")
#to start - we need our data - note that a blank graphic is produced - we have
#not said what we are plotting or how we are plotting it
ggplot(fdat)
str(ggplot(fdat))
#list of 9 - data, layers, scales, mapping, theme, coordinates, facet, plot_env,
#labels
#luckily there are some defaults so we don't have to specify everything
#in practice, people omit mapping = , but it might help you to understand what
#we are doing
#we are choosing the data we are plotting - the data can be scaled and the axes
#appear
ggplot(fdat, mapping = aes(x=weight, y=fitbit))
#we have now chosen the geometric object to plot our data with
ggplot(fdat, aes(x=weight, y=fitbit)) + geom_line()
#we can add a statistical transformation to look at the data on a different
#scale
ggplot(fdat, aes(x=weight, y=fitbit)) +
geom_line() +
scale_y_log10()
#we can subset our data per animal by facetting
ggplot(fdat, aes(x=weight, y=fitbit)) +
geom_line() +
scale_y_log10() +
facet_wrap(~animal)
#since the animals have quite a range of different weights, we could modify the
#x-axis to be specific for each animal
ggplot(fdat, aes(x=weight, y=fitbit)) +
geom_line() +
scale_y_log10() +
facet_wrap(~animal, scales="free_x")
#when we look at the data without the log transformation the wolverine is an
#outlier. Wolverines are crazy, climb moutains, and being are tracked by
#ultrarunners. They may be an outlier compared to our GP and sloth.
ggplot(fdat, aes(x=weight, y=fitbit)) +
geom_line() +
facet_wrap(~animal, scales="free_x")
#The data can be subsetted earlier, or in the function call. If you are using
#the same mapping base frequently, you can store it as a variable to reduce
#redundancy.
p <- ggplot(subset(fdat, animal != "wolverine"), aes(x=weight, y=fitbit))
p + geom_line() +
facet_wrap(~animal, scales="free_x")
#we could add a regression line - which is linear due to my use of a balanced
#normal distribution. You can use other methods such as loess.
p + geom_line() +
stat_smooth(method=lm) +
facet_wrap(~animal, scales="free_x")
```
###Let's explore different types of plots, and learn how to customize and make things prettier.
```{r, warning=FALSE}
#scatterplot
ggplot(fdat, aes(x=weight, y=iq, color=animal, shape=job)) + geom_point()
#stripplot
ggplot(fdat, aes(x=job, y=iq, color=animal, shape=job)) + geom_point()
#use jitter for overlapping data
ggplot(fdat, aes(x=job, y=iq, color=animal)) + geom_jitter()
#density
#to customize aesthetics, we can color by a factor variable
#alter transparency with alpha
#change titles and axis labels
#remove legends
#change the defalut theme and tailor the current theme
#specify colours manually
ggplot(fdat, aes(x=iq, fill=animal, alpha=0.3)) +
geom_density() +
ggtitle("IQ of Furry Creatures") +
xlab("Intelligence Quotient") +
# guides(fill=FALSE) +
scale_alpha(guide="none") +
theme_bw() +
theme(title = element_text(face="bold", size=16), legend.title = element_blank(), legend.position = "left") +
scale_fill_manual(values=c("purple", "cornflowerblue", "grey", "yellow", "orange", "brown"))
#histogram
#the default binwidth is 30 and has NOTHING to do with your data -- so CHANGE
#IT!
ggplot(fdat, aes(x=weight, fill=animal, alpha=0.3)) +
geom_histogram(binwidth = 50)
#bar plot
#this commented code throws an error - this is a common error so I included it.
#ggplot(fdat, aes(x=animal, y=weight, fill=job)) +
# geom_bar()
#need to specify identity - this is a stacked bar graph
ggplot(fdat, aes(x=animal, y=weight, fill=job)) +
geom_bar(stat="identity")
#to position bars side-by-side instead
#and to rotate x-axis labels to be vertical
#try a different theme
ggplot(fdat, aes(x=job, y=iq, fill=animal)) +
geom_bar(stat="identity", position="dodge") +
theme_classic() +
theme(axis.text.x=element_text(angle=90))
#to flip the alignment
ggplot(fdat, aes(x=job, y=iq, fill=animal)) +
geom_bar(stat="identity", position="dodge") +
theme_classic() +
coord_flip()
#circular plots
#the default coordinate system is cartesian coordinates
#need to switch to polar coordinates to make a circle
#note that this is a bar graph with polar coordinates - coxcomb plot
#customize the major gridlines
ggplot(fdat, aes(x=job, y=iq, fill=animal)) +
geom_bar(stat="identity", position="dodge") +
coord_polar() +
theme(panel.grid.major = element_line(color="cornflowerblue"))
#removing the dodge position gives back the stacked 'bar graph'
ggplot(fdat, aes(x=job, y=iq, fill=animal)) +
geom_bar(stat="identity") +
coord_polar() +
theme(panel.grid.major = element_line(color="cornflowerblue"))
#classic venn diagram, theta is the variable to map the angle
#width = 1 wraps to a full circle
ggplot(fdat, aes(x="", y=iq, fill=animal, alpha=0.1)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="y") +
theme(panel.grid.major = element_line(color="cornflowerblue"))
#bullseye
ggplot(fdat, aes(x="", y=iq, fill=animal, alpha=0.1)) +
geom_bar(stat="identity", width=1) +
coord_polar(theta="x") +
theme(panel.grid.major = element_line(color="cornflowerblue"))
#bubble plots
ggplot(fdat,aes(x = animal, y = fitbit, fill=animal)) +
scale_y_log10(limits=c(1,1000000)) +
geom_jitter(aes(size = sqrt(fitbit/pi)), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
facet_wrap(~ animal) +
theme_dark() +
xlab(NULL) +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
```
##Further resources:
Reading: Wickham, Hadley. (2010). A Layered Grammar of Graphics. Journal of Computational and Statistical Graphics.
Reading: Wilkinson, L. (2005), The Grammar of Graphics (2nd ed.). Statistics and Computing, New York: Springer. [14,18]
Resource: http://www.cookbook-r.com/Graphs/
Tutorial: https://github.com/jennybc/ggplot2-tutorial
Reading: Tufte, Edward R. The Visual Display of Quantitative Information.
Tutorial: http://stcorp.nl/R_course/tutorial_ggplot2.html
Resource: http://ggplot2.tidyverse.org/reference/theme.html