Introduction to R
Introdução ao uso estatístico básico da linguagem computacional R e do ambiente integrado de desenvolvimento RStudio Desktop.
R oferece um ambiente único para a realização de análises genéticas de populações. Não é mais preciso mudar de formatos de dados e sistemas operacionais para executar uma série de análises, como era o caso até agora. Além disso, R fornece capacidades gráficas que estão prontas para uso em publicações, com apenas um pouco de esforço extra. Mas, primeiro, vamos instalar R e instalar um sistema integrado de desenvolvimento para R, e os pacotes que iremos utilizar nãs análises populacionais.
Installing R
Download and install the R statistical computing and graphing environment. This works cross-platform on Windows, OS X and Linux operating systems.
Download and install the free, open source edition of the RStudio Desktop integrated development environment (IDE) that we recommend.
Installing the required packages
The following packages are utilized in this primer:
Use the following script to install these packages:
install.packages(c("poppr", "mmod", "magrittr", "treemap"), repos = "http://cran.rstudio.com", dependencies = TRUE)
Once you’ve installed poppr, you can invoke (i.e., load) it by typing or cutting and pasting:
library("poppr")
## Loading required package: adegenet
## Loading required package: ade4
##
## /// adegenet 2.0.1 is loaded ////////////
##
## > overview: '?adegenet'
## > tutorials/doc/questions: 'adegenetWeb()'
## > bug reports/feature resquests: adegenetIssues()
##
##
## This is poppr version 2.0.2. To get started, type package?poppr
## OMP parallel support: available
This will load poppr and all dependent packages, such as adegenet and ade4. You will recognize loading by the prompts written to your screen.
Congratulations. You should now be all set for using R. Loading data and conducting your first analysis will be the topic of the next chapter. But before we go there lets provide a few useful resources.
A quick introduction to R using RStudio
Next, let’s review some of the basic features and functions of R. To start R, open the RStudio application from your programs folder or start menu. This will initialize your R session. To exit R, simply close the RStudio application.
Note that R is a case sensitive language!
Let’s get comfortable with R by submitting the following command on the command line (where R prompts you with a >
in the lower left RStudio window pane) that will retrieve the current working directory on your machine:
getwd() # this command will print the current working directory
Note that the symbol ‘#’ is used to add comments to your code and you just type
getwd()
after the “>”.
Our primer is heavily based on the poppr and adegenet packages. To get help on any of their functions type a question mark before the empty function call as in:
?mlg # open the R documentation of the function mlg()
To quit R you can either use the RStudio > Quit pull-down menu command or execute ??? + Q (OS X) or ctrl + Q (PC).
Using magrittr
Various chapters throughout this primer will have the symbol %>%
in the code. This is called a “pipe” operator and it allows code to be more readable by stringing together commands from right to left. Here’s a short description of these “pipes” with cats. When reading code, it can be thought of as equivalent to saying “and then”. For example, if you have three consecutive steps to a process, you would write this in English as:
Take your data and then do step one, and then do step two, and then do step three.
In R code with magrittr, assuming that each step is a function, it might be written as:
result <- data %>% step_one() %>% step_two() %>% step_three()
Below, are two examples of how code can be improved with magrittr. More details about magrittr can be found in this link.
Consider a fake Example:
Adapted from Hadley Wickham. Based on the children’s song, Little bunny foo foo.
foo_foo <- little_bunny()
bop_on(scoop_up(hop_through(foo_foo, forest), field_mouse), head)
# VS
foo_foo %>%
hop_through(forest) %>%
scoop_up(field_mouse) %>%
bop_on(head)
Now for a real Example:
We will use the Phytophthora infestans microsatellite data from North and South America. Let’s calculate allelic diversity per population after clone-correction. This information can be found in our chapters on Population strata and clone correction and Locus based statistics.
library("poppr")
library("magrittr")
data(Pinf)
# Compare the traditional R script
allelic_diversity <- lapply(seppop(clonecorrect(Pinf, strata = ~Continent/Country)),
FUN = locus_table, info = FALSE)
# versus the magrittr piping:
allelic_diversity <- Pinf %>%
clonecorrect(strata= ~Continent/Country) %>% # clone censor by continent and country.
seppop() %>% # Separate populations (by continent)
lapply(FUN = locus_table, info = FALSE) # Apply the function locus_table to both populations
To observe the results type
allelic_diversity
into the console after each statement.
The %>%
operator is thus good if you have to do a lot of small steps in your analysis. It allows your code to be more readable and reproducible.
Packages and getting help
One way that R shines above other languages for analysis is the fact that R packages in CRAN are all documented. Help files are written in HTML and give the user a brief overview of:
- the purpose of a function,
- the parameters it takes,
- the output it yields,
- and some examples demonstrating its usage.
To see all of the help topics in a package, you can simply type:
help(package = "poppr") # Get help for a package.
help(amova) # Get help for the amova function.
?amova # same as above.
??multilocus # Search for functions that have the keyword multilocus.
Some packages include vignettes that can have different formats such as being introductions, tutorials, or reference cards in PDF format. You can look at a list of vignettes in all packages by typing:
browseVignettes() # see vignettes from all packages
browseVignettes(package = 'poppr') # see vignettes from a specific package.
and to look at a specific vignette you can type:
vignette('poppr_manual')
Next, consider browsing Appendix 3 on “Introduction to R” if you are not yet familiar with R and RStudio. Otherwise, you are now ready to think about formatting and loading population genetic data into R.
References
We assume you installed the needed resources mentioned in chapter 2 on getting ready to use R.
After installing both R and RStudio, please proceed to open RStudio. RStudio should look something like:
The top left panel is a text-box where you can write a script. The bottom left panel is the console (or command line) where you directly execute commands and R writes results to the console; for example, type in x <- 1
hit return and then type x
and hit return again and watch the console:
Note: the “>” symbol is R’s prompt for you to type something in the console.
> x <- 1
> x
[1] 1
You just assigned the value 1 to the variable x
with the assignment operator <-
. The second command asks R to show the value assigned to R. Finally, the top right panel shows variables loaded into the environment and the bottom right panel is useful for loading files, plotting graphs, installing packages and getting help. Feel free to explore all the various tabs or elements of each panel.
Assignment, data types and operations
Let’s learn a few simple facts about R:
R uses # to add comments to code:
> # Add 1 + 1 and write to console
> 1 + 1
[1] 2
This is very useful when developing reusable code or when you want to remember a few months later what you actually did.
R is case sensitive so that x <- 5
is not the same as X <- x + 1
. Try typing this into your console and look at x
and X
, respectively:
> x <- 5
> X <- x + 1
> x
[1] 5
> X
[1] 6
Next, let’s assign numbers to a vector:
> x <- c(1, 2, 3, 4, 5)
> x
[1] 1 2 3 4 5
You can see several important traits of R. The c()
command (which stands for combine) can be used to make a list (e.g., a vector) of data. We can overwrite the variable x
with new content and we can assign another data type (e.g., a single number becomes a vector). To compound things further you can even change the data type to strings overwriting the previous content:
> a <- c("Paris", "Tokyo", "Seattle")
> a
[1] "Paris" "Tokyo" "Seattle"
Note that we use quotes to denote the string data type containing text.
To access individual elements of a vector, in this case the second element, we execute:
> a[2]
[1] "Tokyo"
Console tricks: Code completion & command history
RStudio has a very useful feature called code completion using the Tab key which can complete the full name of an object. For example type hel
and hit Tab and you will see several functions pop up and you can select help()
.
This also works inside a function to find function arguments. Type help(
and hit Tab to select arguments for the help function.
RStudio records your command history. Thus you can scroll up or down the history of executed commands using the Up or Down arrows. Make sure your cursor is in the console and try to re-execute previous commands.
To quit R you can either use the RStudio > Quit pull-down menu command or execute ??? + Q (OS X) or ctrl + Q (PC).
Getting help
One way that R shines above other languages for analysis is the fact that R packages in CRAN are all documented. Help files are written in HTML and give the user a brief overview of:
- The purpose of a function
- The parameters it takes
- The output it yields
- Some examples demonstrating its usage.
To see all of the help topics in a package, you can simply type:
help(package = "poppr") # Get help for a package.
help(amova) # Get help for the amova function.
?amova # same as above.
??multilocus # Search for functions that have the keyword multilocus.
Some packages include vignettes that can have different formats such as being introductions, tutorials, or reference cards in PDF format. You can look at a list of vignettes in all packages by typing:
browseVignettes() # see vignettes from all packages
browseVignettes(package = 'poppr') # see vignettes from a specific package.
and to look at a specific vignette you can type:
vignette('poppr_manual')
Useful resources to learn R
Introductory
- Swirl is a very well thought out R package that teaches you interactively.
- Code School Try R is a nice interactive tutorial.
- Quick R
- R reference card
- A very nice, short introduction to R
Advanced
- Advanced R by Hadley Wickham
Books:
- R in a Nutshell
- R cookbook is a nice quick reference and tutorial for general R use.
- ggplot2 book is a useful reference if you want to customize graphs for publication.