You are on page 1of 10

Intro to R and R-Studio

I. Downloading R and R-Studio



R is a free software environment for statistical computing and graphics. It compiles and runs on a
wide variety of UNIX platforms, Windows and MacOS. This software is command driven (as
opposed to point and click) which allows the users to have a high degree of control in graphs and
computations. R-Studio is a powerful user interface for R. Moreover, hundreds of free R-
packages are also available with each customized for a specific analysis and task.

You need to install R and then RStudio in that order.

To install R go through the following steps:
1. Go to http://www.r-project.org/ (or simply type R in google, and click on the first link!).
2. On the Getting Started box click on download R.
3. Choose a CRAN Mirror site to download R from (e.g., USA: http://cran.stat.ucla.edu/).
4. Depending on your operating system click on one of Download R for Linux, Download R
for MacOS X or Download R for Windows.
5. If you selected Download R for Windows, then
Click on base
Select Download R and follow the prompts. If you save the installer, you need to
run it to install R on your computer.

If you selected Download R for MacOS X, then
Under Files click on R-3.0.1.pkg and follow the prompts. (Note: this is only if you
have MacOS X 10.6 (Snow Leopard) and higher. If you have an older version, talk to the
instructor for further instructions).
Once downloaded, R will appear as an icon on your Launchpad.

At this point you have installed R and by clicking on the R icon you can open the program. The
R program is usable as is. However, a more user friendly interface is provided by R-Studio
which we will use.

Follow the steps below to download RStudio:
1. Go to http://www.rstudio.com/.
2. Click on the icon Download now.
3. Click on the icon Download Rstudio Desktop.
4. Select the appropriate link for the type of operating system that you are using (Windows or
Mac), and download the program.
5. Run the downloaded program and install it on your computer.




"# $%& '()'* %+ ,-. /0,&1)% )'%+ $%& 2-%&(1 2.. ,-. &2.3 )+,.3#4'.5




A window on the left hand side called Console. In this window we write a command and press
enter to run it; we refer to it as the Command Window. We are mainly going to write commands
on a fourth window (not seeing on the top figure) called the R Script editor. On the top right
window there are two tabs, one for Workspace and another for History. And on the bottom right
window we have Files, Plots, Packages, and Help tabs. In the subsequent sections, we will
explain the functionality of each of these components, as needed.

II. The R Script Editor
To open the editor, at the bar on top, click File, then New, then R Script. We are mainly going to
write commands and run the commands on the editor because 1) errors made in the syntax of the
command are easily fixed, 2) we can run multiple lines of code at once, and 3) the code can be
saved easily to a file to be used later if needed.


III. Reading data into RStudio

We will often need to import data from the data sets you saved into your flashdrive. To do so,
you first need to open the data set in Excel, and within Excel save the data set as a csv file. To
import this data set, on the top right window of RStudio we click Import Dataset. You will
have two choices From Text File or From Web URL. We select From Text File. A
navigation window will pop-up, and you will need to navigate to the location of your file and
click on the name. Once you click Open, A menu will appear. Make sure that the option Yes
is selected from Heading. Then click Import.
For example, lets say that we wanted to import a data set called cheese. Once you click the
Import button, the following lines will appear in the Console.

> cheese <- read.csv("~/Documents/Math 338
Fa13/PC_Excel/Excel/cheese.csv")
> View(cheese)

This means that R-Studio has executed the command read.table to read the data, and the data
frame is now imported under the name cheese. Note that the name is not cheese.csv as the
output is assigned to the name cheese (see cheese <- read.table ) RStudio then executes the
command viewData, and that results in a viewing of the data on the top left panel.

In order to use the variable names in the data set, you need to attach the data set by issuing the
command attach (cheese). Hence every time you import a data set, issue the command attach
(dataname).

> attach(cheese)

IV. Simple math operations
We can simply type numbers with arithmetic operations and run this code to get the result of the
operations.
> 2+5
[1] 7
> sqrt(16)
[1] 4

> 2^4*5
[1] 80

> 3/40
[1] 0.075


V. Assignments

In many instances, we need to preserve a value of a variable to use later. In this case, we assign
the value to a variable name, and this value will be preserved in the memory of the computer for
later referral and use.
Generally speaking, there are three basic forms of assigning data. Case one is the single atom or
a single number. Assigning a number to an object in this case is quite trivial. All we need is to
use < - or = for assigning a number or an atom to a character. In the following, > refers to the
prompt in R once you run the code from the editor.

(a) Atoms:
> sam=2

> sam
[1] 2

> sam+sam
[1] 4

> (2*sam*2)/2
[1] 4

> sam^(1/3)
[1] 1.259921

> sqrt(sam)
[1] 1.414214

> abs(-sam)
[1] 2

Characters

Characters are quantities that we dont perform arithmetic operations with. Like numbers, they
can be assigned to variables. However, characters are always written within quotations, either
single or double. The value assigned to the variable is exactly what is typed between the
quotations. For example, one can add as many spaces as desired anywhere within the quotes, and
the spaces will be part of the character.

> class=math 338

> class
[1] "math 338"


The second form is the vector form. In this form, we assign a name to an array of numbers.
This can be done with the command c which stands for concatenation. The interesting fact is that
we can call any member of the vector or we can replace that member with a new member or to
perform various arithmetic operations on that vector, as shown below.
(b) Vectors

> class.age=c(35,35,36,37,37,38,38,39,40.5,43,44,44.5,50,19)

> class.age
[1] 35.0 35.0 36.0 37.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0 44.5
50.0 19.0

> class.age[3]
[1] 36

> class.age[1:5]
[1] 35 35 36 37 37

> sort(class.age)
[1] 19.0 35.0 35.0 36.0 37.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0
[13] 44.5 50.0

> class.age[-5]
[1] 35.0 35.0 36.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0
19.0

> class.age[-c(2,7)]
[1] 35.0 36.0 37.0 37.0 38.0 39.0 40.5 43.0 44.0 44.5 50.0 19.0

> class.age*2
[1] 70 70 72 74 74 76 76 78 81 86 88 89 100 38

> sqrt(class.age)
[1] 5.916080 5.916080 6.000000 6.082763 6.082763 6.164414
6.164414 6.244998
6.363961 6.557439 6.633250 6.670832 7.071068
[14] 4.358899

> class.age^(-1)
[1] 0.02857143 0.02857143 0.02777778 0.02702703 0.02702703
0.02631579
0.02631579 0.02564103 0.02469136 0.02325581
[11] 0.02272727 0.02247191 0.02000000 0.05263158

> class.age*class.age
[1] 1225.00 1225.00 1296.00 1369.00 1369.00 1444.00 1444.00
1521.00 1640.25
1849.00 1936.00 1980.25 2500.00 361.00

> class.age^2
[1] 1225.00 1225.00 1296.00 1369.00 1369.00 1444.00 1444.00
1521.00 1640.25
1849.00 1936.00 1980.25 2500.00 361.00

> class.age=(class.age)/2

> class.age
[1] 17.50 17.50 18.00 18.50 18.50 19.00 19.00 19.50 20.25 21.50
22.00 22.25 25.00

> class.age=class.age*2

Often it is useful to create an empty vector. Here is the way this is done:
> hi=numeric(10)

> hi
[1] 0 0 0 0 0 0 0 0 0 0

Vectors do not have to be numerical. We can create a vector of characters:
> hi=c("hello","whasup","longday")
> hi
[1] "hello" "whasup" "longday"

Later, it becomes useful to ask R the length of a vector:
> length(class.age)
[1] 14

Finally, the third form of storing data is to put them in a matrix form. The command is matrix.
First we need to input the data set of interest, followed by telling Rstudio the dimensionality of
the matrix that needs to be specified. For example, we can put an array of 9 numbers into a
matrix with 3 rows and 3 columns. We demonstrate all of these below.

(c) Matrices
> sam = matrix(nrow=3,ncol=4)
> sam
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA

> sam = matrix(c(1,2,3,4,5,6,7,8,9,10,11,12),nrow=3,byrow=T)
> sam
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

> sam<-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12),nrow=3,byrow=F)
> sam
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

> sally =c(1,2,3,4,5,6,7,8,9,10,11,12)

> sam=matrix(sally,nrow=3,byrow=T)

> sam
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

> v1=c(1,2,3,4)
> v2=c(5,6,7,8)
> v3=c(9,10,11,12)

> sam=matrix(c(v1,v2,v3),nrow=3,byrow=T)
> sam
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

> sam[1,]
[1] 1 2 3 4

> sam[,2]
[1] 2 6 10

> sam[1,3]
[1] 3

sam[3,]=v2
> sam
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 5 6 7 8

> sam[1,]=log(v1)
> sam
[,1] [,2] [,3] [,4]
[1,] 0 0.6931472 1.098612 1.386294
[2,] 5 6.0000000 7.000000 8.000000
[3,] 5 6.0000000 7.000000 8.000000




VI. Lists
Rstudio provides a powerful additional storing function called list. The importance of list is in
that we can store various objects of different natures such as matrices, vectors, or atoms into a
unique space, followed by calling different parts of that object separately.
Let's assume that we would like to store the following three objects into a list-object called A:

> s1=3
> s2=seq(1,10,2)
> s3=matrix(c(1:9),nrow=3)

> s1
[1] 3

> s2
[1] 1 3 5 7 9

> s3
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> A=list(s1,s2,s3)
> A
[[1]]
[1] 3

[[2]]
[1] 1 3 5 7 9

[[3]]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> A[[1]]
[1] 3

> A[[2]]
[1] 1 3 5 7 9

> A[[3]]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

VII. The for loop
Often times, it becomes necessary to repeat certain calculations a number of times.
This is done in R using a simple command called "for". Here are some examples:

> for(i in 1:3)
+ {
+ print("sam")
+ }
[1] "sam"
[1] "sam"
[1] "sam"

> A=matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,byrow=T)
> for(i in 1:3)
+ print(A)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

or:

> for(i in 1:3)
+ { print(A)}

[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

> for(i in 1:3)
+{
+ print(s[i,])
+ }
[1] 1 2 3
[1] 4 5 6
[1] 7 8 9


VIII. Functions
The most common and useful ones are the library functions or the already written commands.
For example, mean and sd are commands that calculate the average and the standard deviation
of an object, say a vector respectively. Here are a couple of examples:

> class.age=c(35,35,36,37,37,38,38,39,40.5,43,44,44.5,50,19)

> mean(class.age)
[1] 38.28571

> median(class.age)
[1] 38

> var(class.age)
[1] 49.1044

> sd(class.age)
[1] 7.007453

> summary(class.age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
19.00 36.25 38.00 38.29 42.38 50.00

> sort(class.age)
[1] 19.0 35.0 35.0 36.0 37.0 37.0 38.0 38.0 39.0 40.5 43.0 44.0
[13] 44.5 50.0

You might also like