Professional Documents
Culture Documents
Kenneth R. Szulczyk
Basic Programming in R
Copyright 2015 by Kenneth R. Szulczyk
All rights reserved
Table of Contents
The Basics
I am assuming the readers are familiar with basic statistics and linear algebra. I do not teach
you any profound empirical techniques. Instead, I give you a comprehensive overview of
programming in R. After finishing this book, you should be able to use R and tailor it for your
own use.
R is an open source math and statistics software. Researchers can download and use the
software for free. You can download the software from:
http://cran.r-project.org/
Researchers can use R in two ways. Researchers can enter commands directly into the console
or write a program and run the program in the console. I show the console below:
I have heard someone created a Graphic User Interface for R, where the users execute
commands via pull down menus, but I have not found it yet.
Using the console, enter the command, 2 + 2. The greater than sign indicates R is waiting for a
command. Any text, commands, and equations in red indicate commands one can enter
directly into R while blue indicates the output.
2 + 2
4
R calculates the answer. R uses matrices and vectors, and [1] means the answer is a vector of
dimension 1. (Or simply a scalar). The brackets, [ ], mean an element and always indicate the
index.
[1] 4
Note: R remembers all the variables and subroutines created in the console. Once I finish a
program that seems to work, I close R and re-open it to wipe its memory. Then I check if the
program still works.
1
19
37
55
73
91
2
20
38
56
74
92
3
21
39
57
75
93
4
22
40
58
76
94
5
23
41
59
77
95
6
24
42
60
78
96
7
25
43
61
79
97
8
26
44
62
80
98
9 10
27 28
45 46
63 64
81 82
99 100
11
29
47
65
83
12
30
48
66
84
13
31
49
67
85
14
32
50
68
86
15
33
51
69
87
16
34
52
70
88
17
35
53
71
89
18
36
54
72
90
I will create 100 observations of white noise with a mean of zero and standard deviation (sd) of
1.
noise <- rnorm(n=100, mean=0, sd=1)
I want to see the plot of the white noise. The main writes the charts title while xlab and ylab
define the labels for the axis. The code, pch=20, defines the dot on the graph. I believe it stands
for plot character.
plot(trend, noise, pch=20, main="White Noise", xlab="Trend",
ylab="Noise")
You can right click on the graph and copy it as a bitmap and paste it into a Word document.
Lets say you only wanted the first 50 observations of noise. Did you notice the command? I
create a new vector called noise.2 and copy the first 50 observations from the noise vector.
noise.2 <- noise[1:50]
noise.2
[1] -1.226444898 -0.689033125
1.940228241
7
0.187140317 -0.083116309
[6]
[11]
[16]
[21]
[26]
[31]
[36]
[41]
[46]
-0.163007504
-0.417129484
-0.008955807
-1.375899489
0.133480828
-1.973639450
0.963737604
-0.374648161
1.749785524
0.466372338
0.247227624
-0.229873035
-1.709316587
-1.049068429
1.151270347
0.466905061
0.612023483
-0.650935475
Observations
1
2
3
4
Model
2.5TL/3.2TL
2.5TL/3.2TL
3.5RL
2.3CL/3.0CL
Make MPG
Acura 25
Acura 24
Acura 25
Acura 28
Class
Compact
Compact
Mid-Size
Sub-Compact
Note: You must be consistent when using upper and lower case letters. For example, R views
the variable MPG, mpg, and Mpg as three different variables. If you wrote a program and
created a variable MPG, and further in the code you called the variable mpg, R will create a
new variable called mpg. You must be consistent with your names and labels.
I will create the variables I want to use. Remember, I used capital letters for miles per gallon
(MPG) in the original dataset. The dataset is an object and the $ allows a user to access specific
information from this object. In our case, the $ refers to the variable as a subset of the dataset.
mpg
eng.size
cylinders
<- dataset$MPG
<- dataset$Engine
<- dataset$Cylinders
I want to create a dummy variable for the transmission. The trans equal one if automatic and
zero if manual transmission. A single equal sign means to set a variable equal to a value while
double equal sign means a comparison.
trans <- as.numeric(dataset$Transmission == "L" )
Take a look at trans:
trans
[1]
[38]
[75]
[112]
[149]
[186]
[223]
[260]
[297]
[334]
[371]
1
0
1
0
1
1
0
0
1
0
0
1
1
1
1
0
1
1
1
0
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
1
1
0
1
1
1
0
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
0
1
1
1
1
0
1
1
0
0
0
1
1
0
0
1
1
0
1
1
0
1
1
1
1
1
1
1
0
0
1
1
1
0
0
0
0
1
1
1
0
0
1
1
0
1
1
0
0
0
1
1
1
1
1
0
0
1
1
1
0
0
1
1
0
1
1
1
0
0
1
1
1
1
1
1
0
0
1
1
1
0
0
1
0
0
1
1
0
1
1
0
0
1
1
1
1
0
1
1
1
1
1
1
0
0
0
0
0
1
1
0
1
1
1
0
1
0
1
1
0
1
1
1
0
0
0
1
1
1
1
0
1
1
1
0
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
1
1
1
1
0
1
0
1
1
1
1
1
1
0
1
0
1
1
1
0
0
0
1
1
0
1
0
0
1
1
1
1
1
0
1
0
1
1
1
1
0
0
1
1
0
1
0
1
1
1
0
1
1
1
1
0
1
0
1
1
1
0
0
1
0
0
0
1
1
0
0
1
1
1
1
0
1
0
0
1
1
1
1
1
0
1
1
1
1
0
0
1
0
1
1
0
0
0
1
1
1
1
1
1
0
1
1
1
1
0
0
1
0
1
1
0
0
1
1
1
1
0
1
0
0
1
1
1
1
0
0
1
0
1
1
1
1
1
0
1
1
1
1
0
0
0
0
0
1
0
1
0
0
1
1
1
1
1
0
1
0
1
1
1
0
1
1
Similarly, I create a separate dummy variable for compact cars. Note, all categories for
Compact must be spelled the same with matching upper and lower case letters.
compact <- as.numeric(dataset$Class == "Compact" )
I can easily create a variable for only engine sizes greater than 2 liters. The first command
returns a 1 if the engine size exceeds 2 and a 0 if false. Since eng.size is a vector, the large.eng
will also be a vector.
large.eng <- as.numeric(eng.size > 2 )
9
Then I multiply large.eng to get a vector of large engine sizes. The variable will only have
engine sizes greater than 2. Smaller engines are transformed into zeros. The asterisk, *, means
multiply. For vectors, R will multiply the first element of one vector to the first element of the
second vector. Then the second element, the third element, and so on.
large.eng <- large.eng * eng.size
The Output
[1]
[19]
[37]
[55]
2.5
2.8
2.8
3.8
3.2
3.7
2.8
3.1
3.5
4.2
0.0
3.8
3.0
2.8
0.0
3.8
0.0
2.8
2.5
3.8
0.0
2.8
2.5
4.6
0.0
2.8
2.8
3.0
3.0
0.0
2.8
4.6
3.2
0.0
3.2
4.6
0.0
2.8
3.2
0.0
0.0
2.8
0.0
0.0
2.8
4.4
0.0
0.0
2.8
4.4
2.8
3.1
0.0
4.4
2.8
3.8
0.0
5.4
2.4
2.4
2.8
4.4
3.1
3.1
2.8
2.5
3.8
3.8
2.8
2.5
3.8
3.8
Directory
R sets the Document folder as the default in Windows 7. Please, copy programs and
spreadsheets to the document folder, so R can access them there.
10
If you are not sure which directory R has installed itself in, then run this program.
list.dirs <- function(path=".", pattern=NULL, all.dirs=FALSE,
full.names=FALSE, ignore.case=FALSE) {
all <- list.files(path, pattern, all.dirs,
full.names, recursive=FALSE, ignore.case)
all[file.info(all)$isdir]
}
This creates a subroutine that reads the folders in the current directory. Then run the following
to read the folders.
dir()[file.info(dir())$isdir]
If you need to change the directory, then type this command to change the working directory.
Unfortunately, if you close R, you have to enter this command again to set the directory.
setwd("C:/Users/kenneth/Documents/")
11
Basic Statistics
I can use many commands to get descriptive statistics. Remember, this dataset has many
categorical data.
summary(dataset)
The partial output:
Observations
Min.
: 1.00
1st Qu.: 94.75
Median :188.50
Mean
:188.50
3rd Qu.:282.25
Max.
:376.00
Model
Passat : 6
Accord : 5
Cavalier: 5
Mustang : 5
S70
: 5
Sunfire : 5
(Other) :345
Make
BMW
: 25
Mitsubishi: 24
Volkswagen: 22
Chevrolet : 21
Ford
: 20
Toyota
: 18
(Other)
:246
MPG
Min.
:13.00
1st Qu.:26.00
Median :29.00
Mean
:29.48
3rd Qu.:32.00
Max.
:50.00
Remember, we created new variables. If I want the summary statistics for the variables I had
created, then I use the cbind to combine the variables into a matrix. Cbind means I take the
vectors and combine them together into a matrix. The matrix x will have three columns, and C
refers to the columns. We also have the command rbind which combines rows.
x <- cbind(mpg, eng.size, cylinders)
Then I use the summary to get the descriptive statistics.
summary(x)
The output:
mpg
Min.
:13.00
1st Qu.:26.00
Median :29.00
Mean
:29.48
3rd Qu.:32.00
Max.
:50.00
eng.size
Min.
:1.000
1st Qu.:2.000
Median :2.400
Mean
:2.645
3rd Qu.:3.000
Max.
:6.000
cylinders
Min.
: 3.000
1st Qu.: 4.000
Median : 4.000
Mean
: 5.191
3rd Qu.: 6.000
Max.
:12.000
I want to calculate a box plot with the variables. R lays out the boxplot horizontally and plots
the axes. I named the title, Boxplot of the Data.
boxplot(x, horizontal=TRUE, axes=TRUE, main="Boxplot of the Data")
12
I want to calculate the correlations on my data. I redefine my x matrix to add more variables.
x <- cbind(mpg, compact, trans, eng.size, cylinders)
x refers to the matrix of observations. The cor defines the command for correlations while
use determines what I should do with missing observations. With this option, the program
will drop a pair if an observation has missing data. Finally, the method determines which
correlation to use - pearson, spearman or kendall.
corr.x <- cor(x, use="pairwise.complete.obs", method="kendall")
corr.x
The output:
mpg
compact
trans
eng.size cylinders
mpg
1.0000000 0.16313790 -0.25502462 -0.5787260 -0.6199851
compact
0.1631379 1.00000000 -0.03024894 -0.1447983 -0.1352761
trans
-0.2550246 -0.03024894 1.00000000 0.2306456 0.2411497
eng.size -0.5787260 -0.14479830 0.23064557 1.0000000 0.7877670
cylinders -0.6199851 -0.13527609 0.24114966 0.7877670 1.0000000
13
I will calculate something a little more complicated - canonical correlation. I create two
matrices x and y.
x <- cbind(mpg, compact)
y <- cbind(trans, eng.size, cylinders)
We use the command, cancor(x, y) to calculate the canonical correlation. I store the information
into an object called, cxy.
cxy <- cancor(x, y)
Print out the output
cxy
The output
$cor
[1] 0.6874188 0.1056315
$xcoef
[,1]
[,2]
mpg
-0.01040752 0.002188889
compact -0.00724475 -0.120616344
$ycoef
[,1]
[,2]
[,3]
trans
0.02686013 -0.04290648 -0.09737402
eng.size 0.02195106 0.12405721 -0.03980611
cylinders 0.01754001 -0.06947293 0.03930493
$xcenter
mpg
29.4760638
compact
0.2473404
$ycenter
trans eng.size cylinders
0.6276596 2.6452128 5.1914894
Did you notice the $ signs? That is an object in cxy. I can pull those values out. I want to use
the second correlation in a calculation: The 2 refers to the second number under cor.
correlation.2 <- cxy$cor[2]
correlation.2
[1] 0.1056315
14
Similarly, I need the third column from the $ycoef. We can index any matrix by using [row,
column]. The row is blank, so R will copy all the rows. The 3 indicates we only want the third
column.
vector <- cxy$ycoef[,3]
vector
The output
trans
eng.size
-0.09737402 -0.03980611
cylinders
0.03930493
Instead, I want the number in the second row, first column. I store the number under element.
element <- cxy$ycoef[2,1]
element
eng.size
0.02195106
15
Linear Regression
With linear regression, we estimate a dependent variable, yt, with one or more explanatory
variables. Refer to the equation below:
= + , +
,
+ + , +
We define the variables as:
i represents one observation. If we have time series data, then we switch i to t.
The dependent variable, yi
o We try to explain or predict yi based on the x variables
The independent variable, xj,i
o We assume these variables are fixed and constant
i represents the white noise process, assumed to be normal, mean of zero, and constant
variance.
We need to estimate the parameters
o The intercept, 0
o 1, 2, until k are the slopes.
I have data for 376 cars in 1998, or 376 observations. I believe the following relationship:
A cars petrol consumption depends on the explanatory variables.
o yt is measured in miles per gallon, or mpg.
The explanatory variables
o Compact cars should use less petrol than regular cars
Dummy variable
One if the car is compact, zero otherwise.
o Cars with automatic transmissions use more petrol than sticks.
Dummy variable
o Larger engines use more petrol than smaller engines.
o An engine with more cylinders uses more petrol.
= + , +
,
+ , + , +
In R, we use lm as the command for linear regression. I store all the results under the object fit.
Fit is not a variable. Fit constitutes an object containing many pieces of information. In this
case, dataset is redundant. I could drop this term because I created vectors, or variables in R.
fit <- lm(mpg ~ compact + trans + eng.size + cylinders, data=dataset)
I could type in fit and R will only show the coefficients. If I want to see the statistics, then I
type the command summary:
summary(fit)
16
Call:
lm(formula = mpg ~ compact + trans + eng.size + cylinders, data =
dataset)
Residuals:
Min
1Q Median
-6.9325 -2.2384 -0.2697
3Q
Max
1.7869 17.1587
Coefficients:
Estimate Std. Error t value
(Intercept) 40.1067
0.6928 57.892
compact
0.5901
0.4347
1.357
trans
-1.7810
0.3930 -4.532
eng.size
-1.2785
0.4767 -2.682
cylinders
-1.2091
0.2932 -4.124
--Signif. codes: 0 *** 0.001 ** 0.01
Pr(>|t|)
< 2e-16
0.17547
7.90e-06
0.00764
4.59e-05
***
***
**
***
* 0.05 . 0.1 1
17
We can check whether the residuals are normally distributed. I will extract the standardized
residuals from the object, fit. Standardized means the program will subtract the average and
divide by the standard deviation. We use the command below:
resid.standard <- rstandard(fit)
The command, fit$rstandard, does not work in this case.
I create a QQ Plot by using the command, qqnorm. Then I add labels and a title to make it look
nice.
qqnorm(resid.standard, ylab="Standardized Residuals", xlab="MPG",
main="Gas Consumption")
Then I add the line.
qqline(resid.standard)
If the data is normally distributed, the points should fall on the line.
18
My data has outliers. I can estimate a Median Regression, or Quantile Regression. If you
remember your statistics, an outlier represents an extreme point or observation the exception
to the rule. When you calculate an average, outliers will cause it to deviate from the true
average. On the other hand, median is another type of average. Median is the value in the
middle, and it is not sensitive to outliers.
For quantile regression, I need to install the package, quantreg.
install.packages("quantreg")
You can also install it using the Package Menu. R should also install SparseM because it relies
on another package for its calculation.
Note: You are assuming two things when you download and use someone elses package.
1. You assume the code works correctly and calculates what it is supposed to calculate.
2. You enter the correct parameters into the function when you use it.
lower bd
37.00000
-0.21725
-1.17795
-1.12246
-1.50000
upper bd
38.79854
1.00000
-0.30138
0.00000
-0.99398
I must re-estimate the regular regression because I want to compare linear regression and
quantile regression.
fit <- lm(mpg ~ eng.size, data=dataset)
I want to see what our estimates look like. I plot the data.
plot(mpg ~ eng.size, data=dataset)
abline(fit, lty="dashed", col="blue")
abline(fit.median)
abline means a straight line with intercept a and slope b. The command lty specifies the line
type. They wrote dashed but its numerical code equals 2. Then we add a legend. Finally, col
refers to the color. Finally, the bty="n" removes the border from the lengend.
legend("topright", inset=0.05, bty="n",
legend = c("Least Squares Fit", "Median Fit"),
lty = c(2, 1),
col = c("blue", "black")
)
The c() means combine the elements and is not the same as cbind. The line type, lty, equals 2
for dashed and 1 for straight line. The bty command removes the box around the legend.
20
Note: The c() command lets us cheat in R. Many commands in R only allow the user to input
one argument or variable. Thus, we can use c() to combine many variables or arguments into
one element. Then we can enter the combined element as one command into an R function.
21
Next, we calculate the autocorrelation and partial autocorrelation plots to guide which ARIMA
model we should estimate. Of course, I create a new plot window. Otherwise, R will copy over
my first time series plot.
win.graph(width = 6, height = 6, pointsize = 10)
22
We can use the par(mfrow) command to combine multiple plots onto the same graph. The
c(2,1) refers to two rows and one column, or c(rows, columns). Remember, the c() means
combine the elements, and it differs from the command, cbind().
par(mfrow=c(2,1))
acf(sunspots, 30, main="Sunspots")
pacf(sunspots, 30, main="Sunspots")
I fit an AutoRegressive Integrated Moving Average (ARIMA) to the data. The ARIMA is
difficult to explain, so lets assume our data does not have the Integrated part. That leaves an
ARMA, which is defined below:
= +
!
!
++
" !"
+
+ #
! + #
!
+ + #"
!$
!
+ + # !
We setup the estimation below. The c(1,0,1) means c(# of terms for autoregression, integrative,
# of terms for the moving average).
fit.arima <- arima(sunspots, order=c(1,0,1))
Summary does not work for ARIMA, so just enter fit.arima to get the coefficients and standard
errors.
Call:
arima(x = sunspots, order = c(1, 0, 1))
Coefficients:
ar1
ma1
0.9787 -0.4522
s.e. 0.0039
0.0191
intercept
52.0854
7.1406
aic =
ma1
-0.6154
0.0250
intercept
52.0989
7.9316
24
aic =
Scientists at NASA claim the number of sunspots have an 11-year cycle. Unfortunately, R will
not let me estimate a seasonal ARIMA(1,0,0) with a 11-year seasonal component because that
means the data has a 132-month cycle. However, I found an annual cycle, and we can estimate
an ARIMA(1,0,1) with a seasonal ARIMA(1,0,0).
fit.arima.season <- arima(sunspots, order=c(1,0,1), seasonal =
list(order = c(1, 0, 0), period = 12))
Call:
arima(x = sunspots, order = c(1, 0, 1), seasonal = list(order = c(1,
0, 0),
period = 12))
Coefficients:
ar1
ma1
0.9777 -0.4585
s.e. 0.0040
0.0194
sar1
0.0446
0.0182
intercept
52.0724
7.0849
log likelihood = -13356.46,
aic =
How did I know about the frequencies? I wrote a program in R that calculates the
periodogram from the residuals. A periodogram transforms a variable from the time domain
to a frequency domain. Just run the program to see the output.
source("periodogram.R")
25
Writing Programs
The Car and Sunspot Program
R uses a scripting programming language. We can write programs in text files using Notepad.
However, we save the extension as R and not txt.
We did many calculations for the car data. I organized everything into a program, which is
available in the Apendix. I added comments using the # mark and outputted my results using
the print function. Everything else is the same.
Look at the code. I wanted a blank line to separate the output, so I placed the command,
cat("\n"). Cat stands for concatenate while \n is a carriage return. I named the file, car.R, and
can run it by:
source("car.R")
Similarly, I wrote a R program to calculate the sunspot data. To run the program, type in:
source("arima.R")
Note: R scripting language allows users to write sloppy code. Imagine you return a year later
and try to figure out what you wrote. Thus, these rules come in handy.
1. Get into the habit of reviewing your code and simplifying it.
2. Use # to include comments in your code.
3. Print out your variables and calculations to verify them.
Simple Program
Programmers use a loop to repeat a process. R allows a programmer three methods to
construct a loop, but I show only one. A loop starts and ends with a curly bracket. From the
loop below, the loop starts at 1 and ends at 100.
Here is a quirk with R. I can only print one item at a time. So I create an x variable that
contains my quote and adds the index number to it by using c().
for(i in 1:100) {
x <- c("Goodbye cold, cruel world",i)
print(x)
}
26
Amortization Table
I used R to calculate an amortization table. I created a subroutine to calculate the monthly
payment. We would not do this in real practice because the program only calls the subroutine
once. We normally use subroutines to compute repeated calculations.
The subroutine comes first in your program. In the program, I utilize the subroutine to
calculate the monthly payment. My variable is payment and I named the subroutine
loan.payment. I pass the variables principal, interest, years, and number into the subroutine.
1 (1 + )!*+,-."/$0+-
I calculate the payment. I used plenty of parentheses to guarantee the formula calculates the
periodic payment correctly.
payment <- principal*rate / (1 - ((1+rate)^(years*number*(-1))))
Once the subroutine has finished, it returns the payment to the main program. I believe R does
not remember variables used in subroutines. Also, dont forget the closing curly bracket.
return(payment)
}
27
The main program begins executing. I define the parameters of the loan. The loan is $150,000.
The bank charges 8% annual interest rate. The borrower makes 12 payments per year and
repays the loan in 30 years. Below, I define the parameters.
principal
interest
number
years
<<<<-
150000
0.08
12
30
I need to calculate the total number of payments, which equal the number of payments per
year times the number of years.
n
<- number*years
I create a new variable called balance. For each periodic payment, the borrower pays the
interest and the remainder of the payment reduces the loan balance.
balance
<- principal
28
I record this information into the matrix by indexing specific elements. The loop chooses the
row, i, while the second number determines the column. Please do not forget the closing
bracket, }, that comes at the loops end.
table.amort[i,1]
table.amort[i,2]
table.amort[i,3]
table.amort[i,4]
}
<<<<-
i
interest.payment
payment
balance
R tends to write numbers in scientific notation, so I used the command below to get rid of
scientific notation.
options(scipen=999)
If you run a program, you must use the print command to display values. Otherwise, R will
not show the values.
print(table.amort)
I show a sample of the output below:
Payment
[1,]
[2,]
[3,]
Interest Payment
Balance
1 1000.000000 1100.647 149899.35313918092288
2 999.329021 1100.647 149798.03529928970966
3 998.653569 1100.647 149696.04200713254977
29
30
compact
0.5901217
trans
-1.7809514
eng.size -1.2784526
cylinders -1.2090972
Did you notice? This should be a vector but R defines it as a matrix. The 1 identifies the first
column.
I want to predict the y values while using the x and estimated . The hats mean I estimated the
parameters from the data. We have seen this before, remember 1 = 23 + 4. The random noise
is missing, and I added some hats.
9
9 = 23
1
predict <- x %*% betas
I want to calculate the residuals, or the errors, so I took this equation and solved for .
9
4: = 1 23
residuals <- y - predict
I need the number of explanatory variables including the intercept.
k <- length(betas)
I calculate the variance of least squares. All I do is take each error or residual, and square it.
Then I sum over all errors.
"
1
1
(
)7 (
)
;:
=
=(
)
=
<
<
?
sigma <-
sigma
[,1]
[1,] 12.823
I solve for the covariance matrix for the x terms. This matrix contains the standard errors. The
equation is below:
@ = ;:
(6 7 6)!
32
R has another quirk. The sigma is a matrix with dimensions 1 X 1. Thus, we have a single
number, or scalar. However, R does not allow sigma to be multiplied by the matrix. So we
must use the command, as.numeric, to convert the matrix into a number, or scalar.
cov <- as.numeric(sigma)*solve(x.x)
cov
intercept
compact
trans
eng.size
cylinders
intercept 0.47995567 -0.07416303 -0.042483365 0.05716483 -0.106338746
compact
-0.07416303 0.18899524 -0.001004130 0.02518794 -0.007431460
trans
-0.04248337 -0.00100413 0.154447230 -0.01112530 -0.004773151
eng.size
0.05716483 0.02518794 -0.011125304 0.22720238 -0.126632363
cylinders -0.10633875 -0.00743146 -0.004773151 -0.12663236 0.085937247
89.89684
78.94843
45.24267
23.43552
$vectors
[,1]
[1,] -0.15460189
[2,] -0.03567982
[3,] -0.10275274
[,2]
[,3]
[,4]
[,5]
0.26034516 -0.1553690 0.25717147 0.9044567
0.57873778 -0.6674799 -0.43788487 -0.1628402
0.72261583 0.6754002 -0.02836281 -0.1014808
33
intercept
compact
trans
eng.size
cylinders
[,1]
-0.000000000003183231
-0.000000000002728484
-0.000000000003865352
-0.000000000016370905
-0.000000000030922820
Technically, this vector should equal zero. Unfortunately, we are experiencing rounding
errors.
Choleski Factorization
We can easily calculate a Choleski factorization of A. Choleski factorization is similar to a
matrix square root. In algebra, = . For matrices, can we find a square matrix R so that:
The command is:
G7 G = H
R <- chol(A)
intercept compact
trans eng.size cylinders
intercept 19.39072 4.796109 12.1707707 51.292579 100.666714
compact
0.00000 8.366441 -0.2835543 -3.060416 -3.801916
trans
0.00000 0.000000 9.3697352 4.190299
6.695009
eng.size
0.00000 0.000000 0.0000000 17.770949 26.186285
cylinders
0.00000 0.000000 0.0000000 0.000000 12.215298
34
Verify the Choleski factorization by multiplying RTR. Remember, we must use matrix
multiplication, %*%.
t(R)%*%R
intercept compact trans eng.size cylinders
intercept
376.0
93.0 236.0
994.60
1952.0
compact
93.0
93.0
56.0
220.40
451.0
trans
236.0
56.0 236.0
664.40
1289.0
eng.size
994.6
220.4 664.4 2973.66
5668.5
cylinders
1952.0
451.0 1289.0 5668.50
11028.0
It does indeed equal the A matrix.
35
36
37
The Periodogram
############################################################
#
# This program calculates the periodogram on the residuals
# Ken Szulczyk
#
############################################################
# reads the variable residuals
n = length(residuals)
response <- matrix(0, n, 1)
frequency <- matrix(0, n, 1)
for(i in 1:n) {
# This loop calculates a particular frequency
# We will look at only positive frequencies
w <- i/(2*n)
real <- matrix(0, n, 1)
imag <- matrix(0, n, 1)
frequency[i,1] <- w
#
#
#
#
Frequency value
Set real vector to zero
Set imaginary vector to zero
Place frequency value in x matrix
for(t in 1:n) {
# This loop calculates the frequency response of the time series
real[t,1] <imag[t,1] <}
cos(w*t)
sin(w*t)
38
39
############################################################
#
# This program calculates the periodogram on the residuals
# Ken Szulczyk
#
############################################################
# reads the variable residuals
n = length(residuals)
response <- matrix(0, n, 1)
frequency <- matrix(0, n, 1)
for(i in 1:n) {
# This loop calculates a particular frequency
# We will look at only positive frequencies
w <- i/(2*n)
real <- matrix(0, n, 1)
imag <- matrix(0, n, 1)
frequency[i,1] <- w
#
#
#
#
Frequency value
Set real vector to zero
Set imaginary vector to zero
Place frequency value in x matrix
for(t in 1:n) {
# This loop calculates the frequency response of the time series
real[t,1] <imag[t,1] <}
cos(w*t)
sin(w*t)
40
Amortization Table
####################################################
# Create a subroutine to calculate a loan payment
####################################################
loan.payment <- function(principal, interest, years, number) {
rate <- interest/number
payment <- principal*rate / (1 - ((1+rate)^(years*number*(-1))))
return(payment)
}
####################################################
# Main Program
####################################################
# Define the loan parameters. Principal is the amount of the loan.
Interest is the annual
# intersest rate. Payments are the number of payments per year, and
the years is the total
# number of years for the loan.
principal<- 150000
interest<- 0.08
number
<- 12
years
<- 30
# number of payments
n
<- number*years
balance
<- principal
for(i in 1:n) {
interest.payment <- balance*interest/number
balance <- balance - payment + interest.payment
table.amort[i,1]
table.amort[i,2]
table.amort[i,3]
table.amort[i,4]
<<<<-
i
interest.payment
payment
balance
}
# Option supresses scientific notation
options(scipen=999)
# Print the matrix
print(table.amort)
42