Chapter 4 R Programming Basics

Before we get started, you will need to know the basics of matrix manipulation in the R programming language:

  • Generally matrices are entered in as one vector, which R then breaks apart into rows and columns in they way that you specify (with nrow/ncol). The default way that R reads a vector into a matrix is down the columns. To read the data in across the rows, use the byrow=TRUE option). This is only relevant if you’re entering matrices from scratch.
Y=matrix(c(1,2,3,4),nrow=2,ncol=2)
Y
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
X=matrix(c(1,2,3,4),nrow=2,ncol=2,byrow=TRUE)
X
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
  • The standard multiplication symbol, ‘*,’ will unfortunately provide unexpected results if you are looking for matrix multiplication. ‘*’ will multiply matrices elementwise. In order to do matrix multiplication, the function is ‘%*%.’
X*X
##      [,1] [,2]
## [1,]    1    4
## [2,]    9   16
X%*%X
##      [,1] [,2]
## [1,]    7   10
## [2,]   15   22
  • To transpose a matrix or a vector \(\X\), use the function t(\(\X\)).
t(X)
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
  • R indexes vectors and matrices starting with \(i=1\) (as opposed to \(i=0\) in python).
  • X[i,j] gives element \(\X_{ij}\). You can alter individual elements this way.
X[2,1]
## [1] 3
X[2,1]=100
X
##      [,1] [,2]
## [1,]    1    2
## [2,]  100    4
  • To create a vector of all ones, \(\e\), use the rep() function
e=rep(1,5)
e
## [1] 1 1 1 1 1
  • To compute the mean of a vector, use the mean function. To compute the column means of a matrix (or data frame), use the colMeans() function. You can also use the apply function, which is necessary if you want column standard deviations (sd() function). apply(X,dim,function) applies the specified function to the specified dimension dim (1 for rows, 2 for columns) of the matrix or data frame X.
# Start by generating random ~N(0,1) data:
A=replicate(2,rnorm(5))
colMeans(A)
## [1] -0.4884781  0.2465562
# (Why aren't the means close to zero?)
A=replicate(2,rnorm(100))
colMeans(A)
## [1] -0.14709807  0.05484491
#LawOfLargeNumbers.

apply(A,2,sd)
## [1] 0.9951114 0.9658601
# To apply a "homemade function" you must create it as a function
# Here we apply a sum of squares function for the first 5 rows of A:
apply(A[1:5, ],1,function(x) x%*%x)
## [1] 1.7102525 1.0398961 4.1784246 3.9187167 0.5713711
# Here we center the data by subtracting the mean vector:
B=apply(A,2,function(x) x-mean(x))
colMeans(B)
## [1]  1.804112e-18 -1.713907e-17
# R doesn't tell you when things are zero to machine precision. "Machine zero" in
# R is given by the internal variable .Machine$double.eps
colMeans(B) < .Machine$double.eps
## [1] TRUE TRUE
  • To invert a matrix, use the solve() command.
Xinv=solve(X)
X%*%Xinv
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
  • To determine size of a matrix, use the dim() function. The result is a vector with two values: dim(x)[1] provides the number of rows and dim(x)[2] provides the number of columns. You can label rows/columns of a matrix using the rownames() or colnames() functions.
dim(A)
## [1] 100   2
nrows=dim(A)[1]
ncols=dim(A)[2]
colnames(A)=c("This","That")
A[1:5, ]
##            This       That
## [1,] -1.2985084  0.1553331
## [2,]  0.9521460 -0.3651220
## [3,]  1.8559421  0.8566817
## [4,] -1.8959629 -0.5692463
## [5,]  0.4465415  0.6098949
  • Most arithmetic functions you apply to a vector act elementwise. In R, \(\x^2\) will be a vector containing the square of the elements in \(\x\). You can add a column to a matrix (or a data frame) by using the cbind() function.
# Add a column containing the square of the second column
A=cbind(A,A[ ,2]^2)
colnames(A)
## [1] "This" "That" ""
colnames(A)[3]="That Squared"
colnames(A)
## [1] "This"         "That"         "That Squared"
  • You can compute vector norms using the norm() function. Unfortunately, the default norm is not the \(2\)-norm (it should be!) so we must specify the type="2" as the second argument to the function.
x=c(1,1,1)
y=c(1,0,0)
norm(x,type="2")
## [1] 1.732051
# It's actually fewer characters to work from the equivalent definition:
sqrt(x%*%x)
##          [,1]
## [1,] 1.732051
norm(y,type="2")
## [1] 1
norm(x-y,type="2")
## [1] 1.414214

You’ll learn many additional R techniques throughout this course, but our strategy in this text will be to pick them up as we go as opposed to trying to remember them from the beginning.