Review of Chapter 8

Assignment and data types:

Use the assignment operator <- (Alt + -); the equal sign = works as well, but is deprecated.

a <- 5 # create a variable called a and assign 5 to it
print(a)

To determine the type of a variable, use class(a):

class(a)
a <- "ciao" # note dynamic typing
class(a)

Basic data types in R:

character (strings)
numeric (real numbers)
integer (integer numbers)
complex (complex numbers)
logical (TRUE, FALSE)
factor (categorical values)

Operators

+ - * / ^ work as expected
x %% y modulus
x %/% y integer division
x %in% y test for membership

Data structures

Vectors

Contain a one-dimensional array of values of the same type:

v <- c(1, 2, 3, 4) # combine
# R starts counting at 1 (different from Python)
v[1]
v[1:2]
v[-1] # without the first element (different from Python)
v[c(1, 3)] # non-adjacent
v[c(TRUE, FALSE, TRUE, TRUE)] # using logical
v[v %in% c(1, 3)] # using membership
v[v < 3] # using logical condition

You can operate on whole vectors:

sum(v)
prod(v)
v ^ 2
sqrt(v)

Length of a vector:

length(v)

Matrices and arrays

Two- or multi-dimensional arrays containing data of the same type:

m <- matrix(data = 1:9, nrow = 3, ncol = 3, byrow = TRUE)
m

Accessing rows/cols:

m[1,] # first row
m[,1] # first col
# note that R drops dimensions automatically
is.matrix(m[1,])
# to prevent it
is.matrix(m[1, , drop = FALSE])

Multidimensional:

aa <- array(data = 1:27, dim = c(3,3,3))
aa

Lists

Collection of objects indexed by position or name:

ll <- list(v = c(1,2,3), n = c("a", "b", "c"))
ll
ll$v # $ like . in Python
ll$n
ll[[1]] # note the double brackets
ll[[2]][c(1,3)]

Data frames

Possibly, the most used data structure in R. Store spreadsheet-like data:

df <- read.csv("Goldberg2010_data.csv",
               stringsAsFactors = FALSE, # by default, strings are treated as categorical values
               quote = "") 
# first few row --- tail(df) for the last few
head(df)
# structure
str(df)
# extract column
df$Species[1:2]
df[,"Species"][1:2]
df[1:2, 1]
# extract row by index
df[3:4,]
# extract row using logical operators
df[df$Species == "Acnistus_arborescens",]
df[df$Status == 2,]

Reading and writing data

For csv files, use read.table (space/tab separated), read.csv (comma-separated), or read.csv2 (semi-colon separated). write.table etc. write csv files.

Important options:

reading: stringsAsFactors = FALSE read strings as character instead of factors
writing: row.names = FALSE (do not write row numbers)

Conditional branching

if (condition == TRUE){
  # this is executed when the condition is true
} else {
  # this when the condition is false
}

Looping

for loop:

for (i in a_vector_or_list){
  do_something(i)
}

Example:

for (i in 2:10){
  print(c(i, i * (i - 1) / 2 ))
}

while loop:

while (a_contidion_is_true){
  do_something()
  # update condition!
}

User-defined functions

Anatomy:

my_func <- function(arg1 = "default_value", arg2){
  # ...
  # body of the function
  # ...
  # return statement
  return(my_result)
}

Warmup exercise: TED Talks

For our warmup, we are going to use a spreadsheet with information on 992 TED talks. The data were adapted from

Kinnaird, Katherine M. and John Laudun. 2018. TED Talks Data Set.

Plot an histogram for the number of views. Is the distribution approximately log-normal?
Transform the duration to seconds

Hint: Look here

Plot duration in seconds vs. log number of views: does duration correlate with views?
Count the number of days since publication, and plot against log views

Hint: Look here

Find the top 10 tags
For each top tags, add a column to the data frame specifying if the tag is present

Hint: you could use the function grepl

Build a linear model with
- Response variable = log(views)
- Explanatory variables = published_days, seconds, technology, science, culture, etc.
- Which tags significantly increase views?

Hint: Look here

Add to the model the effect of the top 10 speakers by number of talks. Does this improve the fit?

Here’s a possible solution

CSB_2019

Lecture notes and exercises for Computing Skills for Biologists --- Winter 2019