Fundamentals of Biological Data Analysis
Course material for Fundamentals of Biological Data Analysis, BIOS 26318, fall 2023
Organization of the class
Learning goals
R
tools for visualizing and analyzing data- exploration of
tidyverse
dplyr
,tidyr
andreadr
for data wrangling and organizationggplot2
for visualization- specific packages and functions for statistical analysis
- exploration of
- Theory to perform statistical inference
- assumptions of different methods
- hypothesis testing
- estimation of parameters
- model building and selection
- Avoiding common errors
- when (not) to use a statistical method
- sneaky paradoxes
- phantom effects
- Work on your own data
- analyze data
- produce graphics
- write up a report
- present to class
Approach
- Mix of theory and practice
- Apply what you’re learning to your own data
Materials
Week 0
R
refresher @ref(refresher)
Week 1
- Using
ggplot2
to produce publication-ready figures - Review of probability
Week 2
- Data wrangling in
tidyverse
- Probability distributions
Week 3
- Hypothesis testing
- Likelihood
Week 4
- Linear algebra primer
- Linear models
Week 5
- Analysis of variance
- Model selection
Week 6
- Principal Component Analysis and SVD
- Multidimensional scaling and Clustering
Week 7
- Generalized Linear Models
- Machine Learning and cross validation
Week 8
- Monte Carlo and boostrap
- Modeling time-series data
Week 9
Thanksgiving break
Week 10
- Student presentations 1
- Student presentations 2
Acknowledgements
Zach Miller for TAing the first iteration of the class, and for contributing materials and comments; Julia Smith for TAing the second iteration; Cassie Manrique for TAing the third iteration; Amatullah Mir for this year. Development of the class was partially supported by the Burroughs Wellcome Fund through the program “Quantitative and statistical thinking in the life sciences” (Stefano Allesina, PI).