Fundamentals of Biological Data Analysis

Course material for Fundamentals of Biological Data Analysis, BIOS 26318, fall 2023
Author

Stefano Allesina and Dmitry Kondrashov

Published

Invalid Date

Organization of the class

Learning goals

  • R tools for visualizing and analyzing data
    • exploration of tidyverse
    • dplyr, tidyr and readr for data wrangling and organization
    • ggplot2 for visualization
    • specific packages and functions for statistical analysis
  • Theory to perform statistical inference
    • assumptions of different methods
    • hypothesis testing
    • estimation of parameters
    • model building and selection
  • Avoiding common errors
    • when (not) to use a statistical method
    • sneaky paradoxes
    • phantom effects
  • Work on your own data
    • analyze data
    • produce graphics
    • write up a report
    • present to class

Approach

  • Mix of theory and practice
  • Apply what you’re learning to your own data

Materials

Week 0

  • R refresher @ref(refresher)

Week 1

  • Using ggplot2 to produce publication-ready figures
  • Review of probability

Week 2

  • Data wrangling in tidyverse
  • Probability distributions

Week 3

  • Hypothesis testing
  • Likelihood

Week 4

  • Linear algebra primer
  • Linear models

Week 5

  • Analysis of variance
  • Model selection

Week 6

  • Principal Component Analysis and SVD
  • Multidimensional scaling and Clustering

Week 7

  • Generalized Linear Models
  • Machine Learning and cross validation

Week 8

  • Monte Carlo and boostrap
  • Modeling time-series data

Week 9

Thanksgiving break

Week 10

  • Student presentations 1
  • Student presentations 2

Acknowledgements

Zach Miller for TAing the first iteration of the class, and for contributing materials and comments; Julia Smith for TAing the second iteration; Cassie Manrique for TAing the third iteration; Amatullah Mir for this year. Development of the class was partially supported by the Burroughs Wellcome Fund through the program “Quantitative and statistical thinking in the life sciences” (Stefano Allesina, PI).