CSB_2019

Lecture notes and exercises for Computing Skills for Biologists --- Winter 2019

View project on GitHub

ECEV 32000 Summaries and Warmup Exercises

General goals for the class

  • Learn how to automate analysis of biological data
  • Program in different languages
  • Integrate different tools into coherent “pipelines”
  • Find out about tools that you didn’t consider before
  • Jumpstart the learning of anything computer-related

Method

  • Lecture notes/book
  • Exercises in class and homework
  • You need to include these tools into your daily work to be effective
  • “Showcase” more than “Teach”
  • Give practical examples
  • Brief review of the chapter(s) for the week
  • Warmup exercises
  • Working as a group to solve the longer exercises at the end of the chapter

Final project

  • Take a boring task (or complicated one) and automate it
  • Good examples: automatically process data from experimental apparatus; image analysis; automated reporting of experimental results; downloading and organizing data from websites; …
  • A great example and associated code
  • You can collaborate (but tackle a larger problem, and document who did what)
  • Remember to include: a) Report; b) Code; c) Working example. Ideally, just send me the link to a public repo.

Before we start

  • Open a terminal (Ctrl + Alt + T in Ubuntu)
  • Clone the repository for the book
git clone https://github.com/CSB-book/CSB.git
  • Clone the repository for the exercises and summaries
git clone https://github.com/StefanoAllesina/CSB_2019.git

Week 1: UNIX Shell

Week 2: Basic Programming in Python

Week 3: Advanced Programming in Python

Week 4: Regular expressions

Week 5: Scientific Programming

Week 6: Python wrapup

Week 7: Programming in R

Week 8: Data wrangling and visualization

Week 9: R wrapup

Week 10: Relational databases

Final project

  • Please send your final project by the end of the quarter (March 23rd)
  • Ideally, send me a link to a public repo (or invite StefanoAllesina (GitHub) or AllesinaLab (BitBucket) to join a private repo)
  • Include data, code and a report
  • If the data are too large, include just what is sufficient to see how the project works
  • The code should be well-written; ideally, combine several tools
  • The report can be written in MarkDown or LaTeX (or any other system, but these are good as it is easy to include well-formatted code). RStudio can produce RMarkdown Notebooks; Jupyter notebooks can be exported to MarkDown easily.
  • The report should briefly state what the problem is; give a sense of the strategy used for the solution; include some details on the development of the code.

Conclusion

I hope the class helped building your confidence when it comes to computing. You should feel empowered: with a little time and dedication, you can crack almost any computational problem.

The real power is to be able to combine creatively the different tools we’ve seen in class: maybe some shell scripting can save you some time when wrestling with a large data set, which can then be parsed and analyzed in Python and plotted in R.

The only way to become a strong programmer is to program a lot. Force yourself to tackle new problems; ask for help; learn from your mistakes and from others; it is a neverending journey, but a rewarding one.

If you can automate something, do it.

There are many other topics that you might find interesting. See the last chapter for some suggestions.

If you are about to embark on a difficult project, and you need a sounding board or some advice, come see me and I’ll buy you an espresso.

Thanks for the great time in class, and happy computing! Stefano