Computational Biology and Bioinformatics

Course Description

This course is designed to provide computer scientists with a comprehensive introduction to the field of computational biology. The course will cover the application of computational techniques to modern research challenges in biology, discussing both foundational algorithms and newly introduced methods. The necessary background on biology will be provided in order to understand the methods. The primary focus will be analysis of genomic data, including genome assembly, genome annotation, sequence alignment, phylogeny construction, mutation effect prediction, population genetics, and genotype-phenotype association studies. We will also cover gene expression analysis (RNA-seq and single-cell RNA-seq) and protein structure analysis and prediction. Throughout the course, we will emphasize the unique challenges to working with biological data. Through lectures and hands-on programming problem sets, students will develop the necessary skills to tackle computational challenges in the field of biology.

Syllabus

download

Organization

Schedule

This schedule is subject to change. Please check back frequently.

Week Topics
Week 0 Syllabus Discussion, Introduction, Lecture 0: Biological sequences as information
Week 1 Sequence alignment: basics and modern solutions. Needleman-Wunsch and Smith-Waterman algorithms, BLAST, evolutionary interpretation of sequence alignment
Week 2 DNA sequencencing technology, read mapping and variant calling, Burrows-Wheeler transform
Week 3 De novo genome assembly, overlap graphs, de Bruijn graphs, long read sequencing technology
Week 4 Genome annotation, Markov chains, Hidden Markov models for genome annotation, Viterbi algorithm
Week 5 Phylogenetics, continuous time markov models, Jukes-Cantor substitution model, gene trees versus species trees, outlok on molecular phylogenetics
Week 6 Population genetics, mutation and selection, genetic drift, tests for selection, dN/dS ratio, linkage disequilibrium
Week 7 Association Studies, controlling for population structure, multiple comparisons problem, heritability, interpretation of GWAS, ethical considerations
Week 8 Mutation effect prediction in proteins, deep mutational scan experiments, clinical variant data, classic models for mutation effect prediction, modern ML solutions for mutation effect prediction
Week 9 Mutation effect prediction in non-coding regions, experimental analysis of the function of non-coding regions, modern ML solutions for annotating non-coding regions
Week 10 Gene Expression Analysis, RNA sequencing experiments, inferring transcript abundance, single-cell RNA sequencing, correcting for sparsity, cell type inference
Week 11 Protein Structure Prediction, Levinthal’s paradox, homology modeling, evolution-based inference, AlphaFold
Week 12 Special topics