Machine learning on biological sequence data

Course Description

A seminar in which students will read, present, and discuss research papers on recent and advanced topics in computational biology, specifically related to machine learning models fit to biological sequence data (proteins and DNA). This semester, the seminar will primarily cover the following topics: foundation models of DNA and protein sequences (including transformer-based models), predicting the effects of biological mutations, predicting the structure of proteins (including AlphaFold), and supervised vs. unsupervised learning on sequences. Students are expected to read up to two papers per week. For one or more sessions in the semester, students are expected to make summary presentations and lead discussion of the papers. Students should have taken COMPSCI 690U, Computational Biology and Bioinformatics, or have comparable background. 1 credit.

Syllabus

download

Organization

  • Instructor: Anna G. Green
  • Course number: COMPSCI 692X
  • Meeting day/time: Monday, 2:30-3:30pm
  • Location: CS 140
  • TA: None
  • Office hours: by appointment
  • E-mail: annagreen@umass.edu