Nicholas Monath

  PhD Student
  University of Massachusetts Amherst


I am a PhD student in computer science at the University of Massachusetts Amherst advised by Professor Andrew McCallum. I am a member of the Information Extraction and Synthesis Laboratory.

In my PhD, I have had the great pleasure of collaborating with Ari Kobren (Oracle), Manzil Zaheer (Google), Avinava Dubey, Amr Ahmed (Google), Akshay Krishnamurthy (MSR), Sunil Mohan (CZI), Ivana Williams (CZI), Sebastian Macaluso (NYU, Physics), Kyle Cranmer (NYU, Physics), Pat Flaherty, Rajarshi Das (UMass), Rico Angell (UMass), Nishant Yadav (UMass), and more (see papers below)!

I am grateful for the oppurtunity to have interned at: Google (Winter 2018-2019, 2019-2020) with Amr Ahmed, Manzil Zaheer and Avinava Dubey; CZI (Summer 2019) with Ivana Williams; IBM (Summer 2018) with Michael Glass and Alfio Gliozzo, and Amazon (Summers 2016, 2017) with Shankar Ananthakrishnan and Bo Xiao.

I received my B.S. in computer science and mathematics from Brandeis University in 2013 where I worked with Professors James Storer and Antonella Di Lillo.

Research Interests

I work on machine learning and natural language processing. In particular, I am most interested in scalable clustering, incremental/online clustering, and entity discovery and linking.

Discovering Clusters in Continuously Arriving Data. Often we are faced with settings in which new data that arrives all of time. Performing clustering in these settings requires both the discovery of new (emerging) clusters as well as the reconsideration of past clustering decisions in the presence of newly arrived data. We have been designing algorithms that use hierarchical structured clusterings to reperesent uncertainty in this incremental setting and use tree rearrangements to efficiently reconsider past clustering decisions. Related publications: KDD 2019, KDD 2017.

Entity Discovery. Entities are mentioned ambiguously in both structured (e.g., databases) as well as unstructured (e.g., natural language) data. These entities may belong to many different (possibly overlapping) types. Resolving the ambiguity present in these entity mentions and/or discovering their entity types is necessary for many tasks such as the automatic construction of knowledge-bases, and question answering. This task is especially difficult when the catalog of entities or attributes are not known in advance. This last problem, often known as “entity discovery,” is one of clustering, with the special challenges that it needs to operate on incrementally arriving data, at large scale, ideally with non greedy behavior. Related publications: Related publications: NAACL 2021, Findings of EMNLP 2020, KDD 2019, ACL 2019.

Modeling Abstractions for Continual / Life-long Learning. Clusters provide summaries of data points. Hierarchical and DAG-structured clustering can provide abstractions at multiple granularities. Recent work in continual and few shot learning has shown that we can use clusters in place of individual points to more effectively generalize to new classes as well as to prevent catastrophic forgetting. Related publications: Findings of EMNLP 2020.

Relaxing Tree Structures. We are interested in alternatives to standard hierarchical clustering representations to facilitate end-to-end optimization, richer representations of uncertainty, and scalability. We have proposed continuous representations of trees in the unit ball as well as DAG-structured alternatives to trees. Related publications: AISTATS 2021, KDD 2019.


KDCOVID - April 2020 - We have released KDCOVID. KDCOVID retrieves papers by measuring similarity between queries and sentences in the full text of papers in CORD19 corpus. KDCOVID highlights entities linked to knowledge-bases and drug-gene-disease associations. Developed by Manzil Zaheer, Nicholas Monath, Shehzaad Dhuliawala, Taamannae Taabassum, Rajarshi Das, Bhuwan Dhingra, and Andrew McCallum. [] [Kaggle submission] [code on github]

Sets & Partitions - December 2019 - I co-organized The First Workshop on Sets and Partitions, was held as a part of the NeurIPS 2019 conference, with Manzil Zaheer, Ari Kobren, Junier Oliva, Barnabás Póczos, Ruslan Salakhutdinov, Andrew McCallum. The workshop was focused on models for tasks with set-based inputs/outputs as well as models of partitions and novel clustering methodology. [Workshop Site]


  1. Sunil Mohan, Rico Angell, Nicholas Monath, Andrew McCallum. Low Resource Recognition and Linking of Biomedical Concepts from a Large Ontology. 2021. [arxiv]

Conference Publications

  1. Nicholas Monath, Avinava Dubey, Guru Guruganesh, Manzil Zaheer, Amr Ahmed, Andrew McCallum, Gokhan Mergen, Marc Najork, Mert Terzihan, Bryon Tjanaka, Yuan Wang, Yuchen Wu. Scalable Bottom-Up Hierarchical Clustering. To Appear in KDD. 2021. [arxiv]

  2. Craig Greenberg*, Sebastian Macaluso*, Nicholas Monath*, Avinava Dubey, Patrick Flaherty, Manzil Zaheer, Amr Ahmed, Kyle Cranmer, Andrew McCallum. Exact and Approximate Hierarchical Clustering with A*. To Appear in UAI. 2021. (* equal contribution). [arxiv]

  3. Raghuveer Thirukovalluru, Nicholas Monath, Kumar Shridhar, Manzil Zaheer, Mrinmaya Sachan and Andrew McCallum. Scaling Within Document Coreference to Long Texts. To Appear in Findings of ACL, 2021.

  4. Rico Angell, Nicholas Monath, Sunil Mohan, Nishant Yadav, Andrew McCallum. Clustering-based Inference for Zero-Shot Biomedical Entity Linking. To Appear in NAACL, 2021. [arxiv]

  5. Nicholas Monath, Manzil Zaheer, Avinava Dubey, Amr Ahmed, Andrew McCallum. DAG-Structured Clustering by Nearest Neighbors. AISTATS, 2021. [paper]

  6. Craig S. Greenberg*, Sebastian Macaluso*, Nicholas Monath, Ji-Ah Lee, Patrick Flaherty, Kyle Cranmer, Andrew McGregor, Andrew McCallum. Data Structures & Algorithms for Exact Inference in Hierarchical Clustering . AISTATS, 2021. [arxiv]

  7. Rajarshi Das, Ameya Godbole, Nicholas Monath, Manzil Zaheer, Andrew McCallum. Probabilistic Case-based Reasoning in Knowledge Bases . Findings of EMNLP 2020 [arxiv] [code]

  8. Dung Thai, Zhiyang Xu, Nicholas Monath, Boris Veytsman, Andrew McCallum. Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction. AKBC. 2020 [pdf]

  9. Derek Tam, Nicholas Monath, Ari Kobren, Andrew McCallum. Predicting Institution Hierarchies with Set-based Models AKBC. 2020. [pdf]

  10. Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum. Optimal Transport-based Alignment of Learned Character Representations for String Similarity. ACL. 2019. [arxiv] [pdf] [code + data]

  11. Nicholas Monath*, Ari Kobren*, Akshay Krishnamurthy, Michael Glass, Andrew McCallum. Scalable Hierarchical Clustering via Tree Grafting. KDD. 2019 (Oral presentation) (* Equal Contribution). [pdf] [arxiv] [code]

  12. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. KDD. 2019. [pdf] [code]

  13. Nishant Yadav, Ari Kobren, Nicholas Monath, Andrew McCallum. Supervised Hierarchical Clustering with Exponential Linkage. ICML. 2019. [arxiv] [pdf] [code]

  14. Ari Kobren, Nicholas Monath, Andrew McCallum. Integrating User Feedback under Identity Uncertainty in Knowledge Base Construction. AKBC, 2019. [pdf]

  15. Craig Greenberg, Nicholas Monath, Ari Kobren, Patrick Flaherty, Andrew McGregor, Andrew McCallum. Compact Representation of Uncertainty In Clustering. NeurIPS 2018. [pdf]

  16. Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan, Abishek Ravi. Play Duration based User-Entity Affinity Modeling in Spoken Dialog System Interspeech 2018. [pdf]

  17. Ari Kobren*, Nicholas Monath*, Akshay Krishnamurthy, and Andrew McCallum. A Hierarchical Algorithm for Extreme Clustering . KDD. 2017. (* equal contribution). (Oral Presenation). [pdf] [code] [talk] [promo video]

Workshop & Other Publications

  1. Ethan Shen, Maria Brbic, Nicholas Monath, Jiaqi Zhai, Manzil Zaheer, Jure Leskovec. Model-Agnostic Graph Regularization for Few-Shot Learning. NeurIPS Workshop on Meta-Learning, 2020. [arxiv]

  2. Nicholas Monath *, Ari Kobren*, Akshay Krishnamurthy, Andrew McCallum. Gradient-based Hierarchical Clustering. NIPS Workshop on Discrete Structures in Machine Learning. 2017. (Oral Presentation). [pdf]

  3. Ari Kobren, Nicholas Monath, Andrew McCallum. Entity-centric Attribute Feedback for Interactive Knowledge Bases. NIPS Workshop on Automated Knowledge Base Construction. 2017. [pdf]

  4. Aaron Traylor *, Nicholas Monath *, Rajarshi Das, Andrew McCallum. Learning String Alignments for Entity Aliases. NIPS Workshop on Automated Knowledge Base Construction. 2017. [pdf] [code]

  5. Haw-Shiuan Chang, Abdurrahman Munir, Ao Liu, Johnny Tian-Zheng Wei, Aaron Traylor, Ajay Nagesh, Nicholas Monath, Patrick Verga, Emma Strubell, and Andrew McCallum. Extracting Multilingual Relations under Limited Resources: TAC 2016 Cold-Start KB construction and Slot-Filling using Compositional Universal Schema. NIST TAC KBP Workshop 2016. Notebook version [pdf]

  6. Benjamin Roth, Nicholas Monath, David Belanger, Emma Strubell, Patrick Verga and Andrew McCallum Building Knowledge Bases with Universal Schema: Cold Start and Slot-Filling Approaches TAC KBP 2015 Workshop [pdf]

  7. Nicholas Monath and Andrew McCallum. Discriminative Hierarchical Coreference for Inventor Disambiguation. PatentsView Inventor Disambiguation Technical Workshop. September 2015 [slides] [code]

  8. Mykel J. Kochenderfer and Nicholas Monath. Data Compression of Optimal Value Functions for Markov Decision Processes Data Compression Conference. Snowbird Utah 2013.


1st place. Inventor Disambiguation Challenge. PatentsView Inventor Disambiguation Technical Workshop. September 2015. [link] [slides] [code]. Our inventor name disambiguation system was integrated into USPTO PatentsView website.


University of Massachusetts Amherst.
Started MS (only) Fall 2013
Entered PhD Fall 2015

Brandeis University. BS 2013. Computer Science and Mathematics.
Graduated Summa Cum Laude.
Phi Beta Kappa.
Highest Honors for Undergraduate Thesis. Michtom Prize for Academic Excellence in Computer Science.


Scalable Hierarchical Clustering with Tree Grafting KDD 2019. [slides]

Optimal Transport-based Alignment of Learned Character Representations for String Similarity ACL 2019. [slides]

A Hierarchical Algorithm for Extreme Clustering KDD 2017. [poster] [slides from UMass Data Science Symposium]

Discriminative Hierarchical Coreference for Inventor Disambiguation PatentsView Inventor Disambiguation Technical Workshop. 2015. [slides]. (Our inventor name disambiguation system received 1st place and is a part of the USPTO PatentsView website).