Elsewhere

  • I have succumbed to peer pressure and am now on Twitter.
  • I have an occasionally updated blog on computational social science and machine learning!
  • If I'm not working or playing roller derby, I'm either eating or thinking about food.

News

Biography

In fall 2010, Hanna Wallach started as an assistant professor in the Department of Computer Science at the University of Massachusetts Amherst. She is one of five core faculty members involved in UMass's new Computational Social Science Initiative. Prior to this, Hanna was a senior postdoctoral research associate, also at UMass, where she developed statistical machine learning techniques for analyzing complex data regarding communication and collaboration within scientific and technological innovation communities. Hanna's Ph.D. work, undertaken at the University of Cambridge, introduced new methods for statistically modeling text using structured topic models—models that automatically infer semantic information from unstructured text and information about document structure, ranging from sentence structure to inter-document relationships. Hanna holds an M.Sc. from the University of Edinburgh, where she specialized in neural computing and learning from data, and was awarded the University of Edinburgh's 2001/2002 prize for Best M.Sc. Student in Cognitive Science. Hanna received her B.A. from the University of Cambridge Computer Laboratory in 2001. Her undergraduate project, "Visual Representation of Computer-Aided Design Constraints," won the award for the best computer science student in the 2001 U.K. Science Engineering and Technology Awards. In addition to her many papers on statistical machine learning techniques for analyzing structured and unstructured data, Hanna's tutorial on conditional random fields is extremely widely cited and used in machine learning courses around the world. Her recent work (with Ryan Prescott Adams and Zoubin Ghahramani) on infinite belief networks won the best paper award at AISTATS 2010. As well as her research, Hanna works to promote and support women's involvement in computing. In 2006, she co-founded an annual workshop for women in machine learning, in order to give female faculty, research scientists, postdoctoral researchers, and graduate students an opportunity to meet, exchange research ideas, and build mentoring and networking relationships. In her not-so-spare time, Hanna is a member of Pioneer Valley Roller Derby, where she is better known as Logistic Aggression.

Publicity

Research Interests

Hanna's primary research goal is to develop new mathematical models and computational tools for analyzing vast quantities of structured and unstructured data in order to identify and answer scientific questions about complex social processes. She studies a wide range of social processes, including those that underlie free and open source software development, scientific collaboration, and the US political system. To this end, she works on techniques for aggregating and representing large amounts of information from data sources with disparate emphases, methods for analyzing relational and social network data, efficient algorithms for inference, and robust methods for reasoning under uncertainty. Hanna's research contributes to machine learning, Bayesian statistics, and, in collaboration with social scientists, to the nascent field of computational social science.

(For a more detailed description of Hanna's lab and current research projects, see here.)

Students

Aaron Schein (1st year M.S. student) is interested in developing machine learning methods for political discourse analysis. He also works for the MITRE Corporation as a computational linguist.

Jingyi Guo (2nd year Ph.D. student) is currently developing statistical models of model file transfer networks and Internet Relay Chat (IRC) data. She also works with Brian Levine in digital forensics.

Juston Moore (2nd year M.S./Ph.D. student) is interested in the spatial and temporal dynamics of information exchange in networks, with applications including anomaly and changepoint detection.

Grants

Co-PI with Marc Liberatore and Brian Levine (UMass Amherst), Thomas Kerle (FVTC ICAC), and Janice Wolak (University of New Hampshire), Office of Juvenile Justice and Delinquency Prevention (OJJDP) FY 2011 Child Protection Research Program. "RoundUp Predictive Tool (RPT) Project." 2011–2014.

PI with Andrew McCallum and David Jensen (UMass Amherst), Raytheon BBN Technologies (prime to IARPA). "Foresight and Understanding from Scientific Exposition (FUSE)." 2011–2016.

Co-PI with Jennifer Wortman Vaughan (UCLA) on behalf of The Women in Machine Learning (WiML) Executive Board, NSF IIS #1037002. "Workshop for Women in Machine Learning." 2010–2012.

PI with Andrew McCallum (UMass Amherst) and Fiona Murray (MIT), NSF SBE (SciSIP) #0965436. "New Methods to Enhance Our Understanding of the Diversity of Science." 2010–2013.

Theses

"Structured Topic Models for Language." Ph.D. thesis, University of Cambridge, 2008.

"Efficient Training of Conditional Random Fields." M.Sc. thesis, University of Edinburgh, 2002.

"Visual representation of CAD constraints." B.A. thesis, University of Cambridge, 2001.

Publications

Kriste Krstovski, David Smith, Hanna Wallach and Andrew McGregor. "Efficient Nearest Neighbor Search in the Probability Simplex." To appear in Proceedings of the Fourth International Conference on the Theory of Information Retrieval (ICTIR 2013), Copenhagen, Denmark, 2013.

Peter Krafft, Juston Moore, Bruce Desmarais and Hanna Wallach. "Topic-Partitioned Multinetwork Embeddings." In Advances in Neural Information Processing Systems Twenty-Five, 2012. [pdf]

Peter Krafft, Juston Moore, Hanna Wallach, Bruce Desmarais and James ben-Aaron. "Topic-Specific Communication Patterns from Email Data." In New Directions in Analyzing Text as Data, 2012.

Peter Krafft, Juston Moore, Hanna Wallach, Bruce Desmarais and James ben-Aaron. "Topic-Specific Communication Patterns in Email Data." In Workshop on Information in Networks, 2012.

Anton Bakalov, Andrew McCallum, Hanna Wallach and David Mimno. "Topic Models for Taxonomies." In Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, 2012. [pdf]

Peter Krafft, Juston Moore, Hanna Wallach, Bruce Desmarais and James ben-Aron. "Modeling Government Email Networks. In 5th Annual Political Networks Conference, 2012.

Justin Grimmer, Rachel Shorey, Hanna Wallach and Frances Zlotnik. "A Class of Semiparametric Topic Models for Political Texts." In 70th Midwest Political Science Association Conference, 2012.

Alexandre Passos, Hanna Wallach and Andrew McCallum. "Correlations and Anticorrelations in LDA Inference." In Proceedings of the 2011 Workshop on Challenges in Learning Hierarchical Models: Transfer Learning and Optimization (held in conjunction with NIPS), 2011. [pdf]

David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders and Andrew McCallum. "Optimizing Semantic Coherence in Topic Models." In Proceedings of the 2011 EMNLP Conference, 2011. [pdf]

Edmund Talley, David Newman, Bruce Herr II, Hanna Wallach, Gully Burns, Miriam Leenders and Andrew McCallum. "A Database of National Institutes of Health (NIH) Research Using Machine Learned Categories and Graphically Clustered Grant Awards." In Nature Methods, 2011. [html]

Rachey Shorey, Hanna Wallach and Bruce Desmarais. "Toward a Framework for the Large Scale Textual and Contextual Analysis of Government Information Declassification Patterns." Presented at the 2nd Annual Text as Data Conference, Northwestern University, Illinois, 2011.

Hanna Wallach, Shane Jensen, Lee Dicker and Katherine Heller. "An Alternative Prior Process for Nonparametric Bayesian Clustering. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Sardinia, Italy, 2010. [pdf]

Ryan Prescott Adams, Hanna Wallach and Zoubin Ghahramani. "Learning the Structure of Deep, Sparse Graphical Models." In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Sardinia, Italy, 2010. [pdf]

Hanna Wallach, David Mimno and Andrew McCallum. "Rethinking LDA: Why Priors Matter." In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, 2009. [pdf]

David Mimno, Hanna Wallach, Jason Naradowsky, David Smith and Andrew McCallum. "Polylingual Topic Models." In Proceedings of the 2009 EMNLP Conference, Singapore, 2009. [pdf]

Hanna Wallach, Iain Murray, Ruslan Salakhutdinov and David Mimno. "Evaluation Methods for Topic Models." In Proceedings of the 26th International Conference on Machine Learning, 2009. [pdf]

David Mimno and Hanna Wallach. "Computational Papyrology." Presented at Media in Transition 6: Stone and Papyrus, Storage and Transmission, Cambridge, Massachusetts, 2009. [abstract]

Hanna Wallach, Iain Murray, Ruslan Salakhutdinov and David Mimno. "Evaluation Methods for Topic Models." Presented at the Learning Workshop (Snowbird), Clearwater, Florida, 2009.

David Mimno, Hanna Wallach, Limin Yao, Jason Naradowsky and Andrew McCallum. "Polylingual Topic Models." Presented at the Learning Workshop (Snowbird), Clearwater, Florida, 2009.

David Mimno, Hanna Wallach and Andrew McCallum. "Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors." In Proceedings of the 2008 Workshop on Analyzing Graphs: Theory and Applications (held in conjunction with NIPS), Whistler, Canada, 2008. [pdf]

Hanna Wallach, Charles Sutton and Andrew McCallum. "Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors." In Proceedings of the Workshop on Prior Knowledge for Text and language (held in conjunction with ICML/UAI/COLT), pp. 15–20. Helsinki, Finland, 2008. [pdf]

Mark Dredze, Hanna Wallach, Danny Puller, Tova Brooks, Josh Carroll, Joshua Magarick, John Blitzer and Fernando Pereira. "Intelligent Email: Aiding Users with AI." In Proceedings of the 23rd Conference on Artificial Intelligence (NECTAR Track), pp. 1524–1527. Chicago, Illinois, U.S., 2008.

Mark Dredze and Hanna Wallach. "User Models for Email Activity Management." In Proceedings of the 5th International Workshop on Ubiquitous User Modeling. Gran Canaria, Spain, 2008.

Mark Dredze, Hanna Wallach, Danny Puller and Fernando Pereira. "Generating Summary Keywords for Emails Using Topics." In Proceedings of the 2008 International Conference on Intelligent User Interfaces (IUI 2008), pp. 199–206. Gran Canaria, Spain, 2008.

David Mimno, Hanna Wallach and Andrew McCallum. "Community-based Link Prediction with Text." In Proceedings of the NIPS Statistical Network Modeling Workshop. Whistler, Canada, 2007.

Hanna Wallach. "Topic Modeling: Beyond Bag-of-Words." In Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. Pittsburgh, Pennsylvania, U.S., 2006. [pdf]

Hanna Wallach. "Topic Modeling: Beyond Bag-of-Words." In Proceedings of the 1st Annual North East Student Colloquium on Artificial Intelligence (NESCAI 2006), Ithaca, New York, U.S., 2006.

Hanna Wallach. "Topic Modeling: Beyond Bag-of-Words." In Proceedings of the NIPS Workshop on Bayesian Methods for Natural Language Processing, Whistler, Canada, 2005.

Hanna Wallach. "Efficient Training of Conditional Random Fields." In Proceedings of the 6th Annual Computational Linguistics U.K. Research Colloquium (CLUK 6), Edinburgh, U.K., 2003.

Alan Blackwell and Hanna Wallach. "Diagrammatic Integration of Abstract Operations into Software Work Contexts." In Diagrammatic Representation and Inference, edited by Mary Hegarty, Bernd Meyer and N. Hari Narayanan, pp. 191–205, Springer-Verlag, London, U.K. 2002.

Technical Reports

Hanna Wallach. "Conditional Random Fields: An Introduction." Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004. [pdf]

Slides

"Machine Learning for Complex Social Processes." MSR New England, July 2013. [pdf]

"Textual Analysis of Government Declassification Patterns." Declassification Engine, May 2013. [pdf]

"Machine Learning for Complex Social Processes." NEML, May 2013. [pdf]

"Transparency and Topic Models." DataGotham, Sept. 2012. [pdf]

"Machine Learning, Predictive Text, and Topic Models." UMass Amherst, Oct. 2011. [pdf]

"Computer Science @ UMass." UMass Amherst Women in Engineering Career Day, Oct. 2011. [pdf]

"Statistical Topic Models for Science and Innovation Policy." JSM, Jul. 2011. [pdf]

"Women in Free/Open Source Software Development." Johns Hopkins, Apr. 2011. [pdf]

"Statistical Topic Models for Computational Social Science." Johns Hopkins, Apr. 2011. [pdf]

"Statistical Topic Models for Computational Social Science." University of Chicago, Mar. 2011. [pdf]

"Statistical Topic Models for Computational Social Science." Mount Holyoke, Feb. 2011. [pdf]

"Statistical Topic Models for Studying Collaborative Processes." UMass Amherst, Jan. 2011. [pdf]

"NIPS 2010 Workshop Summary." NIPS, Dec. 2010. [commentary | pdf]

"Statistical Topic Models for Science and Innovation Policy." Williams College, Nov. 2010. [pdf]

"Statistical Topic Models for Science and Innovation Policy." UMass Lowell, Nov. 2010. [pdf]

"Some Stuff About My Lab." UMass Amherst, Sept. 2010. [pdf]

"Statistical Machine Learning Analysis of Debian Mailing Lists." DebConf 10, Aug. 2010. [pdf]

"Text Analysis for Science and Innovation Policy." New Directions in Text Analysis, May 2010. [pdf]

"Learning the Structure of Deep, Sparse Graphical Models." AISTATS, May 2010. [pdf]

"Women in Free/Open Source Software Development." Politics of Open Source, May 2010. [pdf]

"Statistical Models for Science and Innovation Policy." UMass Amherst, Apr. 2010. [pdf]

"Topic Models: Priors, Stop Words and Languages." School of Informatics, University of Edinburgh, Jan. 2010; Division of Applied Mathematics, Brown University, Mar. 2010. [pdf]

"Topic Modeling." Workshop on Applications for Topic Models (held at NIPS), 2009. [pdf]

"Polylingual Topic Models." EMNLP Conference, 2009. [pdf]

"Evaluation Methods for Topic Models." The Learning Workshop, 2009; ICML, 2009. [pdf]

"Bayesian Models for Dependency Parsing Using Pitman-Yor Priors." Workshop on Prior Knowledge for Text and Language Processing (held in conjunction with ICML/UAI/COLT), 2008; Workshop on Unsupervised Latent Variable Models (held at NIPS), 2008. [pdf]

"Machine Learning, Predictive Text, and Topic Models." University of Baltimore, 2007. [pdf]

"Generating Summary Keywords for Emails Using Topics." WiML Workshop, 2007. [pdf]

"Dasher: Information-Efficient Text Entry." Grace Hopper Conference, 2006. [pdf]

"Topic Modeling: Beyond Bag-of-Words." Workshop on Bayesian Methods in NLP (held at NIPS), 2005; Gatsby Machine Learning Journal Club, Gatsby Computational Neuroscience Unit, University College London, 2006; NESCAI, 2006; ICML, 2006; Machine Learning and Friends Lunch, Department of Computer Science, University of Massachusetts Amherst, 2006. [pdf]

"Women in Free and Open Source Software Development: Findings from FLOSSPOLS." Free and Open Source Software Developers' European Meeting (FOSDEM), 2006. [pdf]

"Women in Free and Open Source Software Development." Women in Computing Lecture Series, Department of Computer and Information Science, University of Pennsylvania, 2005. [pdf]

"Debian New Maintainer Process: History and Aims." DebConf 5, 2005. [pdf]

"The Debian Women Project." Free and Open Source Software Developers' European Meeting (FOSDEM), 2005; Women@CL Lunch Talk Series, Computer Laboratory, University of Cambridge, 2005; Libre Software Meeting/Rencontres Mondiales du Logiciel Libre (LSM/RMLL), 2005. [pdf]

"Introduction to Gaussian Process Regression." University of Pennsylvania, 2005. [pdf]

Posters

"An Alternative Prior Process for Nonparametric Bayesian Clustering." AISTATS, 2010. [pdf]

"Cluster-Based Topic Modeling." WiML Workshop, 2008. [pdf]

"Topic Modeling: Beyond Bag-of-Words." ICML, 2006. [pdf]

Workshops

Topic Models: Computation, Application, and Evaluation (NIPS 2013 workshop)

Computational Social Science and the Wisdom of Crowds (NIPS 2011 workshop)

Computational Social Science and the Wisdom of Crowds (NIPS 2010 workshop)

Applications for Topic Models: Text and Beyond (NIPS 2009 workshop)

Annual Workshop for Women in Machine Learning

Contact Details

Department of Computer Science
University of Massachusetts Amherst
140 Governors Drive
Amherst, MA 01003, United States

Telephone: 1.413.545.0330