I have spent several semesters as a research intern/student researcher at Google Brain in Mountain View (Fall 2022, Winter 2021-22 & Spring 2022, Winter 2020-21 & Spring 2021), Google Brain in New York (Summer 2022), Google Research in Mountain View (Summer & Fall 2021, Summer & Fall 2020), and Microsoft Research in Montreal (Summer 2019).
Research
Along with my advisor, I work on problems that lie at the intersection of natural language processing (NLP) and deep learning.
My research interests lie broadly in using transfer learning, unsupervised learning, and semi-supervised learning methods that exploit large volumes of unlabeled data or beneficial relationships among tasks to improve performance on tasks with limited or no labeled data. Currently, I am excited about prompt-based learning methods that use text prompts and/or "soft prompts" (a sequence of tunable tokens) to condition a frozen language model to perform different tasks.
Topics in this vein that I work on include:
in-context learning
prompt tuning
zero-shot/few-shot learning
parameter-efficient transfer learning
exploring and predicting transferability between NLP tasks
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer Tu Vu, Brian Lester, Noah Constant, Rami Al-Rfou, Daniel Cer ACL 2022
One of the three projects highlighted in the Headlines of Google AI’s Natural Language Accelerated Newsletter, Q1, 2022
There has been growing interest in parameter-efficient methods to apply pre-trained language models to downstream tasks. Building on the Prompt Tuning approach of Lester et al. (2021), which learns task-specific soft prompts to condition a frozen pre-trained model to perform different tasks, we propose a novel prompt-based transfer learning approach called SPoT: Soft Prompt Transfer. SPoT first learns a prompt on one or more source tasks and then uses it to initialize the prompt for a target task. We show that SPoT significantly boosts the performance of Prompt Tuning across many tasks. More remarkably, across all model sizes, SPoT matches or outperforms standard Model Tuning (which fine-tunes all model parameters) on the SuperGLUE benchmark, while using up to 27,000x fewer task-specific parameters. To understand where SPoT is most effective, we conduct a large-scale study on task transferability with 26 NLP tasks in 160 combinations, and demonstrate that many tasks can benefit each other via prompt transfer. Finally, we propose an efficient retrieval approach that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.
@inproceedings{vu-etal-2022-spot,
title = "{SP}o{T}: Better Frozen Model Adaptation through Soft Prompt Transfer",
author = "Vu, Tu and
Lester, Brian and
Constant, Noah and
Al-Rfou{'}, Rami and
Cer, Daniel",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.346",
doi = "10.18653/v1/2022.acl-long.346",
pages = "5039--5059",
}
Exploring and Predicting Transferability across NLP Tasks Tu Vu, Tong Wang, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer EMNLP 2020
Taught/discussed in several NLP courses, including COMP790-101 by Colin Raffel at UNC Chapel Hill, CS 685 by Mohit Iyyer at UMass Amherst, ECE 594 by Suma Bhat at UIUC
Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we conduct an extensive study of the transferability between 33 NLP tasks across three broad classes of problems (text classification, question answering, and sequence labeling). Our results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task (e.g., part-of-speech tagging transfers well to the DROP QA dataset). We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size. Overall, our experiments reveal that factors such as source data size, task and domain similarity, and task complexity all play a role in determining transferability.
@inproceedings{vu-etal-2020-exploring,
title = "Exploring and Predicting Transferability across {NLP} Tasks",
author = "Vu, Tu and
Wang, Tong and
Munkhdalai, Tsendsuren and
Sordoni, Alessandro and
Trischler, Adam and
Mattarella-Micke, Andrew and
Maji, Subhransu and
Iyyer, Mohit",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.635",
pages = "7882--7926",
}
Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available. To address this shortcoming, we propose STraTA, which stands for Self-Training with Task Augmentation, an approach that builds on two key ideas for effective leverage of unlabeled data. First, STraTA uses task augmentation, a novel technique that synthesizes a large amount of data for auxiliary-task fine-tuning from target-task unlabeled texts. Second, STraTA performs self-training by further fine-tuning the strong base model created by task augmentation on a broad distribution of pseudo-labeled data. Our experiments demonstrate that STraTA can substantially improve sample efficiency across 12 few-shot benchmarks. Remarkably, on the SST-2 sentiment dataset, STraTA, with only 8 training examples per class, achieves comparable results to standard fine-tuning with 67K training examples. Our analyses reveal that task augmentation and self-training are both complementary and independently effective.
@inproceedings{vu-etal-2021-strata,
title = "{ST}ra{TA}: Self-Training with Task Augmentation for Better Few-shot Learning",
author = "Vu, Tu and
Luong, Minh-Thang and
Le, Quoc and
Simon, Grady and
Iyyer, Mohit",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.462",
doi = "10.18653/v1/2021.emnlp-main.462",
pages = "5715--5731",
}
In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai. However, a significant gap still remains between these methods and fully-supervised baselines. To improve cross-lingual transfer further, we explore several approaches, including: (1) mixing in unlabeled multilingual data, and (2) explicitly factoring prompts into recombinable language and task components. Our approaches can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
@inproceedings{vu-etal-2022-overcoming,
title = "Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation",
author = "Vu, Tu and
Barua, Aditya and
Lester, Brian and
Cer, Daniel and
Iyyer, Mohit and
Constant, Noah",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "",
doi = "",
pages = "",
}
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts ICML 2023
Google Research's Blog: The Flan Collection: Advancing open source methods for instruction tuning
The current best “open-source” large language models for both few-shot prompting and fine-tuning // Hugging Face's models
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at this https URL.
@article{longpre2023flan,
title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning},
author={Longpre, Shayne and Hou, Le and Vu, Tu and Webson, Albert and Chung, Hyung Won and Tay, Yi and Zhou, Denny and Le, Quoc V. and Zoph, Barret and Wei, Jason and Roberts, Adam},
journal={arXiv preprint arXiv:2301.13688},
url = "https://arxiv.org/abs/2301.13688",
year={2023}
}
All Recent Preprints/Publications
Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts
Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou arXiv 2023
The explosive growth of language models and their applications have led to an increased demand for efficient and scalable methods. In this paper, we introduce Flan-MoE, a set of Instruction-Finetuned Sparse Mixture-of-Expert (MoE) models. We show that naively finetuning MoE models on a task-specific dataset (in other words, no instruction-finetuning) often yield worse performance compared to dense models of the same computational complexity. However, our Flan-MoE outperforms dense models under multiple experiment settings: instruction-finetuning only and instruction-finetuning followed by task-specific finetuning. This shows that instruction-finetuning is an essential stage for MoE models. Specifically, our largest model, Flan-MoE-32B, surpasses the performance of Flan-PaLM-62B on four benchmarks, while utilizing only one-third of the FLOPs. The success of Flan-MoE encourages rethinking the design of large-scale, high-performance language models, under the setting of task-agnostic learning.
@article{longpre2023flan,
title={Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts},
author={Sheng Shen and Le Hou and Yanqi Zhou and Nan Du and Shayne Longpre and Jason Wei and Hyung Won Chung and Barret Zoph and William Fedus and Xinyun Chen and Tu Vu and Yuexin Wu and Wuyang Chen and Albert Webson and Yunxuan Li and Vincent Zhao and Hongkun Yu and Kurt Keutzer and Trevor Darrell and Denny Zhou},
journal={arXiv preprint arXiv:2305.14705},
url = "https://arxiv.org/abs/2305.14705",
year={2023}
}
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts ICML 2023
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at this https URL.
@article{longpre2023flan,
title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning},
author={Longpre, Shayne and Hou, Le and Vu, Tu and Webson, Albert and Chung, Hyung Won and Tay, Yi and Zhou, Denny and Le, Quoc V. and Zoph, Barret and Wei, Jason and Roberts, Adam},
journal={arXiv preprint arXiv:2301.13688},
url = "https://arxiv.org/abs/2301.13688",
year={2023}
}
Dialect-robust Evaluation of Generated Text
Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann ACL 2023
Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects. However, currently, there exists no way to quantify how metrics respond to change in the dialect of a generated utterance. We thus formalize dialect robustness and dialect awareness as goals for NLG evaluation metrics. We introduce a suite of methods and corresponding statistical tests one can use to assess metrics in light of the two goals. Applying the suite to current state-of-the-art metrics, we demonstrate that they are not dialect-robust and that semantic perturbations frequently lead to smaller decreases in a metric than the introduction of dialect features. As a first step to overcome this limitation, we propose a training schema, NANO, which introduces regional and language information to the pretraining process of a metric. We demonstrate that NANO provides a size-efficient way for models to improve the dialect robustness while simultaneously improving their performance on the standard metric benchmark.
@inproceedings{sun-etal-2022-dialect,
title = "Dialect-robust Evaluation of Generated Text",
author= "Sun, Jiao and
Sellam, Thibault and
Clark, Elizabeth and
Vu, Tu and
Dozat, Timothy and
Garrette, Dan and
Siddhant, Aditya and
Eisenstein, Jacob and
Gehrmann, Sebastian},
journal={arXiv preprint arXiv:2211.00922},
url = "https://arxiv.org/abs/2211.00922",
year={2022}
}
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation Tu Vu, Aditya Barua, Brian Lester, Daniel Cer, Mohit Iyyer, Noah Constant EMNLP 2022
In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai. However, a significant gap still remains between these methods and fully-supervised baselines. To improve cross-lingual transfer further, we explore several approaches, including: (1) mixing in unlabeled multilingual data, and (2) explicitly factoring prompts into recombinable language and task components. Our approaches can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
@inproceedings{vu-etal-2022-overcoming,
title = "Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation",
author = "Vu, Tu and
Barua, Aditya and
Lester, Brian and
Cer, Daniel and
Iyyer, Mohit and
Constant, Noah",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "",
doi = "",
pages = "",
}
There has been growing interest in parameter-efficient methods to apply pre-trained language models to downstream tasks. Building on the Prompt Tuning approach of Lester et al. (2021), which learns task-specific soft prompts to condition a frozen pre-trained model to perform different tasks, we propose a novel prompt-based transfer learning approach called SPoT: Soft Prompt Transfer. SPoT first learns a prompt on one or more source tasks and then uses it to initialize the prompt for a target task. We show that SPoT significantly boosts the performance of Prompt Tuning across many tasks. More remarkably, across all model sizes, SPoT matches or outperforms standard Model Tuning (which fine-tunes all model parameters) on the SuperGLUE benchmark, while using up to 27,000x fewer task-specific parameters. To understand where SPoT is most effective, we conduct a large-scale study on task transferability with 26 NLP tasks in 160 combinations, and demonstrate that many tasks can benefit each other via prompt transfer. Finally, we propose an efficient retrieval approach that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.
@inproceedings{vu-etal-2022-spot,
title = "{SP}o{T}: Better Frozen Model Adaptation through Soft Prompt Transfer",
author = "Vu, Tu and
Lester, Brian and
Constant, Noah and
Al-Rfou{'}, Rami and
Cer, Daniel",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.346",
doi = "10.18653/v1/2022.acl-long.346",
pages = "5039--5059",
}
Despite their recent successes in tackling many NLP tasks, large-scale pre-trained language models do not perform as well in few-shot settings where only a handful of training examples are available. To address this shortcoming, we propose STraTA, which stands for Self-Training with Task Augmentation, an approach that builds on two key ideas for effective leverage of unlabeled data. First, STraTA uses task augmentation, a novel technique that synthesizes a large amount of data for auxiliary-task fine-tuning from target-task unlabeled texts. Second, STraTA performs self-training by further fine-tuning the strong base model created by task augmentation on a broad distribution of pseudo-labeled data. Our experiments demonstrate that STraTA can substantially improve sample efficiency across 12 few-shot benchmarks. Remarkably, on the SST-2 sentiment dataset, STraTA, with only 8 training examples per class, achieves comparable results to standard fine-tuning with 67K training examples. Our analyses reveal that task augmentation and self-training are both complementary and independently effective.
@inproceedings{vu-etal-2021-strata,
title = "{ST}ra{TA}: Self-Training with Task Augmentation for Better Few-shot Learning",
author = "Vu, Tu and
Luong, Minh-Thang and
Le, Quoc and
Simon, Grady and
Iyyer, Mohit",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.462",
doi = "10.18653/v1/2021.emnlp-main.462",
pages = "5715--5731",
}
Exploring and Predicting Transferability across NLP Tasks. Tu Vu, Tong Wang, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer EMNLP 2020
Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we conduct an extensive study of the transferability between 33 NLP tasks across three broad classes of problems (text classification, question answering, and sequence labeling). Our results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task (e.g., part-of-speech tagging transfers well to the DROP QA dataset). We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size. Overall, our experiments reveal that factors such as source data size, task and domain similarity, and task complexity all play a role in determining transferability.
@inproceedings{vu-etal-2020-exploring,
title = "Exploring and Predicting Transferability across {NLP} Tasks",
author = "Vu, Tu and
Wang, Tong and
Munkhdalai, Tsendsuren and
Sordoni, Alessandro and
Trischler, Adam and
Mattarella-Micke, Andrew and
Maji, Subhransu and
Iyyer, Mohit",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.635",
pages = "7882--7926",
}
While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.
@inproceedings{vu-iyyer-2019-encouraging,
title = "Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification",
author = "Vu, Tu and
Iyyer, Mohit",
booktitle = "Proceedings of the 57th Conference of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1638",
pages = "6331--6338"
}
Supervised distributional methods are applied successfully in lexical entailment, but recent work questioned whether these methods actually learn a relation between two words. Specifically, Levy et al. (2015) claimed that linear classifiers learn only separate properties of each word. We suggest a cheap and easy way to boost the performance of these methods by integrating multiplicative features into commonly used representations. We provide an extensive evaluation with different classifiers and evaluation setups, and suggest a suitable evaluation setup for the task, eliminating biases existing in previous ones.
@inproceedings{vu-shwartz-2018-integrating,
title = "Integrating Multiplicative Features into Supervised Distributional Methods for Lexical Entailment",
author = "Vu, Tu and
Shwartz, Vered",
booktitle = "Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics",
month = jun,
year = "2018",
address = "New Orleans, Louisiana",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/S18-2020",
pages = "160--166",
}
Sentence Simplification with Memory-Augmented Neural Networks. Tu Vu, Baotian Hu, Tsendsuren Munkhdalai, Hong Yu NAACL-HLT 2018
Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.
@inproceedings{vu-etal-2018-sentence,
title = "Sentence Simplification with Memory-Augmented Neural Networks",
author = "Vu, Tu and
Hu, Baotian and
Munkhdalai, Tsendsuren and
Yu, Hong",
booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)",
month = jun,
year = "2018",
address = "New Orleans, Louisiana",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/N18-2013",
pages = "79--85"
}
Patents * Original inventor
Frozen Model Adaptation through Soft Prompt Transfer Tu Vu*, Brian Lester, Noah Constant, Rami Al-Rfou, Daniel Cer U.S. Patent Application Serial No. 17/863,840
Task Augmentation and Self-training for Improved Few-shot Learning
Minh-Thang Luong, Tu Vu*, Quoc V. Le, Grady Simon U.S. Patent Application Serial No. 17/826,690
Champion Prize (rank 1st out of 600 contestants nationally), National Mathematical Olympiad, Vietnam, 2010
First Runner-up Prize (rank 2nd out of 64 teams internationally), International Programming Contest, Japan, 2011 // Press (Vietnamese)
Honorable Mention, ACM-ICPC International Collegiate Programming Contest, Asia Regional Contest, 2010
Outstanding Young Talent of the Capital City (one of the top 100 most outstanding young talents selected from a wide range of fields), Hanoi, Vietnam, 2010
Outstanding Academic and Co-curricular Achievements (class rank: 1st out of 100, highest distinction), Vietnam National University, Hanoi, 2009 – 2013
A number of prizes in National/International Olympiads in both Mathematics and Informatics, 2006 – 2013
A number of academic scholarships for undergraduate students, 2009 – 2013