Chongjie Zhang | AI & Robotics Researcher @ MIT CSAIL

Contact

Chongjie Zhang
MIT CSAIL
32 Vassar St
Cambridge, MA 02139

Office: 32-D310
Email: chongjie@csail.mit.edu

Research Statement

Robots are being introduced into manufacturing processes for reasons of efficiency and safety. Current success largely remains in low-mix mass production, where robots are pre-configured to repeatedly execute one task. Much broader applications require robots capable of performing multiple tasks to support flexible automation. Efficient multi-robot coordination and human-robot collaboration will be indispensable for achieving tasks of diverse complexity in a flexible environment. My research vision is to enable such multi-agent collaboration to generate capabilities and flexibility beyond what individual robots can provide and to leverage the relative strengths of humans and robots.

To realize this vision, I am generating novel, interdisciplinary advances in multi-agent collaboration, drawing on expertise in artificial intelligence, operations research, game theory, and systems science. I have developed planning and learning methods to enable agents (1) to safely and efficiently collaborate in tightly-constrained environments, (2) to learn and co-adapt their collaboration policies under uncertainty, and (3) to self-organize and employ hierarchical control to scale up collaboration. To take multi-agent collaboration to the next level, I aim to develop robots that (1) collaboratively manipulate objects with other robots and humans in close proximity, (2) naturally work with humans to infer goals and construct plans, and (3) collaboratively learn and share skills to enable persistent autonomy.

Planning and Decision-Making

Flexible manufacturing requires the careful choreography of human and robotic agents to support safe and efficient coordinated work. In collaboration with Boeing Research and Technology, I am developing the technology that enables scalable and robust multi-agent coordination under tight constraints and allows automated systems to competently operate under uncertainty.

Task assignment and scheduling is crucial for multi-agent coordination. Tasks must be allocated among agents and scheduled to meet temporal constraints and spatial restrictions on agent proximity. Integrating mathematical programming with heuristic search, I have developed a multi-abstraction search approach (MASA) for optimizing such spatio-temporal planning problems [1]. MASA constructs a hierarchy of abstract problems and incrementally refines its solution with backtracking over abstractions. MASA effectively scales to more than five times larger problems than previous approaches, leads to solutions robust to small disturbances of task execution, and allows fast replanning to respond to unexpected events and situations.

The actual performance of a robot’s task plan depends on kinematic motion plans for achieving its tasks. I have developed an integrated approach that co-optimizes high-level task planning with lower-level motion planning [2]. This approach introduces symbolic action planning for achieving each task to reduce the complexity of geometric motion planning. It dynamically improves the task plan based on the existence and cost of feasible motion plans. I increase the efficiency of plan updates through the use of a novel incremental algorithm, enabling dynamic plan execution.

Flexible manufacturing environments are inherently uncertain. It is especially critical for high-level decisions (e.g., how to place robots and human workers on a production line) to deal competently with uncertainty, as they have a significant impact on the system performance. To enable this competency, I exploited a multi-agent Markov decision process model and proposed a new type of solution criterion, called fairness, for addressing limitations of existing criteria (i.e., utilitarian criterion and Markov perfect equilibrium) [3]. This criterion aims to maximize the throughput of a production line by fairly allocating robots and human workers to its work cells. I have developed a scalable game-theoretic approach for computing an optimal fairness policy [4]. These efforts are the first work on fairness in multi-agent sequential decision making under uncertainty and provide theoretical foundations for further studies. While motivated by manufacturing, my innovations have broader applications where fairness is expected, such as electricity distribution in smart grids, resource allocation in cloud computing, and traffic light control.

Learning and Co-adaptation

Learning is indispensable for enabling robots to effectively operate in an unknown (or partially specified) environment. Due to the complexity of the environment and the absence of sufficient data, it is often infeasible to build a complete and accurate model for planning. I am developing novel algorithms that allow robots to efficiently learn to coordinate with each other and humans in an uncertain, dynamic environment.

In a multi-agent system, the learning of an agent is complicated by the presence of other agents concurrently acting, adapting, and affecting the environment, which is referred to as multi-agent learning. The convergence and optimality guarantees of reinforcement learning do not hold for multi-agent settings. To address this challenge, I developed a gradient-based algorithm that allow agents to concurrently learn to coordinate their actions [5]. This algorithm takes into account the learning effects of other agents through the changes of rewards. I empirically demonstrated that this distributed learning method effectively adapts multi-agent collaboration to the unknown dynamics of the environment, outperforming the traditional centralized best-fit algorithm for online task allocation problems. Furthermore, inspired by human interactions where we often anticipate other people’s behaviors, I introduced the concept of policy prediction and augmented the basic gradient-based learning algorithm [6]. The resulting algorithm achieves two theoretical properties: best-response learning and convergence. Specifically, this algorithm allows an agent to learn the optimal policy to interact with other agents if they use fixed strategies. I also formally proved that this multi-agent learning algorithm converges to a Nash equilibrium in two-action two-agent games.

Many time-critical domains require multi-agent teams to reliably and quickly respond to unforeseen events and situations. My collaborators and I designed a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks requiring coordination [7]. The joint strategies are learned through perturbation training, a human team-training strategy that requires practicing variations of a given task to help the team generalize to new variants of that task. Our Adaptive Perturbation Training (AdaPT) algorithm is a hybrid of transfer learning and reinforcement learning techniques that learns quickly and robustly for new task variants. We empirically validate the benefits of AdaPT through both simulations and human subject experiments.

Organization and Communication

Scalability is a key challenge for multi-agent coordination, especially under uncertainty. Traditional multi-agent learning suffers slow convergence, poor performance, and event divergence in large systems. To achieve both system efficiency and scalability, I am developing a new organizational paradigm that employs supervisory control to coordinate multi-agent learning and co-evolves the control hierarchy with decision policies of learning agents.

Inspired by organizational theory, I developed a low-overhead, heuristic-based supervisory control framework for efficiently organizing and coordinating a large group of learning agents [8, 10]. This novel framework exploits non-local information to dynamically coordinate and shape learning processes of individual agents while still allowing agents to react autonomously to local feedback. Agents can dynamically and automatically form a nearly-decomposable hierarchy based on their interactions, where each group of learning agents is coordinated by one supervisor [6]. I designed both horizontal and vertical communication protocols, and developed strategies for learning agents to integrate supervisory information [9]. This framework employs heuristics to generate supervisory information for coordinating their subordinates. Empirical results demonstrated that this framework dramatically improves the speed, likelihood, and quality of the convergence of multi-agent learning in large complex systems. This framework provides the first practical multi-agent learning paradigm that effectively scales to thousands of agents.

I formalized this organizational paradigm and designed a domain-independent control framework for coordinating and scaling up multi-agent learning [11, 12]. This framework provides a performance guarantee for a rich class of multi-agent decision-making problems. Specifically, I formulated the dynamic coordination of agents’ learning processes as a sequence of constraint optimization problems, and exploited message-passing algorithms to automatically solve them. I empirically demonstrated that, even in problems where optimality is not guaranteed, this coordinated learning approach increases the learning performance by more than 100%. I generalized this framework by enabling supervisory agents to learn when and with whom to coordinate [13]. This generalized framework reduces the communication cost for coordination by one order of magnitude without significant performance loss (e.g., less than 5%).

Future Plan

In addition to my current research, I will develop a research platform that pursues the following three themes towards realizing the vision of effective and efficient multi-agent collaboration:

Collaborative manipulation. I will develop robots capable of working with other agents in close proximity and performing collaborative manipulation tasks. Effective collaborative manipulation leverages the best skills of humans (e.g., complex perception, failure recovery, and domain expertise) and robots (e.g., strength, speed, precision, and dexterity) and enables a team of low-cost simple robots to efficiently achieve diverse complex tasks. To increase efficiency, I will develop models to enable both simultaneous and supportive collaboration as well as conventional sequential collaboration. To ensure safety, I will design co-adaption methods to enable robots to quickly and reliably avoid collision in a dynamic human environment. I will also study human behavior models and develop methods for robots to generate behaviors that are readable, understandable, and predictable for humans.

Human-robot collaborative planning. My goal is to enable autonomous agents to actively engage humans in the planning process. A key requirement for robots fluently working with humans is the ability to infer goal information and planning constraints embedded in natural language instructions. Robots will automatically detect conflicting or ambiguous goals and constraints, and proactively ask for clarification and help (e.g., narrowing down the plan search) when necessary. Robots excel at planning and scheduling with complex constraints and performing multi-objective optimization. However, it is possible that a robot, acting and communicating based on optimal reasoning, may seem unintuitive to the human teammate and possibly degrade team performance. It is critical for robots to learn and integrate humans’ preferences during the planning process. I will investigate a general framework for enabling this natural human-robot collaboration, integrating planning, machine learning, natural language processing, and common-sense reasoning.

Collaborative learning. I will develop robots that collaboratively learn and share skills. I believe knowledge is created within a population where members actively interact by sharing experiences and take on asymmetric roles. Collaborative learning extends the ability of skill acquisition beyond individual robots. Robots can concurrently learn on the same task and then combine their learned skills, or take responsibility for a specific section and then coordinate their respective parts together. Efficient collaborative learning requires robots capable of skill abstraction, characterization, and combination. Robots can collaboratively learn skills through exploration or from demonstration. Collaborative learning is critical for robots to enable persistent autonomy in unstructured environments. I will study hierarchical skill representation and develop a collaborative learning framework to enable skill discovery, transfer, and combination.

Summary

My goal is to develop autonomous robots that effectively and efficiently collaborate with each other and humans to achieve a diverse range of complex tasks in unstructured and dynamic environments. I am developing technology enabling large-scale multi-agent collaboration in flexible environments, with or without a complete environment model. In the future, I will also design collaborative planning and learning methods to enable physical, natural, and persistent multi-agent interaction.

My research is in accord with the agendas of many funding agencies. I will apply for funding from NSF, ONR, NASA, NIH, and DARPA, which provide growing opportunities in robotics and autonomy (e.g., National Robotics Initiative (NRI), NFS’ Cyber-Human Systems program, ONR’s Human-Robot Interaction program, and ONR’s Science of Autonomy program). In 2011, I wrote an NSF proposal based on my dissertation, which was awarded a $450,000 grant. As a post-doc at MIT, I have been working with Boeing and understand the needs of industry sponsors. In the future, I will also seek corporate sponsorship for my academic research.

References

[1] Chongjie Zhang and Julie A. Shah. Co-optimizing multi-agent placement with task assignment and scheduling. In International Joint Conference on Artificial Intelligence (IJCAI’16), 2016.

[2] Chongjie Zhang and Julie A. Shah. Multi-level optimization from motion planning to task planning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’16), 2016.

[3] Chongjie Zhang and Julie A Shah. On fairness in decision-making under uncertainty: Definitions, computation, and comparison. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15), 2015.

[4] Chongjie Zhang and Julie A Shah. Fairness in multi-agent sequential decision-making. In Advances in Neural Information Processing Systems (NIPS’14), 2014.

[5] Chongjie Zhang, Victor R Lesser, and Prashant J Shenoy. A multi-agent learning approach to online distributed resource allocation. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09), 2009.

[6] Chongjie Zhang and Victor R Lesser. Multi-agent learning with policy prediction. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10), 2010.

[7] Ramya Ramakrishnan, Chongjie Zhang, and Julie A. Shah. Perturbation training for human-robot teams. Submitted to Journal of Artificial Intelligence Research, 2016. Under Review.

[8] Chongjie Zhang, Sherief Abdallah, and Victor Lesser. Integrating organizational control into multi-agent learning. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’09), 2009.

[9] Xiangbin Zhu, Chongjie Zhang, and Victor Lesser. Combining dynamic reward shaping and action shaping for coordinating multi-agent learning. In 2013 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’13), 2013.

[10] Dan Garant, Bruno Castro da Silva, Victor Lesser, and Chongjie Zhang. Concurrent and incremental transfer learning in a network of reinforcement learning agents. In Submitted to International Joint Conference on Artificial Intelligence (IJCAI’16), 2016. Under Review.

[11] Chongjie Zhang and Victor R Lesser. Coordinated multi-agent reinforcement learning in networked distributed pomdps. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI’11), 2011.

[12] Duc Thien Nguyen, William Yeoh, Hoong Chuin Lau, Shlomo Zilberstein, and Chongjie Zhang. Decentralized multi-agent reinforcement learning in average-reward dynamic dcops. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI’14), 2014.

[13] Chongjie Zhang and Victor Lesser. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’13), 2013.