Eric Xing

Scroll →

My Group CV

President

Mohamed bin Zayed University of Artificial Intelligence

University Professor

MBZUAI - Machine Learning [bio]

CMU - Machine Learning Department

Language Technology Institute & Computer Science Department

School of Computer Science

Carnegie Mellon University

Research synopsis

My principal research interests lie in the development of machine learning and statistical methodology, and large-scale computational system and architecture, for solving problems involving automated learning, reasoning, and decision-making in high-dimensional, multimodal, and dynamic possible worlds in artificial, biological, and social systems.

In recent years, I have been focusing on building large language models, world/agent models, and foundation models for biology.

Recent activities:

Lately (2025), we have launched:

AIDO (AI-Driven Digital Organism), a system of multiscale foundation models for predicting, simulating, and programming biology at all levels.
PAN-World, a true world model builds an action-conditioned simulation for counterfactual reasoning, foresight, planning and safe action.
K2/K2-Think, leading open-source LLM/system that delivers frontier capabilities and advanced AI Reasoning.

Current Ph.D. students and postdocs:

Past students and postdocs:

Amr Ahmed, Research Scientist, Google
Maruan Al-Shedivat, Principal Research Scientist, Genesis Therapeutics
Bryan Aragam, Assistant Professor, University of Chicago
Sangkeun Choe, Engineer, Anthropic
Ross Curtis, Software Engineer, AncestryDNA
Wei Dai, Research Engineer, Apple
Kumar Avinava Dubey, Research Scientist, Google
Jacob Eisenstein, Assistant Professor, Georgia Institute of Technology
Wenjie Fu, Director of Engineering, Meta
Xuchen Gong, PhD Student, UChicago
Anuj Goyal, Software Engineer, LinkedIn
Steve Hanneke, Assistant Professor, Purdue University
Kamisetty Hetunandan, CTO, Xaira
Qirong Ho, Assistant Professor, MBZUAI
Judie Howlaka, Assistant Professor, Penn State University
Zhiting Hu, Assistant Professor, UC San Diego
Gunhee Kim, Associate Professor, Seoul National University
Jin Kyu Kim, Research Scientist, Meta Reality Lab
Abhimanu Kumar, Senior Research Engineer, LinkedIn
Seyoung Kim, Associate Professor, Carnegie Mellon University
Mladen Kolar, Professor, University of Southern California
Lisa Lee, Research Scientist, Google
Seunghak Lee, Research Scientist, Meta
Ben Lengcerih, Assistant Professor, University of Wisconsin-Madison
Xiaodan Liang, Associate Professor, Sun Yat-sen University
Andre Martins, Associate Professor, Priberam Labs and Instituto Superior Tecnico
Micol Marchetti-Bowick, Principal Software Engineer, Aurora
Willie Neiswanger, Assistant Professor, University of Southern California
Ankur Parikh, Staff research scientist at Google, adjunct assistant professor at NYU.
Kriti Punyakanok, Research Scientist, Google
Aurick Qiao, Thinking Machines Lab
Pradipta Ray, Research Scientist, University of Taxes Dallas
Mrinmaya Sachan, Assistant Professor, ETH, Zurich
Suyash Shringarpure, Senior Scientist, Statistical Genetics at 23andMe
Kyung-Ah Sohn, Professor, Ajou University
Le Song, Professor, MBZUAI
Chong Wang, Research Scientist, Google
Jinliang Wei, Engineer, Google
Sinead Williamson, Assistant Professor, University of Texas Austin
Andrew Wilson, Professor, NYU
Haoyi Wang, Assistant Professor, UIUC
Hongyi Wang, Director of Infrastructure, GenBio AI; Assistant Professor, Rutgers
Pengtao Xie, Associate Professor, UC San Diego
Junming Yin, Assistant Professor, University of Arizona
Yaoliang Yu, Assistant Professor, University of Waterloo
Bin Zhao, Entrepreneur
Hao Zhang, Assistant Professor, UC San Diego
Xun Zheng, Research Scientist, Uber
Bing Zhao, Research Scientist, SRI
Jun Zhu, Professor, Tsinghua University

Services:

Board Member, The International Machine Learning Society.
Program Committee Chair, ICML 2014.
General Chair, ICML 2019.
Action Editor/Associate Editor: JASA, AOAS, JMLR, MLJ, and PAMI.
I am a member of the DARPA Information Science and Technology (ISAT) Advisory Group.
And I serve on the NIH Bio-Data Management and Analysis (BDMA) Study Section.

Professional short bio

Eric P. Xing is the President of the Mohamed bin Zayed University of Artificial Intelligence, and a Professor of Computer Science at Carnegie Mellon University. He completed his undergraduate study at Tsinghua University, and holds a PhD in Molecular Biology and Biochemistry from the Rutgers University, and a PhD in Computer Science from the University of California, Berkeley. His main research interests are the development of machine learning and statistical methodology, and large-scale computational system and architectures, for solving problems involving automated learning, reasoning, and decision-making in high-dimensional, multimodal, and dynamic possible worlds in artificial, biological, and social systems. Prof. Xing currently serves or has served the following roles: associate editor of the Journal of the American Statistical Association (JASA), Annals of Applied Statistics (AOAS), IEEE Journal of Pattern Analysis and Machine Intelligence (PAMI) and the PLoS Journal of Computational Biology; action editor of the Machine Learning Journal (MLJ) and Journal of Machine Learning Research (JMLR); member of the United States Department of Defense Advanced Research Projects Agency (DARPA) Information Science and Technology (ISAT) advisory group. He is a recipient of the National Science Foundation (NSF) Career Award, the Alfred P. Sloan Research Fellowship, the United States Air Force Office of Scientific Research Young Investigator Award, the IBM Open Collaborative Research Faculty Award, as well as several best paper awards. Prof Xing is a board member of the International Machine Learning Society; he has served as the Program Chair (2014) and General Chair (2019) of the International Conference of Machine Learning (ICML); he is also the Associate Department Head of the Machine Learning Department, founding director of the Center for Machine Learning and Health at Carnegie Mellon University; he is a Fellow of AAAI, ACM, ASA, IEEE, and IMS.

A little more about me

I was born in Shanghai, China and spent my childhood there. After completing a B.S. degree major in Physics and minor in Biology in Tsinghua University, Beijing, I came to the United States and studied the genetic mechanisms of human carcinogenesis at Rutgers University, New Jersey, under Professor Chung S. Yang and obtained my first Ph.D. in Molecular Biology and Biochemistry. Not totally satisfied with the extend and nature of understanding of biological phenomena I could reach via pure experimental approaches, I moved on and turned to statistical machine learning, and completed a second Ph.D. in Computer Science at U.C. Berkeley, under Professors Michael Jordan, Richard Karp, and Stuart Russell. I joined the faculty of CS@CMU in 2004, where I have been directing the SAILING Lab whose research spans a broad spectrum of topics ranging from theoretical foundations to real-world applications in machine learning, distributed systems, computer vision, natural language processing, and computational biology. I was awarded early-tenure in 2011 (two years ahead of clock), and was named a full professor in 2014. In 2016, I founded Petuum Inc. to pursue standardization and industrialization of general-purpose AI platform and building blocks. Petuum was recognized as a Technology Pioneer by the World Economic Forum in 2018.

Research overview:

In the SAILING Lab, we emphasize advancing foundational and applied artificial intelligence through innovative approaches in AI4Bio, artificial general intelligence, and large language models. Our work spans the development of multiscale biological models, genome and protein analysis, and clinical AI applications, alongside creating scalable and efficient frameworks for complex biological systems. We are committed to exploring next-generation AGI through world models for simulative and goal-oriented reasoning, addressing gaps in embodied, social, and strategic intelligence. Our efforts in large language models focus on open-source innovation, cost-efficient scaling, and integration as components in broader AI systems.

ARE being studied:

The following themes ARE being studied in my group:

AI for Biology (AI4Bio): with emphasis on multiscale foundation models, genome and transcriptome analysis, protein modeling, clinical applications, and scalable multimodal frameworks to address complex biological challenges and enable actionable insights across scales. Of particular interest are:
1. Multi-scale foundation models for simulating and predicting biological systems across scales
2. Genome and transcriptome analysis using hierarchical latent spaces to model DNA, RNA, and protein functions
3. Protein modeling systems for understanding regulatory logic and enabling functional design
4. Clinical and healthcare applications addressing disease heterogeneity and advancing personalized treatment
5. Multi-scale biological models connecting molecular mechanisms to organism-level phenomena
6. Scalable, efficient frameworks integrating multimodal biological data for complex analysis
Representative work: AIDO (AI-Driven Digital Organism)
Artificial General Intelligence (AGI): with emphasis on developing world and agent models to simulate real-world reasoning, address embodied and social reasoning gaps, and explore the philosophical foundations of intelligence and agency. Of particular interest are:
1. World models simulating real-world possibilities for physical, social, and biological reasoning
2. Agent models incorporating planning, belief systems, goal-oriented behaviour, and environmental interaction
3. Addressing gaps in embodied reasoning, social dynamics, and strategic decision-making
4. Philosophical exploration of intelligence and agency through reasoning frameworks
Representative work: PAN-World
Large Language Models (LLMs): with emphasis on open-source development, cost-efficient scaling, transparency, domain-specific adaptations, and integration of LLMs as components in broader reasoning and action systems. Of particular interest are:
1. Open-source initiatives like K2/K2-Think, Jais 70B, and Vicuna as collaborative academic efforts
2. Scaling and optimization strategies to reduce costs, improve speed, and enhance eco-friendliness
3. Transparency and reproducibility through LLM360, promoting open access and academic engagement
4. Domain-specific adaptations enhancing contextual reasoning and addressing specialized tasks
5. Integration of LLMs as components within broader frameworks for world and agent modeling
Representative work: K2/K2-Think

HAVE being studied:

The following themes HAVE been studied in my group:

Core Machine learning: represented by :
1. Theory and algorithms for learning time/space varying-coefficient models with evolving structures or sample-specific (personalized) structures
2. Meta ML and trustworthy ML for generalizable and adversary-robust algorithms
3. The "standard equation" for ML: building a unifying framework for various ML paradigms via a standardized forms of loss, model, and solver.
4. Theory and algorithms for learning sparse structured input/output models and multi-task models in ultra high-dimensional space
5. Nonparametric Bayesian methods, infinite mixture models, algorithms and applications of Bayesian nonparametrics for data mining and object/topic/event tracking in open, evolving possible worlds
6. Nonparametric graphical models, RKHS embedding and spectrum algorithm for general graph models
7. Distributed and online algorithms for optimization, approximate inference, and Monte Carlo sampling on large-scale data and models
System Architecture and Strategies for Large Scale ML: with emphasize on developing general purpose systems for machine learning on massive data with massive model on industrial-scale multicore and distributed systems. Of particular interest are:
1. Design and implementation of representations and systems for composable ML parallelism
2. Global and local protocols for adaptive scheduling in multi-tenant multi-job distributed ML
3. Theoretical analysis of distributed ML system behaviors
4. Automated model learning and tuning via neural architectural search (NAS) and hyperparameter optimization (HPO)
Healthcare and Medical Applications: with emphasis on developing algorithms and solutions that address problems of practical clinical, medical, and biological concerns. Of particular interest are:
1. Robot radiologist: reasoning on rediological images, clinical case report generation, medical training image generation
2. EHR-based patient modeling and prediction, ICD coding
3. Sample specific models for panomic-microenvironment interactions in cancer development or cell differentiation via joint analysis of genomic, proteomic, cytogenetic and pathway signaling data
4. Statistical inference on genetic fingerprints, pedigrees, and their associations to diseases and other complex traits; application to clinical diagnosis and forensic analysis
Information and Intelligent Systems: with emphasis on developing web-scale, multi-core, and on-line machine learning systems for social media, computer vision, and HCI applications. Of particular interest are:
1. Multi-view latent space models, topics models, sparse coding methods for image/text/relational information retrieval
2. Evolving structure, stable metrics, and prediction for large-scale dynamic social networks; goal-driven network design/modification/optimization
3. eb-scale image understanding, search, annotation, and retrieval; photo storyline; analysis of video and multimedia
4. User modeling and personalization, computational advertising, and temporal analysis based on image, text, and activities

I am teaching:

I am teaching Graduate Introduction to Machine Learning (10701) again in Fall 2020, with Professor Ziv Bar-Joseph
I have been teaching Probabilistic Graphical Models (10708), an advanced graduate course on theory, algorithm, and application for multivariate modeling, inference, and deep learning since 2005 at CMU. All the past versions are available here.
Video lectures of Probabilistic Graphical Models (10708): 2014, 2019, 2020.
In Fall 2014, I taught Advanced Machine Learning (10715), a newly created required course for ML Ph.D. students, with Prof. Barnabas Poczos.
I regularly teach Graduate Machine Learning (10701), which is a general Ph.D.-level intro. ML for CMU students from all majors.

I have taught:

I taught Machine Learning (10601) in Fall 2013 with Professor William Cohen.
I taught Probabilistic Graphical Models (10708) in Spring 2013.
Previously I co-taught Machine Learning (10701) with Prof. Aarti Singh in Fall 2012;
and I taught Computational Genomics (10810) in Spring 2009.
The Dragon Star Lectures: Advanced Machine Learning, @ Peking/Tsinghua Univ., Beijing, Summer 2009.

Research and development:

On June 11th, 2020, we launched the Petuum ML open source consortium that brings our research and development at Petuum Inc. and CMU Sailing Lab on Distributed ML (e.g., AutoDist, AdaptDL), Automated ML (e.g., Dragonfly, ProBO), and Composable ML (e.g., Texar, Forte) implemented across PyTorch and TensorFlow under a unified umbrella.
On December 25th, 2013, we made an initial open-source release of Petuum, a new framework for distributed machine learning with massive data, big models, and a wide spectrum of algorithms. Updates on Petuum are released every three months. The latest release (version 1.1) was made in July, 2015.

Teaching:

I have been teaching Probabilistic Graphical Models (10708), an advanced graduate course on theory, algorithm, and application for multivariate modeling, inference, and deep learning since 2005 at CMU. For all the past versions, please see here.
Video lectures of Probabilistic Graphical Models (10708): 2014, 2019, 2020.
I regularly teach Graduate Machine Learning (10701), which is a general Ph.D.-level intro. ML for CMU students from all majors.

Sabbatical and leave:

I am currently on partial leave from CMU to serve as President at the Mohamed bin Zayed University of Artificial Intelligence.
I was on sabbatical from 2018 to 2019 as the CEO and Chief Scientist of Petuum Inc. Currently I serve as the Executive Chairman of its Board.
I was on sabbatical from 2010 to 2011 as a visiting professor at Department of Statistics, Stanford University.
I was also a visiting professor during 2010-2011 at Facebook, working on a variety of projects on social media.

Talks and tutorials:

From Learning, to Meta-Learning, to "Lego-Learning -- A pathway toward autonomous AI [video] [slides], CMU AI Seminar, 2022.
It is time for deep learning to understand its expense bills [video], KDD Deep Learning Day 2021.
Learning-to-learn through Model-based Optimization: HPO, NAS, and Distributed Systems [video], ACL 2021 workshop on Meta Learning and Its Applications to Natural Language Processing.
A Data-Centric View for Composable Natural Language Processing [video1] [video2], ICML 2021 Machine Learning for Data Workshop.
Thoughts and Efforts on AI Meeting Production [video], Jeffrey L. Elman Distinguished Lecture Series, Halicioglu Data Science Inst., UC San Diego, 2021.
Simplifying and Automating Parallel Machine Learning via a Programmable and Composable Parallel ML System [slides] [video], Tutorial, AAAI 2021.
From Performance-oriented AI to Production- and Industrial-AI [video], Michigan Institute for Data Science, 2020.
A Blueprint of Standardized and Composable Machine Learning, [slides] [video], Institute for Advanced Study, Princeton, 2020.
Compositionality in Machine Learning, [slides] [video], Open Data Science Conference (ODSC) West 2019.
A Civil Engineering Perspective on Artificial Intelligence From Petuum [slides], Distinguished Lectures in Computational Innovation, Columbia University, 2018.
A Statistical Machine Learning Perspective of Deep Learning: Algorithm, Theory, and Scalable Computing [slides], tutorial at the International Summer School on Deep Learning, Genova, Italy, 2018.
Standardized Tests as benchmarks for Artificial Intelligence [slides], tutorial at EMNLP, Melbourne, Australia, 2018.
PetuumMed: algorithms and system for EHR-based medical decision support [slides], MIT, 2018.
System and Algorithm Co-Design, Theory and Practice, for Distributed Machine Learning [slides], [video], at the Simons Institute for the Theory of Computing, Berkeley, 2017.
Strategies & Principles for Distributed Machine Learning [slides], [video], Allen Institute for AI, 2016.
The Machine Learning Behind Reading and Comprehension [slides], Summit of Language and AI, China, 2016.
A New Look at the System, Algorithm and Theory Foundations of Distributed Machine Learning [slides], tutorial with Dr. Qirong Ho at the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015).
Big ML Software for Modern ML Algorithms [slides], tutorial with Dr. Qirong Ho at the 2014 IEEE International Conference on Big Data (IEEE BigData 2014).
Topic Models, Latent Space Models, Sparse Coding, and All That: A systematic understanding of probabilistic semantic extraction in large corpus [slides], tutorial at the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012).
Modern Statistical Methods for Genetic Association Study: Structured Genome-Transcriptome-Phenome Association Analysis [slides], tutorial with Dr. Seyoung Kim, at the Nineteenth International Conference on Intelligence Systems for Molecular Biology (ISMB 2011).

Some earlier talks:

I gave an invited talk on "On Learning Sparse Structured Input-Output Models" [slides] at the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP 2012).
I gave a tutorial on "Topic Models, Latent Space Models, Sparse Coding, and All That: A systematic understanding of probabilistic semantic extraction in large corpus" [slides] at the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012).
With Dr. Seyoung Kim, we gave a tutorial on "Modern Statistical Methods for Genetic Association Study: Structured Genome-Transcriptome-Phenome Association Analysis" [slides] at the Nineteenth International Conference on Intelligence Systems for Molecular Biology (ISMB 2011).
I gave a keynote talk on "Sparsity and Learning Large Scale Models" [slides] at the 2011 CVPR Workshop on Large Scale Learning for Vision.
I gave a keynote talk on "Dynamic Network Analysis: Model, Algorithm, Theory, and Application" [slides] at the Eighth Workshop on Mining and Learning with Graphs, 2010.
I gave a keynote talk on "Genome-Phenome Association Analysis of Complex Diseases - a Structured Sparse Regression Approach" [slides] at the Tenth Annual International Workshop on Bioinformatics and Systems Biology, 2010.
I gave a keynote talk on "Jointly Maximum Margin and Maximum Entropy Learning of Graphical Models" [slides] at the NIPS 2009 Workshop on "APPROXIMATE LEARNING OF LARGE SCALE GRAPHICAL MODELS: THEORY AND APPLICATIONS".
I gave a keynote talk on "Time Varying Graphical Models: reverse engineering and analyzing rewiring networks" [slides] at the NIPS 2009 Mini-Symposium on Machine Learning in Computational Biology.
I gave a keynote talk on "Recent Advances in Learning Sparse Structured Input/Output Model: Models, Algorithms, and Applications" at the NIPS 2008 Workshop on "Structured Input, Structured Output".
I gave a talk on Time-Varying Networks: Reconstructing Temporally/Spatially Rewiring Gene Interactions at the 2008 RECOMB Regulatory Genomics workshop.

I co-organized NIPS 2012 Workshop on "Spectral Learning".
I co-organized ICML 2011 Workshop on "Structured Sparsity: Learning and Inference".
I co-organized NIPS 2008 Workshop on "Analyzing Graphs: Theories and Applications".
I co-organized ICML 2007 Workshop on Learning in Structured Output Spaces.
I co-organized NIPS 2007 Workshop on Statistical Models of Networks.
I gave a keynote talk on "Graphical models and algorithms for integrative bioinformatics" at the 6th annual Graybill Conference.
I gave a keynote talk on "Probabilistic graphical models: theory, algorithm, and application" at ICMLA 07.

Eric P. Xing
President and University Professor

Mohamed bin Zayed University of Artificial Intelligence

Masdar City, Abu Dhabi

United Arab Emirates

Phone: +971 2 811 3333

Email: info@mbzuai.ac.ae

President

University Professor

Research synopsis

Recent activities:

Current Ph.D. students and postdocs:

Past students and postdocs:

Services:

Professional short bio

A little more about me

Research overview:

ARE being studied:

HAVE being studied:

I am teaching:

I have taught:

Archive:

Research and development:

Teaching:

Sabbatical and leave:

Talks and tutorials:

Some earlier talks:

Eric P. Xing President and University Professor

Mohamed bin Zayed University of Artificial Intelligence

Masdar City, Abu Dhabi

United Arab Emirates

Phone: +971 2 811 3333

Email: info@mbzuai.ac.ae

President

University Professor

Research synopsis

Recent activities:

Current Ph.D. students and postdocs:

Past students and postdocs:

Services:

Professional short bio

A little more about me

Research overview:

ARE being studied:

HAVE being studied:

I am teaching:

I have taught:

Archive:

Research and development:

Teaching:

Sabbatical and leave:

Talks and tutorials:

Some earlier talks:

Eric P. Xing
President and University Professor