Pengxiang Li
I am a second-year PhD student in Beijing Institute of Technology(BIT), advised by Dr. Yuwei Wu and Dr. Yunde
Jia.
I am also a member of the joint PhD program ('TONG Program') with Beijing Institute for General
Artificial Intelligence(BIGAI), and I am grateful to be advised by Dr. Qing Li and Dr.
Zhi Gao.
Previously, I got my Bachelor's degree in Computer Science and Technology from BIT in 2021.
My research interests lie in Vision and Language, non-Euclidean representation learning, and 3D
vision.
Specifically, I am interested in building the feedback refining systems for multi-modal models.
Email  / 
Github
|
|
[2025.01] π One paper on Multimodal Agent Tuning is accepted by ICLR 2025
Spotlight.
[2024.10] π One paper on Feedback learning in VLM is accepted by NeurIPS 2024.
[2024.09] π One journal paper on Stereo Matching is accepted by T-CSVT.
|
|
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
Zhi Gao*, Bofei Zhang*, Pengxiang Li*, Xiaojian Ma, Yue Fan, Tao Yuan, Yuwei Wuβ, Yunde Jia, Song-Chun Zhu, Qing Liβ
ICLR, 2025 Spotlight
[Arxiv]
[Website]
 
We propose T3-Agent, a multi-modal agent tuned with the MM-Traj dataset for better tool-usage reasoning, boosting VLM performance by 20% on benchmarks.
|
|
Task-oriented Sequential Grounding in 3D Scenes
Zhuofan Zhang, Ziyu Zhu, Pengxiang Li, Tengyu Liu, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Siyuan Huang, Qing Li
Preprint, 2024
[Arxiv]
[Website]
[Code]
[Dataset]
[Demo]
[YouTube]
 
We proposed a new task, Task-oriented Sequential Grounding in 3D scenes, and introduced SG3D, a large-scale dataset with 22,346 tasks and 112,236 steps in 4,895 real-world 3D scenes.
|
|
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li*, Zhi Gao*, Bofei Zhang*, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li
NeurIPS, 2024
[Arxiv]
[Website]
[Code]
[Dataset]
[Model]
[YouTube]
 
A feedback-refinement dataset with 1.1M multi-turn conversations, which empowers VLMs to refine their responses based on given feedback.
|
|
Inter-Scale Similarity Guided Cost Aggregation for Stereo Matching
Pengxiang Li, Chengtang Yao, Yunde Jia, Yuwei Wu
Accepted by T-CSVT
[Paper]
 
A plug-and-play module of inter-scale similarity guided cost
aggregation to adaptively recover details in fine-grained areas for stereo matching.
|
|
Hyperbolic Learning: Theory and Applications
Pengxiang Li, Peilin Yu, Yangkai Xue, Yuwei Wu , Zhi Gao
Tutorial, 2023
[Slide]
 
A tutorial explores hyperbolic learning's theoretical underpinnings and applications, highlighting its advantages in modeling hierarchical data in diverse downstream felds.
|
|
A Decomposition Model for Stereo Matching
Chengtang Yao, Yunde Jia, Huijun Di* , Pengxiang Li, Yuwei Wu
CVPR, 2021
[Paper]
[Code]
[Supp]
 
A a decomposition model for
stereo matching to solve the problem of excessive growth
in computational cost (time and memory cost) as the resolution increases.
|
 |
Beijing Institute for General Artificial Intelligence(BIGAI), China
2024.02 - Now
Joint training PhD student
Advisor: Dr. Qing Li and Dr.
Zhi Gao.
|
 |
Beijing Institute of Technology, China
Master student 2021.09 - 2023.07
PhD student 2023.9 - Now
Advisor:
Dr. Yuwei Wu and Dr. Yunde
Jia
|
 |
Beijing Institute of Technology, China
2017.08 - 2021.06
Undergraduate Student
Advisor:
Dr. Xian-Ling Mao
|
|