Publications

* Equal contribution, ✉ Corresponding author

2025

  1. 2025sport.png
    Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
    Pengxiang Li* , Zhi Gao* , Bofei Zhang , Yapeng Mi , Xiaojian Ma , Chenrui Shi , Tao Yuan , Yuwei WuYunde JiaSong-Chun Zhu , and Qing Li
    arXiv preprint arXiv:2504.21561, 2025
  2. mat.png
    Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Spotlight
    Zhi Gao* , Bofei Zhang* , Pengxiang Li* , Xiaojian Ma , Tao Yuan , Yue FanYuwei WuYunde JiaSong-Chun Zhu , and Qing Li
    International Conference on Learning Representations (ICLR), 2025

2024

  1. 2024sg3d.png
    Task-oriented Sequential Grounding in 3D Scenes
    Zhuofan Zhang , Ziyu ZhuPengxiang LiTengyu LiuXiaojian MaYixin ChenBaoxiong JiaSiyuan Huang , and Qing Li
    arXiv preprint arXiv:2408.04034, 2024
  2. fire.jpg
    FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
    Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B), 2024
  3. issga.jpg
    Inter-Scale Similarity Guided Cost Aggregation for Stereo Matching
    Pengxiang Li , Chengtang Yao , Yuwei Wu , and Yunde Jia
    IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2024

2023

  1. tutorial.jpg
    Hyperbolic Learning: Theory and Applications
    Pengxiang Li , Peilin Yu , Yangkai Xue , Yuwei Wu , and Zhi Gao
    2023

2021

  1. Decnet.png
    A decomposition model for stereo matching
    Chengtang Yao , Yunde Jia , Huijun Di , Pengxiang Li , and Yuwei Wu
    The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021