Xinyu Zhang 张鑫语

PhD candidate at Rutgers building robot policies grounded in 2D/3D visual world understanding.

I am a fifth-year PhD candidate in Computer Science at Rutgers University, advised by Prof. Abdeslam Boularias.

Before joining Rutgers, I worked as a machine learning engineer at Microsoft and Face++; most recently, I interned at Meta Reality Lab.

I earned my Master’s at UC San Diego under Prof. Ken Kreutz-Delgado and my Bachelor’s at the University of Science and Technology of China.

News

Research

I have published eight independent first-author papers in top-tier robotics and vision venues — CVPR, RSS, RA-L, CoRL, and IROS — spanning robot manipulation policy learning, world modeling, computer vision, 3D reconstruction, and video generation. My focus is bringing 2D/3D world understanding to robotic policy.

Publications

2026

Residual Latent Action — figure

Learning Visual Feature-based World Models via Residual Latent Action

Xinyu Zhang, Zhengtong Xu, Yutian Tao, Yeping Wang, Yu She, Abdeslam Boularias

2026 Under Review

2025

Autoregressive Action Sequence Learning for Robotic Manipulation

Xinyu Zhang, Yuhan Liu, Haonan Chang, Liam Schramm, Abdeslam Boularias

IEEE Robotics and Automation Letters (RA-L) 2025

TL;DR
  • Robot actions as a language — but they are heterogeneous and often continuous.
  • We propose a chunking causal transformer to adapt autoregressive models for robot actions.
  • A universal architecture that establishes new state of the art on Push-T, ALOHA, and RLBench.

2024

Detect Everything with Few Examples

Xinyu Zhang, Yuhan Liu, Yuting Wang, Abdeslam Boularias

Conference on Robot Learning (CoRL) 2024

TL;DR
  • Existing work mixes representation learning with detection.
  • We don’t learn representations — we focus on how to use existing pretrained ones.
  • Detect by propagating ROI regions in the attention map.

Scaling Manipulation Learning with Visual Kinematic Chain Prediction

Xinyu Zhang, Yuhan Liu, Haonan Chang, Abdeslam Boularias

Conference on Robot Learning (CoRL) 2024

TL;DR
  • How to learn a single policy for diverse environments?
  • Use a universal, visually grounded, analytically determined action space.
  • That is, the visual projection of the robot kinematic structure.

Diffusion-based Affordance Prediction for Multi-modality Storage

Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

International Conference on Intelligent Robots and Systems (IROS) 2024

2023

Optical flow segmentation — figure

Optical Flow boosts Unsupervised Localization and Segmentation

Xinyu Zhang, Abdeslam Boularias

International Conference on Intelligent Robots and Systems (IROS) 2023

TL;DR
  • Make DINO features more object-aware.
  • Use optical flow as a regularizer — similar local flow yields similar local features.

2022

Structured subnetworks — figure

Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization

Xinyu Zhang, Ian Colbert, Srinjoy Das

MDPI Applied Sciences 2022

TL;DR
  • Prune layers in topological order, not all at once.
  • Neuron importance depends heavily on the sparsity of previous layers.

2020

Diversity transfer network — figure

Diversity Transfer Network for Few-Shot Learning

Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xinyu Zhang, Chang Huang, Wenyu Liu, Bo Wang

AAAI Conference on Artificial Intelligence 2020