Baifeng Shi

I am a Ph.D. student advised by Prof. Trevor Darrell at UC Berkeley. Previously, I graduated from Peking University with a B.S. degree in computer science.

I build generalist vision and robotic models.

Email  /  Google Scholar  /  Github  /  CV  /  WeChat

profile photo
Selected Publications
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell,
ECCV, 2024
abstract / pdf / code /

We find that smaller vision models (e.g., ViT-B or Vit-L) run on multiple image scales are usually better than larger models (e.g., ViT-H, ViT-G).

Humanoid Locomotion as Next Token Prediction
Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
NeurIPS, 2024
Spotlight
abstract / pdf / website /

We formulate humanoid locomotion as a next token prediction problem. This enables learning to walk from in-the-wild data such as Youtube videos.

Robot Learning with Sensorimotor Pre-training
Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell*, Jitendra Malik*
CoRL, 2023
Oral Presentation
abstract / pdf / website /

We make imitation learning easier by MAE pre-training on sensorimotor sequences.

TOAST: Transfer Learning via Attention Steering
Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang
preprint, 2023
abstract / pdf / code / 知乎

We find that previous transfer learning methods (e.g., fine-tuning, LoRA, prompt tuning) fail to focus the model's attention on the features relevant to the downstream tasks. We show that refocusing the model's attention on task-relevant features by top-down attention can largely improve the downstream performances.

Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi, Trevor Darrell, Xin Wang
CVPR, 2023
Conference highlight
website / abstract / pdf / code / 知乎

We build ViTs with the ability of top-down attention, i.e., steering its attention to specific objects when given a prompt.

Invited Talks

[Jun 2024]   Scaling Up Visual Pre-Training: What’s Next?, AI Tea Talk Singapore

[Apr 2024]   Scaling Up Visual Pre-Training: What’s Next?, VGG group, University of Oxford   [slides]

[Mar 2024]   Scaling Up Visual Pre-Training: What’s Next?, Prof. Yi Ma's group, UC Berkeley

[Oct 2023]   Principles and Applications of Bottom-Up and Top-Down Visual Attention, Peking University   [slides]

[Jun 2023]   Principles and Applications of Bottom-Up and Top-Down Visual Attention, TechBeat