Baifeng Shi
I am a Ph.D. student advised by Prof. Trevor Darrell at UC Berkeley. Previously, I graduated from Peking University with a B.S. degree in computer science.
I build generalist vision and robotic models.
Email  / 
Google Scholar  / 
Github  / 
CV  / 
WeChat
|
|
|
When Do We Not Need Larger Vision Models?
Baifeng Shi,
Ziyang Wu,
Maolin Mao,
Xin Wang,
Trevor Darrell,
ECCV, 2024
abstract /
pdf /
code /
We find that smaller vision models (e.g., ViT-B or Vit-L) run on multiple image scales are usually better than larger models (e.g., ViT-H, ViT-G).
|
|
Humanoid Locomotion as Next Token Prediction
Ilija Radosavovic,
Bike Zhang,
Baifeng Shi,
Jathushan Rajasegaran,
Sarthak Kamat,
Trevor Darrell,
Koushil Sreenath,
Jitendra Malik
NeurIPS, 2024
Spotlight
abstract /
pdf /
website /
We formulate humanoid locomotion as a next token prediction problem. This enables learning to walk from in-the-wild data such as Youtube videos.
|
|
Robot Learning with Sensorimotor Pre-training
Ilija Radosavovic,
Baifeng Shi,
Letian Fu,
Ken Goldberg,
Trevor Darrell*,
Jitendra Malik*
CoRL, 2023
Oral Presentation
abstract /
pdf /
website /
We make imitation learning easier by MAE pre-training on sensorimotor sequences.
|
|
TOAST: Transfer Learning via Attention Steering
Baifeng Shi,
Siyu Gai,
Trevor Darrell,
Xin Wang
preprint, 2023
abstract /
pdf /
code /
知乎
We find that previous transfer learning methods (e.g., fine-tuning, LoRA, prompt tuning) fail to focus the model's attention on the features relevant to the downstream tasks. We show that refocusing the model's attention on task-relevant features by top-down attention can largely improve the downstream performances.
|
|
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi,
Trevor Darrell,
Xin Wang
CVPR, 2023
Conference highlight
website /
abstract /
pdf /
code /
知乎
We build ViTs with the ability of top-down attention, i.e., steering its attention to specific objects when given a prompt.
|
[Jun 2024] Scaling Up Visual Pre-Training: What’s Next?, AI Tea Talk Singapore
[Apr 2024] Scaling Up Visual Pre-Training: What’s Next?, VGG group, University of Oxford [slides]
[Mar 2024] Scaling Up Visual Pre-Training: What’s Next?, Prof. Yi Ma's group, UC Berkeley
[Oct 2023] Principles and Applications of Bottom-Up and Top-Down Visual Attention, Peking University [slides]
[Jun 2023] Principles and Applications of Bottom-Up and Top-Down Visual Attention, TechBeat
|
|