Ego-centric Human Pretraining Improves In-Domain Performance: Compared to the specialist ACT baselines, generalist models (EgoVLA and EgoVLA-NoPretrain) perform substan- tially better on both short- and long-horizon tasks. This is likely because specialist models must simultaneously learn low-level manipulation and long-horizon planning from scratch, whereas gen- eralist models leverage shared low-level skills across tasks.
Additionally, although EgoVLA is pretrained with a unified action space, it cannot be directly deployed for manipulation without further fine-tuning on a moderate amount of robot data. Future work may explore improving zero-shot transferability through more embodiment-agnostic pretraining
요약하자면 학습 흐름은 이렇습니다:
H=30(약 1초 분량)의 액션 청크를 사용하며, 스무딩 파라미터(Smoothing Parameter)를 0.8로 설정