In detail
- Model: MolmoMotion — inputs: RGB observation, set of 3D query points, action description; output: predicted future 3D point trajectories
- MolmoMotion‑1M: 1.16 million videos with paired 3D point trajectories and action descriptions
- PointMotionBench: human‑validated benchmark containing 2.7k video clips to measure object‑centric 3D motion forecasting accuracy
- Code, model weights, data and a technical report are published (Hugging Face/GitHub/project page)
Why it matters
Predictive 3D motion models matter for robotics planning and controllable video generation; public models plus a large labeled dataset lower the barrier for applied R&D and standardized evaluation.
For you Evaluate MolmoMotion on representative tasks (robot grasping, trajectory‑conditioned video) and use PointMotionBench to quantify improvements before changing live planners or generators.