ModelsDataResearch

MolmoMotion: Language‑guided 3D motion forecasting plus 1.16M‑video dataset

AllenAI (via Hugging Face) releases MolmoMotion, a model that predicts future 3D point trajectories from an RGB frame, queried 3D points and a text action prompt, together with the MolmoMotion‑1M dataset and the PointMotionBench benchmark.

In detail

  • Model: MolmoMotion — inputs: RGB observation, set of 3D query points, action description; output: predicted future 3D point trajectories
  • MolmoMotion‑1M: 1.16 million videos with paired 3D point trajectories and action descriptions
  • PointMotionBench: human‑validated benchmark containing 2.7k video clips to measure object‑centric 3D motion forecasting accuracy
  • Code, model weights, data and a technical report are published (Hugging Face/GitHub/project page)

Why it matters

Predictive 3D motion models matter for robotics planning and controllable video generation; public models plus a large labeled dataset lower the barrier for applied R&D and standardized evaluation.

For you Evaluate MolmoMotion on representative tasks (robot grasping, trajectory‑conditioned video) and use PointMotionBench to quantify improvements before changing live planners or generators.

← All news

Summaries are generated automatically and link to the original source.