"A person walks forward, moving from left to right, making a single large step in the middle."
"A person walks straight ahead for a few steps, breaks into a running jump, lands and continues to walk."
"A person drops their arms and walks left to the chair to sit down."
"A person dances in a simple waltz box pattern."
"A person walks forward knocks on a door then turns around and walks away."
"A person swings the hands and warms up the left and right arms."
"A person does jumping jacks."
"A person appears to slam or throw something down with their left hand."
We qualitatively compared our method with MoMask, T2M-GPT, MLD, and MDM. Our apporach achieves more
precise motion generation. For example, in the first case, both MLD and MoMask fail to capture the
detail "breaks into a running jump". The second case evaluates the ability to handle long prompts, where
MDM and MLD exhibit missing actions or unnecessary turns. MoMask and T2M-GPT also struggle to maintain
the "walk straight" instruction. In the third case, which involves less intense movement, our method
also generates more precise motion.