Skip to content

feat: replace action tokenizer with windowed attention#16

Open
imitation-alpha wants to merge 1 commit intoAlmondGod:mainfrom
imitation-alpha:feature/action-tokenizer-window-attention
Open

feat: replace action tokenizer with windowed attention#16
imitation-alpha wants to merge 1 commit intoAlmondGod:mainfrom
imitation-alpha:feature/action-tokenizer-window-attention

Conversation

@imitation-alpha
Copy link

Summary

This PR replaces the "mean pool + concat" mechanism in the LatentActionsEncoder with a "length-2 windowed attention + mean" mechanism. This change aims to better capture temporal dependencies between adjacent frames during action tokenization.

Changes

  • Modified models/latent_actions.py:
    • Imported SpatialAttention from models.st_transformer.
    • Updated LatentActionsEncoder to use SpatialAttention on concatenated windows of current and next frames.
    • Removed the old mean pooling and concatenation logic.

Verification

  • Verified the implementation with a synthetic test script (scripts/verify_latent_actions.py - deleted after verification).
  • Confirmed that the model processes input frames and produces output actions with the correct dimensions.
  • Loss calculation works as expected.

Notes

  • This is a breaking change for LatentActionsEncoder checkpoints.

@imitation-alpha imitation-alpha force-pushed the feature/action-tokenizer-window-attention branch from 93ed906 to 05765b0 Compare November 29, 2025 06:09
@AlmondGod
Copy link
Owner

this looks great! can you train a working world model to confirm the impact of the change?

@NewJerseyStyle
Copy link

Sorry to interrupt. I am not an expert, but I am curious if there are "KPIs" to be monitored to evaluate the impact of a change?
For example:

  • How to confirm it does not get worse, monitor steps used to converge?
  • How to confirm it gets better, monitor the loss of the model?

@AlmondGod
Copy link
Owner

Sorry to interrupt. I am not an expert, but I am curious if there are "KPIs" to be monitored to evaluate the impact of a change? For example:

  • How to confirm it does not get worse, monitor steps used to converge?
  • How to confirm it gets better, monitor the loss of the model?

yes, I'll add in a readme pr section specifying necessary criteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants