I faced an issue when working with another STE project DiffSTU.
If resolution of masked word is high and a target word (a word I want to generate) has the same length as a source word, everything is OK. But when length of the target word longer or shorter (mainly longer), I only see artifacts and poor generation quality.
Have you faced such behavior in your model or it can handle such case somehow?
Thanks.