|
logits = self.lm_head(outputs.hidden_states[early_exit_layer]) |
I think you guys should apply model.norm layer to hidden_states[early_exit_layer] . Because only the last hidden_state applied model.norm layer. See
|
hidden_states = self.norm(hidden_states) |