Reward Kernel Formulation - incorrect in docs

This was discussed and confirmed in [https://discord.com/channels/698080905209577513/702060196222205962/1468594043851374604]


Assume the grid has only 1 redispatchable generator, and max ramp up/down is +5/-5, according to the docs, the action space is [-5, 5]   [https://grid2op.readthedocs.io/en/latest/mdp.html#modeling-sequential-decisions]
 
If the agent outputs an action like [6.0] (this can be the case if the agent is a neural network), in this case, the way the reward kernel behaves, in the language of MDP, doesn't match the notation.

The reward kernel will process and perform "do-nothing action".
But for some reward functions available in Grid2Op, they can give a "-1.0" reward signal because the agent asked for an illegal action.

In the notation of reward kernel, the action "a", is it:
- Case A: the illegal, out-of-action-space [6.0],
- or Case B, the "do-nothing action" (which replaces the [6.0])?

I'm seeing a little of contradicts here:
If it's Case A, that means the reward kernel is processing an action (or action vector) that doesn't belong to the action space of the environment (out of [-5, 5] )? Is that suitable?
If it's Case B, it doesn't make sense, because now it returns -1.0 for a do-nothing action, which won't happen if we use the real do-nothing action at the beginning (it will not be treated as illegal, hence not -1.0). So that means there is something outside the reward kernel that defines the -1.0 illegal point? 

## Possible solution
Reward kernel should be a function that takes also some flags from the environment (like "is_ambiguous", "is_illegal" etc.) which is not in the actual formulation.

So we might have:   

final_action, is_ambiguous, is_illegal = translate_action(action_vector_from_agent)

RewardKernel(s, final_action, is_ambiguous, is_illegal)

final_action usually is the do-nothing action

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward Kernel Formulation - incorrect in docs #737

Possible solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reward Kernel Formulation - incorrect in docs #737

Description

Possible solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions