Skip to content

Reward Kernel Formulation - incorrect in docs #737

@hung311204

Description

@hung311204

This was discussed and confirmed in [https://discord.com/channels/698080905209577513/702060196222205962/1468594043851374604]

Assume the grid has only 1 redispatchable generator, and max ramp up/down is +5/-5, according to the docs, the action space is [-5, 5] [https://grid2op.readthedocs.io/en/latest/mdp.html#modeling-sequential-decisions]

If the agent outputs an action like [6.0] (this can be the case if the agent is a neural network), in this case, the way the reward kernel behaves, in the language of MDP, doesn't match the notation.

The reward kernel will process and perform "do-nothing action".
But for some reward functions available in Grid2Op, they can give a "-1.0" reward signal because the agent asked for an illegal action.

In the notation of reward kernel, the action "a", is it:

  • Case A: the illegal, out-of-action-space [6.0],
  • or Case B, the "do-nothing action" (which replaces the [6.0])?

I'm seeing a little of contradicts here:
If it's Case A, that means the reward kernel is processing an action (or action vector) that doesn't belong to the action space of the environment (out of [-5, 5] )? Is that suitable?
If it's Case B, it doesn't make sense, because now it returns -1.0 for a do-nothing action, which won't happen if we use the real do-nothing action at the beginning (it will not be treated as illegal, hence not -1.0). So that means there is something outside the reward kernel that defines the -1.0 illegal point?

Possible solution

Reward kernel should be a function that takes also some flags from the environment (like "is_ambiguous", "is_illegal" etc.) which is not in the actual formulation.

So we might have:

final_action, is_ambiguous, is_illegal = translate_action(action_vector_from_agent)

RewardKernel(s, final_action, is_ambiguous, is_illegal)

final_action usually is the do-nothing action

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions