BaseReward¶

class grid2op.Reward.BaseReward[source]¶

Base class from which all rewards used in the Grid2Op framework should derived.

In reinforcement learning, a reward is a signal send by the grid2op.Environment to the grid2op.BaseAgent indicating how well this agent performs.

One of the goal of Reinforcement Learning is to maximize the (discounted) sum of (expected) rewards over time.

reward_min¶

The minimum reward an grid2op.BaseAgent can get performing the worst possible grid2op.BaseAction in the worst possible scenario.

Type: float

reward_max¶

The maximum reward an grid2op.BaseAgent can get performing the best possible grid2op.BaseAction in the best possible scenario.

Type: float

abstractmethod __call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

abstractmethod __init__()[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

get_range()[source]¶

Shorthand to retrieve both the minimum and maximum possible rewards in one command.

It is not recommended to override this function.

Returns

reward_min (float) – The minimum reward, see BaseReward.reward_min
reward_max (float) – The maximum reward, see BaseReward.reward_max

initialize(env)[source]¶

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environement.Environment to be initialized, this should be done in this method.

Parameters: env (grid2op.Environment.Environment) – An environment instance properly initialized.
Returns
Return type: None

class grid2op.Reward.ConstantReward[source]¶

Most basic implementation of reward: everything has the same values.

Note that this BaseReward subtype is not usefull at all, whether to train an BaseAgent nor to assess its performance of course.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

__init__()[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.EconomicReward[source]¶

This reward computes the marginal cost of the powergrid. As RL is about maximising a reward, while we want to minimize the cost, this class also ensures that:

the reward is positive if there is no game over, no error etc.
the reward is inversely proportional to the cost of the grid (the higher the reward, the lower the economic cost).

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

__init__()[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]¶

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environement.Environment to be initialized, this should be done in this method.

Parameters: env (grid2op.Environment.Environment) – An environment instance properly initialized.
Returns
Return type: None

class grid2op.Reward.FlatReward(per_timestep=1)[source]¶

This reward return a fixed number (if there are not error) or 0 if there is an error.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

__init__(per_timestep=1)[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

class grid2op.Reward.IncreasingFlatReward(per_timestep=1)[source]¶

This reward just counts the number of timestep the agent has sucessfully manage to perform.

It adds a constant reward for each time step sucessfully handled.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

__init__(per_timestep=1)[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]¶

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environement.Environment to be initialized, this should be done in this method.

Parameters: env (grid2op.Environment.Environment) – An environment instance properly initialized.
Returns
Return type: None

class grid2op.Reward.L2RPNReward[source]¶

This is the historical BaseReward used for the Learning To Run a Power Network competition.

See L2RPN for more information.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

__init__()[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]¶

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environement.Environment to be initialized, this should be done in this method.

Parameters: env (grid2op.Environment.Environment) – An environment instance properly initialized.
Returns
Return type: None

class grid2op.Reward.RedispReward(alpha_redisph=5.0)[source]¶

This reward can be used for environments where redispatching is availble. It assigns a cost to redispatching action and penalizes with the losses.

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Method called to compute the reward.

Parameters

action (grid2op.Action.Action) – BaseAction that has been submitted by the grid2op.BaseAgent
env (grid2op.Environment.Environment) – An environment instance properly initialized.
has_error (bool) – Has there been an error, for example a grid2op.DivergingPowerFlow be thrown when the action has been implemented in the environment.
is_done (bool) – Is the episode over (either because the agent has reached the end, or because there has been a game over)
is_illegal (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.IllegalAction exception. In this case it has been overidden by “do nohting” by the environment.
is_ambiguous (bool) – Has the action submitted by the BaseAgent raised an grid2op.Exceptions.AmbiguousAction exception. In this case it has been overidden by “do nothing” by the environment.

Returns

res – The reward associated to the input parameters.

Return type

float

__init__(alpha_redisph=5.0)[source]¶: Initializes BaseReward.reward_min and BaseReward.reward_max

initialize(env)[source]¶

If BaseReward.reward_min, BaseReward.reward_max or other custom attributes require to have a valid grid2op.Environement.Environment to be initialized, this should be done in this method.

Parameters: env (grid2op.Environment.Environment) – An environment instance properly initialized.
Returns
Return type: None

class grid2op.Reward.RewardHelper(rewardClass=<class 'grid2op.Reward.ConstantReward.ConstantReward'>)[source]¶

This class aims at making the creation of rewards class more automatic by the grid2op.Environment.

It is not recommended to derived or modified this class. If a different reward need to be used, it is recommended to build another object of this class, and change the RewardHelper.rewardClass attribute.

rewardClass¶

Type of reward that will be use by this helper. Note that the type (and not an instance / object of that type) must be given here. It defaults to ConstantReward

Type: type

template_reward¶

An object of class RewardHelper.rewardClass used to compute the rewards.

Type: BaseReward

__call__(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶

Gives the reward that follows the execution of the grid2op.BaseAction.BaseAction action in the grid2op.Environment.Environment env;

Parameters

action (grid2op.Action.Action) – The action performed by the BaseAgent.
env (grid2op.Environment.Environment) – The current environment.
has_error (bool) – Does the action caused an error, such a diverging powerflow for example= (True: the action caused an error)
is_done (bool) – Is the game over (True = the game is over)
is_illegal (bool) – Is the action legal or not (True = the action was illegal). See grid2op.Exceptions.IllegalAction for more information.
is_ambiguous (bool) – Is the action ambiguous or not (True = the action was ambiguous). See grid2op.Exceptions.AmbiguousAction for more information.

__init__(rewardClass=<class 'grid2op.Reward.ConstantReward.ConstantReward'>)[source]¶: Initialize self. See help(type(self)) for accurate signature.

__weakref__¶: list of weak references to the object (if defined)

initialize(env)[source]¶

This function initializes the template_reward with the environment. It is used especially for using RewardHelper.range().

Parameters: env (grid2op.Environment.Environment) – The current used environment.

range()[source]¶

Provides the range of the rewards.

Returns: res – The minimum reward per time step (possibly infinity) and the maximum reward per timestep (possibly infinity)
Return type: (float, float)

If you still can’t find what you’re looking for, try in one of the following pages: