BaseReward¶
-
class
grid2op.Reward.
BaseReward
[source]¶ Base class from which all rewards used in the Grid2Op framework should derived.
In reinforcement learning, a reward is a signal send by the
grid2op.Environment
to thegrid2op.BaseAgent
indicating how well this agent performs.One of the goal of Reinforcement Learning is to maximize the (discounted) sum of (expected) rewards over time.
-
reward_min
¶ The minimum reward an
grid2op.BaseAgent
can get performing the worst possiblegrid2op.BaseAction
in the worst possible scenario.- Type
float
-
reward_max
¶ The maximum reward an
grid2op.BaseAgent
can get performing the best possiblegrid2op.BaseAction
in the best possible scenario.- Type
float
-
abstractmethod
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
abstractmethod
__init__
()[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
get_range
()[source]¶ Shorthand to retrieve both the minimum and maximum possible rewards in one command.
It is not recommended to override this function.
- Returns
reward_min (
float
) – The minimum reward, seeBaseReward.reward_min
reward_max (
float
) – The maximum reward, seeBaseReward.reward_max
-
initialize
(env)[source]¶ If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environement.Environment
to be initialized, this should be done in this method.- Parameters
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.- Returns
- Return type
None
-
-
class
grid2op.Reward.
ConstantReward
[source]¶ Most basic implementation of reward: everything has the same values.
Note that this
BaseReward
subtype is not usefull at all, whether to train anBaseAgent
nor to assess its performance of course.-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
__init__
()[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
-
class
grid2op.Reward.
EconomicReward
[source]¶ This reward computes the marginal cost of the powergrid. As RL is about maximising a reward, while we want to minimize the cost, this class also ensures that:
the reward is positive if there is no game over, no error etc.
the reward is inversely proportional to the cost of the grid (the higher the reward, the lower the economic cost).
-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
__init__
()[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
initialize
(env)[source]¶ If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environement.Environment
to be initialized, this should be done in this method.- Parameters
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.- Returns
- Return type
None
-
class
grid2op.Reward.
FlatReward
(per_timestep=1)[source]¶ This reward return a fixed number (if there are not error) or 0 if there is an error.
-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
__init__
(per_timestep=1)[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
-
class
grid2op.Reward.
IncreasingFlatReward
(per_timestep=1)[source]¶ This reward just counts the number of timestep the agent has sucessfully manage to perform.
It adds a constant reward for each time step sucessfully handled.
-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
__init__
(per_timestep=1)[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
initialize
(env)[source]¶ If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environement.Environment
to be initialized, this should be done in this method.- Parameters
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.- Returns
- Return type
None
-
-
class
grid2op.Reward.
L2RPNReward
[source]¶ This is the historical
BaseReward
used for the Learning To Run a Power Network competition.See L2RPN for more information.
-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
__init__
()[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
initialize
(env)[source]¶ If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environement.Environment
to be initialized, this should be done in this method.- Parameters
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.- Returns
- Return type
None
-
-
class
grid2op.Reward.
RedispReward
(alpha_redisph=5.0)[source]¶ This reward can be used for environments where redispatching is availble. It assigns a cost to redispatching action and penalizes with the losses.
-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Method called to compute the reward.
- Parameters
action (
grid2op.Action.Action
) – BaseAction that has been submitted by thegrid2op.BaseAgent
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.has_error (
bool
) – Has there been an error, for example agrid2op.DivergingPowerFlow
be thrown when the action has been implemented in the environment.is_done (
bool
) – Is the episode over (either because the agent has reached the end, or because there has been a game over)is_illegal (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.IllegalAction
exception. In this case it has been overidden by “do nohting” by the environment.is_ambiguous (
bool
) – Has the action submitted by the BaseAgent raised angrid2op.Exceptions.AmbiguousAction
exception. In this case it has been overidden by “do nothing” by the environment.
- Returns
res – The reward associated to the input parameters.
- Return type
float
-
__init__
(alpha_redisph=5.0)[source]¶ Initializes
BaseReward.reward_min
andBaseReward.reward_max
-
initialize
(env)[source]¶ If
BaseReward.reward_min
,BaseReward.reward_max
or other custom attributes require to have a validgrid2op.Environement.Environment
to be initialized, this should be done in this method.- Parameters
env (
grid2op.Environment.Environment
) – An environment instance properly initialized.- Returns
- Return type
None
-
-
class
grid2op.Reward.
RewardHelper
(rewardClass=<class 'grid2op.Reward.ConstantReward.ConstantReward'>)[source]¶ This class aims at making the creation of rewards class more automatic by the
grid2op.Environment
.It is not recommended to derived or modified this class. If a different reward need to be used, it is recommended to build another object of this class, and change the
RewardHelper.rewardClass
attribute.-
rewardClass
¶ Type of reward that will be use by this helper. Note that the type (and not an instance / object of that type) must be given here. It defaults to
ConstantReward
- Type
type
-
template_reward
¶ An object of class
RewardHelper.rewardClass
used to compute the rewards.- Type
-
__call__
(action, env, has_error, is_done, is_illegal, is_ambiguous)[source]¶ Gives the reward that follows the execution of the
grid2op.BaseAction.BaseAction
action in thegrid2op.Environment.Environment
env;- Parameters
action (
grid2op.Action.Action
) – The action performed by the BaseAgent.env (
grid2op.Environment.Environment
) – The current environment.has_error (
bool
) – Does the action caused an error, such a diverging powerflow for example= (True
: the action caused an error)is_done (
bool
) – Is the game over (True
= the game is over)is_illegal (
bool
) – Is the action legal or not (True
= the action was illegal). Seegrid2op.Exceptions.IllegalAction
for more information.is_ambiguous (
bool
) – Is the action ambiguous or not (True
= the action was ambiguous). Seegrid2op.Exceptions.AmbiguousAction
for more information.
-
__init__
(rewardClass=<class 'grid2op.Reward.ConstantReward.ConstantReward'>)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
initialize
(env)[source]¶ This function initializes the template_reward with the environment. It is used especially for using
RewardHelper.range()
.- Parameters
env (
grid2op.Environment.Environment
) – The current used environment.
-
If you still can’t find what you’re looking for, try in one of the following pages: