BaseAgent¶

class grid2op.Agent.Agent(*args, **kwargs)[source]¶

class grid2op.Agent.AgentWithConverter(action_space, action_space_converter=None, **kwargs_converter)[source]¶

Compared to a regular BaseAgent, these types of Agents are able to deal with a different representation of grid2op.BaseAction.BaseAction and grid2op.BaseObservation.BaseObservation.

As any other Agents, AgentWithConverter will implement the BaseAgent.act() method. But for them, it’s slightly different.

They receive in this method an observation, as an object (ie an instance of grid2op.BaseObservation). This object can then be converted to any other object with the method AgentWithConverter.convert_obs().

Then, this transformed_observation is pass to the method AgentWithConverter.my_act() that is supposed to be defined for each agents. This function outputs an encoded_act which can be whatever you want to be.

Finally, the encoded_act is decoded into a proper action, object of class grid2op.BaseAction.BaseAction, thanks to the method AgentWithConverter.convert_act().

This allows, for example, to represent actions as integers to train more easily standard discrete control algorithm used to solve atari games for example.

NB It is possible to define AgentWithConverter.convert_obs() and AgentWithConverter.convert_act(): or to define a grid2op.Converters.Converter and feed it to the action_space_converter parameters used to initialise the class. The second option is preferred, as the AgentWithConverter.action_space will then directly be this converter. Such an BaseAgent will really behave as if the actions are encoded the way he wants.

Examples

For example, imagine an BaseAgent uses a neural networks to take its decision.

Suppose also that, after some features engineering, it’s best for the neural network to use only the load active values (grid2op.BaseObservation.BaseObservation.load_p) and the sum of the relative flows (grid2op.BaseObservation.BaseObservation.rho) with the active flow (grid2op.BaseObservation.BaseObservation.p_or) [NB that agent would not make sense a priori, but who knows]

Suppose that this neural network can be accessed with a class AwesomeNN (not available…) that can predict some actions. It can be loaded with the “load” method and make predictions with the “predict” method.

For the sake of the examples, we will suppose that this agent only predicts powerline status (so 0 or 1) that are represented as vector. So we need to take extra care to convert this vector from a numpy array to a valid action.

This is done below:

import grid2op
import AwesomeNN # this does not exists!
# create a simple environment
env = grid2op.make()

# define the class above
class AgentCustomObservation(AgentWithConverter):
    def __init__(self, action_space, path):
        AgentWithConverter.__init__(self, action_space)
        self.my_neural_network = AwesomeNN()
        self.my_neural_networl.load(path)

    def convert_obs(self, observation):
        # convert the observation
        return np.concatenate((observation.load_p, observation.rho + observation.p_or))

    def convert_act(self, encoded_act):
        # convert back the action, output from the NN "self.my_neural_network"
        # to a valid action
        act = self.action_space({"set_status": encoded_act})

    def my_act(self, transformed_observation, reward, done=False):
        act_predicted = self.my_neural_network(transformed_observation)
        return act_predicted


# make the agent that behaves as expected.
my_agent = AgentCustomObservation(action_space=env.action_space, path=".")

# this agent is perfectly working :-) You can use it as any other agents.

action_space_converter¶

The converter that is used to represents the BaseAgent action space. Might be set to None if not initialized

Type: grid2op.Converters.Converter

init_action_space¶

The initial action space. This corresponds to the action space of the grid2op.Environment.Environment.

Type: grid2op.Action.ActionSpace

action_space¶

If a converter is used, then this action space represents is this converter. The agent will behave as if the action space is directly encoded the way it wants.

Type: grid2op.Converters.ActionSpace

act(observation, reward, done=False)[source]¶

Standard method of an BaseAgent. There is no need to overload this function.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.Action

convert_act(encoded_act)[source]¶

This function will convert an “ecnoded action” that be of any types, to a valid action that can be ingested by the environment.

Parameters: encoded_act (object) – Anything that represents an action.
Returns: act – A valid actions, represented as a class, that corresponds to the encoded action given as input.
Return type: :grid2op.BaseAction.BaseAction`

convert_obs(observation)[source]¶

This function convert the observation, that is an object of class grid2op.BaseObservation.BaseObservation into a representation understandable by the BaseAgent.

For example, and agent could only want to look at the relative flows grid2op.BaseObservation.BaseObservation.rho to take his decision. This is possible by overloading this method.

This method can also be used to scale the observation such that each compononents has mean 0 and variance 1 for example.

Parameters: observation (grid2op.Observation.Observation) – Initial observation received by the agent in the BaseAgent.act() method.
Returns: res – Anything that will be used by the BaseAgent to take decisions.
Return type: object

abstractmethod my_act(transformed_observation, reward, done=False)[source]¶

This method should be overide if this class is used. It is an “abstract” method.

If someone wants to make a agent that handles different kinds of actions an observation.

Parameters

transformed_observation (object) – Anything that will be used to create an action. This is the results to the call of AgentWithConverter.convert_obs(). This is likely a numpy array.
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the AgentWithConverter.convert_act() method.

Return type

object

class grid2op.Agent.BaseAgent(action_space)[source]¶

This class represents the base class of an BaseAgent. All bot / controller / agent used in the Grid2Op simulator should derived from this class.

To work properly, it is advise to create BaseAgent after the grid2op.Environment has been created and reuse the grid2op.Environment.Environment.action_space to build the BaseAgent.

action_space¶

It represent the action space ie a tool that can serve to create valid action. Note that a valid action can be illegal or ambiguous, and so lead to a “game over” or to a error. But at least it will have a proper size.

Type: grid2op.Action.ActionSpace

abstractmethod act(observation, reward, done=False)[source]¶

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.Action

reset()[source]¶: This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

class grid2op.Agent.DoNothingAgent(action_space)[source]¶

This is the most basic BaseAgent. It is purely passive, and does absolutely nothing.

act(observation, reward, done=False)[source]¶

As better explained in the document of grid2op.BaseAction.update() or grid2op.BaseAction.ActionSpace.__call__().

The preferred way to make an object of type action is to call grid2op.BaseAction.ActionSpace.__call__() with the dictionnary representing the action. In this case, the action is “do nothing” and it is represented by the empty dictionnary.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controller / agent.

Return type

grid2op.Action.Action

class grid2op.Agent.GreedyAgent(action_space, action_space_converter=None)[source]¶

This is a class of “Greedy BaseAgent”. Greedy agents are all executing the same kind of algorithm to take action:

They grid2op.BaseObservation.simulate() all actions in a given set

They take the action that maximise the simulated reward among all these actions

To make the creation of such BaseAgent, we created this abstract class (object of this class cannot be created). Two examples of such greedy agents are provided with PowerLineSwitch and TopologyGreedy.

abstractmethod _get_tested_action(observation)[source]¶

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type: list

act(observation, reward, done=False)[source]¶

By definition, all “greedy” agents are acting the same way. The only thing that can differentiate multiple agents is the actions that are tested.

These actions are defined in the method _get_tested_action(). This act() method implements the greedy logic: take the actions that maximizes the instantaneous reward on the simulated action.

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controller / agent.

Return type

grid2op.Action.Action

class grid2op.Agent.MLAgent(action_space, action_space_converter=<class 'grid2op.Converter.ToVect.ToVect'>, **kwargs_converter)[source]¶

This agent allows to handle only vectors. The “my_act” function will return “do nothing” action (so it needs to be override)

In this class, the “my_act” is expected to return a vector that can be directly converted into a valid action.

convert_from_vect(act)[source]¶

Helper to convert an action, represented as a numpy array as an grid2op.BaseAction instance.

Parameters: act (numppy.ndarray) – An action cast as an grid2op.BaseAction.BaseAction instance.
Returns: res – The act parameters converted into a proper grid2op.BaseAction.BaseAction object.
Return type: grid2op.Action.Action

my_act(transformed_observation, reward, done=False)[source]¶

This method should be overide if this class is used. It is an “abstract” method.

If someone wants to make a agent that handles different kinds of actions an observation.

Parameters

transformed_observation (object) – Anything that will be used to create an action. This is the results to the call of AgentWithConverter.convert_obs(). This is likely a numpy array.
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the AgentWithConverter.convert_act() method.

Return type

object

class grid2op.Agent.OneChangeThenNothing(action_space, action_space_converter=None)[source]¶

This is a specific kind of BaseAgent. It does an BaseAction (possibly non empty) at the first time step and then does nothing.

This class is an abstract class and cannot be instanciated (ie no object of this class can be created). It must be overridden and the method OneChangeThenNothing._get_dict_act() be defined. Basically, it must know what action to do.

abstractmethod _get_dict_act()[source]¶

Function that need to be overridden to indicate which action to perfom.

Returns: res – A dictionnary that can be converted into a valid grid2op.BaseAction.BaseAction. See the help of grid2op.BaseAction.ActionSpace.__call__() for more information.
Return type: dict

act(observation, reward, done=False)[source]¶

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Parameters

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – The action chosen by the bot / controler / agent.

Return type

grid2op.Action.Action

class grid2op.Agent.PowerLineSwitch(action_space)[source]¶

This is a GreedyAgent example, which will attempt to disconnect powerlines.

It will choose among:

doing nothing

disconnecting one powerline

which action that will maximize the reward. All powerlines are tested.

_get_tested_action(observation)[source]¶

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type: list

class grid2op.Agent.RandomAgent(action_space, action_space_converter=<class 'grid2op.Converter.IdToAct.IdToAct'>, **kwargs_converter)[source]¶

This agent acts randomnly on the powergrid. It uses the grid2op.Converters.IdToAct to compute all the possible actions available for the environment. And then chooses a random one among all these.

my_act(transformed_observation, reward, done=False)[source]¶

This method should be overide if this class is used. It is an “abstract” method.

If someone wants to make a agent that handles different kinds of actions an observation.

Parameters

transformed_observation (object) – Anything that will be used to create an action. This is the results to the call of AgentWithConverter.convert_obs(). This is likely a numpy array.
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns

res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the AgentWithConverter.convert_act() method.

Return type

object

class grid2op.Agent.TopologyGreedy(action_space, action_space_converter=None)[source]¶

This is a GreedyAgent example, which will attempt to reconfigure the substations connectivity.

It will choose among:

doing nothing

changing the topology of one substation.

_get_tested_action(observation)[source]¶

Returns the list of all the candidate actions.

From this list, the one that achieve the best “simulated reward” is used.

Parameters: observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
Returns: res – A list of all candidate grid2op.BaseAction.BaseAction
Return type: list

If you still can’t find what you’re looking for, try in one of the following pages: