Reinforcement learning? Let’s Play Around in OpenAI Gym

06 Dec 2022

Articles

Reinforcement Learning is a part of Machine Learning which is about taking the best action to maximize the reward from the action. Developers would create a method to reward desired behaviors and punish negative behaviors. These long-term goals help prevent the agent from stalling on lesser goals. With time, the agent learns to avoid the negative and seek the positive. This learning method has been adopted in artificial intelligence (AI) as a way of directing unsupervised machine learning by giving rewards and penalties.

While many would connect reinforcement learning with Artificial Intelligence, it has many applications in real life. Current use cases such as; gaming, resource management, personalized recommendations, and robotics. There are 2 main types of reinforcement learning. Those are positive and negative reinforcement. Positive Reinforcement is defined as when an event, occurs due to a particular behavior and increases the strength and the frequency of the behavior. In other words, it has a positive effect on behavior. Negative Reinforcement is defined as the strengthening of behavior because a negative condition is stopped or avoided.

OpenAI gym is an environment where people can develop and test learning agents. It focuses on and is best suited for reinforcement learning agents. The fundamental building block of OpenAI Gym is the Env class. It is a Python class that basically implements a simulator that runs the environment you want to train your agent in. There are tons of environments in the OpenAI gym such as a mountain car, a lunar lander, and many more.

There are many ways to interact with the environment. These are the functions of the Env class that help the agent interact with the environment. Two such important functions are:

1. Reset: This function resets the environment to its initial state, and returns the observation of the environment corresponding to the initial state.
2. Step: This function takes an action as an input and applies it to the environment, which leads to the environment transitioning to a new state. The reset function returns four things:
  - Observation: The observation of the state of the environment.
  - Reward: The reward that you can get from the environment after executing the action that was given as the input to the step function.
  - Done: Whether the episode has been terminated. If true, you may need to end the simulation or reset the environment to restart the episode.
  - Info: This provides additional information depending on the environment, such as the number of lives left, or general information that may be conducive to debugging.

To get started on OpenAI, there are several steps to be taken first. The requirements are:

Python 3.5+
Pip: pip will be required whether you are installing from the source or direct.

To install the OpenAI gym, we can use “pip install gym” in the terminal. You can also use conda to install it. In this article, we will mainly focus on the mountain car environment. There are various environments within the OpenAI gym. The objective of a mountain car is to get the vehicle up a mountain. The vehicle is on a one-dimensional track, positioned between two “mountains”. The goal is to drive up the mountain on the right; however, the car’s engine is not strong enough to scale the mountain in a single go. Thus, the only way to succeed is to drive back and forth to build up momentum.

By using a random agent as shown below, we can complete the goal once. We can now just need to make it better by adding loops. One basic template for the loop would be:

For n times
While the Goal is not achieved
take_action()
take_step()
End of while
End of for