Introduction to Reinforcement Learning (Part-1)
What is Reinforcement Learning?
It is considered to be a part of machine learning in which the agent learn from experience. If you do not know what a agent is then for now consider it as a human child which interact with the environment. An agent keeps on performing task until it perfects the art of performing task. As we know Neural networks were designed to be similar to human brain computation method.
Similarly there are many methods in machine learning which come from human and there Psychology. The word 'Reinforcement' was first coined by Psychologist B.F. Skinner, according to him Reinforcement refer to anything that increases the likelihood that a response will occur. We can understand it from an example in life when we were kids and looked at fire through our eyes(sensors), which is a visual information of environment and a parameter of state and we want to find out about how it taste, can we eat it as have any idea about what fire is and how it feels then we had quarticity which caused us to explore the environment and we took action of touching it with our hand, now we all know how it feels what reward we get on taking that action and we are wise enough to not take it .
B.F. Skinner |
This is the first post in the series of Introduction to Reinforcement learning. In the series we will learn about history of Reinforcement learning, key Terms used in Reinforcement Learning, we will talk about state value function, action value function, n-arm bandit problem, MDP's, Q-learning, SARSA, Expected SARSA, Double Q and solve games using these methods. We will use OPEN AI Gym as an environment provider and python as coding language.
Key Terms
Some of the basic concepts of Reinforcement learning
Environment:
environment is everithing except agent although the boundry is some to decide by the creator as a robot can consider its outer shell as a part of agent and Room as environment or consider its micro-controller as agent and its outer shell as part of environment
Agent:
An agent is someone or something which sense the environment and take action on the basis of policy. Agent has the ability to take decision and create changes in the environment
Policy:
A policy defines the learning agent’s way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. It corresponds to what in psychology would be called a set of stimulus–response rules or associations. In some cases the policy may be a simple function or lookup table, whereas in others it may involve extensive computation such as a search process. The policy is the core of a reinforcement
learning agent in the sense that it alone is sucient to determine behavior. In general, policies may be stochastic, specifying probabilities for each action.
Reward signal:
A reward signal defines the goal of a reinforcement learning problem. On each time step, the environment sends to the reinforcement learning agent a single number called the reward. The agent’s sole objective is to maximize the total reward it receives over the long run. The reward signal thus defines what are the good and bad events for the agent. In a biological system, we might think of rewards as analogous to the experiences of pleasure or pain. They are the immediate and defining
features of the problem faced by the agent. The reward signal is the primary basis for altering the policy; if an action selected by the policy is followed by low reward, then the policy may be changed to select some other action in that situation in the future. In general, reward signals may be stochastic functions of the state of the environment and the actions taken.
Environment:
environment is everithing except agent although the boundry is some to decide by the creator as a robot can consider its outer shell as a part of agent and Room as environment or consider its micro-controller as agent and its outer shell as part of environment
Agent:
An agent is someone or something which sense the environment and take action on the basis of policy. Agent has the ability to take decision and create changes in the environment
Policy:
A policy defines the learning agent’s way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. It corresponds to what in psychology would be called a set of stimulus–response rules or associations. In some cases the policy may be a simple function or lookup table, whereas in others it may involve extensive computation such as a search process. The policy is the core of a reinforcement
learning agent in the sense that it alone is sucient to determine behavior. In general, policies may be stochastic, specifying probabilities for each action.
Reward signal:
A reward signal defines the goal of a reinforcement learning problem. On each time step, the environment sends to the reinforcement learning agent a single number called the reward. The agent’s sole objective is to maximize the total reward it receives over the long run. The reward signal thus defines what are the good and bad events for the agent. In a biological system, we might think of rewards as analogous to the experiences of pleasure or pain. They are the immediate and defining
features of the problem faced by the agent. The reward signal is the primary basis for altering the policy; if an action selected by the policy is followed by low reward, then the policy may be changed to select some other action in that situation in the future. In general, reward signals may be stochastic functions of the state of the environment and the actions taken.
Value Function:
Whereas the reward signal indicates what is good in an immediate sense, a value function specifies what is good in the long run. Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow and the
rewards available in those states. For example, a state might always yield a low immediate reward but still have a high value because it is regularly followed by other states that yield high rewards.
Action Value Function:
It is the expected reward a agent hope to accumulate when present in a state and take a perticular action. action value function also known as Q-value is a state action pair value
Whereas the reward signal indicates what is good in an immediate sense, a value function specifies what is good in the long run. Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow and the
rewards available in those states. For example, a state might always yield a low immediate reward but still have a high value because it is regularly followed by other states that yield high rewards.
Action Value Function:
It is the expected reward a agent hope to accumulate when present in a state and take a perticular action. action value function also known as Q-value is a state action pair value
Introduction to reinforcement learning
ReplyDelete