RL1 Introduction to RL

Posted on 2017-03-30 | In reinforcement learning | | Visitors

reference:
UCL Course on RL

lecture 1 Introduction to Reinforcement Learning

reinforcement learning feature

no supervisor, only a reward signal
delayed feedback
time matters
agent’s actions affect the subsequent data

Reward Hypothesis
all goal can be described by the maximisation of expected cumulative reward.

environment state and agent state

environment state : whatever data to environment
agent state: whatever data to agent

information state
Information state is Markov

Fully Observable Environments and Partially Observable Environments

Major Components of an RL Agent

Policy
Value function
Model
1. A model predict what the environment will do next
2. P predict the next state
3. R predict the next reward

Maze Example
Agent may have an internal model of the environment
Dynamics: how actions change the state
Rewards: how much reward from each state
The model may be imperfect

Categorize RL agents

value based
policy based
Actor Critic
Model Free
Model Based

Learning and Planning

Two fundamental problems in sequential decision making

Reinforcement learning
- The environment is initially unknown
- The agent interacts with the environment
- The agent improves its policy
Planning
- A model of the environment is known
- The agent performs computation with its model
- The agent improves its policy

Exploration and Exploitation

Reinforcement learning is like trail-and-error learning
the agent should discover a good policy
From its experience of the environment
Without losing too much reward along the way

Prediction and Control

Prediction: evaluate the future, given a policy
Control: optimize the future, find the best policy

Donate comment here