RL1 Introduction to RL

reference:
UCL Course on RL

lecture 1 Introduction to Reinforcement Learning

reinforcement learning feature

  1. no supervisor, only a reward signal
  2. delayed feedback
  3. time matters
  4. agent’s actions affect the subsequent data

Reward Hypothesis
all goal can be described by the maximisation of expected cumulative reward.

environment state and agent state

environment state : whatever data to environment
agent state: whatever data to agent

information state
Information state is Markov

Fully Observable Environments and Partially Observable Environments

Major Components of an RL Agent

  1. Policy
  2. Value function
  3. Model
    1. A model predict what the environment will do next
    2. P predict the next state
    3. R predict the next reward

Maze Example
Agent may have an internal model of the environment
Dynamics: how actions change the state
Rewards: how much reward from each state
The model may be imperfect

Categorize RL agents

  • value based
  • policy based
  • Actor Critic

  • Model Free

  • Model Based

Learning and Planning

Two fundamental problems in sequential decision making

  1. Reinforcement learning
    • The environment is initially unknown
    • The agent interacts with the environment
    • The agent improves its policy
  2. Planning
    • A model of the environment is known
    • The agent performs computation with its model
    • The agent improves its policy

Exploration and Exploitation

  1. Reinforcement learning is like trail-and-error learning
  2. the agent should discover a good policy
  3. From its experience of the environment
  4. Without losing too much reward along the way

Prediction and Control

  1. Prediction: evaluate the future, given a policy
  2. Control: optimize the future, find the best policy
Donate comment here