Describe two ways to force a Q-learning agent to explore.
  • The ε-greedy strategy is to select the greedy action (one that maximizes Q[s,a]) all but ε of the time and to select a random action ε of the time, where 0 ≤ ε ≤ 1.
  • An alternative is "optimism in the face of uncertainty": initialize the Q-function to values that encourage exploration. If the Q-values are initialized to high values, the unexplored areas will look good, so that a greedy search will tend to explore.

Valid HTML 4.0 Transitional