Describe two ways to force a Q-learning agent to explore.
- The ε-greedy strategy is to select the greedy action
(one that maximizes Q[s,a]) all but ε of the time and
to select a random action ε of the time, where 0 ≤ ε ≤ 1.
- An alternative is "optimism in the face of uncertainty": initialize the Q-function to values that encourage
exploration. If the Q-values are initialized to high values, the
unexplored areas will look good, so that a greedy search will tend