Run an experiment for each of these parameter settings and record the total reward received.
  1. SARSA, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 1152181
  2. Q-learning, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
  3. SARSA, α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664318
  4. Q-learning, α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664659
  5. SARSA, α=0.1, γ=0.1, Greedy Exploit = 80%, and initial Q-value = 0: 225610
  6. SARSA, α=0.9, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 246605

Valid HTML 4.0 Transitional