Run an experiment for each of these parameter settings and record the total reward received.
  1. α=0.1, γ=0.5, Greedy Exploit=80%, and initial Q-value=0: 641052
  2. α=0.1, γ=0.5, Greedy Exploit=80%, and initial Q-value=20. 648575
  3. α=0.1, γ=0.5, Greedy Exploit=100%, and initial Q-value=0: -90
  4. α=0.1, γ=0.5, Greedy Exploit=100%, and initial Q-value=20: 2664659
  5. α=0.1, γ=0.1, Greedy Exploit=80%, and initial Q-value=0: -123678
  6. α=0.9, γ=0.5, Greedy Exploit=80%, and initial Q-value=0: 227693

Valid HTML 4.0 Transitional