AIspace

Run an experiment for each of these parameter settings and record the total reward received.

SARSA, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 1152181
Q-learning, α=0.1, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 641052
SARSA, α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664318
Q-learning, α=0.1, γ=0.5, Greedy Exploit = 100%, and initial Q-value = 20: 2664659
SARSA, α=0.1, γ=0.1, Greedy Exploit = 80%, and initial Q-value = 0: 225610
SARSA, α=0.9, γ=0.5, Greedy Exploit = 80%, and initial Q-value = 0: 246605