What is the major difference between policies generated by SARSA and Q-learning?
  • SARSA tries to avoid s2 while Q-learning does not. SARSA is expected to be more cautious than Q-learning when exploring near dangerous regions like s2.

Valid HTML 4.0 Transitional