Policy Iteration

  • Intro
  • Video Overview

Information

Primary software used Jupyter Notebook
Course Policy Iteration
Primary subject AI & ML
Secondary subject Machine Learning
Level Intermediate
Last updated November 11, 2024
Keywords

Responsible

Teachers
Faculty

Policy Iteration 0/1

Policy Iteration link copied

Policy iteration is an algorithm used to find the optimal policy for a Markov Decision Process (MDP) by iteratively improving a given policy. It alternates between evaluating the current policy by calculating the value of each state and updating the policy by selecting actions that maximize the expected value, repeating this process until the policy stabilizes and becomes optimal.

For this tutorial you need to have installed Python, Jupyter notebooks, and some common libraries including Scikit Learn. Please see the following tutorial for more information.

Download the Jupyter notebook here to follow along with the tutorial.

Download MDPPolicyIteration_PYscript_01
application/zip (ZIP, 155 KB)

Policy Iteration 1/1

Video Overview link copied