Q-learning Mathematical Background

26 July 2024

2

Prerequisites: Q-Learning.

In the following derivations, the symbols defined as in the prerequisite article will be used.
The Q-learning technique is based on the Bellman Equation.

$v(s) = E(R_{t+1}+\lambda v(S_{t+1})|S_{t}=s)$
where,
E : Expectation
t+1 : next state
$\lambda$ : discount factor

Rephrasing the above equation in the form of Q-Value:-

$Q^{\pi}(s,a) = E(r_{t+1}+\lambda r_{t+2}+\lambda ^{2}r_{t+3}+...|S_{t}=s,A_{t}=a)$

$= E_{s'}(r_{t}+\lambda Q^{\pi}(s',a')|S_{t}=s,A_{t}=a)$

The optimal Q-value is given by

$Q^{*}(s,a) = E_{s'}(r_{t}+\lambda max_{a'}Q^{*}(s',a')|S_{t}=s,A_{t}=a)$

Policy Iteration: It is the process of determining the optimal policy for the model and consists of the following two steps:-

Policy Evaluation: This process estimates the value of the long-term reward function with the greedy policy obtained from the last Policy Improvement step.
Policy Improvement: This process updates the policy with the action that maximizes V for each of the state. This process is repeated until convergence is achieved.

Steps Involved:-

Initialization:
$V(s)$ = any real random number
$\pi(s)$ = any A(s) arbitrarily chosen

Policy Evaluation:


while()
{
    for each s in S
    {    
        
        
        
    }
}

Policy Improvement:


while(true)
    for each s in S
    {
        
        
        if()
            
        if()
            break from both loops
    }
return V,

Value Iteration: This process updates the function V according to the Optimal Bellman Equation.
$v_{*}(s) = max_{a}E(R_{t+1}+\gamma v_{*}(S_{t+1})|S_{t}=s,A_{t}=a)$

Working Steps:

Initialization: Initialize array V by any random real number.

Computing the optimal value:


while()
{
    for each s in S
    {
        
        
        
    }
}


return

Q-learning Mathematical Background

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US