Doubt about belief update #465

Manavvora · 2023-02-28T20:33:43Z

Manavvora
Feb 28, 2023

Hi,

I am using the BasicPOMCP algorithm to solve my POMDP which I have created using the QuickPOMDP interface. I had a doubt regarding the belief update.

I have defined my transition and the observation actions in the following manner:

                transition = function(s,a)
                    if a == a1
                        return #some distribution over next states
                    elseif a == a2
                        return #some distribution over next states
                    elseif a == a3
                        return #some distribution over next states
                    end
                end,

                observation = function(s,a,sp)
                    if a == a1
                        return Deterministic(sp)
                    elseif a == a2
                        return Deterministic(sp)
                    elseif a == a3
                        return Deterministic(e)
                    end
                end,

So, if I take actions a1 or a2, I get the observation equal to the exact value of the next state and if I take action a3, I get some error 'e' which is not a part of the state space. For example if my state space is {0,1,2,3} then e could be 4.

My question is: For a1 and a2, will the belief also be deterministic(sp)? I am asking this because theoretically, the optimal policy should not perform action a1 consecutively since it already knows the state it is in but the optimal policy I get from POMCP still does it. For now I have created a selective_actions function and added an if condition to prevent the same. But is there a better work-around for this?

zsunberg · 2023-03-10T23:41:27Z

zsunberg
Mar 10, 2023
Maintainer

Hey @Manavvora , sorry this slipped through the cracks and I forgot an answer.

For a1 and a2, will the belief also be deterministic(sp)?

Yes! pdf(new_belief, sp) will be 1.0 (as long as sp has a nonzero probability given the transition matrix). This is mathematically equivalent to Deterministic(sp), but it may not be represented by a Deterministic object.

I would encourage you to check this for yourself, by running something like this:

s = rand(belief)
sp, o = @gen(:sp, :o)(pomdp_model, s, a1)
new_belief = update(belief, a1, o)
@assert pdf(new_belief, sp) == 1.0

For now I have created a selective_actions function and added an if condition to prevent the same. But is there a better work-around for this?

If you know that an action is never in the optimal policy it will help the solver to use a belief-dependent action space as you have described because the solver will not have to waste computation trying that action. But, I don't think this will have a huge affect on performance - there are many other factors that have a much more significant effect. Are you encountering poor solver performance?

1 reply

Manavvora Mar 11, 2023
Author

Thank you for the reply @zsunberg! No I’m not encountering poor solver performance! Thanks a lot for you help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubt about belief update #465

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Doubt about belief update #465

Manavvora Feb 28, 2023

Replies: 1 comment · 1 reply

zsunberg Mar 10, 2023 Maintainer

Manavvora Mar 11, 2023 Author

Manavvora
Feb 28, 2023

Replies: 1 comment 1 reply

zsunberg
Mar 10, 2023
Maintainer

Manavvora Mar 11, 2023
Author