Effective Algorithms for POMDP Optimization

Algorithms for POMDP Presented by Alp Sardağ

Monahan Enumeration Phase • Generate all vectors: Number of gen. Vectors = |A|M|| where M vectors of previous state

Monahan Reduction Phase • All vectors can be kept: • Each time maximize over all vectors. • Lot of excess baggage • The number of vectors in next step will be even large. • LP used to trim away useless vectors

Monahan Reduction Phase • For a vector to be useful, there must be at least one belief point it gives larger value than others:

Monahan Algorithm

Monahan’s LP Complication Formulate LP and check for :

Eagle’s Variant of Monahan • The optimization occurs in enumaration phase. • If, in the enumaration process, a vector’s components are completely dominated by another vector’s component, discard it. • Generate ji(t) and following condition holds: • Discard ji(t). • Can be applied to check new vector dominates any vector previously enumarated.

Sondik’s One-Pass Algorithm • Find theproper set of belief states to plug into the below formula to get all necessary vectors: • The algorithm is guaranteed to visit finite number of regions. • The union of these regions is the entire belief space.

Sondik’s One-Pass Algorithm • Simplified version of Sondik’s algorithm:

Sondik’s One-Pass Algorithm • How to define a region around this belief state where that vector is guaranteed to be true linear portion of the value function? • Construct a series of constraints when satisfied, region is found. • Then go step (5)

Sondik’s One-Pass Algorithm • The condition *(t), generated at , larger for all other a(t), as  varies: • Variations in  can cause changes in a(t). • Need a new constraint to ensure components of a(t) stay the same.

Sondik’s One-Pass Algorithm • What affects *(t) and a(t)? • To ensure that every part of the function does not change, these constraint exists for every combination of a and 

Sondik’s One-Pass Algorithm • Constraints restrict belief states to lie on the belief state space simplex:

Sondik’s One-Pass Algorithm • A constraint consists of a region with all the points on one side of the line:

Sondik’s One-Pass Algorithm • The LP constraints at step (4):

Sondik’s One-Pass Algorithm • In step (5), find belief states guaranteed not to be in region defined in step (4). • With the new point proceed exactly as step (4). • The algorithm goes until a complete partition of the belief space found.

Sondik’s One-Pass Algorithm • To find points in the neighboring regions, points lying on the edge of the region defined by the constraints is used:

Sondik’s One-Pass Algorithm • Which constraints are binding: • For each constraint, change its inequality into an equality, • Solve this LP. • If the LP has solution, it is a binding constraint, a non-binding constraint can not pass through the region defined by all other constraints.

Cheng’s Relaxed Region • Same as Sondik’s One Pass algorithm except each region specified with fewer constraints. • Defines regions that will typically be larger than the actual vectors’ regions.

Cheng’s Relaxed Region • Set of constraints for the relaxed regions of Cheng:

Cheng’s Relaxed Region • Corners found with interior algorithm:

Cheng’s Linear Support • The algorithm defines an approximate value function over the entire belief space. • Refine this approximation until it reaches the optimal value function.

Cheng’s Linear Support • Difference between two algorithms:

Cheng’s Linear Support • Initiliaze a search list with extreme points on the belief simplex(e.g. [1,0,0...],[0,1,0,0...]), and an empty set of vectors. • For each of these points the true (t) vector calculated, and added to the set of vectors.

Cheng’s Linear Support • Since both the true and the approximation are PWLC, the largest difference must occur at a corner point. • Cheng then finds all the corner points of the regionsinduced by the approximation. • Disregard the corner points seen before and add those not seen before to search list. • Pick a point from the search list, generate the vector. If it is different all the other approximation, add it to the approximation set. • Repeat whole procedure with the new approximation

Cheng’s Linear Support

Effective Algorithms for POMDP Optimization

Effective Algorithms for POMDP Optimization

Presentation Transcript

Algorithms for Classification:

KI2 – MDP / POMDP

Algorithms for Finding Genes

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Optimal Policies for POMDP

Finding Approximate POMDP Solutions through Belief Compression

Algorithms for encryption

Algorithms for Classification:

Advances in Point-Based POMDP Solvers

Adaptive Algorithms for PCA

Paradigms for Parallel Algorithms

Partially Observable Markov Decision Process (POMDP)

Algorithms for Beginners - Edukite

Evaluating Algorithms for GRE

Algorithms for Selfish Agents

Meeting 3 POMDP (Partial Observability MDP)

POMDP

Algorithms for Classification:

Algorithms for Classification: