My Account Log in

1 option

Partially observed Markov decision processes : filtering, learning and controlled sensing / Vikram Krishnamurthy.

Cambridge eBooks: Frontlist 2025 Available online

View online
Format:
Book
Author/Creator:
Krishnamurthy, V. (Vikram), author.
Language:
English
Subjects (All):
Markov processes--Textbooks.
Markov processes.
Stochastic processes--Textbooks.
Stochastic processes.
Physical Description:
1 online resource (xv, 635 pages) : digital, PDF file(s).
Edition:
Second edition..
Place of Publication:
Cambridge : Cambridge University Press, 2025.
Summary:
Covering formulation, algorithms and structural results and linking theory to real-world applications in controlled sensing (including social learning, adaptive radars and sequential detection), this book focuses on the conceptual foundations of partially observed Markov decision processes (POMDPs). It emphasizes structural results in stochastic dynamic programming, enabling graduate students and researchers in engineering, operations research, and economics to understand the underlying unifying themes without getting weighed down by mathematical technicalities. In light of major advances in machine learning over the past decade, this edition includes a new Part V on inverse reinforcement learning as well as a new chapter on non-parametric Bayesian inference (for Dirichlet processes and Gaussian processes), variational Bayes and conformal prediction.
Contents:
Cover
Half-title
Epigraph
Title page
Imprints page
Contents
Preface to Revised Edition
Notation
1 Introduction
1.1 Part I. Stochastic Models and Bayesian Inference
1.2 Part II. POMDPs. Models, Algorithms and Applications
1.3 Part III. POMDP Structural Results
1.4 Part IV. Stochastic Gradient Algorithms and Reinforcement Learning
1.5 Part V. Inverse Reinforcement Learning
1.6 Examples of Controlled Sensing
Part I Stochastic Models and Bayesian Inference
2 Stochastic State Space Models
2.1 Stochastic State Space Model
2.2 Optimal Prediction: Chapman-Kolmogorov Equation
2.3 Example 1. Linear Gaussian State Space Model
2.4 Example 2. Finite-State Hidden Markov Model (HMM)
2.5 Example 3. Jump Markov Linear Systems (JMLS)
2.6 Modeling Maneuvering Target in Clutter
2.7 Geometric Ergodicity of HMM Predictor: Dobrushin's Coefficient
2.8 Complements and Sources
Appendix 2.A Wasserstein Distance
Appendix 2.B Coupling Inequality
Appendix 2.C Proof of Theorem 2.8 and Lemma 2.9
3 Optimal Filtering
3.1 Optimal State Estimation
3.2 Conditional Expectation and Bregman Divergence
3.3 Optimal Filtering, Prediction and Smoothing Formulas
3.4 Kalman Filter
3.5 Hidden Markov Model (HMM) Filter
3.6 Examples. Markov Modulated Time Series, Out of Sequence Measurements and Reciprocal Processes
3.7 Geometric Ergodicity of HMM Filter
3.8 Suboptimal Filters
3.9 Particle Filter
3.10 Example: Jump Markov Linear Systems (JMLS)
3.11 Complements and Sources
Appendix 3.A Max-Plus Algebra Interpretation of Viterbi Algorithm
Appendix 3.B Proof of Lemma 3.9 on page 57
Appendix 3.C Proof of Lemma 3.10 on page 57
Appendix 3.D Proof of Theorem 3.12 on page 58
4 Algorithms for Maximum Likelihood Parameter Estimation.
4.1 Maximum Likelihood Estimation Criterion
4.2 MLE of Partially Observed Models
4.3 Expectation Maximization (EM) Algorithm
4.4 Forward-Only Filter-Based EM Algorithms
4.5 Method-of-Moments Estimator for HMMs
4.6 Complements and Sources
Appendix 4.A Minorization Maximization Algorithm
Appendix 4.B Variants of EM
Appendix 4.C Consistency of the MLE
5 Multiagent Sensing: Social Learning and Data Incest
5.1 Multiagent Bayesian Social Learning
5.2 Information Cascades and Constrained Social Learning
5.3 Risk Averse and Rational Inattention Based Social Learning
5.4 Slow Social Learning with Interacting Kalman Filters
5.5 Data Incest in Online Reputation Systems
5.6 Fair Online Reputation System
5.7 Complements and Sources
Appendix 5.A Polling Social Networks and Friendship Paradox
Appendix 5.B Proofs of Theorems
6 Nonparametric Bayesian Inference
6.1 Conjugate Priors, Bayesian Models and Clustering
6.2 Dirichlet Random Variables and Bayesian Inference
6.3 Mixture Distributions and Random Mixing Measures
6.4 Dirichlet Processes and Bayesian Inference
6.5 Dirichlet Process Mixture and Bayesian Inference
6.6 Gaussian Process Regression
6.7 Variational Bayesian Inference
6.8 Variational Autoencoder
6.9 Conformal Prediction
6.10 Complements and Sources
Appendix 6.A Exchangeable Random Variables
Appendix 6.B Supervised Bayes Classifiers
Part II POMDPs. Models, Algorithms and Applications
7 Fully Observed Markov Decision Processes
7.1 Finite-State Finite Horizon MDP
7.2 Bellman's Stochastic Dynamic Programming Algorithm
7.3 Continuous-State MDP
7.4 Infinite Horizon Discounted Cost MDP
7.5 Infinite Horizon Average Cost MDP
7.6 Average Cost Constrained Markov Decision Process
7.7 Entropy-Regularized Markov Decision Process.
7.8 Distributionally Robust MDP
7.9 Inverse Optimal Control
7.10 Complements and Sources
Appendix 7.A Time-Inconsistent Markov Decision Process
Appendix 7.B Proof of Theorem 7.2 on page 169
Appendix 7.C Proof of Theorem 7.8 on page 178
8 Partially Observed Markov Decision Processes
8.1 Finite Horizon POMDP
8.2 Belief State Formulation of POMDP
8.3 Stochastic Dynamic Programming for POMDP
8.4 Numerical Examples of POMDPs
8.5 Finite Horizon POMDP. Finite-Dimensional Controller
8.6 Exact Algorithms for Finite Horizon POMDPs
8.7 Suboptimal Algorithms for Finite Horizon POMDPs
8.8 Infinite Horizon Discounted Cost POMDP
8.9 Example: Optimal Search for a Markovian Moving Target
8.10 Complements and Sources
9 POMDPs in Controlled Sensing and Sensor Scheduling
9.1 Introduction
9.2 State and Sensor Control for State Space Models
9.3 Example 1: Linear Gaussian Control with Sensor Scheduling
9.4 Example 2: POMDPs in Controlled Sensing
9.5 Example 3. Multiagent Controlled Sensing with Social Learning
9.6 Risk Averse MDPs and POMDPs
9.7 Notes and Sources
Appendix 9.A Convexity of Negative Conditional Entropy
Appendix 9.B Proof of Theorems
Part III POMDP Structural Results
10 Structural Results for Markov Decision Processes
10.1 Submodularity and Supermodularity
10.2 First-Order Stochastic (FoS) Dominance
10.3 Monotone Optimal Policies for MDPs
10.4 Trading Off Decreasing Costs with Submodular Costs
10.5 How Does the Optimal Cost Depend on Transition Matrix?
10.6 Integer-Concavity and Multimodularity of Value Function
10.7 Algorithms for Monotone Policies - Exploiting Sparsity
10.8 Example: Transmission Scheduling in Markov Correlated Channel
10.9 Complements and Sources
Appendix 10.A Interval Dominance Structural Results for MDPs.
Appendix 10.B Slepian and Sudakov-Fernique Inequalities
Appendix 10.C Proofs of Theorems
11 Structural Results for Optimal Filters
11.1 Monotone Likelihood Ratio (MLR) Stochastic Order
11.2 Total Positivity of Order 2 (TP2)
11.3 Copositive Ordering of Stochastic Matrices
11.4 Monotone Properties of Optimal Filter (F1)-(F4)
11.5 Convex Stochastic Dominance of Optimal Filter using Blackwell Dominance
11.6 Examples of Blackwell Dominance in Filtering and Localization
11.7 Reduced-Complexity HMM Filtering with Sample-Path Bounds
11.8 MLR Monotonicity of Optimal Filter w.r.t. Observation Probabilities
11.9 Complements and Sources
Appendix 11.A Strassen's Theorem
Appendix 11.B Totally Positive of Order .. (TP..) Kernels
Appendix 11.C Proofs of Theorems
12 Monotonicity of Value Function for POMDPs
12.1 POMDP Model and Assumptions
12.2 Main Result: Monotone Value Function in Belief State
12.3 Example 1: Monotone Policies for Two-State POMDPs
12.4 Example 2: POMDP Multi-armed Bandits Structural Results
12.5 Complements and Sources
Appendix 12.A Neyman-Pearson Detector
Appendix 12.B Proof of Theorem 12.3 on page 317
13 Structural Results for Stopping-Time POMDPs
13.1 Introduction
13.2 Stopping-Time POMDP Model and Value Iteration Bounds
13.3 Convexity of Stopping Set
13.4 Monotone Optimal Policy for Stopping-Time POMDP
13.5 Undiscounted Stopping-Time POMDPs
13.6 Optimal Policy is Characterized by Threshold Curve
13.7 Explicit Stopping Set with One-Step Lookahead Property
13.8 Characterization of Optimal Linear Threshold Policy
13.9 Example. Partially Observed Machine Replacement
13.10 Multiple Stopping-Time POMDP. Interactive Advertising
13.11 Multivariate Stopping-Time POMDPs
13.12 Radar Scheduling with Mutual Information Cost.
13.13 Complements and Sources
Appendix 13.A Lattices and Submodularity
Appendix 13.B MLR Dominance on Lines. Additional Properties
Appendix 13.C Proof of Theorem 13.15 on page 344
Appendix 13.D Proof of Theorem 13.21 on page 359
Appendix 13.E Proof of Theorem 13.24 on page 366
14 Stopping-Time POMDPs for Quickest Detection
14.1 Example 1: Quickest Detection. Phase-Distributed Change Time and Variance Penalty
14.2 Example 2: Quickest Transient Detection
14.3 Example 3: Risk Sensitive Quickest Detection with Exponential Delay Penalty
14.4 Example 4: Quickest Detection with Multiagent Social Learning
14.5 Example 5: Constrained Social Optimum and Quickest Time Herding
14.6 Example 6: Optimal Pricing and Controlled Information Fusion
14.7 Example 7: Quickest Detection with Controlled Sampling
14.8 Complements and Sources
Appendix 14.A Proof of Theorem 14.5 on page 389
15 Myopic Policy Bounds for POMDPs and Sensitivity to Model Parameters
15.1 Discounted Cost POMDP
15.2 Myopic Policies using Copositive Dominance: Insight
15.3 Myopic Bounds for Optimal Policy using Copositive Dominance
15.4 Optimizing Myopic Policy Bounds to Match Optimal Policy
15.5 Controlled Sensing POMDPs with Quadratic Costs
15.6 Myopic Policy Bounds using Blackwell Dominance
15.7 Combining Blackwell and Copositive Dominance
15.8 How Does Optimal POMDP Cost Vary with State and Observation Dynamics?
15.9 Sensitivity of POMDP to Misspecified Model and Policy
15.10 Complements and Sources
Appendix 15.A Proof of Theorem 15.10 on page 417
Appendix 15.B Proof of Theorem 15.11 on page 418
Part IV Stochastic Gradient Algorithms and Reinforcement Learning
16 Stochastic Optimization and Gradient Estimation
16.1 Stochastic Gradient Algorithm
16.2 How Long to Simulate a Markov Chain?.
16.3 Gradient Estimators for Markov Processes - Overview.
Notes:
Title from publisher's bibliographic system (viewed on 16 May 2025).
Description based on publisher supplied metadata and other sources.
ISBN:
1-009-44941-9
1-009-44944-3
OCLC:
1574120651

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account