1 option
Partially observed Markov decision processes : filtering, learning and controlled sensing / Vikram Krishnamurthy.
- Format:
- Book
- Author/Creator:
- Krishnamurthy, V. (Vikram), author.
- Language:
- English
- Subjects (All):
- Markov processes--Textbooks.
- Markov processes.
- Stochastic processes--Textbooks.
- Stochastic processes.
- Physical Description:
- 1 online resource (xv, 635 pages) : digital, PDF file(s).
- Edition:
- Second edition..
- Place of Publication:
- Cambridge : Cambridge University Press, 2025.
- Summary:
- Covering formulation, algorithms and structural results and linking theory to real-world applications in controlled sensing (including social learning, adaptive radars and sequential detection), this book focuses on the conceptual foundations of partially observed Markov decision processes (POMDPs). It emphasizes structural results in stochastic dynamic programming, enabling graduate students and researchers in engineering, operations research, and economics to understand the underlying unifying themes without getting weighed down by mathematical technicalities. In light of major advances in machine learning over the past decade, this edition includes a new Part V on inverse reinforcement learning as well as a new chapter on non-parametric Bayesian inference (for Dirichlet processes and Gaussian processes), variational Bayes and conformal prediction.
- Contents:
- Cover
- Half-title
- Epigraph
- Title page
- Imprints page
- Contents
- Preface to Revised Edition
- Notation
- 1 Introduction
- 1.1 Part I. Stochastic Models and Bayesian Inference
- 1.2 Part II. POMDPs. Models, Algorithms and Applications
- 1.3 Part III. POMDP Structural Results
- 1.4 Part IV. Stochastic Gradient Algorithms and Reinforcement Learning
- 1.5 Part V. Inverse Reinforcement Learning
- 1.6 Examples of Controlled Sensing
- Part I Stochastic Models and Bayesian Inference
- 2 Stochastic State Space Models
- 2.1 Stochastic State Space Model
- 2.2 Optimal Prediction: Chapman-Kolmogorov Equation
- 2.3 Example 1. Linear Gaussian State Space Model
- 2.4 Example 2. Finite-State Hidden Markov Model (HMM)
- 2.5 Example 3. Jump Markov Linear Systems (JMLS)
- 2.6 Modeling Maneuvering Target in Clutter
- 2.7 Geometric Ergodicity of HMM Predictor: Dobrushin's Coefficient
- 2.8 Complements and Sources
- Appendix 2.A Wasserstein Distance
- Appendix 2.B Coupling Inequality
- Appendix 2.C Proof of Theorem 2.8 and Lemma 2.9
- 3 Optimal Filtering
- 3.1 Optimal State Estimation
- 3.2 Conditional Expectation and Bregman Divergence
- 3.3 Optimal Filtering, Prediction and Smoothing Formulas
- 3.4 Kalman Filter
- 3.5 Hidden Markov Model (HMM) Filter
- 3.6 Examples. Markov Modulated Time Series, Out of Sequence Measurements and Reciprocal Processes
- 3.7 Geometric Ergodicity of HMM Filter
- 3.8 Suboptimal Filters
- 3.9 Particle Filter
- 3.10 Example: Jump Markov Linear Systems (JMLS)
- 3.11 Complements and Sources
- Appendix 3.A Max-Plus Algebra Interpretation of Viterbi Algorithm
- Appendix 3.B Proof of Lemma 3.9 on page 57
- Appendix 3.C Proof of Lemma 3.10 on page 57
- Appendix 3.D Proof of Theorem 3.12 on page 58
- 4 Algorithms for Maximum Likelihood Parameter Estimation.
- 4.1 Maximum Likelihood Estimation Criterion
- 4.2 MLE of Partially Observed Models
- 4.3 Expectation Maximization (EM) Algorithm
- 4.4 Forward-Only Filter-Based EM Algorithms
- 4.5 Method-of-Moments Estimator for HMMs
- 4.6 Complements and Sources
- Appendix 4.A Minorization Maximization Algorithm
- Appendix 4.B Variants of EM
- Appendix 4.C Consistency of the MLE
- 5 Multiagent Sensing: Social Learning and Data Incest
- 5.1 Multiagent Bayesian Social Learning
- 5.2 Information Cascades and Constrained Social Learning
- 5.3 Risk Averse and Rational Inattention Based Social Learning
- 5.4 Slow Social Learning with Interacting Kalman Filters
- 5.5 Data Incest in Online Reputation Systems
- 5.6 Fair Online Reputation System
- 5.7 Complements and Sources
- Appendix 5.A Polling Social Networks and Friendship Paradox
- Appendix 5.B Proofs of Theorems
- 6 Nonparametric Bayesian Inference
- 6.1 Conjugate Priors, Bayesian Models and Clustering
- 6.2 Dirichlet Random Variables and Bayesian Inference
- 6.3 Mixture Distributions and Random Mixing Measures
- 6.4 Dirichlet Processes and Bayesian Inference
- 6.5 Dirichlet Process Mixture and Bayesian Inference
- 6.6 Gaussian Process Regression
- 6.7 Variational Bayesian Inference
- 6.8 Variational Autoencoder
- 6.9 Conformal Prediction
- 6.10 Complements and Sources
- Appendix 6.A Exchangeable Random Variables
- Appendix 6.B Supervised Bayes Classifiers
- Part II POMDPs. Models, Algorithms and Applications
- 7 Fully Observed Markov Decision Processes
- 7.1 Finite-State Finite Horizon MDP
- 7.2 Bellman's Stochastic Dynamic Programming Algorithm
- 7.3 Continuous-State MDP
- 7.4 Infinite Horizon Discounted Cost MDP
- 7.5 Infinite Horizon Average Cost MDP
- 7.6 Average Cost Constrained Markov Decision Process
- 7.7 Entropy-Regularized Markov Decision Process.
- 7.8 Distributionally Robust MDP
- 7.9 Inverse Optimal Control
- 7.10 Complements and Sources
- Appendix 7.A Time-Inconsistent Markov Decision Process
- Appendix 7.B Proof of Theorem 7.2 on page 169
- Appendix 7.C Proof of Theorem 7.8 on page 178
- 8 Partially Observed Markov Decision Processes
- 8.1 Finite Horizon POMDP
- 8.2 Belief State Formulation of POMDP
- 8.3 Stochastic Dynamic Programming for POMDP
- 8.4 Numerical Examples of POMDPs
- 8.5 Finite Horizon POMDP. Finite-Dimensional Controller
- 8.6 Exact Algorithms for Finite Horizon POMDPs
- 8.7 Suboptimal Algorithms for Finite Horizon POMDPs
- 8.8 Infinite Horizon Discounted Cost POMDP
- 8.9 Example: Optimal Search for a Markovian Moving Target
- 8.10 Complements and Sources
- 9 POMDPs in Controlled Sensing and Sensor Scheduling
- 9.1 Introduction
- 9.2 State and Sensor Control for State Space Models
- 9.3 Example 1: Linear Gaussian Control with Sensor Scheduling
- 9.4 Example 2: POMDPs in Controlled Sensing
- 9.5 Example 3. Multiagent Controlled Sensing with Social Learning
- 9.6 Risk Averse MDPs and POMDPs
- 9.7 Notes and Sources
- Appendix 9.A Convexity of Negative Conditional Entropy
- Appendix 9.B Proof of Theorems
- Part III POMDP Structural Results
- 10 Structural Results for Markov Decision Processes
- 10.1 Submodularity and Supermodularity
- 10.2 First-Order Stochastic (FoS) Dominance
- 10.3 Monotone Optimal Policies for MDPs
- 10.4 Trading Off Decreasing Costs with Submodular Costs
- 10.5 How Does the Optimal Cost Depend on Transition Matrix?
- 10.6 Integer-Concavity and Multimodularity of Value Function
- 10.7 Algorithms for Monotone Policies - Exploiting Sparsity
- 10.8 Example: Transmission Scheduling in Markov Correlated Channel
- 10.9 Complements and Sources
- Appendix 10.A Interval Dominance Structural Results for MDPs.
- Appendix 10.B Slepian and Sudakov-Fernique Inequalities
- Appendix 10.C Proofs of Theorems
- 11 Structural Results for Optimal Filters
- 11.1 Monotone Likelihood Ratio (MLR) Stochastic Order
- 11.2 Total Positivity of Order 2 (TP2)
- 11.3 Copositive Ordering of Stochastic Matrices
- 11.4 Monotone Properties of Optimal Filter (F1)-(F4)
- 11.5 Convex Stochastic Dominance of Optimal Filter using Blackwell Dominance
- 11.6 Examples of Blackwell Dominance in Filtering and Localization
- 11.7 Reduced-Complexity HMM Filtering with Sample-Path Bounds
- 11.8 MLR Monotonicity of Optimal Filter w.r.t. Observation Probabilities
- 11.9 Complements and Sources
- Appendix 11.A Strassen's Theorem
- Appendix 11.B Totally Positive of Order .. (TP..) Kernels
- Appendix 11.C Proofs of Theorems
- 12 Monotonicity of Value Function for POMDPs
- 12.1 POMDP Model and Assumptions
- 12.2 Main Result: Monotone Value Function in Belief State
- 12.3 Example 1: Monotone Policies for Two-State POMDPs
- 12.4 Example 2: POMDP Multi-armed Bandits Structural Results
- 12.5 Complements and Sources
- Appendix 12.A Neyman-Pearson Detector
- Appendix 12.B Proof of Theorem 12.3 on page 317
- 13 Structural Results for Stopping-Time POMDPs
- 13.1 Introduction
- 13.2 Stopping-Time POMDP Model and Value Iteration Bounds
- 13.3 Convexity of Stopping Set
- 13.4 Monotone Optimal Policy for Stopping-Time POMDP
- 13.5 Undiscounted Stopping-Time POMDPs
- 13.6 Optimal Policy is Characterized by Threshold Curve
- 13.7 Explicit Stopping Set with One-Step Lookahead Property
- 13.8 Characterization of Optimal Linear Threshold Policy
- 13.9 Example. Partially Observed Machine Replacement
- 13.10 Multiple Stopping-Time POMDP. Interactive Advertising
- 13.11 Multivariate Stopping-Time POMDPs
- 13.12 Radar Scheduling with Mutual Information Cost.
- 13.13 Complements and Sources
- Appendix 13.A Lattices and Submodularity
- Appendix 13.B MLR Dominance on Lines. Additional Properties
- Appendix 13.C Proof of Theorem 13.15 on page 344
- Appendix 13.D Proof of Theorem 13.21 on page 359
- Appendix 13.E Proof of Theorem 13.24 on page 366
- 14 Stopping-Time POMDPs for Quickest Detection
- 14.1 Example 1: Quickest Detection. Phase-Distributed Change Time and Variance Penalty
- 14.2 Example 2: Quickest Transient Detection
- 14.3 Example 3: Risk Sensitive Quickest Detection with Exponential Delay Penalty
- 14.4 Example 4: Quickest Detection with Multiagent Social Learning
- 14.5 Example 5: Constrained Social Optimum and Quickest Time Herding
- 14.6 Example 6: Optimal Pricing and Controlled Information Fusion
- 14.7 Example 7: Quickest Detection with Controlled Sampling
- 14.8 Complements and Sources
- Appendix 14.A Proof of Theorem 14.5 on page 389
- 15 Myopic Policy Bounds for POMDPs and Sensitivity to Model Parameters
- 15.1 Discounted Cost POMDP
- 15.2 Myopic Policies using Copositive Dominance: Insight
- 15.3 Myopic Bounds for Optimal Policy using Copositive Dominance
- 15.4 Optimizing Myopic Policy Bounds to Match Optimal Policy
- 15.5 Controlled Sensing POMDPs with Quadratic Costs
- 15.6 Myopic Policy Bounds using Blackwell Dominance
- 15.7 Combining Blackwell and Copositive Dominance
- 15.8 How Does Optimal POMDP Cost Vary with State and Observation Dynamics?
- 15.9 Sensitivity of POMDP to Misspecified Model and Policy
- 15.10 Complements and Sources
- Appendix 15.A Proof of Theorem 15.10 on page 417
- Appendix 15.B Proof of Theorem 15.11 on page 418
- Part IV Stochastic Gradient Algorithms and Reinforcement Learning
- 16 Stochastic Optimization and Gradient Estimation
- 16.1 Stochastic Gradient Algorithm
- 16.2 How Long to Simulate a Markov Chain?.
- 16.3 Gradient Estimators for Markov Processes - Overview.
- Notes:
- Title from publisher's bibliographic system (viewed on 16 May 2025).
- Description based on publisher supplied metadata and other sources.
- ISBN:
- 1-009-44941-9
- 1-009-44944-3
- OCLC:
- 1574120651
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.