HomeAboutWork with UsResearch Refill FOMC Room Contact

Sequential Action and Beliefs — POMDP at the Center of DSGE

A Markovian classification of DSGE models that puts Partially Observable MDPs at the generalization of linear rational expectations, and derives an equilibrium as a fixed point mapping observables to beliefs.

Sapiens Q 6 min read

Background

Dynamic Stochastic General Equilibrium (DSGE) models share a recursive Markov structure, but almost all of them assume agents observe the true state — capital, technology, output gap — perfectly. Kim borrows a taxonomy from AI and control theory to make this assumption visible:

state transition exogenousstate transition endogenous
state observableMarkov Chain (MC)Markov Decision Process (MDP) — Lucas 1978, Blanchard & Gali 2007
state hiddenHidden Markov Model (HMM) — Kydland & Prescott 1982, Lorenzoni 2009Partially Observable MDP (POMDP) — Sargent 1991, Svensson & Woodford 2003/04

The paper argues the POMDP cell is where post-2007 macro actually lives: “trouble in capital valuation” (Gertler & Karadi 2009, Curdia 2008), signal-extraction from endogenous indicators, and central bankers who cannot see the output gap they themselves help shape.

Core Idea

Under POMDP, estimation and control no longer separate — the agent’s action changes the very state she is trying to infer. The paper follows the AI/POMDP tradition (Sondik 1971, Kaelbling et al. 1998): treat the agent’s beliefs as her inner state, and solve for an action rule on that inner state plus a belief-update rule consistent with it.

For a linear rational-expectations system in stacked state θt=[kt zt]\theta_t = [k_t\ z_t]' with observation channel yt=Byy1BycctByy1Byθθty_t = -B_{yy}^{-1}B_{yc}c_t - B_{yy}^{-1}B_{y\theta}\theta_t, the equilibrium is characterized by

ytwhat we observe=By~cHcθθttwhat we believe+By~θθtwhat truly is.\underbrace{y_t}_{\text{what we observe}} = \underbrace{B_{\tilde{y}c} H_{c\theta}\, \theta_{t|t}}_{\text{what we believe}} + \underbrace{B_{\tilde{y}\theta}\, \theta_t}_{\text{what truly is}}.

An equilibrium is a fixed point of the operator that reverses this relation — mapping observables back to beliefs in a Bayesian direction. Sargent (1991)‘s fixed-point is between perceived and actual laws of motion; Kim’s fixed-point is instantaneous, between observations and beliefs, which relaxes the requirement that predictable parts and residuals coincide.

Method

3.1 Solution step 1 — optimal action as a function of beliefs

Conjecture ct=Hcθθttc_t = H_{c\theta}\, \theta_{t|t} (certainty-equivalent on point beliefs). Substitute into the optimality and transition equations, take conditional expectations, and match undetermined coefficients. Whenever the corresponding MDP problem has a Blanchard-Kahn / Klein solution, the same HcθH_{c\theta} satisfies the POMDP quadratic — the action rule survives the loss of observability.

3.2 Solution step 2 — optimal sequential beliefs

Take the action rule as given and apply least mean-squared-error. Conjecture a Kalman-style update θttθtt1=M~t(ytytt1)\theta_{t|t} - \theta_{t|t-1} = \tilde{M}_t (y_t - y_{t|t-1}), derive M~t\tilde{M}_t from MSE minimization, and require closed-loop consistency M~t=Mt\tilde{M}_t = M_t. The result is a modified Kalman recursion:

θt+1t=(Bθ~cHcθ+Bθ~θ)θttθtt=θtt1+Mt(ytytt1)Σ^t+1=Bθ~θΣtBθ~θ+HθωΣωHθωΣt=(IMtGt)Σ^t.\begin{aligned} \theta_{t+1|t} &= (B_{\tilde\theta c} H_{c\theta} + B_{\tilde\theta\theta})\, \theta_{t|t} \\ \theta_{t|t} &= \theta_{t|t-1} + M_t (y_t - y_{t|t-1}) \\ \hat\Sigma_{t+1} &= B_{\tilde\theta\theta} \Sigma_t B_{\tilde\theta\theta}' + H_{\theta\omega}\Sigma_\omega H_{\theta\omega}' \\ \Sigma_t &= (I - M_t G_t')\, \hat\Sigma_t. \end{aligned}

The modification relative to standard Kalman: the gain MtM_t is coupled to the optimal action coefficients HcθH_{c\theta} through Gt={I+By~cHcθmt}By~θG_t' = \{I + B_{\tilde{y}c} H_{c\theta}\,\mathbf{m}_t\} B_{\tilde{y}\theta} — inference and control no longer decouple.

3.3 MDP and HMM as special cases

  • If ry=rθr_y = r_\theta with HyθH_{y\theta} full-rank, the state is instantaneously invertible from yty_t and POMDP collapses to MDP.
  • If only the exogenous shocks are hidden, kt=kttk_t = k_{t|t}, and the recursion reduces to the standard Kalman filter of HMM.

So the POMDP solution is the unifier, not a separate object.

Experiments

Kim calibrates the textbook Neoclassical growth model at quarterly frequency (α=0.36\alpha=0.36, β=0.99\beta=0.99, δ=0.025\delta=0.025, ρ=0.99\rho=0.99) and builds three variants — MDP, HMM (transitory + permanent technology, observable capital), POMDP (also hidden capital). Three findings worth carrying:

  1. Same action coefficients, different state vector. HcθH_{c\theta} numerically coincides across the three variants — the controls are identical functions, but of the true state under MDP, of beliefs about technology under HMM, and of beliefs about everything under POMDP.
  2. Convergence is dramatically slower under POMDP. Reaching 10710^{-7} on the covariance takes 12 iterations under HMM but 221 under POMDP with the same initial condition. At convergence, the two filters agree (output surprise is attributed 61.6 % to permanent, 38.4 % to transitory technology), but the transient path is very different.
  3. Perception shock as a proxy for the pre-convergence regime. A 1 % positive mis-valuation of capital (the agent thinks she is richer than she is) triggers a consumption spree → capital decumulation → persistent undervaluation of permanent technology. Consumption takes more than 100 periods to return to steady state without any further shocks.

The comparison is internal and non-empirical — no baselines, no data, no seeds — so read the numbers as properties of the solution algorithm, not empirical claims.

Limitations

  • Linearity and Gaussianity are load-bearing. Weitzman (2007) and Geweke (2001) show CRRA-plus-normal can hide equity-premium-like puzzles; the same caution applies to any POMDP impulse-response built on this assumption.
  • Convergence of Σ^t\hat\Sigma_t is assumed. When the structural environment changes (regime switch, learning, structural break), the steady-state filter is a bad approximation for exactly the periods that matter most.
  • Partially-observable controls (trembling hand) are explicitly out of scope.
  • Only the linear-quadratic-Gaussian POMDP is tackled analytically; non-linear POMDPs in the AI sense (belief-space planning over continuous non-Gaussian beliefs) remain numerical.

References

  • Original paper: Kim, S.-H. (2012). Sequential Action and Beliefs Under Partially Observable DSGE Environments. Computational Economics 40(3), 219–244.
  • Foundational POMDP: Sondik, E. (1971). The optimal control of partially observable Markov decision processes. PhD thesis, Stanford.
  • Fixed-point-of-perceived-laws-of-motion ancestor: Sargent, T. (1991). Equilibrium with signal extraction from endogenous variables. JEDC 15(2).
  • Related filtering work: Svensson & Woodford (2003, 2004); Baxter, Graham & Wright (2011).