4 No-arbitrage models of the term struture of risk-free yields
4.1 Introduction
Risk-free yields are the yields-to-maturity associated with bonds that carry no default and/or liquidity risks. Bonds issued by sovereign entities with top credit quality are usually considered to be risk-free. Because of call-margins mechanisms, swap rates are also used as risk-free benchmarks (Darrell Duffie and Stein 2015) (see Subsection 6.2.1). (Credit and liquidity risks are covered in Section 7.)
An important share of the term-structure literature pertains to the modelling of risk-free yields. Some models explicitly involve macroeconomic factors; this can be done in a reduced-form way (e.g., Ang and Piazzesi (2003)) or in a more structural way, in the context of the equilibrium term-structure approaches (e.g., Bansal and Shaliastovich (2013), see Example 3.8) or even in the context of dynamic stochastic general equilibrium models (e.g., Hördahl, Tristani, and Vestin (2008), Dew-Becker (2014)). Many studies feature purely latent factors, with no explicit macroeconomic interpretation (e.g., Darrel Duffie and Singleton (1997), Joslin, Singleton, and Zhu (2011)). The latter are sometimes called yield-only models.
In this section, we consider term structure of zero-coupon yields, which are basic objects from which one can price any stream of fixed payoffs that will be settled at fixed future dates. These yields are usually not directly observed in the market. As a result, they have to be constructed in the first place; Appendix 4.5 briefly presents approaches used for that purpose.
Under regularity and no-arbitrage assumptions, the price of a risk-free zero coupon bond, and the associated yield-to-maturity, satisfy (see Eq. (2.3)): \[\begin{eqnarray} B_{t,h} &=& \mathbb{E}_t (\mathcal{M}_{t,t+h})\nonumber\\ &=& \mathbb{E}_t (\mathcal{M}_{t,t+1} \times \dots \times \mathcal{M}_{t+h-1,t+h})\nonumber\\ &=& \mathbb{E}^{\mathbb{Q}}_t \exp(-i_t -i_{t+1}-\dots-i_{t+h-1})\nonumber\\ i_{t,h} &=& - \frac{1}{h} \log B_{t,h}, \tag{4.1} \end{eqnarray}\] where \(\mathcal{M}_{t,t+h}\) is the (strictly positive) stochastic discount factor between dates \(t\) and \(t+h\), and where the risk neutral measure \(\mathbb{Q}\) is defined with respect to the physical measure \(\mathbb{P}\) by means of the Radon-Nikodym derivative \(\mathcal{M}_{t,t+1}\big/\mathbb{E}_t(\mathcal{M}_{t,t+1})\).18
Term structure models are often used to extract term premiums from observed yields-to-maturity. Term premiums are those components of yields that would not exist if investors were not risk-averse (see, e.g., Subsection 3.1.6).
If agents were not risk averse, i.e., under the Expectation Hypothesis (EH), we would have \(\mathcal{M}_{t,t+1} = \exp(- i_t)\) and \(\mathbb{P} \equiv \mathbb{Q}\); \(B_{t,h}\) would then be equal to: \[\begin{equation} B^{EH}_{t,h} := \mathbb{E}_t \exp(-i_{t}-i_{t+1}-\dots-i_{t+h-1}).\tag{4.2} \end{equation}\] The (counterfactual) maturity-\(h\) yield-to-maturity would then be: \[\begin{eqnarray} i^{EH}_{t,h} &=& -\frac{1}{h}\log \left( \mathbb{E}_t \exp(-i_t-\dots-i_{t+h-1})\right)\nonumber\\ &\approx& \frac{1}{h}\mathbb{E}_t(i_t + \dots + i_{t+h-1}).\tag{4.3} \end{eqnarray}\]
Using the previous notations, the term premium is usually defined as: \[\begin{eqnarray} TP_{t,h} &=& \underbrace{- \frac{1}{h} \log \mathbb{E}^{\mathbb{Q}}_t \exp(-i_{t+1}-\dots-i_{t+h-1})}_{=i_{t,h}} \nonumber \\ && \underbrace{- \frac{1}{h} \log \mathbb{E}_t \exp(-i_{t+1}-\dots-i_{t+h-1}).}_{=i^{EH}_{t,h}}\tag{4.4} \end{eqnarray}\]
A term premium is a specific type of risk premium (see Def. 4.1), in the sense that it reflects changes between the price of an asset (a bond here, \(B_{t,h}\)) and the counterfactual we would observe if agents were not risk averse (\(B_{t,h}^{EH}\), here).
Economically, what accounts for term premiums? To address this in a comprehensive way, one needs to resort to structural approaches (see Section 3). A short reply is that a long-term bond is not “risk-free” as it is exposed to the interest-rate risk; and that investors are averse to this risk, hence the term premiums. Take a two-period bond; using (4.6), we have: \[\begin{equation} B_{t,2} = \underbrace{\mathbb{E}_t(\exp(-i_t-i_{t+1}))}_{= B^{EH}_{t,2}} + \mathbb{C}ov_t(\mathcal{M}_{t,t+1},\underbrace{\exp(-i_{t+1})}_{=B_{t+1,1}}).\tag{4.5} \end{equation}\] Therefore \(B_{t,2} \ne B_{t,2}^{EH}\) as soon as \(\mathcal{M}_{t,t+1}\) and \(i_{t+1}\) are conditionally correlated, which is likely to be the case (see, e.g., the CCAPM case in Subsection 3.1.6).
In the following subsections, we describe statistical approaches that have been used to estimate term premiums. These approaches are usually silent about the drivers of the term premiums. Eq (4.5) provides some insights regarding the sources of fluctuations of term premiums: interest rate volatility, SDF volatility (that may depend on risk aversion, see, e.g., the Epstein-Zin case in (3.28)), the correlation between interest rates and the SDF.
Under EH, investors are willing to buy a maturity-\(h\) bond as long as its expected return is—up to Jensen’s inequality—equal to the average of future short-term rates. (Hence the definition of \(i^{EH}_{t,h}\), see Eq. (4.3).) The fact that \(TP_{t,h}>0\) (say) means that investors are willing to buy the maturity-\(h\) bond only if its return is, on average, higher than expected future short-term rates; it corresponds to a situation where investors consider that long-term bonds tend to lose value in bad states of the world (i.e., states of high marginal utility) and want to be compensated for that.
Definition 4.1 (Risk premium, in the general case) According to (2.1), under the absence of arbitrage opportunities, the price of any asset \(j\) satisfies: \[\begin{equation} p_{jt} = \mathbb{E}_t(\mathcal{M}_{t,t+1} p_{j,t+1}),\tag{4.6} \end{equation}\] or, equivalently, \[ p_{jt} = \exp(-i_t)\mathbb{E}^{\mathbb{Q}}_t(p_{j,t+1}), \] where the risk neutral measure \(\mathbb{Q}\) is defined with respect to the physical measure \(\mathbb{P}\) by means of the Radon-Nikodym derivative \(\mathcal{M}_{t,t+1}\big/\mathbb{E}_t(\mathcal{M}_{t,t+1})\).19
Eq. (4.6) rewrites: \[ p_{jt} = \mathbb{E}_t(\mathcal{M}_{t,t+1})\mathbb{E}_t( p_{j,t+1}) + \mathbb{C}ov_t(\mathcal{M}_{t,t+1}, p_{j,t+1}) \] or \[\begin{equation} p_{jt} = \underbrace{\exp(-i_t)\mathbb{E}_t( p_{j,t+1})}_{=p^{EH}_{jt}} + \underbrace{\mathbb{C}ov_t(\mathcal{M}_{t,t+1}, p_{j,t+1})}_{\mbox{Risk premium}}.\tag{4.7} \end{equation}\] If investors were not risk-averse, then we would have \(p_{jt} = p^{EH}_{jt}\). The SDF is high (resp. low) in bad (resp. good) states of the world (states of high marginal utility in the equilibrium approach). Hence, we have \(p_{jt}< p^{EH}_{jt}\) if asset \(j\) tends to pay less in bad states of the world (i.e., if \(\mathbb{C}ov_t(\mathcal{M}_{t,t+1}, p_{j,t+1})<0\)).
Example 4.1 (Testing the Expectation Hypothesis) Several studies have employed regression analysis to test the Expectation Hypothesis, e.g., Campbell and Shiller (1991) and J. H. Cochrane and Piazzesi (2005). Let us focus on the former study. Campbell and Shiller (1991) note that, if agents were risk-neutral, we would have: \[ B_{t,n} = \exp(-i_{t,m})\mathbb{E}_t(B_{t+m,n-m}). \] This would imply: \[ -n i_{t,n} = -i_{t,m} + \log \mathbb{E}_t[\exp(-(n-m) i_{t+m,n-m})]. \] Up to Jensen’s inequality, and taking expectations on both sides, we get: \[ -n i_{t,n} \approx \mathbb{E}_t(-i_{t,m} -(n-m) i_{t+m,n-m}). \] After having reorganized: \[\begin{equation} \mathbb{E}_t(i_{t+m,n-m}-i_{t,n}) \approx \frac{i_{t,n}-i_{t,m}}{n-m}.\tag{4.8} \end{equation}\]
Hence, if the Expectation Hypothesis holds, we should obtain unit coefficients when regressing \(i_{t+m,n-m}-i_{t,n}\) on \(\frac{i_{t,n}-i_{t,m}}{n-m}\). This is usually rejected by the data.
Before turning to the no-arbitrage term structure models, note that (4.4) implies that term premium estimates are directly available if we have good approximations for the expected future short term rates (i.e., \(i_{t,h}^{EH}\)). Some surveys offer such proxies. It is for instance the case of the Philadelphia Fed SPF, that provides expectations of the short-term rate over the next 10 years (see Figure 4.1). However, there is only one release per year for this survey.
In the same spirit, Figure 4.2 compares the 10-year breakeven rate of inflation (\(i_{t,h}- r_{t,h}\)), also called inflation compensation, with the expected annualized inflation over the next 10 years. The difference between the line and the dots is the inflation risk premium, defined as: \[ IRP_{t,h} = i_{t,h} - r_{t,h} - \mathbb{E}_t(\pi_{t,t+h}), \] where \(\pi_{t,t+h}\) denotes the annualized inflation rate between dates \(t\) and \(t+h\).
Figure 4.2 shows that the inflation risk premium has often been negative in recent years, suggesting a perceived importance of demand shocks in the economy.
4.2 The Affine Case
4.2.1 Affine yields
In this subsection, we consider the case where the state vector \(w_t\) is affine under both \(\mathbb{P}\) and \(\mathbb{Q}\) (see Def. 1.1). If the nominal short-term rate is affine in \(w_t\), i.e., if \(i_t = \omega_0 + \omega'_1 w_t\), then (see (2.3)): \[\begin{eqnarray} B_{t,h} &=& \mathbb{E}^{\mathbb{Q}}_t \exp (-i_{t}-\dots-i_{t+h-1}) \nonumber\\ &=& \exp(-h\omega_0 - \omega'_1 w_t) \color{blue}{\mathbb{E}^{\mathbb{Q}}_t \exp (- \omega'_1 w_{t+1}-\dots- \omega'_1 w_{t+h-1})}.\tag{4.9} \end{eqnarray}\] The (blue) expectation can be computed using the recursive equations of Proposition 1.5 (see Example 1.18), leading to: \[\begin{equation} i_{t,h}= - \frac{1}{h} \log B_{t,h} = a_h'w_t + b_h.\tag{4.10} \end{equation}\] Similarly, (4.2) leads to: \[\begin{equation} i^{EH}_{t,h} = {a^{EH}_h}'w_t + b^{EH}_h.\tag{4.11} \end{equation}\] Moreover, if inflation is also affine in \(w_t\), i.e., if \(\pi_{t} = \bar\omega_0 + \bar\omega'_1 w_t\), then real yields are given by: \[\begin{eqnarray*} \mathcal{B}_{t,h} &=& \mathbb{E}^{\mathbb{Q}}_t \exp(-i_{t}-\dots-i_{t+h-1}+\pi_{t+1}+\dots+\pi_{t+h}) \end{eqnarray*}\] (see Example 1.20) which also leads to: \[\begin{equation} r_{t,h} = - \frac{1}{h} \log \mathcal{B}_{t,h} = \bar{a}_h'w_t + \bar{b}_h.\tag{4.12} \end{equation}\] Eqs. (4.10) and (4.11) imply that term premiums are affine in \(w_t\) (see Eq. (4.4)). Specifically: \[ TP_{t,h} = i_{t,h} - i^{EH}_{t,h} = b_h - b_h^{EH} + (a_h - a_h^{EH})'w_t. \]
The same approach can be employed to compute real term premiums (\(r_{t,h} - r^{EH}_{t,h}\)), and inflation term premiums(\((i_{t,h}-r_{t,h}) - (i^{EH}_{t,h}-r^{EH}_{t,h})\)).
Expected excess returns resulting from holding zero-coupon bonds are also affine in \(w_t\). Indeed, holding a maturity-\(h\) zero-coupon bond for one period provides the following expected gross return: \[ \mathbb{E}_t\left(\frac{B_{t+1,h-1}}{B_{t,h}}\right) = \mathbb{E}_t\left(\exp(b_{h-1} - b_h + a_{h-1}'w_{t+1} - a_h'w_{t})\right), \] which is clearly exponential affine in \(w_t\) if \(w_t\) is an affine process. Therefore, the log expected excess return, that is: \[ \log \mathbb{E}_t\left(\frac{B_{t+1,h-1}}{B_{t,h}}\right) - i_t \] is also affine in \(w_t\). This is exploited in the estimation approach proposed by Adrian, Crump, and Moench (2013).
Moreover, conditional expectations of future interest rates (real or nominal) and of term premiums are also affine in \(w_t\). In particular: \[\begin{equation} \mathbb{E}_t[i_{t+k,h}] = \mathbb{E}_t[{a_h}'w_{t+k} + b_h] = {a_h}'\mathbb{E}_t(w_{t+k}) + b_h,\tag{4.13} \end{equation}\] and \(\mathbb{E}_t(w_{t+k})\) is affine in \(w_t\) (see Eq. (1.15)). This can notably be used at the estimation stage, if one wants to fit survey data (see Subsection 5.4).
Similarly, conditional variances of future interest rates (real or nominal) and of term premiums are affine in \(w_t\). In particular: \[\begin{equation} \mathbb{V}ar_t[i_{t+k,h}] = \mathbb{V}ar_t[{a_h}'w_{t+k} + b_h] = {a_h}'\mathbb{V}ar_t(w_{t+k})a_h,\tag{4.14} \end{equation}\] where the components of \(\mathbb{V}ar_t(w_{t+k})\) (and therefore \(\mathbb{V}ar_t[i_{t+k,h}]\)) is affine in \(w_t\) (see Eq. (1.16)). This can also be used at the estimation stage, if one wants to fit (proxies of) conditional variances (Alain Monfort et al. 2017).
4.2.2 Maximum Sharpe ratio
In an affine model, the maximum Sharpe ratio is easily computed. This has been noted early by Duffee (2010) for the Gaussian model; Christian Gourieroux et al. (2021) and Pallara and Renne (2023) use it in more sophisticated affine models.
Let us derive the maximum Sharpe ratio in the context of a genral affine framework. Eq. (4.7) implies that \[ \mathbb{E}_t\underbrace{\left(\frac{p_{j,t+1}}{p_{j,t}} - \exp(i_t)\right)}_{=xs_{j,t+1},\mbox{ excess return}} = - \exp(i_t) \mathbb{C}ov_t\left(\mathcal{M}_{t,t+1},\frac{p_{j,t+1}}{p_{j,t}}\right), \] and, using \(|\mathbb{C}ov(X,Y)| \le \sqrt{\mathbb{V}ar(X)\mathbb{V}ar(Y)}\), we get the Hansen and Jagannathan (1991) bound: \[\begin{equation} \underbrace{\frac{\mathbb{E}_t(xs_{j,t+1})}{\sqrt{\mathbb{V}ar_t(xs_{j,t+1})}}}_{\mbox{Sharpe ratio}} \le \underbrace{\frac{\sqrt{\mathbb{V}ar_t(\mathcal{M}_{t,t+1})}}{\mathbb{E}_t(\mathcal{M}_{t,t+1})}}_{\mbox{Maximum Sharpe ratio}}. \end{equation}\]
If the SDF is given by \(\mathcal{M}_{t,t+1} = \exp[-i_{t}+\alpha'_tw_{t+1}-\psi_t(\alpha_t)]\) (Eq. (2.4)), and using that \(\mathbb{E}_t(\mathcal{M}_{t,t+1}^2)=\exp(-2i_t+\psi_t(2\alpha_t)-2\psi_t(\alpha_t))\) we get: \[ \mbox{Maximum Sharpe ratio} = \sqrt{\exp(\psi_t(2\alpha_t)-2\psi_t(\alpha_t)) - 1}, \] where \(\psi_t\) denotes the conditional log-Laplace transform of the state vector \(w_t\).
4.3 Gaussian Affine Term Structure Model
The Gaussian Affine Term Structure Model (GATSM) is a workhorse model, widely used in academic and economic-policy circles. In a GATSM, \(w_t\) follows a Gaussian vector autoregressive model, and is therefore affine under \(\mathbb{P}\). The SDF is exponential affine in \(w_t\), which implies that process \(w_t\) is also affine under \(\mathbb{Q}\) (shown below). Since the components of \(w_t\) are valued in \(\mathbb{R}\), one can easily introduce macro-factors among the state variables.
Let us be more specific. The state vector \(w_t\) follows: \[\begin{equation} w_{t+1} = \mu + \Phi w_{t} + \Sigma^{1/2} \varepsilon_{t+1}, \mbox{ where } \varepsilon_{t} \sim i.i.d. \mathcal{N}(0,Id).\tag{4.15} \end{equation}\] (The fact that we consider a VAR(1) process is without loss of generality since a VAR(p) admits a VAR(1) companion representation.)
This implies the following log-Laplace transform for \(w_t\) (see Example 1.6): \[\begin{equation} \psi_t(u) = \log \mathbb{E}_t(\exp(u'w_{t+1})|\underline{w_t}) = \color{blue}{u'\mu + u'\Phi w_t + \frac{1}{2}u'\Sigma u}.\tag{4.16} \end{equation}\] Using the notations of (2.4), in standard Gaussian ATSM, the SDF is defined as: \[ \mathcal{M}_{t,t+1} = \exp(- i_t + \alpha_t'w_{t+1} - \psi_t(\alpha_t)), \mbox{ with } \alpha_t = \alpha_0 + \alpha_1'w_t, \] where \(\alpha_t\) is the vector of prices of risk, that may be state-dependent. (Typically, the present framework could be the reduced-form representation of a structural asset pricing model where risk aversion is time-varying; for instance \(\gamma\) in (3.10) in the CCAPM context.)
Note that, by definition of the log-Laplace transform \(\psi_t\), this SDF satisfies \(\mathbb{E}_t(\mathcal{M}_{t,t+1})=\exp(-i_t)\).
If one wants to derive the risk-neutral dynamics of \(w_t\), one can use (2.8) of Subsection 2.2, which gives: \[\begin{eqnarray*} \psi_t^{\mathbb{Q}}(u) &=& \psi_t(u + \alpha_t) - \psi_t(\alpha_t)\\ &=& (u + \alpha_t)'\mu + (u + \alpha_t)'\Phi w_t + \frac{1}{2}(u + \alpha_t)'\Sigma(u + \alpha_t) \\ && - \left(\alpha_t'\mu + \alpha_t'\Phi w_t + \frac{1}{2}\alpha_t'\Sigma\alpha_t\right) \\ &=& \color{blue}{u' \left(\mu + \Sigma \alpha_0 \right) + u'(\Phi + \Sigma \alpha_1')w_t + \frac{1}{2}u'\Sigma'u}. \end{eqnarray*}\] This is the Laplace tranform of a Gaussian VAR process (see (4.16) and Example 1.6). By identification, it comes that the \(\mathbb{Q}\)-dynamics of \(w_t\) is: \[ w_{t+1} = \mu + \Sigma \alpha_0 + (\Phi + \Sigma \alpha_1') w_{t} + \Sigma^{1/2} \varepsilon^*_{t+1}, \mbox{ where } \varepsilon^*_{t} \sim i.i.d. \mathcal{N}^{\mathbb{Q}}(0,Id). \] or \[\begin{equation} w_{t+1} = \mu^{\mathbb{Q}} + \Phi^{\mathbb{Q}} w_{t} + \Sigma^{1/2} \varepsilon^*_{t+1},\tag{4.17} \end{equation}\] where \[ \boxed{\mu^{\mathbb{Q}} = \mu + \Sigma \alpha_0,\quad \mbox{and} \quad\Phi^{\mathbb{Q}}=\Phi + \Sigma \alpha_1'.} \] With affine specifications of the nominal short term rate (\(i_{t} = \omega_0 + \omega'_1 w_t\)) and of the inflation rate (\(\pi_{t} = \bar\omega_0 + \bar\omega'_1 w_t\)), we obtain affine formulas for nominal and real yields of any maturity (Eqs. (4.10) and (4.12)).
Again, knowing the \(\mathbb{Q}\) dynamics allows to compute bond prices in a fast manner, by using Proposition 1.5 to compute the conditional expectation appearing in (4.9) (see Example 1.18). This is what is used in the prominent examples below.
Example 4.2 (Kim and Wright (2005)) The model proposed by D. H. Kim and Wright (2005) is a three-factor yield-only model (no macro variables, except inflation in one variant of the model), where the short-term rate reads \(i_t = \omega_0 + \omega_{1,1} w_{1,t} +\omega_{1,2} w_{2,t} +\omega_{1,3} w_{3,t}\).
The model estimated by Kalman filter (see Subsection 5.2). The state-space model (Def. 5.1) includes survey-based variables; that is, some of the measurement equations exploit (4.13) (see also Subsection 5.4).
Outputs are regularly updated by the Federal Reserve Board.
Monthly data on the 6-month and 12-month-ahead forecasts of the three-month T-Bill yield from Blue Chip Financial Forecasts and semiannual data on the average expected three-month T-Bill yield from 6 to 11 years.
Example 4.3 (Ang and Piazzesi (2003)) Ang and Piazzesi (2003) exploit the GATSM framework to develop a model mixing latent and macrovariables. The set up is also of the form of (4.15), except that the VAR features several lags.20 In their model, \(w_t = [f^{o}_{1,t},f^{o}_{2,t},f^{u}_{1,t},f^{u}_{2,t},f^{u}_{3,t}]'\) where:
- \(f^{o}_{1,t}\) is the first Principal Component of a set of 3 price indexes (growth rates)
- \(f^{o}_{2,t}\) is the first Principal Component of a set of 4 real activity proxies (HELP, EMPLOY, IP, UE).
- \(f^{u}_{i,t}\) are unobserved, or latent, factors.
The nominal short-term rate follows a Taylor rule. And latent factors are estimated via inversion techniques (Subsection 5.3).
Example 4.4 (Joslin, Priebsch and Singleton (2014)) Joslin, Priebsch, and Singleton (2014) first note that affine models stating that the short term rate is affine in macro factors imply that macro-factors are spanned by the yield curve (consider eq. (4.10), where \(w_t\) is a set of macroeconomic factors): macro-factors should be perfectly explained by yields of different maturities. Further, they show that this is not the case in the data. That is, regressing macro factors on yields provides \(R^2\) that are far from one.
They propose a model where macro factors are unspanned by the yield curve, but can still help predict yields. In their model, \(w_t = [\mathcal{P}_t',M_t']'\), where \(\mathcal{P}_t\) are yield factors (\(\approx\) principal components) and \(M_t\) are macro factors. The model is as follows: \[\begin{eqnarray*} i_t &=& \omega_{0} + \omega_{\mathcal{P}}'\mathcal{P}_t \\ \left[\begin{array}{c}\mathcal{P}_t \\ M_t \end{array}\right] &=& \left[\begin{array}{cc}\Phi_{\mathcal{P}\mathcal{P}}&\Phi_{\mathcal{P}M} \\ \Phi_{M\mathcal{P}}&\Phi_{MM} \end{array}\right] \left[\begin{array}{c}\mathcal{P}_{t-1} \\ M_{t-1} \end{array}\right] + \Sigma \varepsilon_t \\ \left[\begin{array}{c}\mathcal{P}_t \\ M_t \end{array}\right] &=& \mu + \left[\begin{array}{cc}\Phi^{\mathbb{Q}}_{\mathcal{P}\mathcal{P}}&{\color{red}0} \\ \Phi^{\mathbb{Q}}_{M\mathcal{P}}&\Phi^{\mathbb{Q}}_{MM} \end{array}\right] \left[\begin{array}{c}\mathcal{P}_{t-1} \\ M_{t-1} \end{array}\right] + \Sigma \varepsilon^{\mathbb{Q}}_t, \end{eqnarray*}\] where \(\varepsilon_t\) and \(\varepsilon^{\mathbb{Q}}_t\) are \(\mathcal{N}(0,Id)\) under \(\mathbb{P}\) and \(\mathbb{Q}\), respectively.
\(M_t\) does not Granger-cause \(\mathcal{P}_t\) under \(\mathbb{Q}\). In other words, \(\mathcal{P}_t\) is exogenous under \(\mathbb{Q}\). And since \(i_t\) is affine in \(\mathcal{P}_t\) only (the loadings on \(M_t\) are null), it comes that the yield of any maturity \(i_{t,h}\) is affine in \(\mathcal{P}_t\) only (null loadings on \(M_t\)). However, since \(M_t\) does Granger-cause \(\mathcal{P}_t\) under \(\mathbb{P}\), the macro-shocks have dynamic effects on the yield curve. In other words, this model reconciles the facts that:
- the level, slope and curvature explains the bulk of the variations of the yield curve,
- macro factors affect yields,
- the vectorial spaces spanned by yields on the one hand, and macroeconomic factors on the other hand do not coincide.
Example 4.5 (Ang, Boivin, Dong and Loo-Kung (2011)) Ang et al. (2011) propose a macro-finance model based on a quadratic framework. The short-term rate follows a Taylor rule with time-varying parameters: \[ i_t = \omega_0 + a_t g_t + b_t \pi_t, \] where \(x_t=(g_t,\pi_t,a_t,b_t)'\) follows a Gaussian VAR. This is the context described in Example 1.7. The previous equation shows that \(r_t\) is linear in \(w_t = (x_t,vec(x_t x_t')')'\). Specifically: \[ i_t = \omega_0 + \omega_1'w_t, \] with \(\omega_1 = (v,vec(V))'\), where \[ v = \left[ \begin{array}{c} 0\\ 0\\ 0\\ 0 \end{array} \right] \quad \mbox{and} \quad V = \left[ \begin{array}{cccc} 0 & 0& 1/2&0\\ 0& 0& 0&1/2\\ 1/2& 0& 0&0\\ 0&1/2 &0 &0 \end{array} \right]. \]
4.4 Non-Negative Affine Term Structure Model
In the presence of physical currency, absence of arbitrage opportunity and of storing cost of cash, nominal interest rates should be nonnegative. Many standard models (e.g. Gaussian ATSM) are non consistent with non-negative nominal yields. The period of extremely low interest rates challenged these models. Against this backdrop, approaches have been developed to accommodate zero (or effective) lower bounds. We provide two examples; only the second is an affine model.
4.4.1 The shadow-rate approach
The shadow-rate model is originally due to Black (1995). In this model, the short term rate is given by: \[\begin{equation} i_t = \max(s_t,\underline{i}) = \underline{i} + \max(s_t-\underline{i},0),\tag{4.18} \end{equation}\] where \(s_t\) is the shadow short-term interest rate and \(\underline{i}\) is the effective lower bound (\(\le 0\)). While \(s_t\) can be real-valued, the short term rate is nonnegative under (4.18). In shadow-rate models, the shadow rate \(s_t\) is usually a linear combination of a vector \(w_t\) that follows a Gaussian auto-regressive model. While \(s_t\) is a linear combination of components of an affine process, this is not the case for \(i_t\). As a result, pricing formula are not available in closed-form. Approximation formula have been proposed by, e.g., Krippner (2013), Priebsch (2013), Wu and Xia (2016).
Let us describe the latter approach (Wu and Xia 2016). As in Subsection 4.3, the SDF is defined as: \[ \mathcal{M}_{t,t+1} = \exp(- i_t + \alpha_t'w_{t+1} - \psi_t(\alpha_t)), \mbox{ where } \alpha_t = \alpha_0 + \alpha_1'w_t, \] (this is Eq. (2.4)), but the short-term rate \(i_t\) is given by \(i_t = \max(s_t,0)\), with \[ s_t = \delta_0 + \delta_1' w_t. \] The shadow rate \(s_t\) is part of a state vector \(w_t\) that follows a Gaussian VAR, as in (4.15) (or \(s_t\) can be a linear combination of components of \(w_t\)). As before, the \(\mathbb{Q}\)-dynamics of \(w_t\) is also a Gaussian VAR, as in (4.17). (Indeed, using the previous specification of the SDF, the non-linear \(i_t\) term vanishes in the Radon-Nikodym derivatives; see Eq. (2.5).)
The price of a nominal zero-coupon bond is still given by (2.3), that is: \[ B_{t,h} = \mathbb{E}^{\mathbb{Q}}_t \exp(-i_t-i_{t+1}-\dots-i_{t+h-1}); \] but this price is not exponential affine in \(s_t\) because of the max operator relating \(i_t\) to \(s_t\) (see Eq. (4.18)). As a result, we can no longer exploit the tractable calculation of the multi-horizon Laplace transform of \(w_t\) to price this bond.
The approximation approach proposed by Wu and Xia (2016) is based on an approximation to the conditional expectations of forward rates. Using the results of Subsection 6.2.2, we have (Eq. (6.6)): \[ f_{n-1,n,t} = n i_{t,n} - (n-1) i_{t,n-1}., \] for \(n>0\) (and using \(i_{t,0}=0\), i.e., \(f_{0,1,t}=i_t\)). Equivalently, for \(h>0\): \[ i_{t,h} = \frac{1}{h}(f_{t,0,1}+f_{t,1,2}+\dots+f_{t,h-1,h}). \]
The approximation of Wu and Xia (2016) consists in finding approximations of the forward rates \(f_{t,n-1,n}\) (denoted by \(\tilde{f}_{t,n-1,n}\), say) and to use them in the previous equation to get: \[\begin{equation} i_{t,h} \approx \frac{1}{h}\left(\tilde{f}_{t,0,1}+\tilde{f}_{t,1,2}+\dots+\tilde{f}_{t,h-1,h}\right).\tag{4.19} \end{equation}\]
Using that, for any random variable \(Z\), we have \(\log(\mathbb{E}[e^Z]) \approx \mathbb{E}[Z] + \frac{1}{2} \mathbb{V}ar[Z]\) (based on a second-order Taylor expansion of the log-Laplace transform), Wu and Xia (2016) further show that: \[\begin{eqnarray} f_{t,n,n+1} &=& -\log\left(\mathbb{E}_t^{\mathbb{Q}}\left(e^{-\sum_{j=0}^n i_{t+j}}\right)\right) + \log\left(\mathbb{E}_t^{\mathbb{Q}}\left(e^{-\sum_{j=0}^{n-1} i_{t+j}}\right)\right)\\ &\approx& \mathbb{E}_t^{\mathbb{Q}}[i_{t+n}] - \frac{1}{2}\left(\mathbb{V}ar_t^{\mathbb{Q}}\left(\sum_{j=0}^n i_{t+j}\right)-\mathbb{V}ar_t^{\mathbb{Q}}\left(\sum_{j=0}^{n-1} i_{t+j}\right)\right). \end{eqnarray}\]
The expectation can be computed analytically. Using (4.18), it is indeed of the form \(\underline{i} + \mathbb{E}_t^{\mathbb{Q}}[ \max(s_{t+n}-\underline{i},0)]\), where \[ s_{t+n}-\underline{i}|\underline{w_t} \sim \mathcal{N}\left(\bar{a}_n+b_n'w_t- \underline{i},\sigma_n^{\mathbb{Q}}\right), \] with \[ \bar{a}_n = \delta_0 + \delta_1'\left(\sum_{j=0}^{n-1} \left[\Phi^{\mathbb{Q}}\right]^j\right)\mu^{\mathbb{Q}}, \quad \mbox{and} \quad b_n' = \delta_1'\left(\Phi^{\mathbb{Q}}\right)^n, \] and \[ \sigma_n^{\mathbb{Q}} := \mathbb{V}ar^{\mathbb{Q}}_t\left(s_{t+n}\right)= \delta_1'\left(\sum_{j=0}^{n-1} \left[\Phi^{\mathbb{Q}}\right]^j\right)\Sigma \Sigma' \left(\sum_{j=0}^{n-1} \left[\Phi^{\mathbb{Q}}\right]^j\right)'\delta_1. \]
Hence, using standard results on the truncated normal distribution (see Figure 4.8),21 they obtain: \[ \mathbb{E}_t^{\mathbb{Q}}[i_{t+n}] = \underline{i} + \sigma_n^{\mathbb{Q}}g\left(\frac{\bar{a}_n + b_n'X_t - \underline{i}}{\sigma_n^{\mathbb{Q}}}\right), \] where \(g(x)= x\Phi(x)+\phi(x)\), \(\Phi\) and \(\phi\) being the c.d.f. and p.d.f. of the standard normal distribution, respectively.
They also show that \[ \frac{1}{2}\left(\mathbb{V}ar_t^{\mathbb{Q}}\left(\sum_{j=0}^n i_{t+j}\right)-\mathbb{V}ar_t^{\mathbb{Q}}\left(\sum_{j=0}^{n-1} i_{t+j}\right)\right) \approx \Phi\left(\frac{\bar{a}_n + b_n'X_t - \underline{i}}{\sigma_n^{\mathbb{Q}}}\right)\times(\bar{a}_n - a_n), \] where \(a_n = \bar{a}_n - \frac{1}{2}\sigma_n^{\mathbb{Q}}\). They finally obtain: \[ \boxed{f_{t,n,n+1} \approx \tilde{f}_{t,n,n+1} = \underline{i} + \sigma_n^{\mathbb{Q}}g\left(\frac{a_n + b_n'X_t - \underline{i}}{\sigma_n^{\mathbb{Q}}}\right),} \] which is used in (4.19) to obtain an approximation to \(i_{t,h}\).
In the following code, we simulate a shadow-rate path and compute the term structure of the approximated forward rates \(f_{t,n,n+1}\) and the resulting nominal rates \(i_{t,h}\) for the date indicated by the vertical grey line in the upper plot.
library(TSModels)
# Specify model:
n <- 2 # number of factors
rho <- matrix(0,n,n)
diag(rho) <- .97
mu <- matrix(0,n,1)
Sigma <- diag(n)
delta.0 <- 0;delta.1 <- rep(.01,n)
r.bar <- 0 # r = max(s,r.bar) [i.e., r.bar=0 in standard model]
Model <- list(rho = rho,mu = mu,Sigma = Sigma,
delta.0 = delta.0,delta.1 = delta.1,r.bar = r.bar)
# Simulate model and compute shadow rate:
X <- simul.var(Model,nb.sim = 200) # simulated path
s <- delta.0 + X %*% delta.1
# Compute yields:
res <- compute.price.WX(Model,X,max.H=100)
# Prepare plots:
par(plt=c(.1,.95,.2,.75))
par(mfrow=c(2,1))
plot(s,type="l",xlab="time",ylab="",lwd=2,main="(a) Shadow rate")
t <- 50 #t <- which(s==min(s))
abline(v=t,col="dark grey",lwd=2,lty=3)
plot(res$vec.f[t,],type="l",xlab="maturity",ylab="",
lwd=2,main="(b) yields and forward rates")
lines(res$vec.y[t,],col="red",lwd=2)
legend("topright",
c("forward rates","yields to maturity"),lwd=c(2),lty=1,
col=c("black","red"),bg = "white")
4.4.2 The auto-regressive gamma approach
Alain Monfort et al. (2017) introduce an affine framework where the short-term rate can stay at zero for a prolonged period of time and with a stochastic lift-off probability.
Under \(\mathbb{P}\) and \(\mathbb{Q}\), the state vector \(w_t\) follows a multi-variate auto-regressive gamma (VARG) process—a multivariate extension of Example 1.8. Conditionally on \(\underline{w_t}\), the \(n\) components of \(w_{t+1}\) are independent and distributed as follows: \[\begin{equation} \frac{w_{i,t+1}}{\mu_i} \sim \gamma(\nu_i+z_{i,t}) \quad \mbox{where} \quad z_{i,t} \sim {\mathcal P} \left( \alpha_i + \beta_i' w_t \right).\tag{4.20} \end{equation}\] If \(\mu = (\mu_1,\dots,\mu_n)'\), \(\alpha = (\alpha_1,\dots,\alpha_n)'\), \(\nu = (\nu_1,\dots,\nu_n)'\) and \(\beta = (\beta_1,\dots,\beta_n)\), then \[\begin{eqnarray*} \varphi_t(u) &=& \exp\left[\left(\frac{u \odot \mu}{1 - u \odot \mu}\right)'\beta' w_t \right.\\ && \left. + \alpha'\left(\frac{u \odot \mu}{1 - u \odot \mu}\right) - \nu'\log(1 - u \odot \mu)\right], \end{eqnarray*}\] where \(\odot\) denotes the element-by-element multiplication and, where, with abuse of notation, the division and log operators work element-by-element when applied to vectors.
In their baseline model, Alain Monfort et al. (2017) use four factors. They set \(\nu_1 = \nu_2 = 0\), implying that \(w_{1,t}\) and \(w_{2,t}\) can stay at zero (see Example 1.8). The short-term rate \(i_t\) is posited to be an affine combination of \(w_{1,t}\) and \(w_{2,t}\), that is: \[ i_t = \omega'w_t = \omega_{1} w_{1,t} + \omega_{2} w_{2,t}, \] hence, it can stay at zero.
Factors \(w_{3,t}\) and \(w_{4,t}\) Granger-cause \(w_{1,t}\) and \(w_{2,t}\), thereby causing \(i_t\). As a result, for \(h \ge 2\), \(i_{t,h}\) is a non-zero combination of the four components of \(w_t\).
For the same reason, when \(i_t=0\), the lift-off probability depends on \(w_{3,t}\) and \(w_{4,t}\). The framework offers closed-form solutions for lift-off probabilities. Indeed, using Lemma 1.2: \[ \mathbb{P}_t(\alpha'w_{t+h}=0) = \lim_{u \rightarrow -\infty} \varphi_{t,h}(0,\dots,0,u\alpha), \] where \(\varphi_{t,h}\) is the multi-horizon Laplace transform defined in (1.8), which can be computed using Proposition 1.5. We have: \[\begin{equation} \left\{ \begin{array}{l} \mathbb{P}_t(i_{t+h}>0) = 1 - \lim_{u \rightarrow -\infty} \varphi_{t,h}(0,\dots,0,u\omega) \\ \\ \mathbb{P}_t(i_{t+1}=0,\dots,i_{t+h}=0) = \lim_{u \rightarrow -\infty} \varphi_{t,h}(u\omega,\dots,u\omega,u\omega) \equiv p_{h}\\ \\ \mathbb{P}_t(i_{t+1}=0,\dots,i_{t+h-1}=0,i_{t+h}>0) = p_{h-1} - p_h. \end{array} \right. \end{equation}\] Other lift-off probabilities, of the type \(\mathbb{P}_t[i_{t+h,k}>threshold]\), can be derived from (1.18).
Alain Monfort et al. (2017) esitmate this model by means of Kalman filtering techniques (see Subsection 5.2.3). Observed variables include (levels of) yields, as well as survey-based forecasts of yields (see Subsection 5.4 and (e-GARCH-based) proxies of conditional variances (see Eq. (1.16)).
4.5 Appendix – Constructing the yield curve
Interest rates are observed on financial markets. There are two broad categories of interest rate: those associate with bonds and those associated with derivatives (swaps, see Subsection 6.2.1). For each category, there are, in turn, different types of yield curves:
- Bonds: yields-to-maturity, par yields, zero-coupon yields. Besides, even for a given issuer, there may be different yield curves if this issuer has issued different types of bonds in the past (e.g., nominal, inflation-indexed, green).
- Swaps: there is one swap curve per reference rate. The spread between to swap rates with same maturity (tenor), but different reference rate (EONIA and EURIBOR3M, say) is called basis.
4.5.1 Bond-based yield curves
The basic yield curve is the zero-coupon yield curve. It is the one used in most formulas in this course (under the notations \(i_{t,h}\) and \(r_{t,h}\)). It is also the most convenient because, with it at hand, we can easily price any stream of (fixed) payoff promised by the issuer. Consider, for instance, the following stream of payoffs: \(\{W_{t+h_1},W_{t+h_2},\dots,W_{t+h_n}\}\) that will be paid on dates \(h_1,\dots,h_n\) by a given issuer. If the (continuously-compounded) zero-coupon yields of this issuer are the \(i_{t,h}\)’s, then the price of this stream of payoffs is: \[ P_t = \sum_{j=1}^n \exp(-h_j i_{t,h_j})W_{t+h_j}. \] By definition, the yield-to-maturity associated with this asset is the interest rate \(\tilde{i}_{t}(W_{t+h_1},W_{t+h_2},\dots,W_{t+h_n})\) that satisfies: \[ P_t = \sum_{j=1}^n \exp\left[-h_j \tilde{i}_{t}(W_{t+h_1},W_{t+h_2},\dots,W_{t+h_n})\right]W_{t+h_j}. \] Note that this yield to maturity is specific to the stream of payoffs (hence the notation \(\tilde{i}_{t}(W_{t+h_1},W_{t+h_2},\dots,W_{t+h_n})\)). A particular case is that of coupon bonds, which is the most common type of bonds issued by debtors. Let us denote by \(B_{t,h}(c)\) the date-\(t\) price of a bond of maturity \(h\) and that pays the coupon \(c\) on each period (for simplicity). Considering a bond that would pay a coupon \(c\) on each date for expositional simplicity, \(B_{t,h}(c)\) should satisfy: \[\begin{equation} B_{t,h}(c) = \exp(-hi_{t,h}) + \sum_{j=1}^h \exp\left[-j i_{t,j}\right]c.\tag{4.21} \end{equation}\] Denote by \(\tilde{i}_{t,h}(c)\) its yield-to-maturity. We have: \[ B_{t,h}(c) = \exp(-h\tilde{i}_{t,h}(c)) + \sum_{j=1}^h \exp\left[-j \tilde{i}_{t,h}(c)\right]c. \] Since most debtors issue coupon bonds, zero-coupon yields are not directly observed. Hence, before using the models presented above, one first need to compute zero-coupon yields based on the observed prices of coupon bonds. This is a non trivial task. Most researchers and analysts rely on zero-coupon yields computed by data providers or by other researchers (e.g., Gürkaynak, Sack, and Wright (2007), see Example 4.6). To obtain these yields curves, one proceeds as follows, for each considered date \(t\):
- Collect the prices of bonds traded on the market on this date: \(B_{t,h_1}(c_1),\dots,B_{t,h_N}(c_N)\). According to (4.21), each of this price is a function of zero-coupon yields.
- Determine a parametric form of the zero coupon yield curve: \(h \rightarrow f(\theta_t,h)\) (say), where \(\theta_t\) is the vector of parameters that will characterize the zero-coupon yield curve on date \(t\). This function can be, for instance, of the C. R. Nelson and Siegel (1987)’s or Svensson (1994)’s type.22
- Look for the vector \(\theta_t\) that minimizes a distance between observed and fitted prices. For instance, using squared pricing errors in the distance function: \[ \theta_t = \underset{\theta}{\mbox{argmin}}\; \sum_{j=1}^N w_j(h_j)\left(B_{t,h_j}(c_j) - \left[\exp(-h_j f(\theta,h_j)) + \sum_{k=1}^{h_j} \exp\left[-k f(\theta,k)\right]c_j\right]\right)^2, \] where the \(w_j\) are weights (that may, e.g., depend on the maturities \(h_j\)).
Example 4.6 (Gurkaynak, Sack, and Wright (2007)) Gürkaynak, Sack, and Wright (2007) develop a methodology to compute zero-coupon yield curves. Their estimation period starts in 1961. Their data base is updated and available on the Federal Reserve Board website. Figure 4.11 represents the residual maturities of the bonds used for the estimation at the different dates of the sample. Figure 4.12 shows the decomposition of the zero-coupon yield curve for a specific date, using the Svensson (1994) parametric function. Figure 4.13 compares, for one date, fitted and observed yields. This figure also shows the par yield curve; par yields are defined as the coupon of a bond of maturity \(h\) that would trade at par (i.e., with a price equal to 100%). That is, the par yield of maturity \(h\) (\(i^p_{t,h}\), say) satisfies: \[\begin{equation} 1 = \exp(-h i_{t,h}) + \sum_{j=1}^h \exp\left[-j i_{t,j}\right]i^p_{t,h}.\tag{4.22} \end{equation}\]
4.5.2 Swap-based yield curves
It is easy to obtain the maturity structure of swaps, since these instruments are “constant maturity” objects. In other words, every day, you can find a swap with a maturity of 1, 2,…, 10 years (exactly). This contrasts with bonds, whose residual maturity is rarely a whole number of years.
Note however that swap yields are usually not zero-coupon yields. They are, instead, homogeneous to par yields, as defined in (4.22) (see also the remark at the end of Subsection 6.2.1). To obtain zero-coupon from the par yield curve, one can employ an approach called bootstrapping. To simplify the exposition, we consider the yearly frequency. The bootstrapping approach operates recursively across maturities, using (4.22):
- Consider the one period maturity. We have: \[ 1 = \exp(-i_{t,1})(1 + i_{t,1}^p), \] which gives \(i_{t,1} = \log(1+i_{t,1}^p)\).
- Next, for maturity 2, we have: \[ 1 = \exp(-2i_{t,2}) + \exp(-i_{t,1})i_{t,2}^p + \exp(-2i_{t,2})i_{t,2}^p, \] which gives \(i_{t,2} = \frac{1}{2}\log\left(1 + i_{t,2}^p \big/ 1 - \exp(-i_{t,1})i_{t,2}^p\right)\).
- Using (4.22) for \(h=3\) leads to a simple equation that can be solved to get \(i_{t,2}\) as a function of \(i_{t,3}^p\), \(i_{t,1}\), and \(i_{t,2}\).
- idem for \(h=4\)…