diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index de2b0e31..04d3b875 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -46,6 +46,7 @@ parts:
- caption: Stochastic Dynamics
numbered: true
chapters:
+ - file: ar1_processes
- file: markov_chains_I
- file: markov_chains_II
- file: time_series_with_matrices
diff --git a/lectures/ar1_processes.md b/lectures/ar1_processes.md
new file mode 100644
index 00000000..1d156147
--- /dev/null
+++ b/lectures/ar1_processes.md
@@ -0,0 +1,611 @@
+---
+jupytext:
+ text_representation:
+ extension: .md
+ format_name: myst
+kernelspec:
+ display_name: Python 3
+ language: python
+ name: python3
+---
+
+(ar1)=
+```{raw} html
+
+```
+
+# AR1 Processes
+
+```{admonition} Migrated lecture
+:class: warning
+
+This lecture has moved from our [Intermediate Quantitative Economics with Python](https://python.quantecon.org/intro.html) lecture series and is now a part of [A First Course in Quantitative Economics](https://intro.quantecon.org/intro.html).
+```
+
+```{index} single: Autoregressive processes
+```
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+In this lecture we are going to study a very simple class of stochastic
+models called AR(1) processes.
+
+These simple models are used again and again in economic research to represent the dynamics of series such as
+
+* labor income
+* dividends
+* productivity, etc.
+
+AR(1) processes can take negative values but are easily converted into positive processes when necessary by a transformation such as exponentiation.
+
+We are going to study AR(1) processes partly because they are useful and
+partly because they help us understand important concepts.
+
+Let's start with some imports:
+
+```{code-cell} ipython
+import numpy as np
+%matplotlib inline
+import matplotlib.pyplot as plt
+plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
+```
+
+## The AR(1) Model
+
+The **AR(1) model** (autoregressive model of order 1) takes the form
+
+```{math}
+:label: can_ar1
+
+X_{t+1} = a X_t + b + c W_{t+1}
+```
+
+where $a, b, c$ are scalar-valued parameters.
+
+This law of motion generates a time series $\{ X_t\}$ as soon as we
+specify an initial condition $X_0$.
+
+This is called the **state process** and the state space is $\mathbb R$.
+
+To make things even simpler, we will assume that
+
+* the process $\{ W_t \}$ is IID and standard normal,
+* the initial condition $X_0$ is drawn from the normal distribution $N(\mu_0, v_0)$ and
+* the initial condition $X_0$ is independent of $\{ W_t \}$.
+
+### Moving Average Representation
+
+Iterating backwards from time $t$, we obtain
+
+$$
+X_t = a X_{t-1} + b + c W_t
+ = a^2 X_{t-2} + a b + a c W_{t-1} + b + c W_t
+ = \cdots
+$$
+
+If we work all the way back to time zero, we get
+
+```{math}
+:label: ar1_ma
+
+X_t = a^t X_0 + b \sum_{j=0}^{t-1} a^j +
+ c \sum_{j=0}^{t-1} a^j W_{t-j}
+```
+
+Equation {eq}`ar1_ma` shows that $X_t$ is a well defined random variable, the value of which depends on
+
+* the parameters,
+* the initial condition $X_0$ and
+* the shocks $W_1, \ldots W_t$ from time $t=1$ to the present.
+
+Throughout, the symbol $\psi_t$ will be used to refer to the
+density of this random variable $X_t$.
+
+### Distribution Dynamics
+
+One of the nice things about this model is that it's so easy to trace out the sequence of distributions $\{ \psi_t \}$ corresponding to the time
+series $\{ X_t\}$.
+
+To see this, we first note that $X_t$ is normally distributed for each $t$.
+
+This is immediate from {eq}`ar1_ma`, since linear combinations of independent
+normal random variables are normal.
+
+Given that $X_t$ is normally distributed, we will know the full distribution
+$\psi_t$ if we can pin down its first two moments.
+
+Let $\mu_t$ and $v_t$ denote the mean and variance
+of $X_t$ respectively.
+
+We can pin down these values from {eq}`ar1_ma` or we can use the following
+recursive expressions:
+
+```{math}
+:label: dyn_tm
+
+\mu_{t+1} = a \mu_t + b
+\quad \text{and} \quad
+v_{t+1} = a^2 v_t + c^2
+```
+
+These expressions are obtained from {eq}`can_ar1` by taking, respectively, the expectation and variance of both sides of the equality.
+
+In calculating the second expression, we are using the fact that $X_t$
+and $W_{t+1}$ are independent.
+
+(This follows from our assumptions and {eq}`ar1_ma`.)
+
+Given the dynamics in {eq}`ar1_ma` and initial conditions $\mu_0,
+v_0$, we obtain $\mu_t, v_t$ and hence
+
+$$
+\psi_t = N(\mu_t, v_t)
+$$
+
+The following code uses these facts to track the sequence of marginal
+distributions $\{ \psi_t \}$.
+
+The parameters are
+
+```{code-cell} python3
+a, b, c = 0.9, 0.1, 0.5
+
+mu, v = -3.0, 0.6 # initial conditions mu_0, v_0
+```
+
+Here's the sequence of distributions:
+
+```{code-cell} python3
+from scipy.stats import norm
+
+sim_length = 10
+grid = np.linspace(-5, 7, 120)
+
+fig, ax = plt.subplots()
+
+for t in range(sim_length):
+ mu = a * mu + b
+ v = a**2 * v + c**2
+ ax.plot(grid, norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
+ label=f"$\psi_{t}$",
+ alpha=0.7)
+
+ax.legend(bbox_to_anchor=[1.05,1],loc=2,borderaxespad=1)
+
+plt.show()
+```
+
+## Stationarity and Asymptotic Stability
+
+Notice that, in the figure above, the sequence $\{ \psi_t \}$ seems to be converging to a limiting distribution.
+
+This is even clearer if we project forward further into the future:
+
+```{code-cell} python3
+def plot_density_seq(ax, mu_0=-3.0, v_0=0.6, sim_length=60):
+ mu, v = mu_0, v_0
+ for t in range(sim_length):
+ mu = a * mu + b
+ v = a**2 * v + c**2
+ ax.plot(grid,
+ norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
+ alpha=0.5)
+
+fig, ax = plt.subplots()
+plot_density_seq(ax)
+plt.show()
+```
+
+Moreover, the limit does not depend on the initial condition.
+
+For example, this alternative density sequence also converges to the same limit.
+
+```{code-cell} python3
+fig, ax = plt.subplots()
+plot_density_seq(ax, mu_0=3.0)
+plt.show()
+```
+
+In fact it's easy to show that such convergence will occur, regardless of the initial condition, whenever $|a| < 1$.
+
+To see this, we just have to look at the dynamics of the first two moments, as
+given in {eq}`dyn_tm`.
+
+When $|a| < 1$, these sequences converge to the respective limits
+
+```{math}
+:label: mu_sig_star
+
+\mu^* := \frac{b}{1-a}
+\quad \text{and} \quad
+v^* = \frac{c^2}{1 - a^2}
+```
+
+(See our {doc}`lecture on one dimensional dynamics ` for background on deterministic convergence.)
+
+Hence
+
+```{math}
+:label: ar1_psi_star
+
+\psi_t \to \psi^* = N(\mu^*, v^*)
+\quad \text{as }
+t \to \infty
+```
+
+We can confirm this is valid for the sequence above using the following code.
+
+```{code-cell} python3
+fig, ax = plt.subplots()
+plot_density_seq(ax, mu_0=3.0)
+
+mu_star = b / (1 - a)
+std_star = np.sqrt(c**2 / (1 - a**2)) # square root of v_star
+psi_star = norm.pdf(grid, loc=mu_star, scale=std_star)
+ax.plot(grid, psi_star, 'k-', lw=2, label="$\psi^*$")
+ax.legend()
+
+plt.show()
+```
+
+As claimed, the sequence $\{ \psi_t \}$ converges to $\psi^*$.
+
+### Stationary Distributions
+
+A stationary distribution is a distribution that is a fixed
+point of the update rule for distributions.
+
+In other words, if $\psi_t$ is stationary, then $\psi_{t+j} =
+\psi_t$ for all $j$ in $\mathbb N$.
+
+A different way to put this, specialized to the current setting, is as follows: a
+density $\psi$ on $\mathbb R$ is **stationary** for the AR(1) process if
+
+$$
+X_t \sim \psi
+\quad \implies \quad
+a X_t + b + c W_{t+1} \sim \psi
+$$
+
+The distribution $\psi^*$ in {eq}`ar1_psi_star` has this property ---
+checking this is an exercise.
+
+(Of course, we are assuming that $|a| < 1$ so that $\psi^*$ is
+well defined.)
+
+In fact, it can be shown that no other distribution on $\mathbb R$ has this property.
+
+Thus, when $|a| < 1$, the AR(1) model has exactly one stationary density and that density is given by $\psi^*$.
+
+## Ergodicity
+
+The concept of ergodicity is used in different ways by different authors.
+
+One way to understand it in the present setting is that a version of the Law
+of Large Numbers is valid for $\{X_t\}$, even though it is not IID.
+
+In particular, averages over time series converge to expectations under the
+stationary distribution.
+
+Indeed, it can be proved that, whenever $|a| < 1$, we have
+
+```{math}
+:label: ar1_ergo
+
+\frac{1}{m} \sum_{t = 1}^m h(X_t) \to
+\int h(x) \psi^*(x) dx
+ \quad \text{as } m \to \infty
+```
+
+whenever the integral on the right hand side is finite and well defined.
+
+Notes:
+
+* In {eq}`ar1_ergo`, convergence holds with probability one.
+* The textbook by {cite}`MeynTweedie2009` is a classic reference on ergodicity.
+
+For example, if we consider the identity function $h(x) = x$, we get
+
+$$
+\frac{1}{m} \sum_{t = 1}^m X_t \to
+\int x \psi^*(x) dx
+ \quad \text{as } m \to \infty
+$$
+
+In other words, the time series sample mean converges to the mean of the
+stationary distribution.
+
+As will become clear over the next few lectures, ergodicity is a very
+important concept for statistics and simulation.
+
+## Exercises
+
+```{exercise}
+:label: ar1p_ex1
+
+Let $k$ be a natural number.
+
+The $k$-th central moment of a random variable is defined as
+
+$$
+M_k := \mathbb E [ (X - \mathbb E X )^k ]
+$$
+
+When that random variable is $N(\mu, \sigma^2)$, it is known that
+
+$$
+M_k =
+\begin{cases}
+ 0 & \text{ if } k \text{ is odd} \\
+ \sigma^k (k-1)!! & \text{ if } k \text{ is even}
+\end{cases}
+$$
+
+Here $n!!$ is the double factorial.
+
+According to {eq}`ar1_ergo`, we should have, for any $k \in \mathbb N$,
+
+$$
+\frac{1}{m} \sum_{t = 1}^m
+ (X_t - \mu^* )^k
+ \approx M_k
+$$
+
+when $m$ is large.
+
+Confirm this by simulation at a range of $k$ using the default parameters from the lecture.
+```
+
+
+```{solution-start} ar1p_ex1
+:class: dropdown
+```
+
+Here is one solution:
+
+```{code-cell} python3
+from numba import njit
+from scipy.special import factorial2
+
+@njit
+def sample_moments_ar1(k, m=100_000, mu_0=0.0, sigma_0=1.0, seed=1234):
+ np.random.seed(seed)
+ sample_sum = 0.0
+ x = mu_0 + sigma_0 * np.random.randn()
+ for t in range(m):
+ sample_sum += (x - mu_star)**k
+ x = a * x + b + c * np.random.randn()
+ return sample_sum / m
+
+def true_moments_ar1(k):
+ if k % 2 == 0:
+ return std_star**k * factorial2(k - 1)
+ else:
+ return 0
+
+k_vals = np.arange(6) + 1
+sample_moments = np.empty_like(k_vals)
+true_moments = np.empty_like(k_vals)
+
+for k_idx, k in enumerate(k_vals):
+ sample_moments[k_idx] = sample_moments_ar1(k)
+ true_moments[k_idx] = true_moments_ar1(k)
+
+fig, ax = plt.subplots()
+ax.plot(k_vals, true_moments, label="true moments")
+ax.plot(k_vals, sample_moments, label="sample moments")
+ax.legend()
+
+plt.show()
+```
+
+```{solution-end}
+```
+
+
+```{exercise}
+:label: ar1p_ex2
+
+Write your own version of a one dimensional [kernel density
+estimator](https://en.wikipedia.org/wiki/Kernel_density_estimation),
+which estimates a density from a sample.
+
+Write it as a class that takes the data $X$ and bandwidth
+$h$ when initialized and provides a method $f$ such that
+
+$$
+f(x) = \frac{1}{hn} \sum_{i=1}^n
+K \left( \frac{x-X_i}{h} \right)
+$$
+
+For $K$ use the Gaussian kernel ($K$ is the standard normal
+density).
+
+Write the class so that the bandwidth defaults to Silverman’s rule (see
+the “rule of thumb” discussion on [this
+page](https://en.wikipedia.org/wiki/Kernel_density_estimation)). Test
+the class you have written by going through the steps
+
+1. simulate data $X_1, \ldots, X_n$ from distribution $\phi$
+1. plot the kernel density estimate over a suitable range
+1. plot the density of $\phi$ on the same figure
+
+for distributions $\phi$ of the following types
+
+- [beta
+ distribution](https://en.wikipedia.org/wiki/Beta_distribution)
+ with $\alpha = \beta = 2$
+- [beta
+ distribution](https://en.wikipedia.org/wiki/Beta_distribution)
+ with $\alpha = 2$ and $\beta = 5$
+- [beta
+ distribution](https://en.wikipedia.org/wiki/Beta_distribution)
+ with $\alpha = \beta = 0.5$
+
+Use $n=500$.
+
+Make a comment on your results. (Do you think this is a good estimator
+of these distributions?)
+```
+
+
+```{solution-start} ar1p_ex2
+:class: dropdown
+```
+
+Here is one solution:
+
+```{code-cell} ipython3
+K = norm.pdf
+
+class KDE:
+
+ def __init__(self, x_data, h=None):
+
+ if h is None:
+ c = x_data.std()
+ n = len(x_data)
+ h = 1.06 * c * n**(-1/5)
+ self.h = h
+ self.x_data = x_data
+
+ def f(self, x):
+ if np.isscalar(x):
+ return K((x - self.x_data) / self.h).mean() * (1/self.h)
+ else:
+ y = np.empty_like(x)
+ for i, x_val in enumerate(x):
+ y[i] = K((x_val - self.x_data) / self.h).mean() * (1/self.h)
+ return y
+```
+
+```{code-cell} ipython3
+def plot_kde(ϕ, x_min=-0.2, x_max=1.2):
+ x_data = ϕ.rvs(n)
+ kde = KDE(x_data)
+
+ x_grid = np.linspace(-0.2, 1.2, 100)
+ fig, ax = plt.subplots()
+ ax.plot(x_grid, kde.f(x_grid), label="estimate")
+ ax.plot(x_grid, ϕ.pdf(x_grid), label="true density")
+ ax.legend()
+ plt.show()
+```
+
+```{code-cell} ipython3
+from scipy.stats import beta
+
+n = 500
+parameter_pairs= (2, 2), (2, 5), (0.5, 0.5)
+for α, β in parameter_pairs:
+ plot_kde(beta(α, β))
+```
+
+We see that the kernel density estimator is effective when the underlying
+distribution is smooth but less so otherwise.
+
+```{solution-end}
+```
+
+
+```{exercise}
+:label: ar1p_ex3
+
+In the lecture we discussed the following fact: for the $AR(1)$ process
+
+$$
+X_{t+1} = a X_t + b + c W_{t+1}
+$$
+
+with $\{ W_t \}$ iid and standard normal,
+
+$$
+\psi_t = N(\mu, s^2) \implies \psi_{t+1}
+= N(a \mu + b, a^2 s^2 + c^2)
+$$
+
+Confirm this, at least approximately, by simulation. Let
+
+- $a = 0.9$
+- $b = 0.0$
+- $c = 0.1$
+- $\mu = -3$
+- $s = 0.2$
+
+First, plot $\psi_t$ and $\psi_{t+1}$ using the true
+distributions described above.
+
+Second, plot $\psi_{t+1}$ on the same figure (in a different
+color) as follows:
+
+1. Generate $n$ draws of $X_t$ from the $N(\mu, s^2)$
+ distribution
+1. Update them all using the rule
+ $X_{t+1} = a X_t + b + c W_{t+1}$
+1. Use the resulting sample of $X_{t+1}$ values to produce a
+ density estimate via kernel density estimation.
+
+Try this for $n=2000$ and confirm that the
+simulation based estimate of $\psi_{t+1}$ does converge to the
+theoretical distribution.
+```
+
+```{solution-start} ar1p_ex3
+:class: dropdown
+```
+
+Here is our solution
+
+```{code-cell} ipython3
+a = 0.9
+b = 0.0
+c = 0.1
+μ = -3
+s = 0.2
+```
+
+```{code-cell} ipython3
+μ_next = a * μ + b
+s_next = np.sqrt(a**2 * s**2 + c**2)
+```
+
+```{code-cell} ipython3
+ψ = lambda x: K((x - μ) / s)
+ψ_next = lambda x: K((x - μ_next) / s_next)
+```
+
+```{code-cell} ipython3
+ψ = norm(μ, s)
+ψ_next = norm(μ_next, s_next)
+```
+
+```{code-cell} ipython3
+n = 2000
+x_draws = ψ.rvs(n)
+x_draws_next = a * x_draws + b + c * np.random.randn(n)
+kde = KDE(x_draws_next)
+
+x_grid = np.linspace(μ - 1, μ + 1, 100)
+fig, ax = plt.subplots()
+
+ax.plot(x_grid, ψ.pdf(x_grid), label="$\psi_t$")
+ax.plot(x_grid, ψ_next.pdf(x_grid), label="$\psi_{t+1}$")
+ax.plot(x_grid, kde.f(x_grid), label="estimate of $\psi_{t+1}$")
+
+ax.legend()
+plt.show()
+```
+
+The simulated distribution approximately coincides with the theoretical
+distribution, as predicted.
+
+```{solution-end}
+```