October 6, 2023
Course: Finite Sample and Asymptotic Estimators in Causal Inference
A first way to understand the Inverse Propensity Weighting (IPW) is to understand what the oracle IPW. Let us say that we somehow found a way to predict perfectly $ x X()$ the propensity score \(e(x)\). Then we can define the Oracle Inverse Propensity Weighting estimator as:
Similarly one can show that,
\[\begin{equation*} \mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^{n}\left(\frac{(1-T_{i}) Y_{i}}{1-e\left(X_{i}\right)} \right)\right] = \mathbb{E}[Y_{i}^{(0)} ], \end{equation*}\]
such that \(\mathbb{E}[\hat{\tau}_{\text {\tiny IPW }}^{*}] = \mathbb{E}[Y_{i}^{(1)}] - \mathbb{E}[Y_{i}^{(0)}] = \tau\)
In addition, the oracle IPW estimator is consistent with an asymptotic normality property.
and is an asymptotically normal estimator, that is,
\[\sqrt{n}\left(\hat{\tau}_{\text{\tiny IPW}}^* - \tau \right) \stackrel{d}{\rightarrow} \mathcal{N}\left(0, V_{\text {\tiny IPW }^*}\right),\]
where, \[V_{\text {\tiny IPW }}^{*} = \mathbb{E}\left[\frac{\left( Y^{(0)} \right)^2}{1-e(X)} + \frac{\left( Y^{(1)} \right)^2}{e(X)}\right] - \tau^2.\]
\[\begin{align*} V_{\text {\tiny IPW }}^* &= \operatorname{Var}\left[ \frac{1}{n} \sum_{i=1}^{n}\left(\frac{T_{i} Y_{i}}{e\left(X_{i}\right)}-\frac{\left(1-T_{i}\right) Y_{i}}{1-e\left(X_{i}\right)}\right) \right] \\ &= \frac{1}{n^2} \operatorname{Var}\left[ \sum_{i=1}^{n}\left(\frac{T_{i} Y_{i}}{e\left(X_{i}\right)}-\frac{\left(1-T_{i}\right) Y_{i}}{1-e\left(X_{i}\right)}\right) \right] && \text{Variance property} \\ &= \frac{1}{n} \operatorname{Var}\left[ \left(\frac{T Y}{e\left(X\right)}-\frac{\left(1-T\right) Y}{1-e\left(X\right)}\right) \right] && \text{iid} \\ &= \frac{1}{n} \mathbb{E}\left[ \left(\frac{(1-T) Y}{1-e(x)}-\mathbb{E}[Y^{(0)}]\right)^{2} + \left(\frac{T Y}{e(x)}-\mathbb{E}[Y^{(1)}]\right)^{2} \right] \\ &\quad- \frac{2}{n} \mathbb{E}\left[ \left(\frac{(1-T) Y}{1-e(x)}-\mathbb{E}[Y^{(0)}]\right)\left(\frac{T Y}{e(x)}-\mathbb{E}[Y^{(1)}]\right) \right] \\ &= \frac{1}{n} \left( \mathbb{E}\left[ \left(\frac{(1-T) Y}{1-e(x)}\right)^{2} \right] - \mathbb{E}[Y^{(0)}]^2 \right)+ \frac{1}{n} \left(\mathbb{E}\left[ \left(\frac{T Y}{e(x)}\right)^{2} \right] - \mathbb{E}[Y^{(1)}]^2\right) \\ & \quad \quad - \frac{2}{n} \left(\underbrace{\mathbb{E}\left[ \frac{T (1-T) Y^2}{e(x)(1-e(x))}\right] }_{=0} - \underbrace{\mathbb{E}[Y^{(1)}]\mathbb{E}\left[\frac{(1-T) Y}{1-e(x)}\right]-\mathbb{E}[Y^{(0)}]\mathbb{E}\left[\frac{T Y}{e(x)}\right] +\mathbb{E}[Y^{(1)}]\mathbb{E}[Y^{(0)}]}_{=\mathbb{E}[Y^{(1)}]\mathbb{E}[Y^{(0)}]} \right) \\ &= \frac{1}{n} \left(\mathbb{E}\left[ \left(\frac{(1-T) Y}{1-e(x)}\right)^{2} \right] + \mathbb{E}\left[ \left(\frac{T Y}{e(x)}\right)^{2} \right] - \mathbb{E}[Y^{(1)}]^2 - \mathbb{E}[Y^{(0)}]^2 + 2\, \mathbb{E}[Y^{(0)}]\mathbb{E}[Y^{(1)}] \right)\\ &= \frac{1}{n} \left(\mathbb{E}\left[ \left(\frac{(1-T) Y}{1-e(x)}\right)^{2} \right] + \mathbb{E}\left[ \left(\frac{T Y}{e(x)}\right)^{2} \right] - \left(\mathbb{E}[Y^{(1)}] - \mathbb{E}[Y^{(0)}] \right)^2\right). \end{align*}\] We can further simplify this expression noting that,
\[\begin{align*} \mathbb{E}\left[ \left( \frac{T Y}{e(X)} \right)^2 \right] &=\mathbb{E}\left[\left( \frac{T Y^{(1)}}{e\left(X\right)} \right)^2\right] && \text{Consistency} \\ &=\mathbb{E}\left[\mathbb{1}_{\left\{T=1\right\}} \left(\frac{ Y^{(1)}}{e\left(X\right)} \right)^2\right] && \text{T is binary} \\ &=\mathbb{E}\left[\mathbb{E}\left[\mathbb{1}_{\left\{T=1\right\}} \left(\frac{ Y^{(1)}}{e\left(X\right)} \right)^2 \mid X\right]\right] \\ &=\mathbb{E}\left[\frac{1}{e(X)^2}\mathbb{E}\left[\mathbb{1}_{\left\{T=1\right\}} \left( Y^{(1)} \right)^2 \mid X\right]\right] \\ &=\mathbb{E}\left[\frac{\left( Y^{(1)} \right)^2}{e(X)^2}\mathbb{E}\left[\mathbb{1}_{\left\{T=1\right\}} \mid X\right]\right] &&\text{Uncounfoundness} \\ &=\mathbb{E}\left[\frac{\left( Y^{(1)} \right)^2}{e(X)^2}e(X)\right] &&\text{Definition of e(X)} \\ &=\mathbb{E}\left[\frac{\left( Y^{(1)} \right)^2}{e(X)}\right]. \end{align*}\]
Similarly, \[\mathbb{E}\left[ \left( \frac{(1-A)Y}{1-e(X)} \right)^2 \right] = \mathbb{E}\left[\frac{\left( Y^{(0)} \right)^2}{1-e(X)}\right]. \]
Therefore, we recover the asymptotic variance for the oracle IPW \(\hat{\tau}_{\text {IPW}}^{*}\),
\[ V_{\text {\tiny IPW }}^* = \frac{1}{n} \left( \mathbb{E}\left[\frac{\left( Y^{(0)} \right)^2}{1-e(X)} + \frac{\left( Y^{(1)} \right)^2}{e(X)}\right] - \tau^2 \right) \]
Since the variance will converge to 0 when we increase the sample size \(n\) and that we also have an unbiased estimator we can conclude on consistency of \(\hat \tau_{\text{IPW}}^*\).
Another proof of consistency: The consistency of the oracle IPW estimator can directly be shown with the weak law of large number. Denoting \(Z_i = \frac{T_{i}Y_{i}}{e(X_{i})} - \frac{(1-T_{i})Y_{i}}{1-e(X_{i})}\), and considering the series \(Z_1, Z_2, \dots, Z_n\) with finite mean (\(\mathbb{E}[Z] = \tau\)), then the weak law of large number gives \[\begin{equation*} \bar{Z} \stackrel{p}{\longrightarrow} \tau \quad \text { as } n \rightarrow \infty. \end{equation*}\]
This ensures the consistency of the oracle IPW estimator.
Asymptotic normality: We denote \(Z_i = \frac{T_{i}Y_{i}}{e(X_{i})} - \frac{(1-T_{i})Y_{i}}{1-e(X_{i})}\), we consider an iid sequence \(\left\{Z_{1}, Z_{2}, \ldots, Z_{n}\right\}\) with mean \(\tau\) and variance \(V_{\text{\tiny IPW}}^*:= \mathbb{E}\left[\frac{\left( Y^{(0)} \right)^2}{1-e(X)} + \frac{\left( Y^{(1)} \right)^2}{e(X)}\right] - \tau^2\). Then, the central limit theorem ensures that the sample average \(Z_{n}\), corresponding to the oracle IPW estimator, has an asymptotic standard normal distribution with mean \(\tau\) and variance \(\frac{V_{\text{\tiny IPW}}^*}{n}\). We denote this,
\[\begin{equation*} \sqrt{n}\left(\hat{\tau}_{\text{\tiny IPW}}^* - \tau \right) \stackrel{d}{\rightarrow} \mathcal{N}\left( 0,V_{\text{\tiny IPW}}^* \right) \end{equation*}\] Note that another proof is possible using M-estimation theory.
Furthermore, we can show that if our estimation of the propensity score satisfies a certain condition, the under other assumption we have a convergence property of the Inverse Propensity Weighting estimator to its oracle estimator.
where in \((*)\) we used the fact that there exista a big enough \(n\) such that \(\frac{\eta}{2} \leq \hat{e}(X_{i}) \leq 1- \frac{\eta}{2}\) since we have that \(\sup _{x \in \mathcal{X}}|e(x)-\hat{e}(x)|=\) \(\mathcal{O}_{P}\left(a_{n}\right)\), and that \(|\hat{\tau}_{\text{IPW},n} - \hat{\tau}_{\text{IPW}^*,n}| \leq \frac{2M}{\eta^2} \max_{1 \leq i \leq n} |e(X_{i})-\hat{e}(X_{i})|\)
Therefore, we have that: \[|\hat{\tau}_{\text{IPW}} - \hat{\tau}_{\text{IPW}^*}| = \mathcal{O}_p\left(\frac{a_nM}{\eta^2}\right)\]
Hence, using now both previuos propotions we can built the main theorem of the IPW estimator.
and is an asymptotically normal estimator, that is,
\[\sqrt{n}\left(\hat{\tau}_{\text{\tiny IPW}} - \tau \right) \stackrel{d}{\rightarrow} \mathcal{N}\left(0, V_{\text {\tiny IPW }^*}\right),\]
where, \[V_{\text {\tiny IPW }}^{*} = \mathbb{E}\left[\frac{\left( Y^{(0)} \right)^2}{1-e(X)} + \frac{\left( Y^{(1)} \right)^2}{e(X)}\right] - \tau^2.\]
Furthermore, if \(\left(a_{n}\right)\) converges to \(0\) at a \(\sqrt{n}\)-rate we can show that \(\hat{\tau}{\text{\tiny IPW}}\) is \(\sqrt{n}\) consistent.
\[\begin{align*} |\hat{\tau}_{\text{\tiny IPW}} - \tau| &= |\hat{\tau}{\text{\tiny IPW}} - \tau{\text{\tiny IPW}}^* + \tau{\text{\tiny IPW}}^* - \tau| \\ &\leq |\hat{\tau}{\text{\tiny IPW}}- \tau{\text{\tiny IPW}}^*| + |\tau{\text{\tiny IPW}}^* - \tau| \end{align*}\]
We already showed that the oracle IPW estimator is consistent and asymptotically normally distributed. We also showed that \(|\hat{\tau}_{\text{IPW}} - \tau_{\text{IPW}^*}| = \mathcal{O}_p\left(\frac{a_nM}{\eta}\right)\). Therefore using the inequality from above we get that $ _{}$, is consistent.
Furthermore if we assume that \(\left(a_{n}\right)\) converges to \(0\) at a \(\sqrt{n}\)-rate we can write:
\[\begin{equation*} \sqrt{n} (\hat \tau_{\text{\tiny IPW}} - \tau) = \underbrace{\sqrt{n} (\hat \tau_{\text{\tiny IPW}} - \tau_{\text{\tiny IPW}} ^*)}_\textrm{$\stackrel{p}{\longrightarrow} 0$} + \underbrace{\sqrt{n}(\tau_{\text{\tiny IPW}} ^* - \tau)}_\textrm{$\stackrel{\mathcal{L} }{\rightarrow} \mathcal{N}\left(0, V_{\text{\tiny IPW}}^* \right)$}, \end{equation*}\] And we can conclude that \(\hat{\tau}{\text{\tiny IPW}}\) is \(\sqrt{n}\) consistent and its variance is the same as \(\tau_{\text{\tiny IPW}}^*\).
Therefore, if we can estimate the propensity score for all individuals then we can use IPW estimator estimator to compute the ATE. However, the IPW estimator has many problems. In particular, it is not invariant to translation of the outcome. For example, if we change \(Y\) to \(Y + c\) with \(c\) a constant , then we get: \[\begin{align*} \hat{\tau}_{\text{\tiny IPW}}^{bis} &= \frac{1}{n} \sum_{i=1}^{n}\left(\frac{T_{i} (Y_{i} + c)}{\hat{e}\left(X_{i}\right)}-\frac{\left(1-T_{i}\right) (Y_{i}+c)}{1-\hat{e}\left(X_{i}\right)}\right)\\ &= \hat{\tau}_{\text{\tiny IPW}} + c\left( \sum_{i=1}^n \frac{T_i}{\hat{e}\left(X_{i}\right)} - \frac{1-T_i}{1-\hat{e}\left(X_{i}\right)}\right) \end{align*}\] If we have the true propensity score, then \(\sum_{i=1}^n \frac{T_i}{\hat{e}\left(X_{i}\right)} - \frac{1-T_i}{1-\hat{e}\left(X_{i}\right)}\) should converge to 0. However, in general, it is not equal to 0 in finite samples. Since adding a constant to every outcome should not change the average causal effect, this estimator is not reasonable because of its dependence on \(c\).
A simple fix to the problem is to normalize the weights of our IPW estimator:
\[\begin{equation*} \mathbb{E}[\hat{\tau}_{\text {\tiny IPW, norm }}^{*}] = \tau. \end{equation*}\]