4  RCTs estimators

Now that we have defiend the framework and the assumptions under which we are working. We can build estimators of the average treatment effect.

4.1 Horvitz-Thomson estimator

Definition 4.1: Horvitz-Thomson estimator

The Horvitz-Thomson estimator is denoted \(\hat{\tau}_{HT,n}\) and defined as, \[\hat{\tau}_{HT,n} = \frac{1}{n}\sum_{i=1}^n \left( \frac{T_iY_i}{e} - \frac{(1-T_i)Y_i}{1-e} \right) \] where \(e\) is:

  • the probability of having the treatment under a Bernoulli trial
  • the proportion of treated under a Completely randomized trial

This estimator was introduced by Daniel G. Horvitz and Donovan J. Thompson in 1952. They used inverse probability weighting (which is a recurrent technic used in estimators) is applied to account for different proportions of observations within strata in a target population.

Proposition 4.1: Unbiasedness consistancy of \(\hat{\tau}_{HT,n}\) in a Bernoulli trial
Under a Bernoulli trial we have : \[\mathbb{E}_{\mathcal{B}}[\hat{\tau}_{HT,n}]=\tau\] and its variance satisfies, for all n, \[n \mathbb{Var}_{\mathcal{B}}[\hat{\tau}_{HT,n}] = \mathbb{E}\left[ \frac{(Y^{(1)})^2}{\pi}\right] + \mathbb{E}\left[ \frac{(Y^{(0)})^2}{1-\pi}\right] - \tau^2 := V_{HT} \]

Proof
Bias \[\begin{align*} \mathbb{E}_{\mathcal{B}}[\hat{\tau}_{HT,n}] &= \frac{\mathbb{E}_{\mathcal{B}}[TY^{(1)}]}{e} - \frac{\mathbb{E}_{\mathcal{B}}[(1-T)Y^{(0)}]}{1-e} && \text{Linearity, I.I.D and SUTVA} \\ &= \frac{\mathbb{E}_{\mathcal{B}}[T]\mathbb{E}_{\mathcal{B}}[Y^{(1)}]}{e} - \frac{\mathbb{E}_{\mathcal{B}}[(1-T)]\mathbb{E}_{\mathcal{B}}[Y^{(0)}]}{1-e} && \text{Randomization} \\ &= \frac{\pi \mathbb{E}_{\mathcal{B}}[Y^{(1)}]}{e} - \frac{(1-\pi)\mathbb{E}_{\mathcal{B}}[Y^{(0)}]}{1-e} && \text{Def of $\pi$ in Bernoulli design} \\ &= \tau && \text{e := $\pi$ in Bernoulli design} \end{align*}\]

Variance \[\begin{align*} \mathbb{Var}_{\mathcal{B}}[\hat{\tau}_{HT,n}] &= \mathbb{Var}_{\mathcal{B}}\left[\frac{1}{n}\sum_{i=1}^n \frac{T_iY_i}{e} - \frac{(1-T_i)Y_i}{1-e}\right] \\ &=\frac{1}{n^2} \mathbb{Var}_{\mathcal{B}}\left[\sum_{i=1}^n \frac{T_iY^{(1)}_i}{e} - \frac{(1-T_i)Y^{(0)}_i}{1-e}\right] && \text{SUTVA} \\ &= \frac{1}{n} \mathbb{Var}_{\mathcal{B}}\left[\frac{TY^{(1)}}{e} - \frac{(1-T)Y^{(0)}}{1-e}\right] && \text{I.I.D} \end{align*}\] Then, \[\mathbb{Var}_{\mathcal{B}}[\hat{\tau}_{HT,n}] = \frac{1}{n}\left(\mathbb{Var}_{\mathcal{B}}\left[\frac{TY^{(1)}}{e}\right] + \mathbb{Var}_{\mathcal{B}}\left[\frac{(1-T)Y^{(0)}}{1-e}\right] -2\text{Cov}_{\mathcal{B}}\left[\frac{TY^{(1)}}{e}, \frac{(1-T)Y^{(0)}}{1-e}\right]\right)\] The first two terms can be simplified, noting that \[\begin{align*} \mathbb{E}_{\mathcal{B}}\left[\left(\frac{TY^{(1)}}{e}\right)^2\right] &= \mathbb{E}_{\mathcal{B}}\left[\mathbb{1}_{T=1}\left(\frac{Y^{(1)}}{e}\right)^2\right] && \text{A is binary} \\ &= \mathbb{E}_{\mathcal{B}}\left[\left(\frac{Y^{(1)}}{e}\right)^2\right] \mathbb{E}_{\mathcal{B}}\left[\mathbb{1}_{T=1}\right] && \text{Randomization of trial} \\ &= \mathbb{E}_{\mathcal{B}}\left[\frac{(Y^{(1)})^2}{e}\right] && \text{Definition of $e$} \end{align*}\] Similarly, \[\mathbb{E}_{\mathcal{B}}\left[\left(\frac{(1-T)Y^{(0)}}{1-e}\right)^2\right] = \mathbb{E}_{\mathcal{B}}\left[\frac{(Y^{(0)})^2}{1-e}\right]\] So, \[\begin{align*} \mathbb{Var}_{\mathcal{B}}\left[\frac{TY^{(1)}}{e}\right] &= \mathbb{E}_{\mathcal{B}}\left[\left(\frac{TY^{(1)}}{e}\right)^2\right] -\mathbb{E}_{\mathcal{B}}\left[\frac{TY^{(1)}}{e}\right]^2 \\ &= \mathbb{E}_{\mathcal{B}}\left[\frac{(Y^{(1)})^2}{e}\right] - \mathbb{E}_{\mathcal{B}}\left[Y^{(1)}\right]^2 \end{align*}\] Similarly, \[\mathbb{Var}_{\mathcal{B}}\left[\frac{(1-T)Y^{(0)}}{e}\right] = \mathbb{E}_{\mathcal{B}}\left[\frac{(Y^{(0)})^2}{1-e}\right] - \mathbb{E}_{\mathcal{B}}\left[Y^{(0)}\right]^2 \] The third covariance term can also be decomposed, so that, \[\begin{align*} \text{Cov}_{\mathcal{B}}\left[\frac{TY^{(1)}}{e}, \frac{(1-T)Y^{(0)}}{1-e}\right] &= \mathbb{E}_{\mathcal{B}}\left[\left(\frac{TY^{(1)}}{e}- \mathbb{E}_{\mathcal{B}}[Y^{(1)}]\right)\left(\frac{(1-T)Y^{(0)}}{1-e}- \mathbb{E}_{\mathcal{B}}[Y^{(0)}]\right)\right] \\ &= \mathbb{E}_{\mathcal{B}}\left[\underbrace{\frac{TY^{(1)}}{e}\frac{(1-T)Y^{(0)}}{1-e}}_{=0}\right]-\mathbb{E}_{\mathcal{B}}[Y^{(1)}]\mathbb{E}_{\mathcal{B}}[Y^{(0)}] \end{align*}\] Finally, \[n \mathbb{Var}_{\mathcal{B}}[\hat{\tau}_{HT,n}] = \mathbb{E}\left[ \frac{(Y^{(1)})^2}{\pi}\right] + \mathbb{E}\left[ \frac{(Y^{(0)})^2}{1-\pi}\right] - \tau^2 := V_{HT} \]

Proposition 4.2: Unbiased consistancy of \(\hat{\tau}_{HT,n}\) in a Completely randomized trial
Under a Completely randomized trial we have : \[\mathbb{E}_{\mathcal{C}}[\hat{\tau}_{HT,n}]=\tau\] and its variance satisfies, for all n, \[\mathbb{Var}_{\mathcal{C}}[\hat{\tau}_{HT,n}] = \frac{\mathbb{Var}\left[Y^{(1)}\right]}{n_1} + \frac{\mathbb{Var}\left[Y^{(0)}\right]}{n_0}\]

where \(n_1= \sum_{i=1}^n T_i\) and \(n_0= \sum_{i=1}^n 1- T_i\)

Proof
Bias \[\begin{align*} \mathbb{E}_{\mathcal{C}}[\hat{\tau}_{HT,n}] &= \sum_{i=1}^n\frac{\mathbb{E}_{\mathcal{C}}[T_iY^{(1)}_i]}{e} - \frac{\mathbb{E}_{\mathcal{C}}[(1-T_i)Y^{(0)}_i]}{1-e} && \text{Linearity and SUTVA} \\ &= \sum_{i=1}^n\frac{\mathbb{E}_{\mathcal{C}}[T_i]\mathbb{E}_{\mathcal{C}}[Y^{(1)}_i]}{e} - \frac{\mathbb{E}_{\mathcal{C}}[(1-T_i)]\mathbb{E}_{\mathcal{C}}[Y^{(0)}_i]}{1-e} && \text{Randomization} \\ &= \sum_{i=1}^n\frac{\frac{n_1}{n} \mathbb{E}_{\mathcal{C}}[Y^{(1)}_i]}{\frac{n_1}{n}} - \frac{(1-\frac{n_1}{n})\mathbb{E}_{\mathcal{C}}[Y^{(0)}_i]}{1-\frac{n_1}{n}} && \text{Def of $e$ in Bernoulli design} \\ &= \frac{\frac{n_1}{n} \mathbb{E}_{\mathcal{C}}[Y^{(1)}]}{\frac{n_1}{n}} - \frac{(1-\frac{n_1}{n})\mathbb{E}_{\mathcal{C}}[Y^{(0)}]}{1-\frac{n_1}{n}} && \text{I.I.D of $Y^{(1)}_i$ and $Y^{(0)}_i$} \\ &= \tau && \text{Linearity} \end{align*}\] Variance

Using the law of total variance, and conditioning on the treatment assignment vector \(T\), one has \[\begin{align*} \mathbb{Var}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}\right] &= \mathbb{E}_{\mathcal{C}}\left[\mathbb{Var}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}|T\right]\right] + \mathbb{Var}_{\mathcal{C}}\left[\mathbb{E}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}|T\right]\right] \end{align*}\] We first start with the second term of the equation: \[\begin{align*} \mathbb{E}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}|T\right] &= \frac{1}{n}\sum_{i=1}^n\mathbb{E}_{\mathcal{C}}\left[\frac{T_iY_i^{(1)}}{e} - \frac{(1-T_i)Y_i^{(0)}}{1-e}|T\right] && \text{Using linearity and Consistency} \\ &= \frac{1}{n} \sum_{i=1}^n \frac{T_i}{e} \mathbb{E}_{\mathcal{C}}\left[Y_i^{(1)}|T\right] - \frac{(1-T_i)}{1-e} \mathbb{E}_{\mathcal{C}}\left[Y_i^{(0)}|T\right] && \text{$T_i$ is measurable with respect to T}\\ &= \frac{1}{n}\sum_{i=1}^n\frac{T_i}{e}\mathbb{E}_{\mathcal{C}}\left[Y_i^{(1)}\right] - \frac{(1-T_i)}{1-e}\mathbb{E}_{\mathcal{C}}\left[Y_i^{(0)}\right] && \text{Exchangeability}\\ &= \left(\frac{1}{n}\sum_{i=1}^n T_i \right) \frac{\mathbb{E}\left[Y_i^{(1)}\right]}{e} - \left(\frac{1}{n}\sum_{i=1}^n 1-T_i \right) \frac{\mathbb{E}\left[Y_i^{(0)}\right]}{1-e} && \text{the expectation of $Y$ is independent of design} \\ &= \left(\frac{n1 }{n} \right) \frac{\mathbb{E}\left[Y_i^{(1)}\right]}{\frac{n_1}{n}} - \left(\frac{n-n_1}{n}\right) \frac{\mathbb{E}\left[Y_i^{(0)}\right]}{\frac{n-n_1}{n}} && \text{def. of e in Completely randomized trial} \\ &= \tau \\ \end{align*}\] Therefore, we have that \(\mathbb{Var}_{\mathcal{C}}\left[\mathbb{E}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}|T\right]\right] = 0\) since \(\tau\) is constant.

Now, we look at the first term:

\[\begin{align*} \mathbb{Var}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}|T\right] &= \frac{1}{n^2} \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{T_iY_i^{(1)}}{e} - \frac{(1-T_i)Y_i^{(0)}}{1-e}|T\right] &&\text{Consistency}\\ &= \frac{1}{n^2} \left( \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{T_iY_i^{(1)}}{e}|T\right] + \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{(1-T_i)Y_i^{(0)}}{1-e}|T\right] \right) \\ &+ \frac{2}{n^2} \mathbb{Cov}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{T_iY_i^{(1)}}{e} ; \sum_{j=1}^n\frac{(1-T_j)Y_j^{(0)}}{1-e}|T\right] \end{align*}\]

We first focus on the first term:

\[\begin{align*} \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{T_iY_i^{(1)}}{e}|T\right] &= \mathbb{Var}_{\mathcal{C}}\left[\sum_{T_i=1} \frac{Y_i^{(1)}}{e}|T\right] \\ &= \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^{n_1} \frac{Y_i^{(1)}}{e}|T\right] &&\text{Since $ Y_i $ are i.i.d and are independant from $ T_i $}\\ &= \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^{n_1} \frac{Y_i^{(1)}}{e}|n_1\right] \\ &= \frac{n_1}{e^2} \mathbb{Var}[Y_i^{(1)}] \end{align*}\]

Similarly, we also have:

\[\begin{align*} \mathbb{Var}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{(1-T_i)Y_i^{(0)}}{1-e}|T\right] = \frac{n_0}{(1-e-)^2} \mathbb{Var}[Y_i^{(0)}] \end{align*}\]

and for the third term we have:

\[\begin{align*} \mathbb{Cov}_{\mathcal{C}}\left[\sum_{i=1}^n \frac{T_iY_i^{(1)}}{e} ; \sum_{j=1}^n\frac{(1-T_j)Y_j^{(0)}}{1-e}|T\right] &= \sum_{i=1}^n \sum_{j=1}^n \frac{T_i(1-T_j)}{e(1-e)} \mathbb{Cov}_{\mathcal{C}}\left[ Y_i; Y_j|T\right] && \text{linearity and Consistency}\\ &= \sum_{i=1}^n \sum_{j=1}^n \frac{T_i(1-T_j)}{e(1-e)} \mathbb{Cov}_{\mathcal{C}}\left[ Y_i; Y_j\right] && \text{$Y_i$ are independent from $T$}\\ &= \sum_{i=1}^n \sum_{j=1}^n \frac{T_i(1-T_j)}{e(1-e)} \delta_{ij} \mathbb{Var}(Y_i) && \text{$Y_i$ are independent}\\ &= 0 \end{align*}\]

Therefore we have that,

\[\begin{align*} \mathbb{E}_{\mathcal{C}}\left[\mathbb{Var}_{\mathcal{C}}\left[\hat{\tau}_{HT,n}|T\right]\right] &= \mathbb{E}_{\mathcal{C}}\left[\frac{1}{n^2} \left(\frac{n_1}{e^2} \mathbb{Var}[Y_i^{(1)}] + \frac{n_0}{(1-e)^2} \mathbb{Var}[Y_i^{(0)}]\right)\right]\\ &= \frac{\mathbb{Var}\left[Y^{(1)}\right]}{n_1} + \frac{\mathbb{Var}\left[Y^{(0)}\right]}{n_0} \end{align*}\]

4.2 Difference-in-means - Neyman estimator

Definition 4.2: Difference-in-means - Neyman estimator
The Difference-in-means or Neyman estimator is denoted \(\hat{\tau}_{HT,n}\) and defined as, \[\hat{\tau}_{DM,n} = \frac{1}{n_1}\sum_{T_i=1}Y_i -\frac{1}{n_0}\sum_{T_i=0}Y_i\]

The Difference-in-Means estimator can be viewed as a variant of the Horvitz-Thomson estimator, where the probability to be treated \(e\) (or propensity score) is estimated, that is,

\[\hat{\tau}_{DM,n} = \frac{1}{n} \sum_{i=1}^n \left( \frac{T_iY_i}{\hat{e}} - \frac{(1-T_i)Y_i}{1-\hat{e}} \right) \]

where \(\hat{e} = \frac{1}{n}\sum_{i=1}^n T_i\).

Proposition 4.3: Asymptotically unbiasedness consistancy of \(\hat{\tau}_{DM,n}\) under a Bernoulli design
Under a Bernoulli design we have for all n, \[ \mathbb{E}_{\mathcal{B}}[\hat{\tau}_{DM,n}]=\tau + \pi^n \mathbb{E}[Y_i^{(0)}] - (1-\pi)^n \mathbb{E}[Y_i^{(1)}] \] \[n \mathbb{Var}_{\mathcal{B}}[\hat{\tau}_{DM,n}] = \mathbb{E}_{\mathcal{B}}\left[\frac{\mathbb{1}_{\hat{e}>0}}{\hat{e}}\right]\mathbb{Var}[Y^{(1)}] + \mathbb{E}_{\mathcal{B}}\left[\frac{\mathbb{1}_{1-\hat{e}>0}}{1-\hat{e}}\right]\mathbb{Var}[Y^{(0)}] + \mathcal{O}(\max{(\pi, 1-\pi)}^n)\] Asymptotically, the difference-in-means estimator is unbiased: \[\mathbb{E}_{\mathcal{B}}[\hat{\tau}_{DM,n}]=\tau \] and its large sample variance satisfies, \[\lim_{n\to \infty} n \mathbb{Var}_{\mathcal{B}}[\hat{\tau}_{DM,n}] = \frac{\mathbb{Var}[Y^{(1)}]}{\pi} + \frac{\mathbb{Var}[Y^{(0)}]}{1-\pi} := V_{DM, \infty}\]

Proof
Bias

One can use the law of total expectation, conditioning on the treatment assignment vector denoted \(\mathbf{T}\), \[\begin{align*} \mathbb{E}_{\mathcal{B}}[\hat{\tau}_{DM,n}] &= \mathbb{E}_{\mathcal{B}}[\mathbb{E}_{\mathcal{B}}[\hat{\tau}_{DM,n}|\mathbf{T}]]\\ &= \mathbb{E}_{\mathcal{B}}\left[\frac{\frac{1}{n}\sum_{i=1}^n{T_i}}{\frac{1}{n}\sum_{i=1}^n{T_i}}\mathbb{E}[Y_i^{(1)}|\mathbf{T}] - \frac{\frac{1}{n}\sum_{i=1}^n{(1-T_i)}}{\frac{1}{n}\sum_{i=1}^n{(1-T_i)}}\mathbb{E}[Y_i^{(0)}|\mathbf{T}]\right] \\ &=\mathbb{E}_{\mathcal{B}}\left[\frac{\frac{1}{n}\sum_{i=1}^n{T_i}}{\frac{1}{n}\sum_{i=1}^n{T_i}}\mathbb{E}[Y_i^{(1)}] - \frac{\frac{1}{n}\sum_{i=1}^n{(1-T_i)}}{\frac{1}{n}\sum_{i=1}^n{(1-T_i)}}\mathbb{E}[Y_i^{(0)}]\right] && \{Y_i^{(0)}, Y_i^{(1)}\} \perp\mkern-9.5mu\perp T_i \\ &= \mathbb{E}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n{T_i}>0}\mathbb{E}[Y_i^{(1)}] - \mathbb{1}_{\sum_{i=1}^n{1-T_i}>0}\mathbb{E}[Y_i^{(0)}]\right]\\ &= \mathbb{E}[Y_i^{(1)}] \mathbb{E}_{\mathcal{B}}[\mathbb{1}_{\sum_{i=1}^n{T_i}>0}] - \mathbb{E}[Y_i^{(0)}] \mathbb{E}_{\mathcal{B}}[\mathbb{1}_{\sum_{i=1}^n{(1-T_i)}>0}] && \{Y_i^{(0)}, Y_i^{(1)}\} \perp\mkern-9.5mu\perp T_i \\ &= \mathbb{E}[Y_i^{(1)}] (1-(1-\pi)^n) - \mathbb{E}[Y_i^{(0)}] (1-(\pi)^n)\\ &= \tau - (1-\pi)^n\mathbb{E}[Y_i^{(1)}] + (\pi)^n\mathbb{E}[Y_i^{(0)}] \end{align*}\] where the second row uses linearity of expectation and the conditioning on \(\mathbf{T}\). To summarize, the difference- in-means has a finite sample bias, \[\mathbb{E}_{\mathcal{B}}[\hat{\tau}_{DM,n}] = \tau - (1-\frac{n_1}{n})^n\mathbb{E}[Y_i^{(1)}] + (\frac{n_1}{n})^n\mathbb{E}[Y_i^{(0)}]\]

Variance

Using the law of total variance, and conditioning on the treatment assignment vector \(\mathbf{T}\), one has \[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \right] &= \operatorname{Var}_{\mathcal{B}}\left[\mathbb{E}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T}\right] \right] + \mathbb{E}_{\mathcal{B}}\left[ \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T} \right] \right]. \end{align*}\]

Recall from derivations about the bias that,

\[\begin{align*} \mathbb{E}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T}\right] &= \mathbb{1}_{\sum_{i=1}^n T_i > 0} \mathbb{E}\left[ Y_i^{(1)}\right] - \mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \mathbb{E}_{\mathcal{B}}\left[ Y_i^{(0)}\right]. \end{align*}\]

Here, one has,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[\mathbb{E}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T}\right] \right] &= \operatorname{Var}_{\mathcal{B}}\left[ \mathbb{1}_{\sum_{i=1}^n T_i > 0} \mathbb{E}\left[ Y_i^{(1)}\right] - \mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \mathbb{E}\left[ Y_i^{(0)}\right] \right]\\ &= \mathbb{E}\left[ Y_i^{(1)}\right]^2 \operatorname{Var}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} \right] + \mathbb{E}\left[ Y_i^{(0)}\right]^2 \operatorname{Var}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \right] \\ &\qquad -2 \mathbb{E}\left[ Y_i^{(1)}\right] \mathbb{E}\left[ Y_i^{(0)}\right] \operatorname{Cov}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} , \mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \right]. \end{align*}\]

Besides,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} \right] &= \mathbb{E}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} ^2 \right] - \mathbb{E}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} \right]^2 \\ &= (1-\pi)^n\left( 1- (1-\pi)^n\right), \end{align*}\]

and similarly,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \right] &= \pi^n\left( 1- \pi^n\right). \end{align*}\]

On the other hand,

\[\begin{align*} \operatorname{Cov}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} , \mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \right] &= \mathbb{E}_{\mathcal{B}}\left[ \left(\mathbb{1}_{\sum_{i=1}^n T_i > 0} - \left(1- (1-\pi)^n \right) \right) \left( \mathbb{1}_{\sum_{i=1}^n 1-T_i > 0}- 1-\pi^n\right) \right] \\ &= \mathbb{E}_{\mathcal{B}}\left[\mathbb{1}_{\sum_{i=1}^n T_i > 0} \mathbb{1}_{\sum_{i=1}^n 1-T_i > 0} \right] - \left(1- (1-\pi)^n \right)\left(1-\pi^n\right) \\ &= 1-(1-\pi)^n-\pi^n - \left(1- \pi^n - (1-\pi)^n - \pi^n(1-\pi)^n \right)\\ &= \pi^n(1-\pi)^n, \end{align*}\]

such that,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[\mathbb{E}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T}\right] \right] &= \mathbb{E}\left[ Y_i^{(1)}\right]^2 (1-\pi)^n\left( 1- (1-\pi)^n\right)+ \mathbb{E}\left[ Y_i^{(0)}\right]^2 \pi^n\left( 1- \pi^n\right)-2 \mathbb{E}\left[ Y_i^{(1)}\right] \mathbb{E}\left[ Y_i^{(0)}\right] \pi^n(1-\pi)^n \\ &= \mathbb{E}\left[ Y_i^{(1)}\right]^2 (1-\pi)^n + \mathbb{E}\left[ Y_i^{(0)}\right]^2 \pi^n - \left(\mathbb{E}\left[ Y_i^{(1)}\right] (1-\pi)^n + \mathbb{E}\left[ Y_i^{(0)}\right] \pi^n \right)^2 \\ % & \leq (1-\pi)^n \mathbb{E}\left[ Y_i^{(1)}\right]^2 + \pi^n \mathbb{E}\left[ Y_i^{(0)}\right]^2-2\pi^n(1-\pi)^n \mathbb{E}\left[ Y_i^{(1)}\right] \mathbb{E}\left[ Y_i^{(0)}\right] \\ & \leq \mathbb{E}\left[ Y_i^{(1)}\right]^2 (1-\pi)^n + \mathbb{E}\left[ Y_i^{(0)}\right]^2 \pi^n \\ & \leq \left( \mathbb{E}\left[ Y^{(1)}\right]^2 + \mathbb{E}\left[ Y^{(0)}\right]^2\right) \max(\pi, 1 - \pi)^n. \end{align*}\]

Now,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T} \right] &= \operatorname{Var}_{\mathcal{B}}\left[ \frac{1}{n} \sum_{i=1}^n \left( \frac{T_i Y_i^{(1)}}{\hat \pi} - \frac{(1-T_i)Y_i^{(0)} }{1-\hat \pi} \right) \mid \mathbf{T} \right] \\ &= \frac{1}{n} \operatorname{Var}_{\mathcal{B}}\left[ \frac{T_i Y_i^{(1)}}{\hat \pi} - \frac{(1-T_i)Y_i^{(0)} }{1-\hat \pi} \mid \mathbf{T} \right] && \text{iid}\\ &= \frac{1}{n} \left( \operatorname{Var}_{\mathcal{B}}\left[ \frac{T_i Y_i^{(1)}}{\hat \pi} \mid \mathbf{T}\right] + \operatorname{Var}_{\mathcal{B}}\left[ \frac{(1-T_i)Y_i^{(0)} }{1-\hat \pi} \mid \mathbf{T}\right] - 2 \operatorname{Cov}_{\mathcal{B}}\left[ \frac{T_i Y_i^{(1)}}{\hat \pi} , \frac{(1-T_i)Y_i^{(0)} }{1-\hat \pi} \mid \mathbf{T} \right]\right). \end{align*}\]

Now, developing the covariance term, it is possible to show that,

\[\begin{align*} \operatorname{Cov}_{\mathcal{B}}\left[ \frac{T_i Y_i^{(1)}}{\hat \pi} , \frac{(1-T_i)Y_i^{(0)} }{1-\hat \pi} \mid \mathbf{T} \right] &= - \mathbb{E}_{\mathcal{B}}\left[ \frac{(1-T_i)Y_i^{(0)} }{1-\hat \pi} \mid \mathbf{T} \right] \mathbb{E}_{\mathcal{B}}\left[ \frac{T_iY_i^{(1)} }{\hat \pi} \mid \mathbf{T} \right] \\ &= - \frac{(1-T_i)\mathbb{E}_{\mathcal{B}}\left[Y_i^{(0)} \mid \mathbf{T} \right] }{1-\hat \pi} \frac{T_i\mathbb{E}_{\mathcal{B}}\left[ Y_i^{(1)} \mid \mathbf{T} \right] }{\hat \pi} && \text{Linearity and conditioned on $\mathbf{T}$} \\ &= 0. && \text{$T_i(1-T_i)=0$} \end{align*}\]

Now, also using linearity of expectation, and the fact that we conditioned on \(\mathbf{T}\), one has

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T} \right] &= \frac{1}{n} \left( \left( \frac{T_i}{\hat \pi}\right)^2 \operatorname{Var}_{\mathcal{B}}\left[Y_i^{(1)} \mid \mathbf{T}\right] + \left( \frac{1-T_i}{1- \hat \pi}\right)^2 \operatorname{Var}_{\mathcal{B}}\left[Y_i^{(0)} \mid \mathbf{T}\right] \right) \\ &= \frac{1}{n} \left( \left( \frac{T_i}{\hat \pi}\right)^2 \operatorname{Var}\left[Y_i^{(1)}\right] + \left( \frac{1-T_i}{1- \hat \pi}\right)^2 \operatorname{Var}\left[Y_i^{(0)}\right] \right), && \text{using $\{Y_i^{(1)}, Y_i^{(0)} \} \perp\mkern-9.5mu\perp T_i $.} \end{align*}\]

Taking the expecation of the previous term leads to,

\[\begin{align*} \mathbb{E}_{\mathcal{B}}\left[ \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T} \right] \right]&= \mathbb{E}\left[ \frac{1}{n} \left( \left( \frac{T_i}{\hat \pi}\right)^2 \operatorname{Var}\left[Y_i^{(1)}\right] + \left( \frac{1-T_i}{1- \hat \pi}\right)^2 \operatorname{Var}\left[Y_i^{(0)}\right] \right)\right] \\ &= \frac{1}{n} \left(\mathbb{E}\left[ \left( \frac{T_i}{\hat \pi}\right)^2\right] \operatorname{Var}\left[Y_i^{(1)}\right] + \frac{1}{n} \mathbb{E}\left[ \left( \frac{1-T_i}{1-\hat \pi}\right)^2\right] \operatorname{Var}\left[Y_i^{(0)}\right] \right), && \text{by linearity.} \end{align*}\]

Note that,

\[\begin{align*} \mathbb{E}_{\mathcal{B}}\left[ \left( \frac{T_i}{\hat \pi}\right)^2\right] &= \mathbb{E}_{\mathcal{B}}\left[ \frac{T_i}{\left(\hat \pi\right)^2}\right] \\ &= \frac{1}{n}\left( \mathbb{E}_{\mathcal{B}}\left[ \frac{T_1}{\hat \pi^2}\right] + \mathbb{E}_{\mathcal{B}}\left[ \frac{T_2}{\hat \pi^2}\right] + \dots + \mathbb{E}_{\mathcal{B}}\left[ \frac{T_n}{\hat \pi^2}\right] \right) \\ &= \mathbb{E}_{\mathcal{B}}\left[ \frac{\hat \pi}{\hat \pi^2}\right] \\ &= \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{\hat \pi > 0} }{\hat \pi}\right], \end{align*}\]

so that

\[\begin{align*} \mathbb{E}_{\mathcal{B}}\left[ \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T} \right] \right] &= \frac{1}{n} \left( \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{\hat \pi > 0} }{\hat \pi}\right] \operatorname{Var}\left[Y_i^{(1)}\right] + \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{(1-\hat \pi) > 0} }{1-\hat \pi}\right] \operatorname{Var}\left[Y_i^{(0)}\right] \right). \end{align*}\]

Coming back to the law of total variance, one has,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \right] &= \operatorname{Var}_{\mathcal{B}}\left[\mathbb{E}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T}\right] \right] + \mathbb{E}_{\mathcal{B}}\left[ \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \mid \mathbf{T} \right] \right] \\ &= \mathbb{E}\left[ Y_i^{(1)}\right]^2 (1-\pi)^n + \mathbb{E}\left[ Y_i^{(0)}\right]^2 \pi^n - \left(\mathbb{E}\left[ Y_i^{(1)}\right] (1-\pi)^n + \mathbb{E}\left[ Y_i^{(0)}\right] \pi^n \right)^2 \\ & \qquad + \frac{1}{n} \left( \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{\hat \pi > 0} }{\hat \pi}\right] \operatorname{Var}\left[Y_i^{(1)}\right] + \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{(1-\hat \pi) > 0} }{1-\hat \pi}\right] \operatorname{Var}\left[Y_i^{(0)}\right] \right) \end{align*}\]

In particular, for any sample size,

\[\begin{align*} \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \right] &= \frac{1}{n} \left( \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{\hat \pi > 0} }{\hat \pi}\right] \operatorname{Var}\left[Y_i^{(1)}\right] + \mathbb{E}_{\mathcal{B}}\left[ \frac{\mathbb{1}_{(1-\hat \pi) > 0} }{1-\hat \pi}\right] \operatorname{Var}\left[Y_i^{(0)}\right] \right) + \mathcal{O}\left(\max(\pi, 1-\pi)^n \right), \end{align*}\]

and more particularly, \[\begin{align*} \lim_{n\to\infty} n \operatorname{Var}_{\mathcal{B}}\left[ \hat{\tau}_{\text{\tiny DM}} \right] &= \frac{ \operatorname{Var}\left[ Y^{(1)}\right] }{\pi}+ \frac{ \operatorname{Var}\left[ Y^{(0)}\right] }{1-\pi}:= V_{ \text{\tiny DM}, \infty}. \end{align*}\]

Proposition 4.4: Asymptotically unbiasedness consistancy of \(\hat{\tau}_{DM,n}\) under a Completely randomized trial
Under a Completely randomized design we have for all n, \[ \mathbb{E}_{\mathcal{C}}[\hat{\tau}_{DM,n}]=\tau \] and its variance satisfies, for all n, \[n \mathbb{Var}_{\mathcal{C}}[\hat{\tau}_{DM,n}] = \frac{\mathbb{Var}\left[Y^{(1)}\right]}{n_1} + \frac{\mathbb{Var}\left[Y^{(0)}\right]}{n_0}\]

where \(n_1= \sum_{i=1}^n T_i\) and \(n_0= \sum_{i=1}^n 1- T_i\)

Proof
Bias \[\begin{align*} \mathbb{E}_{\mathcal{C}}[\hat{\tau}_{DM,n}] &= \frac{1}{n_1}\sum_{i=1}^n\mathbb{E}_{\mathcal{C}}[T_iY^{(1)}_i] - \frac{1}{n_0}\sum_{i=1}^n\mathbb{E}_{\mathcal{C}}[(1-T_i)Y^{(0)}_i] && \text{Linearity and SUTVA} \\ &= \frac{1}{n_1}\sum_{i=1}^n\mathbb{E}_{\mathcal{C}}[T_i]\mathbb{E}_{\mathcal{C}}[Y^{(1)}_i] - \frac{1}{n_0}\sum_{i=1}^n\mathbb{E}_{\mathcal{C}}[(1-T_i)]\mathbb{E}_{\mathcal{C}}[Y^{(0)}_i] && \text{Randomization} \\ &= \frac{1}{n_1}\sum_{i=1}^n \frac{n_1}{n}\mathbb{E}_{\mathcal{C}}[Y^{(1)}_i] - \frac{1}{n_0}\sum_{i=1}^n\frac{n_0}{n}\mathbb{E}_{\mathcal{C}}[Y^{(0)}_i] && \text{Completely randomized trial} \\ &= \tau && \text{Linearity} \end{align*}\]

Variance

For the Proof of the variance we refer to the proof of Horvitz-Thomson under a Completely randomized trial since in this case Neyman is just a reformulation of Horvitz-Thomson.

Counter-intuitively, the benefit of estimating \(\pi\) is to lower the variance. Even if the true probability is \(\pi = 0.5\), the actual treatment allocation in the sample can be different (e.g., \(\hat{\pi} = 0.48\)), and using \(\hat{\pi}\) rather than \(\pi\) leads to a smaller large sample variance by adjusting to the exact observed probability to be treated in the trial. In particular, it is possible to be convinced of this phenomenon when comparing the two variances.

Proposition 4.5: Variance inequality between a \(\hat{\tau}_{HT,n}\) and \(\hat{\tau}_{DM,n}\) under a Bernoulli design
Asymptotically under a Bernoulli design the variance of the difference-in-means estimator has a smaller variance than the Horvitz-Thomson estimator, \[V_{DM, \infty} = V_{HT} - \left( \sqrt{\frac{1-\pi}{\pi}} \mathbb{E}[Y^{(1)}] + \sqrt{\frac{\pi}{1-\pi}}\mathbb{E}[Y^{(0)}] \right)^2 \leq V_{HT}\]

Proof
We have: \[\begin{align*} V_{HT} &= \mathbb{E}\left[ \frac{(Y^{(1)})^2}{\pi}\right] + \mathbb{E}\left[ \frac{(Y^{(0)})^2}{1-\pi}\right] - \tau^2 \\ &= \frac{1}{\pi} \left(\mathbb{Var}\left[Y^{(1)}\right] + \mathbb{E}\left[Y^{(1)}\right]^2\right) + \frac{1}{1-\pi} \left(\mathbb{Var}\left[(Y^{(0)})\right] + \mathbb{E}\left[Y^{(0)}\right]^2\right) - \tau^2 \\ &= V_{DM, \infty} + \frac{1}{\pi} \mathbb{E}\left[Y^{(1)}\right]^2 + \frac{1}{1-\pi}\mathbb{E}\left[(Y^{(0)})\right]^2 - \tau^2 \\ &= V_{DM, \infty} + \left( \frac{1}{\pi} - 1 \right)\mathbb{E}\left[Y^{(1)}\right]^2 + \left( \frac{1}{1-\pi} - 1 \right)\mathbb{E}\left[Y^{(0)}\right]^2 + 2\mathbb{E}\left[Y^{(0)}\right]\mathbb{E}\left[Y^{(1)}\right]\\ &= V_{DM, \infty} + \left( \sqrt{\frac{1-\pi}{\pi}} \mathbb{E}[Y^{(1)}] + \sqrt{\frac{\pi}{1-\pi}}\mathbb{E}[Y^{(0)}] \right)^2 \end{align*}\]

To conclude and summarize all of these properties we can build a table:

Horvitz-Thomson estimator Difference-in-means - Neyman estimator
Expectancy Bernoulli \(\tau\) \(\tau + \pi^n \mathbb{E}[Y_i^{(0)}] - (1-\pi)^n \mathbb{E}[Y_i^{(1)}]\)
C.randomized \(\tau\) \(\tau\)
Variance Bernoulli \(\frac{1}{n}\left(\mathbb{E}\left[ \frac{(Y^{(1)})^2}{\pi}\right] + \mathbb{E}\left[ \frac{(Y^{(0)})^2}{1-\pi}\right] - \tau^2\right)\) \(\frac{1}{n}\left( \mathbb{E}_{\mathcal{B}}\left[\frac{\mathbb{1}_{\hat{e}>0}}{\hat{e}}\right]\mathbb{Var}[Y^{(1)}] + \mathbb{E}_{\mathcal{B}}\left[\frac{\mathbb{1}_{1-\hat{e}>0}}{1-\hat{e}}\right]\mathbb{Var}[Y^{(0)}] + \mathcal{O}(\max{(\pi, 1-\pi)}^n)\right)\)
C.randomized \(\frac{\mathbb{Var}_{\mathcal{C}}[Y^{(1)}]}{n_1} + \frac{\mathbb{Var}_{\mathcal{C}}[Y^{(0)}]}{n_0}\) \(\frac{\mathbb{Var}_{\mathcal{C}}[Y^{(1)}]}{n_1} + \frac{\mathbb{Var}_{\mathcal{C}}[Y^{(0)}]}{n_0}\)