LinearMixedModel¶

class
hail.stats.
LinearMixedModel
[source]¶ Class representing a linear mixed model.
Danger
This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.
LinearMixedModel
represents a linear model of the form\[y \sim \mathrm{N}(X \beta, \, \sigma^2 K + \tau^2 I)\]where
\(\mathrm{N}\) is a \(n\)dimensional normal distribution.
\(y\) is a known vector of \(n\) observations.
\(X\) is a known \(n \times p\) design matrix for \(p\) fixed effects.
\(K\) is a known \(n \times n\) positive semidefinite kernel.
\(I\) is the \(n \times n\) identity matrix.
\(\beta\) is a \(p\)parameter vector of fixed effects.
\(\sigma^2\) is the variance parameter on \(K\).
\(\tau^2\) is the variance parameter on \(I\).
In particular, the residuals for the \(i^\mathit{th}\) and \(j^\mathit{th}\) observations have covariance \(\sigma^2 K_{ij}\) for \(i \neq j\).
This model is equivalent to a mixed model of the form
\[y = X \beta + Z u + \epsilon\]by setting \(K = ZZ^T\) where
\(Z\) is a known \(n \times r\) design matrix for \(r\) random effects.
\(u\) is a \(r\)vector of random effects drawn from \(\mathrm{N}(0, \sigma^2 I)\).
\(\epsilon\) is a \(n\)vector of random errors drawn from \(\mathrm{N}(0, \tau^2 I)\).
However,
LinearMixedModel
does not itself realize \(K\) as a linear kernel with respect to random effects, nor does it take \(K\) explicitly as input. Rather, via the eigendecomposion \(K = U S U^T\), the the class leverages a third, decorrelated form of the model\[Py \sim \mathrm{N}(PX \beta, \, \sigma^2 (\gamma S + I))\]where
\(P = U^T: \mathbb{R}^n \rightarrow \mathbb{R}^n\) is an orthonormal transformation that decorrelates the observations. The rows of \(P\) are an eigenbasis for \(K\).
\(S\) is the \(n \times n\) diagonal matrix of corresponding eigenvalues.
\(\gamma = \frac{\sigma^2}{\tau^2}\) is the ratio of variance parameters.
Hence, the triple \((Py, PX, S)\) determines the probability of the observations for any choice of model parameters, and is therefore sufficient for inference. This triple, with S encoded as a vector, is the default (“fullrank”) initialization of the class.
LinearMixedModel
also provides an efficient strategy to fit the model above with \(K\) replaced by its rank\(r\) approximation \(K_r = P_r^T S_r P_r\) where\(P_r: \mathbb{R}^n \rightarrow \mathbb{R}^r\) has orthonormal rows consisting of the top \(r\) eigenvectors of \(K\).
\(S_r\) is the \(r \times r\) diagonal matrix of corresponding nonzero eigenvalues.
For this lowrank model, the quintuple \((P_r y, P_r X, S_r, y, X)\) is similarly sufficient for inference and corresponds to the “lowrank” initialization of the class. Morally, \(y\) and \(X\) are required for lowrank inference because the diagonal \(\gamma S + I\) is always fullrank.
If \(K\) actually has rank \(r\), then \(K = K_r\) and the lowrank and fullrank models are equivalent. Hence lowrank inference provides a more efficient, equallyexact algorithm for fitting the fullrank model. This situation arises, for example, when \(K\) is the linear kernel of a mixed model with fewer random effects than observations.
Even when \(K\) has full rank, using a lowerrank approximation may be an effective from of regularization, in addition to boosting computational efficiency.
Initialization
The class may be initialized directly or with one of two methods:
from_kinship()
takes \(y\), \(X\), and \(K\) as ndarrays. The model is always fullrank.from_random_effects()
takes \(y\) and \(X\) as ndarrays and \(Z\) as an ndarray or block matrix. The model is fullrank if and only if \(n \leq m\).
Direct fullrank initialization takes \(Py\), \(PX\), and \(S\) as ndarrays. The following class attributes are set:
Attribute
Type
Value
low_rank
bool
False
n
int
Number of observations \(n\)
f
int
Number of fixed effects \(p\)
r
int
Effective number of random effects, must equal \(n\)
py
ndarray
Rotated response vector \(P y\) with shape \((n)\)
px
ndarray
Rotated design matrix \(P X\) with shape \((n, p)\)
s
ndarray
Eigenvalues vector \(S\) of \(K\) with shape \((n)\)
p_path
str
Path at which \(P\) is stored as a block matrix
Direct lowrank initialization takes \(P_r y\), \(P_r X\), \(S_r\), \(y\), and \(X\) as ndarrays. The following class attributes are set:
Attribute
Type
Value
low_rank
bool
True
n
int
Number of observations \(n\)
f
int
Number of fixed effects \(p\)
r
int
Effective number of random effects, must be less than \(n\)
py
ndarray
Projected response vector \(P_r y\) with shape \((r)\)
px
ndarray
Projected design matrix \(P_r X\) with shape \((r, p)\)
s
ndarray
Eigenvalues vector \(S_r\) of \(K_r\) with shape \((r)\)
y
ndarray
Response vector with shape \((n)\)
x
ndarray
Design matrix with shape \((n, p)\)
p_path
str
Path at which \(P\) is stored as a block matrix
Fitting the model
fit()
uses restricted maximum likelihood (REML) to estimate \((\beta, \sigma^2, \tau^2)\).This is done by numerical optimization of the univariate function
compute_neg_log_reml()
, which itself optimizes REML constrained to a fixed ratio of variance parameters. Each evaluation ofcompute_neg_log_reml()
has computational complexity\[\mathit{O}(rp^2 + p^3).\]fit()
adds the following attributes at this estimate.Attribute
Type
Value
beta
ndarray
\(\beta\)
sigma_sq
float
\(\sigma^2\)
tau_sq
float
\(\tau^2\)
gamma
float
\(\gamma = \frac{\sigma^2}{\tau^2}\)
log_gamma
float
\(\log{\gamma}\)
h_sq
float
\(\mathit{h}^2 = \frac{\sigma^2}{\sigma^2 + \tau^2}\)
h_sq_standard_error
float
asymptotic estimate of \(\mathit{h}^2\) standard error
Testing alternative models
The model is also equivalent to its augmentation
\[y \sim \mathrm{N}\left(x_\star\beta_\star + X \beta, \, \sigma^2 K + \tau^2 I\right)\]by an additional covariate of interest \(x_\star\) under the null hypothesis that the corresponding fixed effect parameter \(\beta_\star\) is zero. Similarly to initialization, fullrank testing of the alternative hypothesis \(\beta_\star \neq 0\) requires \(P x_\star\), whereas the lowrank testing requires \(P_r x_\star\) and \(x_\star\).
After running
fit()
to fit the null model, one can test each of a collection of alternatives using either of two implementations of the likelihood ratio test:fit_alternatives_numpy()
takes one or two ndarrays. It is a pure Python method that evaluates alternatives serially on leader (master).fit_alternatives()
takes one or two paths to block matrices. It evaluates alternatives in parallel on the workers.
Per alternative, both have computational complexity
\[\mathit{O}(rp + p^3).\] Parameters
py (
numpy.ndarray
) – Projected response vector \(P_r y\) with shape \((r)\).px (
numpy.ndarray
) – Projected design matrix \(P_r X\) with shape \((r, p)\).s (
numpy.ndarray
) – Eigenvalues vector \(S\) with shape \((r)\).y (
numpy.ndarray
, optional) – Response vector with shape \((n)\). Include for lowrank inference.x (
numpy.ndarray
, optional) – Design matrix with shape \((n, p)\). Include for lowrank inference.p_path (
str
, optional) – Path at which \(P\) has been stored as a block matrix.
Attributes
Methods
Compute negative log REML constrained to a fixed value of \(\log{\gamma}\).
Find the triple \((\beta, \sigma^2, \tau^2)\) maximizing REML.
Fit and test alternative model for each augmented design matrix in parallel.
Fit and test alternative model for each augmented design matrix.
Initializes a model from \(y\), \(X\), and \(K\).
Initializes a model from \(y\), \(X\), and \(Z\).
Estimate the normalized likelihood of \(\mathit{h}^2\) over the discrete grid of percentiles.

compute_neg_log_reml
(log_gamma, return_parameters=False)[source]¶ Compute negative log REML constrained to a fixed value of \(\log{\gamma}\).
This function computes the triple \((\beta, \sigma^2, \tau^2)\) with \(\gamma = \frac{\sigma^2}{\tau^2}\) at which the restricted likelihood is maximized and returns the negative of the restricted log likelihood at these parameters (shifted by the constant defined below).
The implementation has complexity \(\mathit{O}(rp^2 + p^3)\) and is inspired by FaST linear mixed models for genomewide association studies (2011).
The formulae follow from Bayesian Inference for Variance Components Using Only Error Contrasts (1974). Harville derives that for fixed covariance \(V\), the restricted likelihood of the variance parameter \(V\) in the model
\[y \sim \mathrm{N}(X \beta, \, V)\]is given by
\[(2\pi)^{\frac{1}{2}(n  p)} \det(X^T X)^\frac{1}{2} \det(V)^{\frac{1}{2}} \det(X^T V^{1} X)^{\frac{1}{2}} e^{\frac{1}{2}(y  X\hat\beta)^T V^{1}(y  X\hat\beta)}.\]with
\[\hat\beta = (X^T V^{1} X)^{1} X^T V^{1} y.\]In our case, the variance is
\[V = \sigma^2 K + \tau^2 I = \sigma^2 (K + \gamma^{1} I)\]which is determined up to scale by any fixed value of the ratio \(\gamma\). So for input \(\log \gamma\), the negative restricted log likelihood is minimized at \((\hat\beta, \hat\sigma^2)\) with \(\hat\beta\) as above and
\[\hat\sigma^2 = \frac{1}{n  p}(y  X\hat\beta)^T (K + \gamma^{1} I)^{1}(y  X\hat\beta).\]For \(\hat V\) at this \((\hat\beta, \hat\sigma^2, \gamma)\), the exponent in the likelihood reduces to \(\frac{1}{2}(np)\), so the negative restricted log likelihood may be expressed as
\[\frac{1}{2}\left(\log \det(\hat V) + \log\det(X^T \hat V^{1} X)\right) + C\]where
\[C = \frac{1}{2}\left(n  p + (n  p)\log(2\pi)  \log\det(X^T X)\right)\]only depends on \(X\).
compute_neg_log_reml()
returns the value of the first term, omitting the constant term. Parameters
log_gamma (
float
) – Value of \(\log{\gamma}\).return_parameters – If
True
, also return \(\beta\), \(\sigma^2\), and \(\tau^2\).
 Returns
float
or (float
,numpy.ndarray
,float
,float
) – If return_parameters isFalse
, returns (shifted) negative log REML. Otherwise, returns (shifted) negative log REML, \(\beta\), \(\sigma^2\), and \(\tau^2\).

fit
(log_gamma=None, bounds=( 8.0, 8.0), tol=1e08, maxiter=500)[source]¶ Find the triple \((\beta, \sigma^2, \tau^2)\) maximizing REML.
This method sets the attributes beta, sigma_sq, tau_sq, gamma, log_gamma, h_sq, and h_sq_standard_error as described in the toplevel class documentation.
If log_gamma is provided,
fit()
finds the REML solution with \(\log{\gamma}\) constrained to this value. In this case, h_sq_standard_error isNone
since h_sq is not estimated.Otherwise,
fit()
searches for the value of \(\log{\gamma}\) that minimizescompute_neg_log_reml()
, and also sets the attribute optimize_result of type scipy.optimize.OptimizeResult. Parameters
log_gamma (
float
, optional) – If provided, the solution is constrained to have this value of \(\log{\gamma}\).bounds (
float
,float
) – Lower and upper bounds for \(\log{\gamma}\).tol (
float
) – Absolute tolerance for optimizing \(\log{\gamma}\).maxiter (
float
) – Maximum number of iterations for optimizing \(\log{\gamma}\).

fit_alternatives
(pa_t_path, a_t_path=None, partition_size=None)[source]¶ Fit and test alternative model for each augmented design matrix in parallel.
Notes
The alternative model is fit using REML constrained to the value of \(\gamma\) set by
fit()
.The likelihood ratio test of fixed effect parameter \(\beta_\star\) uses (nonrestricted) maximum likelihood:
\[\chi^2 = 2 \log\left(\frac{ \max_{\beta_\star, \beta, \sigma^2}\mathrm{N} (y \,  \, x_\star \beta_\star + X \beta; \sigma^2(K + \gamma^{1}I)} {\max_{\beta, \sigma^2} \mathrm{N} (y \,  \, x_\star \cdot 0 + X \beta; \sigma^2(K + \gamma^{1}I)} \right)\]The pvalue is given by the tail probability under a chisquared distribution with one degree of freedom.
The resulting table has the following fields:
Field
Type
Value
idx
int64
Index of augmented design matrix.
beta
float64
\(\beta_\star\)
sigma_sq
float64
\(\sigma^2\)
chi_sq
float64
\(\chi^2\)
p_value
float64
pvalue
\((P_r A)^T\) and \(A^T\) (if given) must have the same number of rows (augmentations). These rows are grouped into partitions for parallel processing. The number of partitions equals the ceiling of
n_rows / partition_size
, and should be at least the number or cores to make use of all cores. By default, there is one partition per row of blocks in \((P_r A)^T\). Setting the partition size to an exact (rather than approximate) divisor or multiple of the block size reduces superfluous shuffling of data.The number of columns in each block matrix must be less than \(2^{31}\).
Warning
The block matrices must be stored in rowmajor format, as results from
BlockMatrix.write()
withforce_row_major=True
and fromBlockMatrix.write_from_entry_expr()
. Otherwise, this method will produce an error message. Parameters
pa_t_path (
str
) – Path to block matrix \((P_r A)^T\) with shape \((m, r)\). Each row is a projected augmentation \(P_r x_\star\) of \(P_r X\).a_t_path (
str
, optional) – Path to block matrix \(A^T\) with shape \((m, n)\). Each row is an augmentation \(x_\star\) of \(X\). Include for lowrank inference.partition_size (
int
, optional) – Number of rows to process per partition. Default given by block size of \((P_r A)^T\).
 Returns
Table
– Table of results for each augmented design matrix.

fit_alternatives_numpy
(pa, a=None, return_pandas=False)[source]¶ Fit and test alternative model for each augmented design matrix.
Notes
This Pythononly implementation runs serially on leader (master). See the scalable implementation
fit_alternatives()
for documentation of the returned table. Parameters
pa (
numpy.ndarray
) – Projected matrix \(P_r A\) of alternatives with shape \((r, m)\). Each column is a projected augmentation \(P_r x_\star\) of \(P_r X\).a (
numpy.ndarray
, optional) – Matrix \(A\) of alternatives with shape \((n, m)\). Each column is an augmentation \(x_\star\) of \(X\). Required for lowrank inference.return_pandas (
bool
) – If true, return pandas dataframe. If false, return Hail table.
 Returns
Table
orpandas.DataFrame
– Table of results for each augmented design matrix.

classmethod
from_kinship
(y, x, k, p_path=None, overwrite=False)[source]¶ Initializes a model from \(y\), \(X\), and \(K\).
Examples
>>> from hail.stats import LinearMixedModel >>> y = np.array([0.0, 1.0, 8.0, 9.0]) >>> x = np.array([[1.0, 0.0], ... [1.0, 2.0], ... [1.0, 1.0], ... [1.0, 4.0]]) >>> k = np.array([[ 1. , 0.8727875 , 0.96397335, 0.94512946], ... [0.8727875 , 1. , 0.93036112, 0.97320323], ... [ 0.96397335, 0.93036112, 1. , 0.98294169], ... [ 0.94512946, 0.97320323, 0.98294169, 1. ]]) >>> model, p = LinearMixedModel.from_kinship(y, x, k) >>> model.fit() >>> model.h_sq 0.2525148830695317
>>> model.s array([3.83501295, 0.13540343, 0.02454114, 0.00504248])
Truncate to a rank \(r=2\) model:
>>> r = 2 >>> s_r = model.s[:r] >>> p_r = p[:r, :] >>> model_r = LinearMixedModel(p_r @ y, p_r @ x, s_r, y, x) >>> model.fit() >>> model.h_sq 0.25193197591429695
Notes
This method eigendecomposes \(K = P^T S P\) on the leader (master) and returns
LinearMixedModel(p @ y, p @ x, s)
andp
.The performance of eigendecomposition depends critically on the number of leader (master) cores and the NumPy / SciPy configuration, viewable with
np.show_config()
. For Intel machines, we recommend installing the MKL package for Anaconda.k must be positive semidefinite; symmetry is not checked as only the lower triangle is used.
 Parameters
y (
numpy.ndarray
) – \(n\) vector of observations.x (
numpy.ndarray
) – \(n \times p\) matrix of fixed effects.k (
numpy.ndarray
) – \(n \times n\) positive semidefinite kernel \(K\).p_path (
str
, optional) – Path at which to write \(P\) as a block matrix.overwrite (
bool
) – IfTrue
, overwrite an existing file at p_path.
 Returns
model (
LinearMixedModel
) – Model constructed from \(y\), \(X\), and \(K\).p (
numpy.ndarray
) – Matrix \(P\) whose rows are the eigenvectors of \(K\).

classmethod
from_random_effects
(y, x, z, p_path=None, overwrite=False, max_condition_number=1e10, complexity_bound=8192)[source]¶ Initializes a model from \(y\), \(X\), and \(Z\).
Examples
>>> from hail.stats import LinearMixedModel >>> y = np.array([0.0, 1.0, 8.0, 9.0]) >>> x = np.array([[1.0, 0.0], ... [1.0, 2.0], ... [1.0, 1.0], ... [1.0, 4.0]]) >>> z = np.array([[0.0, 0.0, 1.0], ... [0.0, 1.0, 2.0], ... [1.0, 2.0, 4.0], ... [2.0, 4.0, 8.0]]) >>> model, p = LinearMixedModel.from_random_effects(y, x, z) >>> model.fit() >>> model.h_sq 0.38205307244271675
Notes
If \(n \leq m\), the returned model is full rank.
If \(n > m\), the returned model is low rank. In this case only, eigenvalues less than or equal to max_condition_number times the top eigenvalue are dropped from \(S\), with the corresponding eigenvectors dropped from \(P\). This guards against precision loss on left eigenvectors computed via the right gramian \(Z^T Z\) in
BlockMatrix.svd()
.In either case, one can truncate to a rank \(r\) model as follows. If p is an ndarray:
>>> p_r = p[:r, :] >>> s_r = model.s[:r] >>> model_r = LinearMixedModel(p_r @ y, p_r @ x, s_r, y, x)
If p is a block matrix:
>>> p[:r, :].write(p_r_path) >>> p_r = BlockMatrix.read(p_r_path) >>> s_r = model.s[:r] >>> model_r = LinearMixedModel(p_r @ y, p_r @ x, s_r, y, x, p_r_path)
This method applies no standardization to z.
Warning
If z is a block matrix, then ideally z should be the result of directly reading from disk (and possibly a transpose). This is most critical if \(n > m\), because in this case multiplication by z will result in all preceding transformations being repeated
n / block_size
times, as explained inBlockMatrix
.At least one dimension must be less than or equal to 46300. See the warning in
BlockMatrix.svd()
for performance considerations. Parameters
y (
numpy.ndarray
) – \(n\) vector of observations \(y\).x (
numpy.ndarray
) – \(n \times p\) matrix of fixed effects \(X\).z (
numpy.ndarray
orBlockMatrix
) – \(n \times m\) matrix of random effects \(Z\).p_path (
str
, optional) – Path at which to write \(P\) as a block matrix. Required if z is a block matrix.overwrite (
bool
) – IfTrue
, overwrite an existing file at p_path.max_condition_number (
float
) – Maximum condition number. Must be greater than 1e16.complexity_bound (
int
) – Complexity bound forBlockMatrix.svd()
when z is a block matrix.
 Returns
model (
LinearMixedModel
) – Model constructed from \(y\), \(X\), and \(Z\).p (
numpy.ndarray
orBlockMatrix
) – Matrix \(P\) whose rows are the eigenvectors of \(K\). The type is block matrix if z is a block matrix andBlockMatrix.svd()
of z returns \(U\) as a block matrix.

h_sq_normalized_lkhd
()[source]¶ Estimate the normalized likelihood of \(\mathit{h}^2\) over the discrete grid of percentiles.
Examples
Plot the estimated normalized likelihood function:
>>> import matplotlib.pyplot as plt >>> plt.plot(range(101), model.h_sq_normalized_lkhd())
Notes
This method may be used to visualize the approximate posterior on \(\mathit{h}^2\) under a flat prior.
The resulting ndarray
a
has length 101 witha[i]
equal to the maximum likelihood over all \(\beta\) and \(\sigma^2\) with \(\mathit{h}^2\) constrained toi / 100
. The values for1 <= i <= 99
are normalized to sum to 1, anda[0]
anda[100]
are set tonan
. Returns
numpy.ndarray
offloat
– Normalized likelihood values for \(\mathit{h}^2\).