GENIE Model =========== The ``GENIE`` class implements the Gene-ENvironment Interaction Estimator. It extends the ``Base`` class to estimate gene-environment interaction effects in genomic data. Model Types ----------- The GENIE model supports three types of analysis: - ``G``: Basic genetic effects only - ``G+GxE``: Genetic effects plus gene-environment interactions - ``G+GxE+NxE``: Full model with noise-environment interactions Variance Components ------------------- The model estimates multiple variance components: - σ²g: Genetic variance - σ²gxe: Gene-environment interaction variance - σ²nxe: Noise-environment interaction variance - σ²e: Environmental variance Class Inheritance --------------- .. code-block:: python class GENIE(Base): """Gene-ENvironment Interaction Estimator.""" Initialization -------------- To initialize the GENIE model, you should also specify the environment file and the genie model to use. You must set the ``self.num_env`` and ``self.env`` member variables, which should be read from the environment file. You must set the ``self.num_gen_env_bin`` as ``self.num_bin * self.num_env``. .. py:method:: __init__(env_file: str, genie_model: str, **kwargs) Initializes the GENIE model with environment and model type specifications. :param str env_file: Path to the environment file containing environment variables :param str genie_model: Type of GENIE model to use ("G", "G+GxE", or "G+GxE+NxE") :param **kwargs: Additional arguments passed to the parent Base class .. code-block:: python def __init__( self, env_file: str, genie_model: str, **kwargs ): super().__init__(**kwargs) self.num_env, self.env = read_env_file(env_file) self.env = self.env[:, np.newaxis] self.num_gen_env_bin = self.num_bin * self.num_env self.genie_model = genie_model Key Methods ---------- .. py:method:: get_num_estimates() Returns the number of variance components based on model type: .. code-block:: python def get_num_estimates(self): if self.genie_model == "G": return self.num_bin elif self.genie_model == "G+GxE": return self.num_bin + self.num_gen_env_bin elif self.genie_model == "G+GxE+NxE": return self.num_bin + self.num_gen_env_bin + self.num_env else: raise ValueError("Unsupported GENIE genie_model type") .. py:method:: get_M_last_row() Specifies the last row of the M matrix based on the model type: .. code-block:: python def get_M_last_row(self): if self.genie_model == "G": return self.len_bin elif self.genie_model == "G+GxE": return np.concatenate((self.len_bin, self.len_bin * self.num_env)) elif self.genie_model == "G+GxE+NxE": return np.concatenate((self.len_bin, self.len_bin * self.num_env, [1] * self.num_env)) else: raise ValueError("Unsupported GENIE genie_model type") .. py:method:: pre_compute_jackknife_bin(j, all_gen) Pre-computes statistics for each jackknife sample: :param int j: Jackknife sample index :param list all_gen: List of genotype matrices for each bin .. code-block:: python def pre_compute_jackknife_bin(self, j, all_gen): for k, X_kj in enumerate(all_gen): # Process genetic effects X_kj = self.standardize_geno(X_kj) self.M[j][k] = self.M[self.num_jack][k] - X_kj.shape[1] for b in range(self.num_random_vec): self.XXz[k, j, b, :] = self._compute_XXz(b, X_kj) if self.use_cov: self.UXXz[k, j, b, :] = self._compute_UXXz(self.XXz[k][j][b]) self.XXUz[k, j, b, :] = self._compute_XXUz(b, X_kj) self.yXXy[k][j] = self._compute_yXXy(X_kj, self.pheno) # Process GxE effects if needed if self.model_type in ["G+GxE", "G+GxE+NxE"]: X_kj_gxe = self._compute_gxe_effects(X_kj) self.M[j][k + self.num_bin] = self.M[self.num_jack][k + self.num_bin] - X_kj_gxe.shape[1] for b in range(self.num_random_vec): self.XXz[k + self.num_bin, j, b, :] = self._compute_XXz(b, X_kj_gxe) if self.use_cov: self.UXXz[k + self.num_bin, j, b, :] = self._compute_UXXz(self.XXz[k + self.num_bin][j][b]) self.XXUz[k + self.num_bin, j, b, :] = self._compute_XXUz(b, X_kj_gxe) self.yXXy[k + self.num_bin][j] = self._compute_yXXy(X_kj_gxe, self.pheno) # Process NxE effects if needed if self.model_type == "G+GxE+NxE": X_kj_nxe = self._compute_nxe_effects(X_kj) self.M[j][k + 2 * self.num_bin] = self.M[self.num_jack][k + 2 * self.num_bin] - X_kj_nxe.shape[1] for b in range(self.num_random_vec): self.XXz[k + 2 * self.num_bin, j, b, :] = self._compute_XXz(b, X_kj_nxe) if self.use_cov: self.UXXz[k + 2 * self.num_bin, j, b, :] = self._compute_UXXz(self.XXz[k + 2 * self.num_bin][j][b]) self.XXUz[k + 2 * self.num_bin, j, b, :] = self._compute_XXUz(b, X_kj_nxe) self.yXXy[k + 2 * self.num_bin][j] = self._compute_yXXy(X_kj_nxe, self.pheno) .. py:method:: b_trace_calculation(k, j, b_idx) Calculates trace terms for estimation: :param int k: Bin index :param int j: Jackknife sample index :param int b_idx: Random vector index :return: Trace value .. code-block:: python def b_trace_calculation(self, k, j, b_idx): # Trace for the interaction terms if k >= self.num_bin: # Actual trace calculation M_k = self.M[j][k] B1 = self.XXz[k][b_idx] b_trk = np.sum(B1 * self.all_zb.T) / (self.num_random_vec * M_k) else: # Trace can be directly calculated as self.num_indv since the genotype is standardized b_trk = self.num_indv return b_trk .. py:method:: run(method) Runs the complete GENIE analysis: :param str method: Estimation method ("lstsq" or "QR") :returns: Dictionary containing: - sigma_ests_total: Estimated variance components - sig_errs: Standard errors of variance components - h2_total: Heritability estimates - h2_errs: Standard errors of heritability - enrichment_total: Enrichment scores - enrichment_errs: Standard errors of enrichment Other Methods to Override the Base Class --------------------------------------- In addition to the methods above, the ``GENIE`` class also overrides the following methods from the ``Base`` class: .. py:method:: estimate(self, method) This method is overridden because for GENIE, the heritability should be computed with the traces. Thus, the adjusted sigma estimated based on the traces is also returned. .. code-block:: python def estimate(self, method: str = "lstsq"): # ... existing code from the base class ... sigma_ests_adj = [] for j in range(self.num_jack): # ... existing code from the base class ... # Adjust the estimate by the effect of traces for heritability calculation sigma_est_adj = [] for i in range(len(sigma_est)): sigma_est_adj.append(sigma_est[i] * T[i, self.num_estimates]) # ... existing code from the base class ... # Also include the adjusted sigma_ests_adj sigma_ests_adj = np.array(sigma_ests_adj) sigma_est_jackknife_adj, sigma_ests_total_adj = sigma_ests_adj[:-1, :], sigma_ests_adj[-1, :] sigma_ests_adj.append(sigma_est_adj) return sigma_est_jackknife, sigma_ests_total, sigma_est_jackknife_adj, sigma_ests_total_adj .. py:method:: compute_h2_nonoverlapping(self, sigma_est_jackknife, sigma_ests_total): Since the heritability is computed based on the traces and the heritability is computed separately for G and GxE, we also need to override the method to compute the heritability. A future work is to also override the method to compute the heritability for the overlapping cases. .. py:method:: compute_enrichment(self, h2_jackknife, h2_total): The enrichment is computed separately for G only based on the original GENIE algorithm. Thus, we also override the enrichment computation method. Usage Example ------------ .. code-block:: python from pyrhe.models import GENIE # Initialize model genie_model = GENIE( genie_model="G+GxE+NxE", geno_file="path/to/genotype", annot_file="path/to/annotation", pheno_file="path/to/phenotype", cov_file="path/to/covariate", env_file="path/to/environment", num_bins=10, num_jack=100, num_random_vec=10, num_workers=5, ... ) # Run analysis results = genie_model() # Access results # The outputs are automatically logged in the output file. # In addition, you can also access the results: print(results) print(results['sigma_ests_total']) # The results are stored in a dictionary. The keys are: # - sigma_ests_total: Estimated variance components # - sig_errs: Standard errors of variance components # - h2_total: Heritability estimates # - h2_errs: Standard errors of heritability # - enrichment_total: Enrichment scores # - enrichment_errs: Standard errors of enrichment