Systems (epi)genetics to study the basis of complex traits and diseases
For most biological processes, organisms must respond to extrinsic cues, while maintaining essential gene expression programmes. Although studied extensively in single cells, it is still unclear how variation is controlled in multicellular organisms. Here, we used a machine‐learning approach to identify genomic features that are predictive of genes with high versus low variation in their expression across individuals, using bulk data to remove stochastic cell‐to‐cell variation. Using embryonic gene expression across 75 Drosophilaisogenic lines, we identify features predictive of expression variation (controlling for expression level), many of which are promoter‐related. Genes with low variation fall into two classes reflecting different mechanisms to maintain robust expression, while genes with high variation seem to lack both types of stabilizing mechanisms. Applying this framework to humans revealed similar predictive features, indicating that promoter architecture is an ancient mechanism to control expression variation. Remarkably, expression variation features could also partially predict differential expression after diverse perturbations in both Drosophila and humans. Differential gene expression signatures may therefore be partially explained by genetically encoded gene‐specific features, unrelated to the studied treatment.
For most biological processes, organisms must respond to extrinsic cues, while maintaining essential gene expression programmes. Although studied extensively in single cells, it is still unclear how variation is controlled in multicellular organisms. Here, we used a machine‐learning approach to identify genomic features that are predictive of genes with high versus low variation in their expression across individuals, using bulk data to remove stochastic cell‐to‐cell variation. Using embryonic gene expression across 75 Drosophilaisogenic lines, we identify features predictive of expression variation (controlling for expression level), many of which are promoter‐related. Genes with low variation fall into two classes reflecting different mechanisms to maintain robust expression, while genes with high variation seem to lack both types of stabilizing mechanisms. Applying this framework to humans revealed similar predictive features, indicating that promoter architecture is an ancient mechanism to control expression variation. Remarkably, expression variation features could also partially predict differential expression after diverse perturbations in both Drosophila and humans. Differential gene expression signatures may therefore be partially explained by genetically encoded gene‐specific features, unrelated to the studied treatment.
Code is available here
Gene-specific feature table for Drosophila (01_master_table_final_fly_EV02.csv)
Collection of all features per gene in Drosophila. Feature names are explained in Feature details Drosophila (02_features_info_fly_EV01.csv).
Feature details Drosophila (02_features_info_fly_EV01.csv). Features used to predict expression level and variation in Drosophila.
Important features Drosophila (03_important_features_fly_EV04.csv). Feature importance scores (from Boruta) and correlations with predicted variables. Only features important in at least one prediction are included. NA indicate non-significant features in the corresponding predictions. Columns:
Gene-specific feature table human (05_aggregted_expression_human_EV17.csv). Collection of gene-specific features for human genes (gene_id) as comma separated file. Feature details are explained in 06_Feature_details_human_EV11.csv.
Tissue-specific expression level and variation human genes (04_all_variations_human_EV10.csv). Gene- and tissue-specific expression data and several gene annotations for human genes. Columns names for tissue-specific expression contain tissue name. NA indicates that a gene is not expressed (or did not pass filtering criteria) in the corresponding tissue.
Feature details human (06_Feature_details_human_EV11.csv). Explanation of all features used to predict expression level and variation in human.
(07_important_features_human_EV18.csv). Median feature importance scores (from Boruta) and correlationswith predicted variables (median_importance and correlation_with_responce columns). Feature importance and correlations are reported for aggregated expression level and variation (response column).