Our mission is to train scientists. This blog is a platform for us to share updates on our annual programme, tips and tricks for scientists, new e-learning opportunities, and sometimes just something to make you smile.
The EMBO|EMBL Symposium: Multiomics to Mechanisms: Challenges in Data Integration took place virtually 15 – 17 September 2021. With over 400 participants, this was the biggest multi-omics conference since it began in 2017. We had 96 posters presented virtually, and are excited to share the research from the three best poster prize winners.
Bacteria need to adapt to changes in their environment in order to survive. Transcription factors (TFs) bind metabolites that signal such changes and in turn alter gene expression. Escherichia coli has the best characterized transcriptional regulatory network involving 300 predicted TFs, of which ~75% have a metabolite‑binding domain. However, the binding partners of only 95 TFs have been identified due to low-throughput of common in vitro identification methods. Here, we combined metabolomics and gene expression data obtained in vivo across several growth conditions to identify TF‑metabolite interactions of four TFs without a known binding partner: CdaR, CsgD, FlhDC and GadX. We have validated our method by accurately predicting the known binding partners of ArgR, TyrR and CysB, three highly studied TFs. The in vivo nature of our approach can not only identify new TF‑metabolite interactions but also provide insight into the most functionally relevant.
The call for an application of (multi‑)omics data in toxicology became highly prominent in recent years, since omics experiments are intended to generate comprehensive information on molecular changes in cells and tissues more quickly, more accurately, and with fewer resources than ever before. The associated hopes explicitly include the reduction of live animal testing and an increased number of analyzed substances that can be tested. Therein, multi‑omics data are essential to comprehensively infer mechanistic knowledge on molecular response pathways to subsequently guide and aid chemical risk assessment. However, currently available multi‑omics pathway enrichment methods struggle to cope with different aspects hampering their application in computational toxicology, e.g., the utilization of insufficient enrichment methods, missing support for time‑ and concentration resolved data, and restrictions on the pathway sources. Most approaches utilize a sequential data integration and thereby completely ignore the connections between different omics layers. With ToPaFC, we present the first step towards a consistent and simultaneous multi-omics-based pathway enrichment that accounts for those obstacles and explicitly takes the underlying pathway topology into account. Right now, we can deal with up to eight different pathway databases and two omics layers (trans/meta or prot/meta). The pathway topology is reflected in two different ways: i) the importance of a node (omics feature) is measured based on its connections and its relative localization within the pathway and ii) the influence of each node on the network is specified by the weight of its outgoing edges, whether they are inhibiting, neutral, or activating. With this integration of edge information along the pathway, our method inherently accounts for consistent molecular changes of the features. The derived node‑centered pathway representation is combined with measured multi‑omics features to calculate a topology‑based pathway fold change that accounts for consistent changes within the molecular response.
The advance of laser‑microdissection technologies coupled with proteomics enables unprecedented insights into tissue proteomes. However, the limited availability of patient materials coupled with the high dimensional output of proteomics necessitates data integration across studies to safeguard the reliability of the results. We microdissected morphologically benign and neoplastic pancreas and surrounding stromal areas from 14 patients with early pancreatic ductal adenocarcinoma and analyzed their protein compositions with nLC‑MS/MS. The results indicated downregulated digestive functions in the malignant exocrine tissue and lower metabolic activity in the stroma vs. exocrine pancreas. Intriguingly, the majority of the most significant proteins for survival originated from the morphologically benign exocrine regions, suggesting that these areas may harbor early, predisposing changes. To scrutinize this idea, we compared their proteomes to proteomics data of 12 healthy control pancreatic samples obtained from publications. The protein identification and quantification pipeline from the raw mass spectrometer files were standardized to minimize variation introduced by search engines or protein sequence databases. Altogether, we identified 7,099 proteins in 67 samples involving 5 tissue types from 2 experiments and 5 batches. We investigated two independent strategies for rendering the values comparable. First, batch effects within experiments were corrected for with ComBat and the abundances across experiments were aligned with housekeeping protein normalization. However, this approach required full observations, removing over 90% of the identified proteins from the analysis. Hence, our second approach involved applying Group Factor Analysis to directly extract factors that reveal relationships between the tissue types in our study without compromising the protein coverage. These approaches not only showed that our main results are independent of the data analysis pipeline but also implicated changes in the mRNA splicing machinery as important players in pancreatic cancer. By surveying 165 patients from The Cancer Genome Atlas we revealed that increased transcriptional complexity indeed associates with poor survival in this disease.