For a positive culture change in life science research
The essential requirements for data at EMBL are:
Open data sharing is at the heart of academic research and an essential prerequisite for transparency, reproducibility and trustworthiness of research results. Open data sharing is also in the interest of research funders, including societies via their taxes, to stimulate new research and development through data reevaluation and reuse. An important precondition to data sharing is good data management. There should be an uninterrupted and auditable chain of information from reagents to raw data to publication. It is the aim that anybody should be able to follow this chain, not just the people who have been involved in the work.
There is no single definition of how much data is enough to ensure experimental reproducibility; this will vary by experiment type and domain. These data guidelines are intended to provide practical guidance to EMBL researchers on how to achieve the objectives of the EMBL Open Science policy with respect to data management. They are organised as a step-by-step guide to taking the appropriate action at each stage, starting from the generation of data to its deposition in a trusted public repository.
Starting a new project
Establishing a chain of information long after the experiments have been concluded is lengthy and error-prone and experience shows that critical information is lost when people leave EMBL.
Therefore, before a new project is started, you should take some time to establish a Data Management Plan (DMP) that considers the lifecycle of your data, following this Data Management Checklist. The steps described below should then be followed from the beginning of your project as a practical implementation of your DMP. One of the goals of this implementation should be to capture your data and metadata in a machine readable form as early as possible when data is generated.
Writing a Data Management Plan
EMBL requires that all projects have a data management plan. This includes any work that is funded by grants, or is part of the PhD and postdoctoral projects, or is intended to support a scientific article or is part of a collaborative effort. As a default, EMBL researchers should use this EMBL Data Management Plan Template.
Please also check with your funder or project partners for possible additional requirements, ensuring EMBL’s data management requirements are satisfied.
Exploratory work is not expected to have a DMP. Nevertheless, the data has to be managed and should it become evident that the work will lead to something beyond exploring, well-managed data is a good starting point and ensures a strong foundation for future work, so please use the following Guidelines at the earliest stage if the work is leading somewhere.
To ensure the production of the best reusable open data (FAIR), you should adhere to best practices of data production and processing EARLY in experimental design, ideally at the moment of production, and NOT at the point of publishing a research article (see section on DMPs). This includes the following:
We expect that as data management services are developed and aligned with the Data Science Theme, better practices will be encouraged and easier to implement. There follows a selected list of tools which can already now support you in effectively registering, managing and documenting your data:
If you cannot use any of these tools, please find here some recommendations on how to manage your data effectively.
As stated in the policy, EMBL expects as a minimum all data behind research articles to be made public and adhere to the FAIR principles.
This guideline was written by R. Lueck, J. Klemeier, J. Marquez, J. McEntyre, U. Sarkans, J.-K. Hériché, A. Kreshuk for the EMBL Open Science Implementation Guidelines.