How can I choose my research

Böker / CC BY 4.0

Research data should be kept for up to 10 years as part of good scientific practice. More and more third-party funders also expect information on where the collected data is saved. However, it is neither technically nor economically possible to store all data that were collected during a research project. Due to increasingly scarce resources, representatives of infrastructure institutions have to choose which research data should be archived in which form. Research data repositories are also increasingly facing this dilemma. As a result, it is necessary for scientists to carry out a data assessment after the project is completed, which forms the basis for a decision about which data should or must be archived.

Such an objective data evaluation is based on the following criteria:

Which dates and for how long?

In general, the decision of what to keep depends on the priorities of the data creator; H. How valuable are the data for further use / re-use, taking into account the costs of preparing for long-term use.[1]

Is the data 'good' enough? In other words: is there enough information about the data such as B. a DMP up to date: What does this data describe, how and why was it recorded, how was the data processed. The quality of the data and its reusability is derived from this.

Selection of the appropriate data types for reuse

1. Primary data (data source): data that was originally collected or created

2. Compiled data sets: data that have been extracted or derived from your own or third-party data sources

3. Referenced data: data that has been processed from a subset of the primary data in order to further pursue the analysis or to draw conclusions from it.

However, this decision must also take legal, regulatory or political compliance issues into account. This mainly depends on whether and under what conditions the data can be publicly accessible or whether access has to be restricted.

Data retention guidelines

Step 1: What data must be kept?

  • Research data policy: Data retention is specified in the research data policy
  • Journal Policy: Article was submitted to a journal that requires data availability
  • Guidelines: Disciplinary ordinances (e.g. research protocol) or other provisions (funding guidelines) require storage
  • Legal or contractual reasons: data have commercial value or should be registered as a patent; Contractual terms or conditions require data to be retained
  • Personal data: data usage requires ethical approval, consent agreement or declaration of consent. Can data security be guaranteed by a security standard (e.g. ISO27001) and data protection by anonymizing the data?

Step 2: What purposes can the data fulfill beyond the actual research context? How relevant is the research data for possible reuse?

  • Verification: To enable others to follow the process that leads to published results and which may possibly reproduce or verify them
  • Further analysis: Increase the possibilities for further analyzes by z. B. new methods, integration with other sources for meta-analysis (new collaborations or third-party analyzes)
  • Community Resource Development: Publish a data resource with value to a well-known group of users, e.g. B. a reference data set, method test bench or domain database
  • Build Academic Reputation: Data that is discoverable has greater visibility, which can increase citation rates for published results
  • Other Publications: The publication of a data article will contribute to scientific communication and discussion about data management or reuse in your field
  • Learning & Teaching: embedding data in a learning / teaching or public engagement resource to improve its interactivity; Engage users in learning or participating in research
  • Private use: the data will be easier to find in the coming years in order to use other potential applications

Step 3: Which data should be kept?

  • Data quality: Adequate quality in terms of completeness, sample size, accuracy, validity, reliability, representativeness
  • Integration potential: Can the data match standardized terms / conditions in other research areas? B. geographical locations, time periods? Does the professional community recommend sharing the data?
  • Re-use potential: How likely is demand? Is the data in a format that does not require license fees or proprietary software / hardware to be reused, or is the proprietary software / hardware widespread?
  • Legal framework: Has the data been classified according to its sensitivity and free of data protection, contractual restrictions, license or copyright provisions that restrict public access and reuse?
  • Reputation: Is the data produced by a research group that has been rated highly for the originality, relevance and diligence of its previous research?
  • Attractiveness: Could the data be widely accepted e.g. B. by referring to a milestone finding, a meaningful new research process or an international policy and social issue?
  • Reproducibility: How difficult is it to reproduce this data? Difficult, expensive or even impossible (example: observations)?
  • Unique: is this the only and most complete copy of the data? At risk: is the data somewhere that cannot guarantee long-term storage?

Step 4: Which data and information are needed for reuse?

  • Other publications: Referenced data with additional documentation (metadata)
  • Learning & teaching: Samples of original data and compiled data including analysis steps
  • Verification: Referenced data including analysis steps
  • Further analysis: All original data including the software that was used to collect the data

Step 5: has the cost been weighed?

  • Preparation costs: costs incurred both during the research process and in preparation for archiving
  • Storage costs: Separate costs for storage and maintenance after the research period

Data selection for long-term storage / archiving

Data evaluation / selection at a glance

  1. Potential reuse - What goals could the data be used to achieve?
  2. Are there any conflicts of interest (policies or copyrights / data protection rights) that need to be considered?
  3. Which data could have long-term value and should therefore be kept?
  4. Weighing the costs - what data management costs have already been incurred and thus add to their value? Will there be more data and is it affordable to manage? Are there any additional funds to cover these costs?[2]

With the help of an evaluation list that summarizes the potential reuse purposes and the associated measures for data preparation for the purpose of data storage (or the justification for not keeping these), not only the costs incurred can be determined, but it can also be decided whether an external Advice is necessary, e.g. B. how budget deficits can / should be dealt with. Instructions can be found on the DCC website.

Individual evidence

  1. ↑ DCC (2014). 'Five steps to decide what data to keep: a checklist for appraising research data v.1'. Edinburgh: Digital Curation Center
  2. ↑ UK Data Archive (2015) Data management costing tool and checklist.

Further information



Share on Facebook on Twitter


Write e-mail