Skip to Main Content
NYMC Library Banner
Ask a Librarian

The NIH Data Management and Sharing Policy

A guide on the NIH's DMS policy effective January 25, 2023.

What is the new policy?

NIH Data Management and Sharing Policy
Effective January 25, 2023

Effective January 25, 2023, the NIH Data Management and Sharing (DMS) Policy applies to all research that meets these criteria:

  • Is funded or conducted in whole or in part by NIH
  • Results in the generation of scientific data

Previously, the NIH only required grants with $500,000 per year or more in direct costs to provide a brief explanation of how and when data resulting from the grant would be shared.

Beginning January 25, 2023ALL grant applications or renewals that generate Scientific Data must include a robust and detailed plan for managing and sharing data during the entire funded period. This includes information on data storage, access policies/procedures, preservation, metadata standards, distribution approaches, and more. You must provide this information in a data management and sharing plan (DMSP). The DMSP is similar to what other funders call a data management plan (DMP).

Policy Requirements:

Applicants will be required to submit a two-page data management and sharing plan and to comply with that plan.

  1. Submission of an official Data Management and Sharing (DMS) Plan as part of all applications for funding beginning January 25, 2023.
  2. Compliance with the DMS Plan approved by the funding NIH Institute, Center, or Office. The approved plan becomes a part of the terms and conditions of the grantCompliance will be monitored at regular reporting intervals by the funding NIH Institute, Center or Office and may factor into future funding decisions.

 

What Data Should Be Shared?

What is Scientific Data?

The NIH defines scientific data as: ​​​​​​data commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications.

Under the 2023 DMS policy, researchers are expected to share:

  • Adequate data to validate and replicate study findings
  • Data resulting from the study but not necessarily supporting a publication
  • Null findings that do not result in publication

The policy does not require researchers to share data per se but expects them to maximize their data sharing.

Justifiable ethical, legal, and technical factors for limiting sharing include:

  • Informed consent will not permit or limits the scope of sharing or use
  • Privacy or safety of research participants would be compromised and available protections are insufficient
  • Explicit federal, state, local, or Tribal law, regulation, or policy prohibits disclosure
  • Restrictions are imposed by existing or anticipated agreements with other parties

What does not need to be shared?

Scientific data does not include:

  • Laboratory notebooks
  • Preliminary analyses
  • Completed case report forms
  • Drafts of scientific papers
  • Plans for future research
  • Peer reviews
  • Communications with colleagues
  • Physical objects such as laboratory specimens

Elements of an NIH Data Management Plan

If you plan to generate scientific data, you must submit a Data Management and Sharing Plan to the funding NIH ICO as part of the Budget Justification section of your application for extramural awards. 

Your plan should be two pages or fewer and must include:

Data Type: Briefly describe the scientific data to be managed, preserved, and shared.

Related Tools, Software and/or Code: An indication of whether specialized tools are needed to access or manipulate shared scientific data to support replication or reuse, and name(s) of the needed tool(s) and software. If applicable, specify how needed tools can be accessed, (e.g., open source and freely available, generally available for a fee in the marketplace, available only from the research team) and, if known, whether such tools are likely to remain available for as long as the scientific data remain available.

Standards: An indication of what standards will be applied to the scientific data and associated metadata (i.e., data formats, data dictionaries, data identifiers, definitions, unique identifiers, and other data documentation). While many scientific fields have developed and adopted common data standards, others have not. In such cases, the Plan may indicate that no consensus data standards exist for the scientific data and metadata to be generated, preserved, and shared.

Data Preservation, Access, and Associated Timelines: Plans and timelines for data preservation and access, including: the name of the repository(ies) where scientific data and metadata arising from the project will be archived, how the scientific data will be findable and identifiable, when the scientific data will be made available to other users (i.e., the larger research community, institutions, and/or the broader public) and for how long. 

Access, Distribution, or Reuse Considerations: NIH expects that in drafting Plans, researchers maximize the appropriate sharing of scientific data generated from NIH-funded or conducted research, consistent with privacy, security, informed consent, and proprietary issues. Describe any applicable factors affecting subsequent access, distribution, or reuse of scientific data related to:

Oversight of Data Management and Sharing: Indicate how compliance with the Plan will be monitored and managed, frequency of oversight, and by whom (e.g., titles, roles).

See Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan for a detailed description of these Elements. For additional resources, refer to How to Get Started Writing a DMP

 

Data Types/Description

Include the following in your plan:

  1. Describe the data you are collecting
    • Type - what is it? How was it created?
    • Formats - in what format is it going to be preserved?
    • Amount - how much data? how large are the samples?
    • Level of aggregation - is it individual level data or aggregated?
  2. Describe the data you will be sharing
    • Specifically include rationale for any data that was generated that will not be shared for ethical, legal, or technical reasons
  3. Describe metadata and documentation which will be included to facilitate interpretation
    • Metadata is any documentation that will help others understand the data such as protocols, codebooks, data dictionaries, data collection instruments

WHY? Data without context loses its power and objectivity. By comprehensively describing your data, you are ensuring that the complete picture of your research is communicated, and that any derivative work resulting from your research remains academically honest. Additionally, by being asked to think about how your data will be generated, described, and structured before any data is collected, you are indicating a commitment to robust research and data practices. You will also be saving yourself time and effort, as well avoiding any headaches, by knowing exactly what data you are generating, where it is, and how to access and use it.

Details

  • Indicate where the data is being generated or pulled from. Are you collecting data from an instrument, survey, or electronic health record? Will you be aggregating multiple datasets together, or is your data the result of only one set of observations? 
  • Indicate what types of data are being generated. Is the result of your research a spreadsheet? Are you capturing images or video? 
  • Inficate which data will be shared, if any. Not all data generated as a result of your project need to be shared as part of you DMSP.The following types of data are not required to be shared per the NIH DMS Policy:
    • Data that are not necessary for or of sufficient quality to validate and replicate the research findings
    • Laboratory notebooks
    • Preliminary analyses that are not necessary for or of sufficient quality to validate and replicate the research findings
    • Completed case report forms
    • Drafts of scientific papers
    • Plans for future research
    • Peer reviews
    • Communications with colleagues
    • Physical objects, such as laboratory specimens

Level of Data Processing

  • Indicate the level of data processing/aggregation. There are several phases of data collection and analyzation, and you have some say over what, if anything, is made available. Below are some options when it comes to what you can share:
    • Raw: as collected from data source 
    • Processed: cleaned and organized for analysis, de-identified if applicable 
    • Summarized: data used to generate figures and tables 

Restrictions on Data

  • State any restrictions on data based on 'Protected Health Information' (PHI), IRB approval, data usage agreements or any other justifiable . If the data cannot be de-identified and still be usable, indicate that in this section. You will be able to go into more detail in later sections. 

Amount of Data

  • State the number of observations generated or used, even if using data from a public dataset. If the size or number of files in the dataset are significant, include that here. 

File Formats

  • When possible, choose open, non-proprietary formats. These formats will allow anyone to access and view your data, regardless of most software restrictions. Common preferred file formats are listed below, and an exhaustive list is maintained by The National Archives. There is always the possibility that your data will not be able to be made available in any of these formats, but all efforts should be made to find one that works for you. If not possible, indicate that here.
    • Images: .tiff, .png, .jpg, .bmp 
    • Text: .txt, .pdf 
    • Tabular data: .csv, .pdf 

File naming

  • Indicate the naming convention for the files you are sharing , or indicate where it can be found, in this section. Having a file naming convention not only makes is easier for others to find and use your data, but having a robust naming plan in place prior to conducting your research will help you stay organized and on-task. Below are some helpful tips when deciding how you will name your files:

    • Check for field-specific standards.  

    • For dates use: YYYYMMDD; for datetimes use YYYYMMDDThhmm (24 hour time) 

    • Do not include spaces; use ‘-’ or ‘_’ as separators if necessary 

    • Use versioning; file_v1.csv or file_v01.csv 

    • Include README file (see below) to explain naming conventions and any abbreviations 

    • Example: 20220922_NHDS_export_v01.csv 

Documentation

  • All good documentation begins with a README file. In general, this is detailed listing of data formats, structures, and naming conventions. You will want to indicate in this section whether or not you will be including a README file, and doing so can save you some time and effort when constructing your DMSP. Cornell University has a fantastic guide on how to construct a quality README here and Arizona provides a nice template here, but in general you will want to include the following at a minimum: 
    • Contact information 
    • File structure, including naming conventions and versioning nomenclature
    • File formats for each data type 
    • Codes (if applicable)
  • If applicable, any data collection instruments, such as surveys or extraction tools, or review protocols should be indicated in this section. 

​​​​

Related Tools, Software and/or Code

Include the following in your plan:

Indicate whether specialized tools are needed to access or manipulate shared scientific data to support replication or reuse, and name(s) of the needed tool(s) and software. If applicable, specify how needed tools can be accessed. 

In order to ensure your data can be used in the future, either by you or another researcher, it is important to excplicitly list any and all research tools, whether widely available or custom-built, that were used during data collection and analysis. In an ideal scenario, everything should be listed in this section that would allow a user to take your data and reproduce your results following the same general workflow. Obviously this is not always feasible, but there should be an attempt made to make your data analysis as reproducable as possible.

Data Tools

  • Indicate any tools or software necessary to access or manipulate data. This should include the following information, if applicable:
    • Which statistical package or program was used to manipulate the data, along with the version of the software that was used and any packages, scripts, or settings that were used or developed during the course of the study, as well as how users can access the software
    • Whether there were any custom workflows or pipelines developed as part of the study necessary to analyze or process the data, and how 
    • Whether there were any executable programs or macros written as part of the study necessary to analyze or process the data, as well as how users can access the code

Standards

Include the following in your plan:

Describe the standards, if any, will be applied to the scientific data and associated metadata (i.e., data formats, data dictionaries, data identifiers, definitions, unique identifiers, and other data documentation).

WHY? One of the driving forces in enabling data sharing and reuse is interoperability, or how your dataset can be combined with other datasets to enhance discovery. In order for that to occur, similar data need to be described using similar metadata and using, if possible, similar data standards. Not every field uses standardized data formats yet, but every effort should be made to reconcile your data and/or metadata with known standards if possible.

Metadata standards describe at a high level how datasets will be structured and organized.  There are a number commonly used standards, available below:

Data standards describe in detail how the data itself will be captured and described. This is often field-specific, and your field might not gave an established data standard. If applicable, choose a data standard from the following lists: 

Data Preservation, Access, and Associated Timelines

Include the following in your plan:

Give plans and timelines for data preservation and access, including:

  • The name of the repository(ies) where scientific data and metadata arising from the project will be archived. See Selecting a Data Repository for information on selecting an appropriate repository.
  • How the scientific data will be findable and identifiable, i.e., via a persistent unique identifier or other standard indexing tools.
  • When the scientific data will be made available to other users and for how long. Identify any differences in timelines for different subsets of scientific data to be shared.

WHY? In order to ensure data can effectively be shared and reused, there needs to be a plan for when and how it will be shared. This section allows you to explain in detail the details regarding your plan to make your data available, if applicable. 

Data Repositories

  • A data repository is a type of large database built specifically to store data. It differs from other types of storage options in they are typically geared towards capturing more specific information during deposition necessary for data discovery and storage. Data repositories also typically allow varieties of access priveleges data owners can enforce on users attempting to access their data.
  • There are a large number of possible data repositories available for you to deposit your data into, and choosing one can come down to a variety of factors such as affiliation with a funding agency section, possible limitations related to your data, or simply personal preference. The NIH has compiled a list of desireable characteristics to consider when selecting a repository, and below are some additional options to look at, in order of decreasing recommendation of use for complying with the NIH Policy.
    • Option 1: Use accepted repositories in field. If not known, browse NIH Repositories for Sharing Scientific Data to find an applicable repository (NOTE: This should always be your first option. Most datasets affiliated with NIH funding should be able to be deposited in a data repository supported by their funding Institute or Center)
    • Option 2: Use re3data to check for potential field-specific repositories not funded by NIH
    • Option 3: Use an acceptable generalist repository. These are repositories that don't focus on any particular field of study, but are reputable, safe places to store your data. 
    • Option 4: If you can't find an appropriate repository using any of the other options, or if you will not be sharing your data due to data restrictions, you can archive your date in the WCM Institutional Data Repository for Research (WIDRR) (NOTE: This should only be done as a last resort if sharing your data. WIDRR is currently not set up to allow sharing to outside entities. As a result, if you can share your data, putting it in WIDRR does not make you compliant with the NIH DMS Policy)

Data Identifiers

  • Indicate whether persistent identifiers will be made available by the repository of choice. The most common identifier used to link to datasets is the DOI, or 'Digital Object Identifier'. Not every repository will generate a DOI for each dataset, but all repositories should generate some form of unique identifier. 

Data Availability Timeline

  • Indicate when data will initially be available, how long it will be available for, and any milestones that could trigger a data sharing event. The WCM Data Retention Policy dictates that data be made available within three years of closeout of project or upon publication, and that it is available for at least six years, with an additional six years if self-cited. The NIH dictates that data be made available at the end of the performance period or upon publication, so there is a good bit overlap between the two policies when it comes to when to start sharing. In summary:
    • Start sharing: three years from closeout of project, end of the grant performance period, or publication of funded research paper
    • Stop sharing: at least six years, plus an additional six years if self-cited

Data Requests

  • Indicate the process/workflow by which a user will access the data. For NIH or generalist repositories, data will typically be accessed or requested directly by the user, with minimal or no intervention necessary from you. Each data repository will handle this differently, however, so be sure to be inform yourself on the individual repository policies.

Access, Distribution, or Reuse Considerations

Include the following in your plan:

Describe any applicable factors affecting subsequent access, distribution, or reuse of scientific data related to:

  • Informed consent
  • Privacy and confidentiality protections consistent with applicable federal, Tribal, state, and local laws, regulations, and policies
  • Whether access to scientific data derived from humans will be controlled 
  • Any restrictions imposed by federal, Tribal, or state laws, regulations, or policies, or existing or anticipated agreements
  • Any other considerations that may limit the extent of data sharing. Any potential limitations on subsequent data use should be communicated to the individuals or entities (for example, data repository managers) that will preserve and share the scientific data. The NIH ICO will assess whether an applicant’s DMS plan appropriately considers and describes these factors. 

WHY? It is very important, the reason a DMSP is required, that you specify how you will share your data with non-group members after the project is completed. If the data is of a sensitive nature, privacy concerns, for instance, and public access is inappropriate, this is where that gets addressed. It is your opportunity to provide justification for not sharing or restricting access to your data. The NIH DMS Policy allows for limits on data sharing and reuse, but it needs to be explicitly specified in this section.

Restrictions to Data Access

  • Indicate any restrictions to data and provide rationale and justification for your decision. This can include any of the following:
    • Not sharing the data because it conflicts with the Institutional Review Board approval or informed consent process, both of which take precedent over the NIH DMS Policy
    • Restricting the sharing of data because the dataset contains protected health information (PHI) and the data can't be de-identified without losing scientific utility
    • The data being contigent on a 'Data Usage Agreement' which does not provision for data sharing

Oversight of Data Management and Sharing

Include the following in your plan:

Indicate how compliance with the DMSP will be monitored and managed.

WHY? In order to ensure that your research will be handled responsibly throughout the duration of the study and beyond, explain how the responsibilities regarding the management and sharing of your data will be delegated. This should include time allocations, project management of technical aspects, training requirements, and contributions of non-project staff, with names and titles of individuals named where possible. Remember that those responsible for long-term decisions about your data will likely be the custodians of the repository/archive you choose to store your data. While the costs associated with your research (and the results of your research) must be specified in the Budget Justification portion of the proposal, you may want to reiterate who will be responsible for funding the management of your data. Much of this information should also be present in your README, but this section allows you to provide more context to the reviewer at the time of your proposal.
 

Roles and Responsibilities 

  • Indicate who is responsible for which roles in managing your data and monitoring compliance with the DMSP. DataONE maintains a list of possible roles and responsibilities for an ideal DMSP here. However, every research group is different and might not need people for each role. At minimum, each group conducting NIH-funded research requiring a DMSP should have at least one person assigned to the following roles:
    • Data Collector: the person(s) responsible for either collecting the data or overseeing those who are collecting the data, or both
    • Data Analyzer: the person(s) conducting any data processing or statistical analysis
    • Project Manager: the person(s) responsible for overseeing and monitoring the entire study, in addition to monitoring compliance with the DMSP. 

 

Compliance

NIH will monitor compliance with Plans over the course of the funding period during regular reporting intervals (e.g., at the time of annual Research Performance Progress Reports (RPPRs)). Steps include:

  • Approved DMS Plan becomes a Term and Condition of Award
  • Grantee reports progress of approved DMS Plan in RPPR
  • NIH reviews compliance annually

Failure to comply with DMS Plans may result in the NIH ICO adding special Terms and Conditions of Award or terminating the award. If award recipients are not compliant with Plans at the end of the award, noncompliance may be factored into future funding decisions.

For contracts, noncompliance with the DMS Plan will be handled in accordance with the terms and conditions of the contract and applicable Federal Acquisition Regulation (FAR).