GDEX Dataset Appraisal and Ingestion Workflow

Screenshot 2026-03-04 112010

This document explains the complete workflow that the Geoscience Data Exchange (GDEX) uses to appraise, accept, and ingest submitted datasets. It outlines how GDEX communicates with submitters, evaluates dataset suitability, manages data transfer and integrity checks, verifies metadata, and confirms curation levels, formats, rights, and ownership.


If a dataset is accepted, the dataset specialist works with the submitter to finalize metadata, citations, and the DOI before publishing the dataset.

If rejected, the submitter may receive guidance on alternative repositories, with the possibility of future re-evaluation.

Upon Submission

The GDEX Data Engineering and Curation Section (DECS) staff will be notified as soon as a dataset submission form has been submitted. If the GDEX DECS staff or manager has any questions regarding the dataset, the manager or a designated DECS representative will contact the dataset submitter through the Jira ticket submitted. The dataset submitter should respond or communicate all additional information/feedback to the GDEX via Jira.

All submissions are split into two groups, UCAR and non-UCAR staff. The submission system determines UCAR affiliation based on the submitter’s email domain (e.g., ucar.edu; Figure 1.1). GDEX staff will conduct a mandatory screening process for all non-UCAR dataset submissions to determine whether the dataset falls within the scope of the GDEX repository (Figure 1.2), as defined in the GDEX Terms and Conditions.

The GDEX welcomes submission requests for datasets that primarily support weather and climate research. The datasets will be considered on a case-by-case basis, constrained by available resources (personnel time and infrastructure capacity), relevance to existing data collections in the GDEX, and importance for the core weather and climate research sponsored by the NSF.

Upon Repository Determination

All submissions are automatically assigned to a repository based on dataset characteristics (e.g., data size, access to NCAR HPC; Figure 1.3). Datasets of larger size and/or requiring access to NCAR HPC are deposited in the GDEX repository, while all others are ingested into NSF/NCAR’s Zenodo Community Repository. The assignment and notification process typically takes about five minutes once a dataset has been approved by GDEX staff.

Upon Rejection

If a non-UCAR dataset submission is rejected for ingestion into GDEX, the GDEX DECS team will, whenever possible, provide recommendations for alternative archives or repositories. If a dataset is rejected for ingestion with GDEX and is not deposited in another archive or repository, GDEX may re-evaluate the dataset and update its previous decision, depending on the original reason for rejection. However, the GDEX DECS team cannot guarantee re-evaluation within a specific time frame.

Upon Acceptance

After the dataset has been accepted for ingest with the GDEX, a GDEX DECS dataset specialist will contact the dataset submitter and dataset contact provided on the Dataset Submission Form to complete the tasks below


Confirmation of rights/terms, conditions for use/collaboration, and ownership:

For non-UCAR submissions, the GDEX DECS dataset specialist will verify the submitted rights, terms of use, collaboration conditions, and ownership with the dataset submitter or contact. A Data Deposit Agreement (DDA) must be signed before dataset ingestion.

Submission of data files:

The method for transferring the data files, including any supplementary files that are relevant to the data files, could vary slightly depending on the file structure, format, and/or size of the data. The responsible DECS dataset specialist will work with the data submitter to determine the best method for data transfer. The data transfer workflow typically proceeds according to the following steps:

  • The data submitter is asked to host the data files to be transferred on a remote server and provide the DECS dataset specialist with a manifest that includes the complete list of files to be transferred and the MD5 checksum for each file. As there can be certain nuances depending on the type of systems and transfer protocol used, the data submitter will work with the DECS dataset specialist to determine the best structure for the manifest file, and the appropriate method to compute the MD5 checksums for that specific use case.
  • The DECS specialist will use either FTP, HTTP(s), or GridFTP as the mechanism to transfer the data files from the submitter's server to the DECS server according to the manifest details.
  • Once a data file has been transferred to the DECS server, the MD5 checksum is computed and validated against the submitter provided MD5 values to verify data integrity.
  • The manifest list is used to verify that the complete file set has been transferred from the submitter's server to the DECS server.

Please note: (1) File upload to GDEX is not available to dataset submitters; (2) To prevent potential data loss, submitters should retain a copy of their data files on their local server until the dataset creation process is complete; (3) During the submission process, the GDEX DECS team may request clarification regarding file naming conventions or directory structures to ensure proper dataset organization and ingestion.

Metadata record collaboration and verification for GDEX landing page:

  • Users submit the dataset submission form via the DATAHELP portal (accessible on the data submission webpage), and GDEX uses it as the basis for all communications with the data specialist. After receiving the repository determination letter, users complete a metadata collection form as requested by the dataset specialist. The GDEX DECS dataset specialist may also contact the dataset submitter or designated dataset contact to verify or collect any additional information.
  • If an existing metadata record or any additional descriptive documents have been created previously for the dataset, please inform the dataset specialist as this information may assist with the dataset metadata creation process.
  • To populate the dataset metadata, the dataset specialist must enter a minimum set of required metadata fields. The minimum set of required metadata fields has been selected to be able to reflect and map to metadata schemas that are commonly recognized and supported by the GDEX's scientific community. By doing so, the GDEX is well positioned to support long-term preservation.
    • Dataset collection level metadata is maintained in a native GDEX schema based on ISO representations (e.g. ISO 8601) and leverages Global Change Master Directory (GCMD) controlled vocabulary keywords.
    • Tools are provided to map the native GDEX metadata into community standards based schemas according to the relevant standard specifications, including: DataCite; GCMD Directory Interchange Format (DIF); Dublin Core; Federal Geographic Data Committee (FGDC); International Organization for Standardization (ISO) 19139 and ISO 19115-3; and JSON-LD Structured Data.
    • Please find an example of the available standard metadata schemas provided by the GDEX by reviewing the “Metadata Record” menu found at the bottom of an example dataset homepage.
    • Additionally, all of the listed metadata schemas plus the THREDDS schema, can be accessed through the GDEX Open Archive Initiatives Protocol for Metadata Harvesting (OAI-PMH) web service.
  • Dataset collection level content metadata, derived from “file level” metadata harvested during data file archival (See “About Data File Content Metadata” in the Dataset Maintenance Guide), are populated into the dataset collection metadata once files have been archived into the dataset collection. Content metadata is automatically updated as additional files are archived in a dataset collection over time. For an example of a summary metadata product derived from “file level” metadata, please see the “detailed metadata” summary found on an example dataset homepage.
  • Any changes to GDEX dataset metadata are tracked and preserved for provenance purposes as described in GDEX Dataset Change Management Strategies.

Collaboration and verification of data file format compliance, completeness, and clarity:

  • The GDEX DECS dataset specialist will scan all dataset files with the GDEX's gatherxml tool to assess adherence to the agreed upon format specification and file completeness. In addition to validating adherence to data format and convention, the DS will run random checks on the data files by plotting sample fields to make sure the data values are physically reasonable.
  • If issues are discovered with data file format adherence, completeness of data files, or the data values themselves, the dataset specialist will iteratively work with the data submitter to fix the data issues before the full dataset will be archived.

Collaboration on data curation and related transformations or restructuring:

  • The GDEX DECS dataset specialist will work with the data submitter to agree upon the appropriate curation level to be used as described in Description of GDEX Dataset Collection Curation Levels. If any data restructuring or transformation is needed, it will be agreed upon during this step.
  • If applicable, data transformation workflow steps will be documented by the DECS dataset specialist, and provided under the documentation tab of the dataset collection. Additional details can be found in GDEX Dataset Change Management Strategies.

Creation of dataset citation:

  • GDEX supports transparent data sharing and recognition of data contribution through dataset citation. As such, citation is created for each of the GDEX's dataset and can be reconfigured to meet the following formats:
    • American Geophysical Union (AGU)
    • American Meteorological Society (AMS)
    • Copernicus Publications
    • DataCite
    • Federation of Earth Science Information Partners (ESIP)
    • Geoscience Data Journal
  • A GDEX DECS dataset specialist will work with the dataset submitter/dataset contact to confirm information required to construct the appropriate citation (e.g. authors, title, and affiliated institutions).
    • Please note that once the data files are ingested and the metadata record/dataset landing page have been completed, the GDEX DECS dataset specialist will confirm the dataset information again with the dataset submitter/dataset contact before registering the dataset to acquire the official digital object identifier (DOI). This DOI will be included as part of the dataset citation.