flowchart_gdex

Upon submission:
The GDEX DECS manager will be notified as soon as a Dataset Submission Form has been submitted.  If the GDEX DECS manager has any questions regarding the dataset, the manager or a designated DECS representative will contact the dataset submitter via email and using the email address that is associated with the dataset submitter's registered GDEX account.  The dataset submitter should respond or communicate all additional information/feedback to the GDEX via email.  The emails exchanged might be added to the dataset information in order to build and record the provenance for the dataset history.

Upon acceptance:
After the dataset has been accepted for ingest with the GDEX, a GDEX DECS dataset specialist (DS) will contact the dataset submitter and dataset contact provided on the Dataset Submission Form to complete the following tasks:

  • Submission of the actual data files:
    • The method for transferring the data files, including any supplementary files that are relevant to the data files, could vary slightly depending on the file structure, format, and/or size of the data.  The responsible DECS dataset specialist will work with the data submitter to determine the best method for data transfer. The data transfer workflow typically proceeds according to the following steps:
      • The data submitter is asked to host the data files to be transferred on a remote server and provide the DECS dataset specialist with a manifest that includes the complete list of files to be transferred and the MD5 checksum for each file. As there can be certain nuances depending on the type of systems and transfer protocol used, the data submitter will work with the DECS dataset specialist to determine the best structure for the manifest file, and the appropriate method to compute the MD5 checksums for that specific use case.
      • The DECS specialist will use either FTP, HTTP(s), or GridFTP as the mechanism to transfer the data files from the submitter's server to the DECS server according to the manifest details.
      • Once a data file has been transferred to the DECS server, the MD5 checksum is computed and validated against the submitter provided MD5 values to verify data integrity.
      • The manifest list is used to verify that the complete file set has been transferred from the submitter's server to the DECS server.
      • File upload to the GDEX is not an option for data submitters.
    • Dataset submitter should maintain a copy of the data files on their local server until the dataset creation process is complete to avoid the chance of data loss.
    • GDEX DECS team might ask for clarification of the file naming conventions/structures.
  • Collaboration, verification, and confirmation of metadata record, including the information that will be used for the dataset's landing page under the GDEX website:
    • The Dataset Submission Form will be used as the basis for creating the metadata record within GDEX.  However, the GDEX DECS dataset specialist (DS) might also contact the dataset submitter/dataset contact to confirm additional information.
    • If an existing metadata record or any additional descriptive documents have been created previously for the dataset, please inform the DS as this information my assist with the dataset metadata creation process.
    • To populate the dataset metadata, the DS must enter a minimum set of required metadata fields as highlighted in the “Metadata fields” section of the metadata manager tool. The minimum set of required metadata fields has been selected to be able to reflect and map to metadata schemas that are commonly recognized and supported by the GDEX's scientific community. By doing so, the GDEX is well positioned to support long-term preservation.
      • Dataset collection level metadata is maintained in a native GDEX schema based on ISO representations (e.g. ISO 8601) and leverages Global Change Master Directory (GCMD) controlled vocabulary keywords.
        • Tools are provided to map the native GDEX metadata into community standards based schemas according to the relevant standard specifications, including: DataCite; GCMD Directory Interchange Format (DIF); Dublin Core; Federal Geographic Data Committee (FGDC); International Organization for Standardization (ISO) 19139 and ISO 19115-3; and JSON-LD Structured Data.
        • Please find an example of the available standard metadata schemas provided by the GDEX by reviewing the “Metadata Record” menu found at the bottom of an example dataset homepage.
        • Additionally, all of the listed metadata schemas plus the THREDDS schema, can be accessed through the GDEX Open Archive Initiatives Protocol for Metadata Harvesting (OAI-PMH) web service.
    • Dataset collection level content metadata, derived from “file level” metadata harvested during data file archival (See “About Data File Content Metadata”), are populated into the dataset collection metadata once files have been archived into the dataset collection. Content metadata is automatically updated as additional files are archived in a dataset collection over time. For an example of a summary metadata product derived from “file level” metadata, please see the “detailed metadata” summary found on an example dataset homepage.
    • Any changes to GDEX dataset metadata are tracked and preserved for provenance purposes as described in GDEX Dataset Change Management Strategies.
  • Collaboration, verification, and confirmation of agreed upon data file format adherence, completeness and understandability:
    • The GDEX DECS dataset specialist (DS) will scan all dataset files with the GDEX's gatherxml tool to assess adherence to the agreed upon format specification and file completeness.  In addition to validating adherence to data format and convention, the DS will run random checks on the data files by plotting sample fields to make sure the data values are physically reasonable.
    • If issues are discovered with data file format adherence, completeness of data files, or the data values themselves, the DS will iteratively work with the data submitter to fix the data issues before the full dataset will be archived.
  • Collaboration on data curation level and any related data transformations or restructuring :
    • The GDEX DECS dataset specialist will work with the data submitter to agree upon the appropriate curation level to be used as described in Description of GDEX Dataset Collection Curation Levels. If any data restructuring or transformation is needed, it will be agreed upon during this step.
    • If applicable, data transformation workflow steps will be documented by the DECS dataset specialist, and provided under the documentation tab of the dataset collection. Additional details can be found in GDEX Dataset Change Management Strategies.
  • Creation of dataset citation:
    • GDEX supports transparent data sharing and recognition of data contribution through dataset citation.  As such, citation is created for each of the GDEX's dataset and can be reconfigured to meet the following formats:
      • American Geophysical Union (AGU)
      • American Meteorological Society (AMS)
      • DataCite
      • Copernicus Publications
      • Federation of Earth Science Information Partners (ESIP)
      • Geoscience Data Journal
    • A GDEX DECS dataset specialist will work with the dataset submitter/dataset contact to confirm information required to construct the appropriate citation (e.g. authors, title, and affiliated institutions).
      • Please note that once the data files are ingested and the metadata record/dataset landing page have been completed, the GDEX DECS dataset specialist will confirm the dataset information again with the dataset submitter/dataset contact before registering the dataset to acquire the official digital object identifier (DOI). This DOI will be included as part of the dataset citation.
  • Confirmation of rights/terms, conditions for use/collaboration, and ownership:
    • Before registering the dataset to acquire the official DOI, the GDEX DECS dataset specialist will also verify with the dataset submitter/dataset contact regarding the rights/terms, conditions for use/collaboration, and ownership information that was submitted via the Dataset Submission Form.
      • Any modifications should be discussed and confirmed at this time.
  • Release of public announcement of dataset:
    • Once the dataset has been ingested completely with the GDEX and the dataset's metadata record and the landing page have been finalized, the dataset will be announced publicly via GDEX social media. 
    • The dataset submitter/dataset contact is encouraged to collaborate with the GDEX DECS dataset specialist to create the dataset's public announcement.

Upon rejection:
If the dataset has been rejected for ingest with GDEX, the GDEX DECS team will ensure that submitted dataset information remains available for access by the Dataset Submission Form provider.  Additionally, whenever possible, the GDEX DECS team will assist in providing recommendations regarding alternative archive/repositories for depositing the dataset. 

Additional details can be found in the Research Data Archive Dataset Ingest to Dissemination Workflow Overview.

Frequently Asked Questions:

  • If my dataset has been rejected for ingest with the GDEX and I do not deposit the dataset with another archive/repository, will the GDEX consider my dataset at a later time?
    • Depending on the original reason for not accepting the dataset, it is possible for the GDEX to re-evaluate the dataset and update its previous decision.  However, the GDEX DECS team cannot guarantee re-evaluation within a specific time frame at this time.