Data Security/Resilience Overview

Data Preservation

  • The Computational and Information Systems Lab’s (CISL) maintains strategies to ensure access to and availability of data holdings for three- to five-years as well as long-term. The strategies include activities shown below:
    • 24x7 monitoring of hardware and software for warnings and errors.
    • Support to create and maintain disaster recovery data copies as described here.
    • Proactive replacement of marginal tapes.
    • Data migrations from older media to new media.
    • Periodic replacement of older hardware.
  • A unique copy of each archived data file is maintained separately on disk and tape media storage. For datasets lacking a copy at a trusted alternative repository, such as the Copernicus Climate Data Store, a third copy is maintained in the disaster recovery tape pool and a fourth copy is distributed to geographically separated third-party providers. .
  • GDEX Web Server Disk directories that host datasets’ metadata are backed up to Quasar tape system on a daily basis and synchronized across geographically distributed kubernetes storage clusters hosted at both the NWSC and NCAR Mesa Lab (NCAR-ML) facility in Boulder, CO. This creates a geographically separated backup copy from the servers that are hosted at the NWSC.
  • GDEX Metadata Databases, which include repository inventory information and file level metadata, are replicated on PostgreSQL database containers running in geographically separated kubernetes clusters hosted at NWSC and NCAR-ML. Additionally, the full databases are backed up from the NWSC servers to the Quasar tape system described above.
  • Both databases are backed up to S3-compatible object storage for off-site backup and recovery.

Data Security

  • Checksums are computed on all archived data files; i.e. the files that are stored on the GDEX Dataset Collection Disk, and Quasar Tape backup systems. The checksum information is recorded in the GDEX Metadata Database and is used to verify file integrity. Historical checksums are verified during the following operations:
    • When files are restored from tape to disk for any operation.
    • Random checks of a sample disk and tape files on a weekly basis. If a mismatch in checksum is discovered on any of the stored files, a report is created and a staff member manually checks to verify that a second copy of the file on tape or disk matches the recorded checksum, and subsequently can be used to replace the corrupted copy. If needed, data may also be downloaded from an alternate trusted repository to replace a corrupted data file.
  • Data are protected from being overwritten by unauthorized parties through the use of Portable Operating System Interface (POSIX) file user and group settings on GDEX Dataset Collection Disk and Globus user permission settings on Quasar.
  • The GDEX requires its users to register in order to access GDEX datasets. Users are required to authenticate on CISL High-Performance Computing (HPC) systems or through the GDEX web server to get read access permission to all archived data files stored on GDEX Dataset Collection Disk. Non-GDEX staff users are only allowed read access permission in all cases. Authentication logs are monitored to detect repeated attempts to unsuccessfully authenticate.
  • Operating systems and related software are updated to the most recent versions on a monthly basis, or as needed more frequently to ensure that security patches are current.

Risk Management and Resiliency

  • CISL staff, including those that manage the GDEX, participate in institutionally led Enterprise Risk Management (ERM) and Risk Assessment exercises to identify and manage risks on a regular basis. Additional information on UCAR’s ERM office is provided at UCAR-Enterprise-Risk-Management. A description of UCAR’s ERM framework is provided at UCAR Enterprise Risk Management Framework.
  • Dataset metadata, documents and software are backed up to Quasar tape system on a daily basis and synchronized across geographically distributed kubernetes storage clusters hosted at both the NWSC and NCAR-ML. This creates a geographically separated backup copy from the servers that are hosted at the NWSC. Using this strategy, three independent copies of this information always exist.
    • One copy on NWSC hosted GDEX Web Server Disk
    • One copy on NCAR-ML hosted GDEX Web Server Disk
    • One copy on NWSC hosted Quasar Tape
    • One copy sent to a third-party, geographically distributed provider
  • Database replication across physical hosts ensures two physically separated nearly real-time copies of GDEX Metadata Databases exist at all times. Additionally, the full databases are backed up from the NWSC servers to the Quasar tape system described above. Using this strategy, three independent copies of this information always exist.
    • One copy on production NWSC hosted GDEX Metadata Database
    • One copy on replication NCAR-ML hosted hosted GDEX Metadata Database
    • One copy on NWSC hosted Quasar tape storage.
    • One copy in S-3 compatible object storage
  • Additional Backup Procedures:
    • Weekly volume snapshots are taken from the NCAR-ML database
    • All bin/log database logs are backed up once per week to Quasar tape storage.
  • An in-house developed machine learning tool is in place to detect and alert about anomalous file remove patterns.
  • Proactive relationships with colleagues, vendors, and industry are maintained to track technology trends.

Maintaining Storage Media

  • CISL monitors tapes for warning and error messages, and if a tape is detected to be marginal, data are copied from the tape.
  • Quasar tape media verification is enabled and monitored by CISL.
  • CISL upgrades to the latest tape technologies on a 3-5 year basis, and migrates data from older tapes to the new tapes.
    • Furthermore, the value of the GDEX is recognized, so that as a priority for NCAR, the GDEX datasets are stored on a specific set of tapes in order to make it easy to systematically migrate the tapes to new technology

Disaster Recovery

  • The reference copy for all GDEX’s archived data files is stored on disk and tape media storage at NWSC. For datasets that have no copy at a trusted alternative repository, a third copy is maintained in the disaster recovery tape pool and stored in a fireproof safe at NWSC, and a fourth copy is distributed to geographically separated third-party providers.
    • If either the disk or tape media storage experiences a major failure, the copy hosted at the on the unimpacted storage system can be used to repopulate the other system once it is restored. A detailed plan specifies the process to recover data lost on disk systems from the copies hosted on the Quasar tape system.
    • If there is a major disaster at the NWSC facility that disables both the disk or tape media data copies, irreplaceable data will be recovered from disaster recovery tape media hosted in the fireproof safe at NWSC, and from trusted alternate repositories, such as the Copernicus Climate Data Store, which are tracked in GDEX dataset metadata. GDEX dataset metadata would be restored from the backup copies hosted on NCAR-ML storage by CISL staff.
  • All metadata, databases, documentation, and software associated with the GDEX’s datasets can be restored from either the geographically separated replicated kubernetes copy or the Quasar tape copy by CISL staff when needed.
  • Support contracts are maintained with IBM that enables CISL HPCD to send tapes with unreadable data blocks to the vendor for recovery.

Support contracts are maintained with DDN that provide a high level of support for GDEX Data Collection Disk. The vendor can be called upon to repair disk related hardware failures and file system contents when needed.