Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration

d041308
| DOI: 10.5065/12ZJ-ZZ25
 
Abstract:

Weather and climate science is producing increasingly large, high-dimensional datasets from numerical simulations, Earth system models, and AI-based weather and climate models. Embedding-based representations can make these data searchable through similarity search and analog retrieval, but nearest neighbors in latent space are not automatically scientifically meaningful. Researchers need tools to inspect how embeddings organize meteorological data, compare representation models, develop retrieval strategies, and verify results against physical evidence. We present an open-source visual analytics workbench for inspectable, configurable, and scalable embedding-based search over weather and climate data. The system links embedding experiments to source data, metadata, spatial context, model configurations, and retrieval parameters, allowing users to explore latent spaces, construct global or localized queries, and inspect retrieved analogs through meteorological views. We demonstrate the workbench through tropical-cyclone retrieval using ERA5 derived embeddings and IBTrACS metadata, and evaluate its out-of-core retrieval backend to show that large embedding collections can be searched beyond in-memory limits on commodity workstation hardware.

Variables:
Tropical Cyclones
Data Types:
Model Simulation
Data Contributors:
UCAR/NCAR
National Center for Atmospheric Research, University Corporation for Atmospheric Research
Total Volume:
0.0 MB (Entire dataset) Volume details by dataset product
Data Formats:
Binary (see dataset documentation)
Metadata Record:
Data License:
Citation counts are compiled through information provided by publicly-accessible APIs according to the guidelines developed through the https://makedatacount.org/ project. If journals do not provide citation information to these publicly-accessible services, then this citation information will not be included in GDEX citation counts. Additionally citations that include dataset DOIs are the only types included in these counts, so legacy citations without DOIs, references found in publication acknowledgements, or references to a related publication that describes a dataset will not be included in these counts.