Original purpose and application
The Global Historical Climatology Network Daily (GHCNd) dataset is crucial for understanding and monitoring climate patterns and changes over time. Firstly, regarding Long-term Climate Monitoring, the GHCNd dataset provides a long-term record of daily climate observations from thousands of weather stations worldwide. This historical data allows researchers and scientists to analyze trends, variability, and changes in temperature, precipitation, and other climatic variables over decades or even centuries. Secondly, in terms of Climate Change Research, with concerns about climate change escalating, having access to comprehensive climate data is essential for researchers to study historical climate patterns and assess how they are changing over time. The GHCNd dataset serves as a foundational resource for climate change research, enabling scientists to evaluate the impacts of human activities on the climate system. Finally, regarding Weather Forecasting and Modeling, accurate and comprehensive climate data, such as that provided by the GHCNd, is crucial for improving weather forecasting models and climate simulations. These models rely on historical climate data to validate their accuracy and predict future climate scenarios.
History, standards, and format
Climate data collection began in 1861, and the updated variation was published on March 17th, 2024. It is still being updated each weekend.
The formal establishment of the GHCNd as a unified dataset occurred in the late 20th century. The National Climatic Data Center (NCDC), now part of the National Centers for Environmental Information (NCEI) within the National Oceanic and Atmospheric Administration (NOAA), played a significant role in consolidating and standardizing the collection of daily climate observations from weather stations globally.
The structure of the GHCNd dataset is designed to organize and standardize a wide range of climate variables collected from weather stations, including temperature, precipitation, wind speed, humidity, and atmospheric pressure. This structured format enables researchers to efficiently analyze and compare climate data across different locations and periods.
The GHCNd dataset is based on existing standards and formats developed by international organizations such as the World Meteorological Organization (WMO) and the Global Climate Observing System (GCOS). These standards ensure consistency and interoperability among different climate datasets collected and maintained by various agencies worldwide.
Organizational context
Although the data is checked through an algorithm, NCEI does acknowledge that the algorithm is not perfect and there may be false positives and negatives (1-2% of flagged data is flagged incorrectly). NCEI also notes that they do not make adjustments for historical biases in their system, such as changes in observing practices. The stations, which the GHCNd gets their data from but does not run themselves, were also not designed to meet standards for climate monitoring, but rather to meet the demands of agriculture, hydrology, weather forecasting, and aviation among other factors. On their page, NCEI encourages users to analyze the data and potential for systematic bias before using it.
Workflow
GHCNd is an integrated dataset, so it collects its data from different sources around the world. It gets data from land surface stations, totaling 100,000 from 180 countries. GHCNd receives the data daily and then runs it through a series of quality checks, the first being weeding out stations whose identity is unknown or not reliable. A station must be “identified with a name, latitude, and longitude”in order to be considered for the GHCNd. It must have a certain amount of data for at least one element of the GHCNd, and its data can’t be more than 50% identical to a different station. It logs new stations and separates them from older ones. It then mingles the data, which is done separately through automated processing every day. The data is “mingled”, done through a hierarchical system that tries to get as much data as possible without comparing or combining data from places with different characteristics. Stations that have received “the greatest amount of scrutiny” are used more and relied on more, while stations that are fully automated are weighed less. The data is also passed through a system that checks for obvious errors such as impossible or missing dates or invalid characters in certain fields. This process is done daily, and when the new data passes the screening system, it is added and uploaded to the GHCNd.
Exploratory Visualization/s of the Data
For this graphic, I use the data called Cooling Degree days, running the monthly cooling degree days through the end of the most recent month from 2018 to 2023. Each month is summed to produce a season-to-date total.
Emnt is the Extreme minimum temperature for Bedford Hanscom Fields. It shows the lowest temperature from 2018 to 2024 in terms of degrees Celsius.
Emxt is the Extreme maximum temperature for Bedford Hanscom Fields. It shows the highest temperature of the year from 2018 to 2024 in Fahrenheit.
By showing the same place with different data, it can elaborate on the changing and trend of the trend of the weather and climate.
Things to know about the data, including limitations
One limitation of this dataset is its lack of inclusion of all 195 countries recognized by the UN and two observer states (The Holy See and the State of Palestine). The GHCNd is considerably extensive though, as it includes many of these states and territories (180). Additionally, in their own words, GHCNd is "continually reprocessed," meaning changes and or additions can occur in the data archives. This means certain data value or quality is subject to change before or after data is scraped for quality sources. These changes are rare, though, and of course, it is good that GHCNd is always innovating.
Other Stories, Reports and Outputs from this data
How have others made use of this data? Provide hyperlinks to applications, stories, reports or other ways we can see the data in action.
Bulleted list of links (from NOAA's registry of Open Data from GHCNd on AWS)
Supplementary Information
A list of links that include any metadata (data dictionary, field values, or codebook) that is available. Provide a link to the original data set. Include any other links to help give additional context to this dataset or the organization that collects it.
Bulleted list of links
Authors of this Data User Guide
Bulleted list of links
National Centers for Environmental Information
Menne, M.J., I. Durre, R.S. Vose, B.E. Gleason, and T.G. Houston
Durre I., M. J. B.E. Gleason, T. G. Houston, and R. S. Vose
Durre, I., M.J. Menne, and R.S. Vose
Source Log
Include names & contact information for three sources that you interviewed about this data set. Note that your sources don't all have to be from the organization that collects the data. For example, if you are writing a user guide for Boston's spreadsheet of food inspections, you could interview restaurant owners who have had their businesses inspected by the city. So think creatively about who might have insight into different parts of the data.
Name | Phone number | Email address |
Matthew J. Menne | ||
Scott Stephens | ||
Imke Durre |
Comments