null

Observer News

GLOBE Observer Data Quality: Updating Quality Flags for a Fresh Assessment


About GLOBE and GLOBE Observer

Since its founding in 1995, the Global Learning and Observations to Benefit the Environment (GLOBE) Program has resulted in the collection of more than 230 million observations of the atmosphere, biosphere, hydrosphere, and pedosphere. This astonishing rate of data collection increased with the development of GLOBE Observer, the official app of the GLOBE Program, which allows anyone in GLOBE countries to submit observations of clouds, mosquito habitats, land cover, trees, and eclipse conditions.

GLOBE data is publicly available and has been leveraged in numerous scientific publications (such as this publication by Amos et al. from 2020).

The GLOBE database fits the parameters of “big data,” a large, complex set of data. Like most types of big data, it is often prudent to conduct quality checks on the data before using it. Using high-quality data as input is conducive to producing quality output for scientific analyses. Building off of Amos et al. (2020)’s work, NASA intern Jessica Mo helped generate and implement quality checks for data submitted through GLOBE Observer’s citizen science protocols. Read the below to learn what Jessica discovered through her analysis.

GLOBE Data: What It Contains –- Jessica Mo

In general, GLOBE Observer submissions include latitude, longitude, date and time. Observations also include observation-dependent geospatial data, such as information on snow, ice, or standing water, whether there are leaves on trees, and precipitation. Each protocol includes data unique to the protocol. For example, Clouds observations include information on cloud cover, sky color, and the cloud types present. Land Cover observations may include land cover classification. Tree Height observations include information on tree height as well as snow, ice, or standing water, whether there are leaves on trees, and precipitation. Finally, Mosquito Habitat Mapper observations include information on water sources, egg and larvae counts, and whether pupae and adults are present. All four observation types also ask observers to take photos of their surroundings.



Potential Data Concerns

As part of my internship with GLOBE and the GLOBE Observer team, I designed quality flags to help identify false or inaccurate observations. We had the following data concerns in mind:

  • Observations submitted as a test
  • Typos or misclicks in manually entered data
  • Measurements that are likely to be inaccurate
  • Fields omitted or entered with “dummy” data
  • Repeated data fields or photos


Flags

I implemented several new quality flags for GLOBE data. Flagging the data does not remove observations from the GLOBE dataset, or condemn the data as invalid or falsely reported; rather, flagging the data is a marker that the observation may warrant a closer look. Flagged observations can mark an interesting geospatial phenomenon, a data entry/reporting issue, or other concerns.

Trends in flagged observations also provide insights into the implications of the way that the GLOBE Observer app and online reporting form are designed. These insights may inform future design choices that ensure the robustness of GLOBE data reporting.

I implemented several flags, including:

  • Latitude and/or longitude of the observation are integers (i.e. 0.0000000)
  • Latitude and/or longitude of the observation exactly match
  • The observation may be over a lake
  • 8/10 or more cloud types are marked as present
  • Duplicate photos within the same observation

As part of the process of creating these quality flags, I also wrote documentation for future processors of GLOBE data.

Below are summary figures for quality flags on GLOBE data for May 2023.

Figure 1. Geographic distribution of GLOBE Observer observations in May 2023.

My analysis of GLOBE data from May 2023 uncovered several interesting patterns. GLOBE Observer submissions were clustered in the United States and Europe. In terms of absolute numbers, the most common potential error for all GLOBE Observer data is the location being over the ocean (LW in Figure 2). However, the percentages of observations flagged, the number of flagged observations compared to the total number of observations able to raise that flag, told another story. The most common potential error, relatively speaking, was an invalid count of mosquito larvae. However, the majority of the observations flagged for an invalid count of mosquito larvae were those in which the volunteer chose not to count larvae, since the step is optional. This means that the flag should be modified to ignore a blank in that field, as that is a valid response.



Table 1. Definition of Quality Flags

Flag Name Flag Definition
CI Cloud cover is invalid
CM Cloud cover is coded as missing
CT 8/10 cloud types or more are reported as present
CX Cloud cover attribute is missing
DF Date/time of measurement is in the future
DI Date/time of measurement is invalid
DO Date/time of measurement is before 1995
DX Date/time of measurement attribute is missing
DZ Date/time of measurement is at midnight UTC
EI Elevation is not valid (not a number)
EM Elevation is coded as missing
ER Elevation is outside of expected range (-300m to 6000m)
EX Elevation attribute is missing
LI Location is not a valid latitude/longitude pair
LL Location may be over a lake
LM Latitude and longitude exactly match
LW Location may be over ocean
LZ Latitude and/or longitude are integers
MI Mosquito larvae count is invalid
MR Mosquito larvae count outside of expected range (0 - 199)
NR Contrail count outside of expected range (0 - 19)
OC Obscuration reported but cloud types also reported
OP Spray reported possibly over land
OR More than two obscurations reported
OX Obscured cover reported but obscuration type missing
PD Duplicate photo within the same observation
TI Tree height is invalid (not a number)
TM Tree height is coded as missing
TR Tree height outside of expected range (0m - 199m)
TX Tree height attribute is missing


Figure 2. Absolute number of GLOBE Observer observations flagged in May 2023.


Figure 3. Percentage of GLOBE Observer data flagged for each quality category.


Future Directions

In the future, I would recommend trying different data sets to flag for observations over water bodies. In order to flag whether an observation was potentially taken over a body of saltwater or freshwater, we detect if the latitude and longitude of the observation fall within ocean or lake geocoordinates. However, due to climate change and the resulting sea level rise, ocean boundaries may shift. Similarly, due to climate change, the surface areas of lakes may shift over time. As a result, it may be helpful for researchers to be able to use different continent or lake datasets based on when their observations of interest were taken.

I would also like to speed up the processing time for detecting duplicate photos within and across observations. This will make data processing more efficient for future end users.



Conclusion

GLOBE is arguably the most comprehensive citizen science database for geospatial observations. It is thanks to the collective efforts of people across the world that a program like GLOBE is possible. Like most big data, it is prudent to conduct data quality checks when leveraging the data for scientific purposes. These quality checks may reveal interesting geospatial phenomena that can be corroborated with satellite data, erroneously reported data, etc.

The majority of GLOBE observations are not flagged. It is important to consider the application of any quality checks to any dataset. Ultimately, the usage of quality flags depends on the person using the data. Some researchers may choose to disregard a flag if it is irrelevant to their scientific inquiries. For example, a researcher may ignore the “observation may be over a lake” flag if they deem observations taken over lakes acceptable. On the other hand, another researcher may be particularly interested in observations with the “observation may be over a lake” flag if they are investigating the impact of freshwater bodies on cloud geophysics or changes in lake extent and land cover.

Flags provide helpful information that may not be immediately obvious. Given the sheer size of the GLOBE database, having a computational process to mark observations for specific traits is much faster than manually checking observations.

Overall, implementing these flags allows us to more easily detect potentially interesting and/or dubious data. The sheer volume of said data speaks to the power of citizen science. Conducting quality assurance and quality control measures on GLOBE data allows us to put the hard work of GLOBE Observers to valuable use.



About the author

Jessica Mo was a summer 2023 NASA intern on the GLOBE team. Jessica worked to increase the quality of GLOBE data used for NASA research efforts. She was mentored by Kristen Weaver, Deputy Coordinator for GLOBE Observer; Holli Kohl, GLOBE Observer Project Lead; and Agnes Conaty, Ph.D., Senior Research Scientist and Science Lead. Jessica would like to thank her mentors for the opportunity to contribute to global citizen science efforts and for their support.


Comments

News Sidebar

A graphic showing silhouettes of two people taking observations with their phones. They are standing between water with mosquito larvae in it, grass, trees, and clouds, which represent the tools within the GLOBE Observer app.

View more GLOBE Observer news here.