Storing and Sharing Data

Environmental Data Initiative Wants to Make Storing and Sharing Data Second Nature
by Adam Hinterthuer

Data scientists and computer programmers from across the country met to discuss building better software at an EDI “Hackathon.” Photo: Colin Smith
Data scientists and computer programmers from across the country met to discuss building better software at an EDI “Hackathon.” Photo: Colin Smith

Over the course of their career, a productive ecological scientist will publish dozens of scientific papers and pile up mountains of data to get those results. And all too often, says CFL senior scientist, Corinna Gries, all of that data is stuck on that single scientist’s computer – doomed to disappear when they retire. Everyday, countless datasets that could help us learn more about the world around us are simply lost.

To try to fix that problem, Gries is looking to the stars. Or, rather, to the people who study the stars.

“What we are focusing on is really the culture change of convincing people to publish their data and share their data, because in ecology we are still so far behind. There’s still a lot of data that people just aren’t willing to share,” says Gries. The hope, she says, is to be more like astronomers. “They are sort of the glowing goal of where everybody would like to be. They have data standards and they share. Nobody is even asked if they want to share [data] or not, it just is how it works.”

The importance of sharing and archiving data is two-fold. First, rather than using only their own limited datasets from a few study sites, scientists can ask and answer questions about the natural world on much longer timescales and at regional or even global scales. Second, as new technologies or new issues emerge, we will have historical data archived that can help answer questions we’re not even thinking about yet.

EDI logoAnd that’s where the Environmental Data Initiative (EDI) comes in. Founded in 2016, the EDI is a collaboration between University of Wisconsin-Madison and University of New Mexico Long Term Ecological Research (LTER) projects. Funded by the National Science Foundation (NSF), the LTER program is made up of more than two dozen research sites conducting long-term monitoring and research on different ecosystems across North America. Since the 1980s, the NSF has mandated that LTER sites manage and archive their data – resulting in a huge cache of information over the decades.

Researchers at the Wisconsin and New Mexico sites created EDI as a way to take data sharing and archiving mainstream in the ecological sciences. While it was initially a way to bring diverse LTER datasets together, Gries, who is a principal investigator (PI) for the project, says that they now work with researchers whether they are funded by the NSF or not.

“It is very, very important to support the ecological researcher in learning how to publish their data and help that process along,” she says. “So we have professional data scientists to support [them].”

Colin Smith is one of these scientists. Part of the EDI’s data curation team, Smith helps scientists submit their datasets to the EDI archives, writes data management software and leads trainings and workshop events. Smith says he’s seeing a lot of enthusiasm for the EDI.

“People are quickly becoming open to this idea,” he says. “It’s a new generation of scientists coming up that are used to doing synthesis science with other people’s data. They are also growing up in a culture where data is a valued research product and they’re going to get credit for it.”

People are also seeing the EDI as a way of having not only their published papers, but also their collected datasets outlive their careers, he says.

The EDI, Smith says, turns ecological datasets into a “living body of knowledge” that can be accessed and reexamined by future generations of scientists as they continue the important work of helping us “discover how the world works and how to adapt and live within it.”