Rising to the Challenge of Improving Access to Our Data
By: Bruce Stein
We are particularly pleased to have the Colorado Natural Heritage Program facilitating this workshop because this NSF project is very much the outgrowth of a process initiated by the CNHP. In 2000 CNHP was the recipient of an NSF planning grant that funded a workshop to begin working towards the establishment of a decentralized and distributed internet data delivery system for the network of natural heritage programs.
That workshop was held at a time when NatureServe was just becoming independent of The Nature Conservancy, and its theme of working together as a network to provide improved access to data was very consistent with the strategic directions for the new organizations. Indeed, NatureServe is committed at the highest levels to improving access to its own data, and to finding ways of improving access to data held by its natural heritage member programs.
The way in which people relate to information has fundamentally changed with the emergence of the internet. Increasingly there is an expectation that information is easily available online. Indeed, studies show that immediate access trumps other considerations (e.g., quality) in the way that many people access information.
Our challenge then is to create a mechanism by which we can both empower our local data custodians, and deliver better data products to end users. Doing so will entail addressing both technological issues and sociological issues. In many ways, the technology issues are the more straightforward. However, taking advantage of web services and a distributed database architecture that ensures that control over data remains with data custodians should also help resolve the sociological issues.
Evolving International Context
The international biodiversity community has moved strongly over the past few years towards an emphasis on open access to data. Indeed, the emergence of an entire field of "biodiversity informatics is based on an increasing recognition of the value of biodiversity data for many purposes. A variety of national and international initiatives focusing on biodiversity informatics are developing, and NatureServe is actively engaged with most of these Among the most important of these are:
Global Biodiversity Information Facility: growing out of an OECD "megascience" forum, the GBIF is a worldwide effort to improve access to primary specimen and observational data. The GBIF memorandum of understanding calls for the free and open access to biodiversity data, although also emphasizes the rights of data providers. NatureServe is a member of GBIF, and will be posting a view of its data over the GBIF portal in the near future.
Taxonomic Databases Working Group (TDWG): The focus of TDWG has shifted over the past several years towards creating standards needed for interoperability among biodiversity data providers. In particular, this standards body has been developing XML schema designed to facilitate data sharing, and is serving as the de facto standards setting body for GBIF. NatureServe has recently re-engaged with TDWG as an institutional member, and is now co-convening an observational data working group.
Conservation Commons: A new effort being lead by IUCN and other members of the conservation community seeks to improve access to conservation-related data resources. Again, this effort encourages open access to data, but emphases respect for the rights of data owners and providers. NatureServe is a signatory to this new mechanism.
Current Situation within NatureServe
As part of NatureServe's strategic and business planning it has become very clear that there is a need to distinguish between data and value-added services. Increasingly the organization is moving towards creating open access to data resources, while shifting cost recovery and funding towards provision of value-added services. Already, almost all of the data resources that NatureServe itself owns and manages are freely and openly available through its web site and linked web products, NatureServe Explorer (www.natureserve.org/explorer) and InfoNatura (www.natureserve.org/infonatura).
NatureServe and its member programs have also made major progress in developing an aggregated element occurrence data set capable of meeting multi-state and multi-national data needs. The basis for sharing of this precise locality data has been the Data Sharing Agreements (DSA), which have recently been renegotiated with most programs, and which lay out the terms under which EO data can be pooled and provided for use to third parties. Although the first generation of DSAs took a least common denominator approach-that is, the only level of data openly available was the level that was acceptable to all programs-this second generation of DSA encourages programs to provide open access to the finest level that they are comfortable with.
Most new DSAs now provide permission to allow open access at a spatial resolution of the USGS topo quad. This is roughly comparable to the 10km grid that serves as the default open access resolution of the British "National Biodiversity Network."
National Science Foundation Grant
The current project was funded on its third submission to NSF, and reflects the importance that the research community places on gaining better access to the locality data developed and held by the NatureServe network. I should also emphasize the importance of the many letters of support from heritage programs we received, which indicated the commitment of network members in participating in this internet data delivery system.
A bottom line expectation for the NSF grant is that using this type of web services approach will provide greatly improved access to these data resources to the research community, and enable our data to contribute to the types of scientific discovery and understanding that NSF promotes. However, it is also our expectation that this improved access will significantly contribute to the application of these data resources to conservation and sustainable development goals.
Lori will be saying more about the details of the project, but at the highest level, it is our intent to create a technology framework that will provide an easy to use and seamless interface to the user to query against the detailed geospatial data help collectively by NatureServe and its natural heritage program members. Although our ultimately goal is to implement a system that draws this data together on the fly from distributed locations, as a first phase of the project, we will be creating an enterprise geodatabase against which queries will draw. An authentication/certification layer will allow differential levels of access to that data. The levels of levels of access will be defined by the data provider, thus enabling data providers to retain control over access rights even when the data is physically resident on the enterprise geodatabase.
NatureServe is involved in a number of other projects relating in some way to technology development or improved data access. This includes:
Moore Foundation Technology Grant: NatureServe recently received a $1.5 million grant from the Gordon and Betty Moore Foundation entitled "Aligning Biodiversity Software with User Needs." This grant will allow us to better understand the software products that currently exist and the needs and market for such products as a way of identifying where new opportunities for software development and support may exist.
EPA Environmental Information Exchange: A recent grant to the State of Delaware will enable NatureServe, together with Delaware, Illinois, and Washington to develop some web services approaches to connecting natural heritage data into the EPA's Environmental Information Exchange. To date that exchange network has focused more on data or a regulatory nature. As part of the NSF project, we hope to take advantage of some of the authentication and access security systems and approaches that have been developed by EPA.
High Level Project Assumptions
- By providing broader access to our data resources we will be able to dramatically increase the conservation impact of our work.
- Providing broader access to our data will enable NatureServe and its member programs to more fully integrate with and take advantage of other national and international biodiversity data efforts.
- Significant new revenue streams should be available to NatureServe and member programs with greater access to and appreciation for our data.
- The web-based data delivery system should reduce the time and cost to member programs of responding to data requests.
- By improving synchronization between NatureServe's central database and member program database we will be improving the currency and quality of our data.
- Local data custodians will continue to maintain control over terms of access to their data sets.
- Software and approaches developed through this project for serving data centrally will also be suitable for local web service applications.
- Implementing an online delivery system will likely require an evolution of our current data sharing arrangements.
This material is based upon work supported by the National Science Foundation under Grant No. 0345400. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.