NatureServe Logo

Internet Data Delivery Workshop

Leadership Conference 2004
"Charting a Path Forward"
November 17, 2004

National Science Foundation

Summary


Key Take-home Messages

Key take-home messages from the NatureServe IDD Workshop include:

  1. The biggest issues surrounding development of a distributed data delivery system are social in nature.
  2. The distinction between web services and traditional web environment needs to be clarified to provide better understanding for member programs and other partners that most concerns can be addressed technologically.  The development and documentation of mis-use cases that address data providers' fears would further that understanding.
  3. Discussions revealed a general validation for distribution of quad-level resolution data at the default public access level; although more discussion around which specific attributes would be delivered at each scale of spatial data is needed.
  4. Discussions indicated a general level of comfort relative to providing data to academic researchers; however, access would ultimately be dependent on the user and reason for the data request.
  5. Discussions confirmed that revenue replacement is an important issue for member programs.  Additionally, Federal government clients need predictability in budgets, so subscription rates that provide a rational, stable cost for data license and access for a given year need to be established, rather than charging per use.
  6. Member programs and other partners are definitely concerned about security and FOIA requests, and therefore, some users don't want physical possession of the data, but are interested in services surrounding the data in addition to the data itself.
  7. In general, there was validation that due to the wide range in policies of the data providers, the user registration process will likely be more manual than automated.  NatureServe should try to follow the NBN example (see Appendix C) of leaving that responsibility with the data providers and have NatureServe develop the security structure and provide administration tools that allow the data providers to establish and maintain users' access to their data.

Summary of Focus Group Discussion Topics

Requirements/comments on Criteria for User Authorization Requirements:

  • Information required to authenticate a data requestor (person or software application) would, at a minimum, include 7 key questions:
    • Identity of requestor?
    • Organization affiliation?
    • Purpose of data request?
    • Who will use the data?
    • Geographic extent of request?
    • How long is access to data needed?
    • What data format is desired?
  • Define NatureServe's policy on privacy of user data that is collected and communicate that to the users
  • Define NatureServe's policy on privacy of user data that is collected and communicate that to the users
  • Keep privacy requirements of the user in mind and define under what circumstances this information needs to be revealed
  • Balance the need to keep registration process simple, so as not to discourage or deter users, with the need to collect appropriate information to determine authentication and to provide feedback statistics to NatureServe and member programs on how data are being used
  • Define under what conditions central authentication will not work, i.e., be passed back to relevant member programs
  • Set up a standardized process by which the requestor is presented with information on who to contact if the result of the authentication process is denial of access
  • Define what constitutes a legitimate request or use of the data - local FOIA or open record act laws are very different among member programs
  • Develop "mis-use" cases to illustrate that issues of concern on the part of member programs and partners surrounding user authorization, privacy, and security can be addressed technologically

Comments:

  • It will be important for member programs to maintain good relationships with local agencies so that these agencies may be part of the process for developing IDD

Requirements/comments on the proposed Data Access Matrix

Requirements:

  • Add additional user type, Data Contributor, to the data access matrix
  • Eliminate the "None" security category because it is virtually the same as "Register Only"
  • Incorporate functionality into matrix addition to user type, security level, and scale/data resolution
  • Consider the implications of splitting out user types into subgroups due to the potential effect on the reputation of the NatureServe network for providing data in an unbiased way
    • Does NatureServe and/or member programs have the right (legally) to withhold data from certain users based on risk? The consenus was that NatureServe can, but some Member Programs cannot.  This may translate into an advantage for data to be served through NatureServe rather than via member programs.
  • Communicate initial findings from this workgroup to member programs in order to redesign an appropriate access matrix to include functionality or type of access at each level, e.g., read only, printable, downloadable, editable (see alternative access_functionality matrix)

Requirements/comments on General IDD Site Functionality

Requirements:

  • Perform a network-wide analysis of all web-based systems and determine which functionalities, authentication procedures, etc. are preferable and working well
  • Define the relationship between the proposed IDD system and local web-based systems
  • Site should be focused on spatial capabilities
  • Define a minimum set of attributes to be provided in data sets
  • Need to ensure that the system is set up so that a user is provided access only to the scale of data that is appropriate for the intended use
    • A "hit or miss" functionality similar to that provided on the Pennsylvania website may be appropriate for some users.  Some users would not know how to properly use precise data.
  • There was general consensus that access to precise data would need to be authenticated manually and then provided over the web for a period of time, with the understanding (and advantage) that users would gain access to refreshed data throughout the time of the subscription
  • Member programs expressed a desire to restrict access to precise data to federal agencies to those which occur within their boundaries; data for areas outside federal land boundaries would be fuzzed/generalized at some scale
    • Federal agencies recommended that they might be granted access to precise data through a 'hit or miss' system, whereby a user designates area and system responds with whether or not there is an element of concern within that specified area.
    • Group discussed the issue of IDD system having appropriate boundaries available in order to provide agencies with precise data for their lands.  It was noted that accurate and complete national boundary layers are difficult to obtain.
  • Define how IDD system will handle single jurisdictional vs. multi-jurisdictional requests
    • All participants present agreed that the IDD system should have the functionality to handle single jurisdictional requests (one state or area within a state) for member programs that are comfortable with their data being provided by NatureServe
    • Users would need to be notified about what different levels of access are available from different states without further permission before proceeding with a multi-jurisdictional request
  • Federal agencies expressed concern that some requests could become too complex if the system does not provide access to same level of data in every state
  • Member programs emphasized that retaining ownership of the data is very important

Comments:

  • Most programs seem comfortable with data generalized to USGS 7.5min quad level being available to the public
  • Member programs expressed concern regarding data currency and mis-interpretation of data
  • Federal agencies are interested in habitat data in addition to Element Occurrence data

Requirements/comments on Protecting Sensitive Data and FOIA Issues

Requirements:

  • Define what will be considered sensitive data types, e.g. specific elements, individual EOs, precision, private landowner data, data contributor permissions
  • Consider that some land owners would be more restrictive on how data can be used compared to other data in the state/province - therefore, the system would have to accommodate differences in data provision scales within a state/province
  • Utilize GIS functionality to protect sensitive data - i.e., restrict zoom levels for general public users Comments:
  • Many federal agencies prefer and find it advantageous access data on-line to avoid FOIA requests and other liability issues with downloading data onto agency computers
  • One government agency representative indicated that he arranged pre-confidentiality agreements with all NHPs in his region that have not been challenged to date

Requirements/comments on Sustainability Models

Requirements:

  • NSF has expectation that data will be provided for free to academic researchers
  • Investigate use of data subscription fees (current internet data model) for user types other than academic researchers
  • Explore ways to deliver some or all data for free and charge for value-added services, i.e., custom data products, help desk interpretation, environmental review, etc.
  • Estimates of revenue replacement needs were between $30-100k/year/NHP or CDC

Comments:

  • Data are valuable and many member programs currently charge for-profit users for data
  • NatureServe feels that by making data available for free, it would increase the NatureServe network profile and funding opportunities (i.e., Grateful Dead model)
  • Member programs would like to be able to provide all data for free and some programs already provide data to state and/or federal agencies for free (charge for processing the data, not the data itself)
  • However, charging for data attributes commercial value to the data may be advantageous in a FOIA challenge (In one instance, Arizona had to release data because they did not charge for it.  The ruling seemed to imply that if they had charged for data, then the data would have been protected under the commercial value exemption.)
  • There was general agreement that "for-profit", commercial users of data should be charged a fee (However, sometimes public agencies subcontract to private consultants, so this distinction can be problematic at times.)
  • Possible scenario: provide coarse-level (quad) data for free and charge a low subscription fee at precise scale; enroll subscribers as members of NatureServe and send them promotional material and donation requests
  • Possible scenario: provide data for free to agencies who financially support the development of the IDD service
  • One government agency representative stated that his agency's view is that they have a responsibility to provide support or pay for data products.  If NatureServe can implement and show the value of IDD, his agency would most likely be willing to pay subscription fees or ask for line-item appropriation to support access to data through IDD.

Workshop Follow-up

Concepts from the focus group discussions will be utilized to create "user stories" or scenarios to facilitate and prioritize future development of use cases which will in turn drive system requirements for internet data delivery (see Appendix D).

NatureServe has established a Technology Working Group - Internet Data Delivery Focus Team.  The focus team currently consists of representatives from five member programs who have developed on-line data delivery systems.  This team will meet intermittently for the duration of the NatureServe IDD Project.

  • California, Tom Lupo
  • Montana, Allan Cox
  • Oregon, Jimmy Kagan
  • Pennsylvania, Mike Bialousz
  • Wisconsin, Jeff Shaw

Full proceedings from the NatureServe IDD Workshop and associated resources will be posted to a website hosted by the Colorado Natural Heritage Program in January 2005.  Notice that the website is available for review will be disseminated to the network via NatureServe listservs.

Additionally, Melissa Landon will present the results of this IDD Workshop at a scientific meeting TBD by 31-DEC-2005.



This material is based upon work supported by the National Science Foundation under Grant No. 0345400.  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.