%META:TOPICINFO{author="AnnieSimpson" date="1186199871" format="1.1" version="1.1"}% %META:TOPICPARENT{name="DevelopmentDocsGISIN"}%
1. Introduction
2. GIS Web Site
3. Registry
4. Provider Toolkit
5. Consumer Toolkit
6. Protocol
Appendix A - Issues
This document describes the requirements of the Global Invasive Species System (GISIN). The goals of this system are to allow users of the world-wide-web to access the large amount of data that is available on invasive species in a manner that is much easier than today. Currently, accessing data on invasive species consists primarily of searching through Google, searching scientific literature to discover individuals who have data, or simple word-of-mouth. After finding the data it can be a time-consuming task to filter through the data to find exactly what one is looking for and then convert it into a desired format.
The requirements are being set based on feedback from the user community. Features are musts unless followed by 'Need' or 'Want. Needs and wants indicate features that will not be implemented unless time allows. Please see the associated definitions and introduction documents for background information.
In general, the system must:
GISIN will support providers that have data in a relational or flat-file database as long as the database contains filtering capability (e.g. a “WHERE” clause in SQL or the equivalent).
GISIN will not support providers with data in text files, HTML files, or non-digital media. These providers should examine the GISIN web registry for a Data Common they can add their data to.
The system must allow the creation of:
To provide the described herein, the system will include:
These components are detailed out in following sections.
The system operates in a client-server, request-response model. A client (a consumer) makes a request for information to a server (a provider). The server then returns a response based on the parameters within the request. The server may also return an error if appropriate. The type of data in the response may vary based on the type of request. Responses will typically be in Extensible Markup Language (XML) but may also include images in the future.
The following is a summary set of use cases that while they must be supported by the system and its communication protocol. These use cases represent only a small subset of the systems functionality. The remainder of the requirements document defines the additional features.
Example: Which species are invasive to New Zealand?
Example: In which countries is Tamarix considered invasive?
Example: Get the checklists for the genus Tamarix worldwide
Examples: Get the list of URLs that contain profile information on Tamarix
Examples: Get the occurrence locations for Tamarix in the United States
The following types of data will be shared in the first version of the system. Please see the definitions document for more details on these types.
Checklists are the highest priority for sharing followed by profile URLs, occurrences and then profiles.
The following data types are expected to be added in the near future:
The following data types are of interest but are not critical:
Able to complete the following search within 1 second:
Able to complete the following search within 1 minute:
The users of GISIN fall into two large categories, end-users and implementers. The end-users are individuals using a web browser to find information on invasive species. Implementers are the people who are building or associated with the builders of the system including the members of GISIN.
The end-users of GISIN will be as varied a group of computer users as can be imagined. Anyone interested in invasive species from young school children to experienced scientists and resource managers to politicians may be assessing the system. This broad a user base will not have the technical expertise to understand the limitations of the available technology and will expect it to perform as other available web sites do. Examples of these web sites include Google, Yahoo, Wikipedia, and a large number of commercial web sites. This implies that the system must be very easy to use, flexible, and have high-performance with very large data sets.
The implementers have been surveyed and were found to have a wide variety of needs and limited time to spend on technical issues (add link to survey results). This means the system must be as easy to implement as possible, well documented, and easy to monitor for quality and performance problems.
The different types of implementers (providers, consumers, data commons, and portals) will also have a variety of needs. Providers will have a variety of technical abilities and existing systems. The quantity and variety of data will vary from small data sets on one species to data sets with millions of entries for a large number of species. Consumers will include implementers caching data, producing maps, and summarizing data. This will require the system to perform as fast as possible and have flexibility in obtaining data. Data commons are a special data provider that will contain large quantities of data from a variety of end-users. Portals integrate multiple providers to aid users in searching across multiple providers. Requirements from various types of implementers are reflected in the remainder of this document.
For all providers and especially data commons, it is a must that the system maintains metadata on the source of the original data. Below are some examples of different types of users that the system is required to support.
Examples:
Examples:
The following are just a few examples of invasive species databases that desire to be online. See GISIN for the complete list.
Below are the target dates for development of the system:
GISIN is largely a grass-roots organization coordinated by GISIN and reliant upon the participation of a large number of individuals from a diverse group of organizations from across the world. The creation of the individual components is largely the responsibility of the organizations hosting the web servers.
Organizations have gracefully provided support for individuals to attend meeting and review documents. Below is the only funding available specifically for GISIN development. The most significant problem is that there is no funding available for support and updates.
The following roles need to be identified to complete and support the system:
End-users will need to have a quality experience for GISIN to be successful. This means the web sites must be accessible, response times quick, results understandable, and data accurate.
Transmission stability is effected by the Internet and the providers hardware, software, and database. Complexity and size of the transfers will also effect stability. While the system should be as standard as any other web service based system, we are setting the following criteria.
Documentation must be available on the world-wide-web and readable by individuals with appropriate background in web services. The documentation will initially only be available in English.
Below are the target tolerances for data within the system:
At introduction:
Within 5 years of release:
A web site will be available with end-user and technical documentation and access to the registry. The web site will also contain a showcase for products created using the GISIN and tools for managing the system.
End-user documentation will include; an introduction to the GISIN, how to use the registry, and how to use the portal to find information on invasive species.
Technical documentation will include how to obtain and install the toolkits and specifications for the protocol. The documentation for the protocol and schema must be freely available and be very easy to create providers from. It should also be easy to create consumers and portals.
The web site will contain a registry with requirements in section 3.
TBD
The manage tools will be available through password protected section of the web site and will include:
The registry will contain a list of providers, consumers, and portals with URLs for their web sites. For providers it will also include which types of data they contain and statistics on the number of species and areas of interest. The registry will follow an approach similar to DiGIR where were are organizations and then within each organization there can be multiple data sets.
The registry must have a data sharing agreement and track who has agreed to it.
The registry will include the following fields for each organization:
For each web service within each organization we will have:
Below is a list of the features that are required for each of the database tables mentioned above.
Search/Browse for web services by:
The provider toolkit will be available on a set of mirrored servers on the web and will make it easy for most providers to add their data to the system. The toolkit will contain:
The documentation will only be available in English initially.
The toolkit has the following general requirements:
The bulk of the remainder of this section documents the characteristics of the systems that must be supported to allow our providers to implement the protocol.
The toolkit will support the following operating systems:
The toolkit will support the following web servers:
The toolkit will be supported on the following web development frameworks:
The toolkit will be supported on the following programming languages:
DiGIR is the most pervasive of the biological data exchange standards. The GISIN toolkit could be thought of as standing on the shoulders of the DiGIR toolkit and taking the next step in ease of use for implementers. This includes:
The provider toolkit will need to allow providers the following scope:
Can have the following limitations:
A consumer Toolkit may be needed to make it easy to access information in the system, make requests, and parse responses. It is yet to be determined whether it is required.
This section provides the requirements for the protocol to communicate data on invasive species between computers. The protocol must provide the requirements appropriate from the material above and the additional requirements in this section.
Protocol will have the following general requirements:
The last two items pretty much force the protocol to operate using HTTP through port 80. This is the only method of communication that is typically available for web serves as most other ports are dedicated or blocked by firewalls.
As a global system, GISIN must allow users to provide and obtain textural information in various languages. However, most providers will not have information available in multiple forms. To allow providers to operate in their own language and to allow consumers to ingest and then provide translated versions of text, the following strategy will be used.
This issue only applies to language specific transfers which are discouraged in favor of “coded” transfers.
Taxa will be identified by standard Scientific Name (i.e. Kingdom, Genus, Species, Subspecies, Variety). Date and author may be added for specific taxa concepts.
The protocol will not support requesting taxa by common name.
Locations can be provided either by coordinates (points, polygons, and bounding boxes) or by textural “names”. Coordinates will be in geographic coordinates in the WGS84 or HARN datums. Names will be ISO where available, other standards when available. If a language-specific name is used it will be transferred with its language. If a name is specified to a location, it’s location must accompany it.
All parameters in requests that filter the data will be ANDed together. In the future an OR may be provided for certain parameters by concatenating them with commas. Data searches requiring Boolean OR operators can be executed with multiple requests or by requesting a more general set of data and then filtering the data to obtain just the desired data.
For each of the supported data types, the protocol must allow request to be made for:
Blocks of records can be requested given a start number and a number of rows or an entire set of available records.
For consumers to minimize updates to their databases the records can be filtered based on CreationDate and LastModifiedDate. Each record must contain a unique identifier to determine when a record has been updated and to prevent duplication.
Below are requirements within each data type.
Filter by:
Please reference the BioStatus spreadsheet (or the current protocol specification) for the latest information on the BioStatus fields.
Filter by:
Filter by:
Responses will be returned as XML unless images are requested. For images, the client will specify an image format for the response.
To show information within a portal each request should return metadata including a URL for additional information, a URL for a logo, and a human readable title.
The following sections document the tags that can be returned in a response for a given data type. This section documents which tags the protocol must allow for, which tags are required to be returned will be defined in the protocol specification.
Each element needs to contain a globally unique identifier (GUID).
These are from the TWDG meeting, are they still required?
All fields from filters plus:
All fields from filters plus:
A.1 How do we represent spatial accuracy
- Added accuracy to coordinates
A.2 How to represent taxonomic identification accuracy?
- ?
A.3 Should we provide a portal interface that gives lists of profiles by species?
- Requires a search mechanism that returns list of URLs
- Search by: Taxon, Fields available, Keywords
- Google-like search engine for profiles
A.4 Is GISIN providing a portal or just a list of available portals?
- A portal
A.5 How to get folks to add data when it effects trade?
A.6 Will we include taxon concept IDs?
A.7 Will we use UDDI for the registry?
A.8 We do not have the resources to provide a DiGIR like installation for all 3 major languages.
- My proposal is to provide examples in all 3 languages and, since PHP is the most common and most portable language, provide an easy install in PHP.
A.9 How do we obtain globally unique identifiers?
A.10 How do we monitor quality and resolve problems with data?