wiki-archive/twiki/data/tmp/SDD/Charter.txt

157 lines
16 KiB
Plaintext

---+!! %TOPIC%
%META:TOPICINFO{author="BobMorris" date="1167427705" format="1.1" version="1.18"}%
%META:TOPICPARENT{name="WebHome"}%
---+ CHARTER for the "TDWG Structured Descriptive Data" group
*As of 22. Nov. we have been asked to revise the Charter to follow a new template. In the coming days, we will reorganize our charter into the following template:*
*1. Convener*
* Gregor Hagedorn, Biologische Bundesanstalt für Land und Forstwirtschaft, Königin-Luise-Str. 19, 14195 Berlin, Germany. Email: name [at] bba.de, replace "name" with "g.hagedorn"
* (The convener is the principal point of contact for group member or external people interested in collaboration. It is the person responsible for reporting to TDWG and the TDWG Executive Committee on the group’s activities.)
*2. Core members of the group are:*
* Gregor Hagedorn (BBA, Germany),
* Robert Morris (UMAss, Boston, USA),
* Kevin Thiele (University of Queensland, Australia),
* Bryan Heidorn (University of Illinois, USA)
* (Core members could explain to an outsider the purpose, justification and current activities of the group. Any core member could substitute for the Convener when required.)
* (Further contributors are acknowledged under [[#SddBackground]["Background"]], below.)
*3. Motivation*
<!-- Commented out:
* TEMPLATE QUESTION: Why is this group needed? What is its niche?
* TEMPLATE QUESTION: Motivation should differentiate the group's activities from other TDWG groups and from groups in other standards organizations.
* TEMPLATE QUESTION: Any historical reasoning should be placed in the 'History' field.
-->
In taxonomy, descriptive data takes a number of very different forms. Natural-language descriptions are semi-structured, semi-formalised descriptions of a taxon, or occasionally of an individual specimen. They may be simple, short and written in plain language, as when used for a popular field guide, or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment. Dichotomous keys are specialised identification tools comprising fragments of descriptive data arranged in couplets forming a branching tree. Each fragment (traditionally called a "lead") comprises a usually small natural-language description. Colloquially also included in this notion, but more properly called a "polytomous key", are trees in which there may be multiple leads per node. Coded descriptions comprise highly structured data used in computer identification and analysis programs such as Lucid (<a href="http://www.lucidcentral.org" rel="nofollow">www.lucidcentral.org</a>) , DELTA and a suite of phylogenetic analysis programs such as PAUP. Raw data descriptions (Box 1.2.4) usually comprise repeated measurements of parts of individual specimens, and are the basis from which the more abstracted descriptions in natural language and coded descriptions are derived. Few taxonomists consistently record and archive their raw data in a standardised format. The goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data in all the forms mentioned above, using a platform- and application-independent, international standard. Such a standard is crucial to enabling lossless exchange of data between existing and future software platforms, including identification, data-mining and analysis tools, and federated databases.
Scope of SDD: SDD documents may be used to express descriptions of biological taxa, specimens, and non-biological objects or classes.
* SDD documents may include all or some of the following:
* terminologies (e. g., characters and states, modifiers, or character trees with higher concepts)
* character ontologies (currently through character trees, but more fundamental ontologies are planned for the future)
* structured (coded) data, typically such as is found in databases or taxon-by-character matrices
* sample data (e. g., measurements)
* unstructured natural language data
* natural language data with markup
* dichotomous or polytomous keys
* resources associated with descriptions (e. g., images, references, links)
* SDD is currently not designed to accommodate:
* molecular sequence and other genetic data (although these may be considered in future versions)
* occurrence and specimen data and representations of these (e. g., distribution maps)
* complex ecological data such as models and ecological observations
* organism interaction data like host-parasite, plant-pollinator, predator-prey
* nomenclatural and formal systematic (rank) information
Some developers of SDD-compliant software treat organism interactions as character data, or use other, non-fully interoperable methods (compare TaxonHierarchyNotReferencedAnywhere).
*4. Goals Outputs and Outcomes*
<!-- Commented out:
* TEMPLATE QUESTION: Interest Groups should have a few annual goals.
* TEMPLATE QUESTION: Task groups require specific deliverables by nominated dates
* TEMPLATE QUESTION: Deliverables are outputs that can be readily used by people outside the group.
* TEMPLATE QUESTION: Meetings and discussions are not deliverables. Reports, software and standards are deliverables.
* TEMPLATE QUESTION: Each deliverable should be accompanied by a timescale and list of dependencies.
* TEMPLATE QUESTION: Outputs are hosted and supported by the parent Interest Group
-->
The goal of the group is to develop standard computer-based mechanisms for expressing and transferring descriptive information about biological organisms or taxa (as well as similar entities such as diseases), including terminologies, ontologies, descriptions, identification tools and associated resources.
A central goal of the Interest Group is to maintain a repository containing the current schema and its evolution history, sample documents, and selected open-source tools for the support of the use of SDD. This is currently a Subversion repository whose current location is always found on the SDD Wiki.
It will be reviewed semi-annually by the Core Group for its currency and relevance to the goals of the SDD standard.
The goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data, using a platform- and application-independent, international standard. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases.
* The SDD Standard:
* provides a flexible, platform-independent data structure for the capture and storage of taxonomic descriptions
* provides data structures for the support of multi-entry (interactive matrix-based keys) as well as authored polytomous identification keys (traditional keys)
* comprises a superset of data requirements of all known programs managing descriptive data
* provides extension beyond existing programs where data requirements can be predicted
* is readily extensible to account for future developments and data requirements
* is human-readable (although it is assumed that in almost all cases standard descriptions will be machine-generated and processed)
* is XML-based, and provides a schema for validation of documents and the use of schema compilers such as XML-beans for the production of schema-based SDD tool generation.
* It facilitates:
* lossless porting of data between standard-aware applications
* achievable progressive markup of legacy descriptions, particularly natural-language descriptions
* comparability and combinability of alternate descriptions of any one taxon
* efficient reusable descriptions serving multiple purposes
* archiving and sharing of raw and processed data
*5. Strategy*
The most effective strategy since 2001 has been found to be face-to-face [[MeetingMinutes][meetings]]. These meetings help to focus and to take uncertainties in the way to go, which can not be purely resolved by logical argument, into account. As funding allows, the Core Group will meet at least once annually between the annual TDWG meeting, and hold an open meeting at each TDWG meeting. Members of the Core Group will share their experiences implementing SDD-compliant software and as widely as possible promote open source software compliant to the standard.
The use of Wikis has been similarly found to be an effective strategy of documentation and the use of source code repositories an effective way to manage evolution of the formal representation of the standard.
The Core Group manages its development with a source code control system, presently the Subversion repository mentioned above. When a point of stability is reached, a Release Candidate is frozen. Development continues through a series of Release Candidates until a version is deemed acceptable for release. Releases are copied to the wiki for public comment and ultimate submission to TDWG for approval.
*6. Becoming Involved*
<!-- Commented out:
* TEMPLATE QUESTION: Required for Interest Groups and recommended for Task Groups.
* TEMPLATE QUESTION: How could an individual reading this help the group?
* TEMPLATE QUESTION: What skills are currently in need
* TEMPLATE QUESTION: Who should be contacted?
-->
All interested parties are particularly encouranged to comment on the SDD Schema and its ancillary supplementary documentation and sample documents. and to contribute further to these collections at the SDD wiki. Membership in the group is open to any interested parties, and anyone implementing SDD-compliant software becomes a de-facto member of the core group if they wish to actively participate in development of the schema and its document sets. SDD development and understanding rests on two skills: experience with the representation technologies involved and experience with taxonomic identification and description tools. Usually it is most productive if there is substantial experience in one and at least slight experience in the other of these areas.
The core SDD group is considering defining a subset, "SDD Lite" of the current schema, with the particular goal of producing a representation in RDF of the main concerns of SDD, in furtherance of TDWG's goal to have RDF representations of its major ontologically-related standards. Hence, the SDD Interest Group especially seeks people interested, and with suitable experience in, the use of Semantic Web technologies for describing taxa.
*7. History and context*
Versions:
* Current Standard: Version 1 (endorsed by TDWG October 2005)<br/>
* Most Recent Version: [[Version 1dot1][SDD Version 1.1]] (April 28, 2006)<br/>
* Working Draft Version: Version 1.2
#SddBackground History and background
TDWG endorsed the DELTA (Descriptive Language for Taxonomy) format as a standard for representation of taxonomic descriptions in the 1980's. The SDD subgroup was established 1998 as a subgroup of the Taxonomic Databases Working Group (TDWG, www. tdwg.org) of the International Union of Biological Sciences (IUBS), in response to recognition that a program-independent, non-proprietary standard based on current data interchange techniques was needed.
The subgroup has met many times since 1998, and conducted discussions by [[http://www.diversitycampus.net/Projects/TDWG-SDD//SDD-EmailList.html][email list]] and wiki pages. It has considered the needs of a wide variety of existing programs that manage, produce and consume biological descriptions, as well as incorporating new ideas that may be implemented in the future.
The SDD subgroup began discussing issues and scoping the standard through an email discussion group established in November 1999 (see the [[http://www.diversitycampus.net/Projects/TDWG-SDD//SDD-EmailList.html][SDD email list archives]]). This resulted in broad participation, but as a result of an extremely wide spectrum of expectations and approaches the discussion did not make substantial progress or convergence.
The most effective strategy since 2001 has been found to be face-to-face [[MeetingMinutes][meetings]]. These meetings help to focus and to take uncertainties in the way to go, which can not be purely resolved by logical argument, into account.
The major meetings so far were: Canberra, Nov. 2001; Sao Paulo, Oct. 2002; Paris, Feb. 2003; Lisbon, October 2003; Berlin, May 2004; Christchurch, Oct. 2004; St. Petersburg, Sept. 2005, and Berlin, April 2006. Over 60 people contributed to these discussions. However, the help, criticism and energy of Jacob Asiedu, Nicolas Bailly, Damian Barnier, Donald Hobern, Trevor Patterson, Guillaume Rousse, and Steve Shattuck is especially acknowledged.
Descriptive data, unlike specimen databases and name services, usually reside in many dispersed and independent data files. With few exceptions these are not provided by large organisations.
Descriptive data range from unstructured natural language to highly structured (coded) data. Each dataset typically has an independent ontology. SDD has been designed to accommodate the current complexity, but also provide means for further (voluntary) standardizations of ontologies.
The SDD (Structured Descriptive Data) xml schema defines a method to encode descriptive data in biology and other subjects. The primary goal of the design is to increase the knowledge and availability of knowledge about the diversity of life on earth. However, it may be used in many other areas (including medicine, pathology, archeology, anthropology) wherever objects or classes of objects are described for later reidentification. It is hoped that this standard will reach general acceptance to become a successor to existing standards like DELTA or NEXUS.
For the future we expect that the development of SDD forms a valuable contribution to future development of structured online monographs or species pages that include descriptive data as well as other biodiversity data.
*8. Summary*
<!-- Commented out:
* TEMPLATE QUESTION: A concise (<500 word) public summary of the group for use on the TDWG web site and in general literature about TDWG.
-->
The Interest Group for Structured Descriptive Data (SDD) designs, proposes, and maintains, standards for the representation of descriptive data about taxa and specimens. It is used particularly in support of interoperability and exchange mechanisms for software packages and web services handling descriptive data (e. g., "species banks" and interactive identification). Its principal target audience comprises developers of software, databases and web sites supporting the identification and description of organisms and taxa in the field or the laboratory. In turn, the audience for such software tools might be scientists (including taxonomists and systematists as well as ecologists or people in conservation work), practioners (including quarantine officers, workers in disease control), educators (primary as well as secondary level), or decision makers concerned with precise descriptions of the biota of the planet.
*9. Resources*
* The primary home address for communications of this group is: *[[http://wiki.tdwg.org/twiki/bin/view/SDD/WebHome]]*
* Current services using SDD are
* [[http://www.lucidcentral.org/][Lucid]],
* [[http://efg.cs.umb.edu/][UMASS-Boston Electronic Field Guide Project (EFG)]],
* [[http://www.identifylife.org/][IdentifyLife]],
* [[http://iris.biosci.ohio-state.edu/hymenoptera/][Hymenoptera On-Line Database]] and
* [[http://www.isrl.uiuc.edu/~openkey/][OpenKey]]
*10. OUTPUTS and OUTCOMES (and timeframe):*
The principal products of the SDD group are the SDD standard (SDD 1.0 endorsed by TDWG 2005), and the discussion framework captured on the [[SDD.WebHome][SDD Wiki]].
SDD 1.1 is nearing completion at the end of 2006 and will be proposed for adoption in 2007.
Investigation of, and a draft proposal for, an RDF representation of a subset of SDD will be completed in 2007.
SDD is under investigation by some members of the Core Group as a vehicle for incremental markup of taxonomic treatments in digitized legacy systematics literature.