wiki-archive/twiki/temp-gjr/BDI/SDD/Primer/SddIntroduction.txt

336 lines
16 KiB
Plaintext

%META:TOPICINFO{author="KevinThiele" date="1158293634" format="1.1" version="1.19"}%
%META:TOPICPARENT{name="SddContents"}%
---++SDD Part 0: Introduction and Primer to the SDD Standard
---+++Abstract
SDD Part 0 is a non-normative introduction to the [[http://www.tdwg.org][Taxonomic Databases Working Group][Name:_blank]] SDD (Structure of Descriptive Data) Standard. Its intention is to provide a background, introduction and primer to the SDD Standard, with examples. Since the SDD Standard is a work-in-progress, this document will be updated from time to time.
---+++Status of this document and version history
See [[DocumentStatus][DocumentStatus]]
---++1.0 Introduction
---+++1.1 Background to the TDWG-SDD Subgroup
In September 1998 the [[http://www.tdwg.org][Taxonomic Databases Working Group ][Name:_blank]] (TDWG) of the [[http://www.iubs.org][International Union of Biological Sciences][Name:_blank]] (IUBS) established the Structure of Descriptive Data (SDD) subgroup. TDWG’s role is to facilitate and manage the development of international standards in the taxonomic domain. The SDD subgroup was established to develop an international XML-based standard for capturing and managing descriptive data for organisms.
Development of the SDD standard was initiated in response to recognition that the existing standard previously endorsed by TDWG – the [[http://delta-intkey.com/][DELTA][Name:_blank]] data standard developed at CSIRO in Canberra from 1971 and adopted by TDWG as a descriptive data standard in 1991 – had become inadequate [[SddFaq][(FAQ: Why not continue to use DELTA?)]].
The SDD subgroup began discussing and scoping a standard through an email discussion group in November 1999 (see the [[http://listserv.nhm.ku.edu/archives/tdwg-sdd.html][SDD email list archives][Name:_blank]]). Considerable progress has been made at face-to-face meetings amongst a small group of core contributors, in Nov. 2001 (Canberra), Oct. 2002 (Sao Paulo), Feb. 2003 (Paris), October 2003 (Lisbon), May 2004 (Berlin) and Oct. 2004 (Christchurch).
---+++1.2 The nature of descriptive data in taxonomy
In taxonomy, descriptive data takes a number of very different forms.
Natural-language descriptions (Box 1.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.
----++++Box 1.2.1 - A typical natural language description
<table bgcolor="#ddddff" cellspacing="2" cellpadding="2" border="1" width="80%">
<tr>
<td>
<p align="left">
<b>Red Knot (Calidris canutus)</b><br>
Stout wader with bill same length as head, crown unstreaked, narrow white bar
in wing, pale rump with grey barring, shortish olive legs. Non-breeding:
grey above with narrow pale edging to feathers, pale eyebrow, smudged sides
to neck with faint spotting. Juvenile: feathers of back edged white with
dark subterminal bar, breast more heavily spotted pale buff and flanks
barred, crown faintly streaked. Breeding: rufous underparts, feathers of
back rufous patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.</p>
<p align="right">
from Slater, P., Slater, P. &amp; Slater, R. (2001) The Slater Field Guide to
Australian Birds&nbsp; (Reed New Holland: Sydney)</p>
</td>
</tr>
</table>
Dichotomous keys (Box 1.2.2) are specialised identification tools comprising fragments of descriptive data arranged in couplets forming a branching tree. Each fragment (lead) comprises a small (occasionally verbose) natural-language description.
----++++Box 1.2.2 - A simple dichotomous key
<div>
<table bgcolor="#ddddff" border="1" cellpadding="5" style="border-collapse: collapse" bordercolor="#111111" width="80%" id="table8">
<tr>
<td style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: solid; border-top-width: 1px; border-bottom-style: none; border-bottom-width: medium" colspan="3">
<b>Key to Australian skinks in the genus <i>Ctenotus</i></b></td>
</tr>
<tr>
<td width="50" style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: solid; border-top-width: 1px; border-bottom-style: none; border-bottom-width: medium">1</td>
<td style="border-left-style: none; border-left-width: medium; border-right-style: none; border-right-width: medium; border-top-style: solid; border-top-width: 1px; border-bottom-style: none; border-bottom-width: medium">Dark upper lateral zone with one or more distinct series of
pale spots or blotches along the body</td>
<td width="114" valign="bottom" align="right" style="border-left-style: none; border-left-width: medium; border-right-style: solid; border-right-width: 1px; border-top-style: solid; border-top-width: 1px; border-bottom-style: none; border-bottom-width: medium">2</td>
</tr>
<tr>
<td width="50" style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">1a</td>
<td style="border-style: none; border-width: medium">Dark upper lateral zone obscurely mottled or uniform with at
most a few pale spots anteriorly</td>
<td width="114" valign="bottom" align="right" style="border-left-style: none; border-left-width: medium; border-right-style: solid; border-right-width: 1px; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">3</td>
</tr>
<tr>
<td width="50" style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">2</td>
<td style="border-style: none; border-width: medium">Fewer than 25 lamellae under the fourth toe; supralabials 7-8 (usually 7); prefrontals separated</td>
<td width="114" valign="bottom" align="right" style="border-left-style: none; border-left-width: medium; border-right-style: solid; border-right-width: 1px; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">
<i>C. arcanus</i></td>
</tr>
<tr>
<td width="50" style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">2a</td>
<td style="border-style: none; border-width: medium">More than 25 lamellae under the fourth toe; supralabials 8-9
(usually 8); prefrontals usually in contact</td>
<td width="114" valign="bottom" align="right" style="border-left-style: none; border-left-width: medium; border-right-style: solid; border-right-width: 1px; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">
<i>C. alleni</i></td>
</tr>
<tr>
<td width="50" style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">3</td>
<td style="border-style: none; border-width: medium">Pale mid-lateral stripe passes over the hindlimb to continue
along the tail </td>
<td width="114" valign="bottom" align="right" style="border-left-style: none; border-left-width: medium; border-right-style: solid; border-right-width: 1px; border-top-style: none; border-top-width: medium; border-bottom-style: none; border-bottom-width: medium">
<i>C. inornatus</i></td>
</tr>
<tr>
<td width="50" style="border-left-style: solid; border-left-width: 1px; border-right-style: none; border-right-width: medium; border-top-style: none; border-top-width: medium; border-bottom-style: solid; border-bottom-width: 1px">3a</td>
<td style="border-left-style: none; border-left-width: medium; border-right-style: none; border-right-width: medium; border-top-style: none; border-top-width: medium; border-bottom-style: solid; border-bottom-width: 1px">Pale mid-lateral stripe extends to groin, then continues
along the front edge of the hindlimb</td>
<td width="114" valign="bottom" align="right" style="border-left-style: none; border-left-width: medium; border-right-style: solid; border-right-width: 1px; border-top-style: none; border-top-width: medium; border-bottom-style: solid; border-bottom-width: 1px">
<i>C. coggeri</i></td>
</tr>
</table>
</div>
Coded descriptions (Box 1.2.3) comprise highly structured data used in computer identification and analysis programs such as Lucid ([[http://www.lucidcentral.org][www.lucidcentral.org]]) , DELTA ([[http://www.delta-intkey.com][www.delta-intkey.com]]) and phylogenetic analysis programs such as PAUP ([[http://www.paup.csit.fsu.edu][www.paup.csit.fsu.edu]]).
----++++Box 1.2.3 - Simple examples of coded descriptions
<div>
<table bgcolor="#ddddff" border="0" cellpadding="5" cellspacing="5" style="border-collapse: collapse" bordercolor="#111111" width="80%" id="table9">
<tr>
<td width="50%">
<div>
Lucid Interchange Format (LIF) file</div>
#Lucid Interchange Format File v. 2.1<br>
<br>
[..Character List..]<br>
Distribution by region<br>
&nbsp; Tropical North<br>
&nbsp; Subtropical and Temperate East and South<br>
&nbsp; South West<br>
&nbsp; Arid &amp; Semi-arid (Central)<br>
&nbsp; Island Territories<br>
General habit<br>
&nbsp; tree<br>
&nbsp; shrub<br>
&nbsp; climber (woody or herbaceous)<br>
&nbsp; herb<br>
&nbsp; grass- or sedge-like plant<br>
Seasonal longevity<br>
&nbsp; annual, biennial or ephemeral<br>
&nbsp; perennial<br>
<br>
[..Taxon List..]<br>
Acanthaceae<br>
Aceraceae<br>
Actinidiaceae<br>
Agavaceae<br>
Aizoaceae<br>
Akaniaceae<br>
Alangiaceae<br>
Alismataceae<br>
Aloaceae<br>
Alseuosmiaceae<br>
<br>
[..Main Data (txs)..]<br>
101101111111<br>
100100000101<br>
101000000010<br>
011110111111<br>
101111111111<br>
100100000011<br>
101101000011<br>
011111011111<br>
011100100111<br>
101100000010</div>
</td>
<td width="50%">
<div>
DELTA file</div>
*SHOW: Gentianella - character list. Last revised 16 April 1997.<br>
<br>
*CHARACTER LIST <br>
<br>
#1. plants/<br>
1. monocarpic/<br>
2. polycarpic/<br>
<br>
#2. &lt;plants lifecycle&gt;/<br>
1. annual/<br>
2. biennial/<br>
3. perennial/<br>
<br>
#3. height in flower/<br>
&lt;&gt; cm/<br>
<br>
#4. caudex/<br>
1. unbranched/<br>
2. branched/<br>
<br>
*ITEM DESCRIPTIONS <br>
<br>
# Gentianella amabilis/<br>
1,2 2,3 3,3-13 4,1<br>
<br>
# Gentianella antarctica/<br>
1,1 2,1&lt;Godley 1982&gt; 3,1.6-22.0&lt;Godley 1982&gt; 4,1<br>
<br>
# Gentianella antipoda/<br>
1,1&lt;Godley 1982&gt; 2,2 3,3.5-9.8-24 4,1/2&lt;depends on size of plant&gt;<br>
<br>
# Gentianella astonii/<br>
1,2 2,3 3,15 4,2<br>
<br>
# Gentianella cerina/<br>
1,2 2,3 3,9-17 4,1/2<br>
<br>
#Gentianella concinna/<br>
1,1 2,1 3,2.7-15.0 4,1<br>
&nbsp;</div>
</td>
<td width="50%">&nbsp;<p>&nbsp;</td>
</tr>
</table>
</div>
Raw data descriptions (Box 1.2.4) usually comprise repeated measurements of parts of individual specimens, and are the basis from which the more abstracted descriptions in natural language and coded descriptions are derived. Few taxonomists consistently record and archive their raw data in a standardised format.
----++++Box 1.2.4 - Example of raw (specimen) descriptive data
<div>
<table bgcolor="#ddddff" border="1" width="80%" cellspacing="0" id="table10" cellpadding="0">
<tr>
<td rowspan="2" align="center">Specimen</td>
<td colspan="5" align="center">Spore length</td>
<td colspan="5" align="center">Spore width</td>
<td width="163" rowspan="2" align="center">Spore colour</td>
</tr>
<tr>
<td width="25" align="center">1</td>
<td width="25" align="center">2</td>
<td width="25" align="center">3</td>
<td width="25" align="center">4</td>
<td width="25" align="center">5</td>
<td width="25" align="center">1</td>
<td width="25" align="center">2</td>
<td width="25" align="center">3</td>
<td width="25" align="center">4</td>
<td width="25" align="center">5</td>
</tr>
<tr>
<td align="center">TJM45337</td>
<td width="25" align="center">12</td>
<td width="25" align="center">13</td>
<td width="25" align="center">12</td>
<td width="25" align="center">15</td>
<td width="25" align="center">11</td>
<td width="25" align="center">8</td>
<td width="25" align="center">8</td>
<td width="25" align="center">7</td>
<td width="25" align="center">6</td>
<td width="25" align="center">6</td>
<td width="163" align="center">brown</td>
</tr>
<tr>
<td align="center">TLM33466</td>
<td width="25" align="center">15</td>
<td width="25" align="center">18</td>
<td width="25" align="center">17</td>
<td width="25" align="center">17</td>
<td width="25" align="center">15</td>
<td width="25" align="center">10</td>
<td width="25" align="center">8</td>
<td width="25" align="center">9</td>
<td width="25" align="center">9</td>
<td width="25" align="center">10</td>
<td width="163" align="center">yellow</td>
</tr>
</table>
</div>
---+++1.3 Goals of SDD
The goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data in all the forms shown above, using a platform- and application-independent, international standard. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases.
---++++The SDD Standard:
* provides a flexible, platform-independent data structure for the capture and storage of taxonomic descriptions
* comprises a superset of data requirements of all existing programs
* provides extension beyond existing programs where data requirements can be predicted
* is readily extensible to account for future developments and data requirements
* is human-readable (although it is assumed that in almost all cases standard descriptions will be machine-generated and processed)
* is XML-based, and provides a schema for validation of documents.
---++++It facilitates:
* lossless porting of data between standard-aware applications
* achievable progressive markup of legacy descriptions, particularly natural-language descriptions
* comparability and, if possible, combinability of alternate descriptions of any one taxon
* efficient multi-tasking of descriptions (one description serving alternate purposes)
* archiving and sharing of raw and processed data
---++++SDD documents may include all or some of the following:
* Terminologies (characters, states, modifiers, char. trees, higher concepts)
* Structured (coded) data
* Sample data (e. g., measurements)
* Unstructured natural language data
* Natural language data with markup
* Dichotomous or polytomous keys
* Resources associated with descriptions (e. g., images, references, links)
---+++SDD is currently not designed to accommodate:
* Molecular sequence and other genetic data (future plans exist)
* Occurrence and specimen data (e. g., distribution maps)
* Complex ecological data such as models and ecological observations
* Organism interactions (host-parasite, plant-pollinator, predator-prey, etc.)
* Nomenclatural and formal systematic (rank) information
---+++1.4 SDD Streams
This Primer is structured into several streams, each describing how SDD can help you with a specific task. For each stream, the Primer describes the core elements of SDD used to capture information related to the task, and provides examples extending from simple to complex.
The four main tasks (uses) of SDD are as follows:
* [[CodedData][Using SDD for coded summary descriptions]]
* [[CodedSampleDescription][Using SDD for coded raw (sample) descriptions]]
* [[NaturalLanguageDescriptions][Using SDD for natural language descriptions]]
* [[DichotomousKeys][Using SDD for dichotomous (or polytomous) identification keys]]
Choose a link above to enter one of the SDD streams.
In addition to the streams, a number of SDD elements are common to all streams. Choose a link below to find out how to:
* [[TechnicalMetadata][Describe metadata for a providing application]]
* [[DatasetMetadata][Describe metadata for an SDD dataset]]
* [[SddMedia][Attach media (images etc) to SDD items]]
-- Main.KevinThiele - 07 Jul 2006