head 1.13; access; symbols; locks; strict; comment @# @; 1.13 date 2009.11.25.04.19.27; author GarryJolleyRogers; state Exp; branches; next 1.12; 1.12 date 2009.11.25.03.14.35; author GarryJolleyRogers; state Exp; branches; next 1.11; 1.11 date 2009.11.20.02.45.27; author LeeBelbin; state Exp; branches; next 1.10; 1.10 date 2008.03.12.22.31.52; author GregorHagedorn; state Exp; branches; next 1.9; 1.9 date 2007.03.06.17.30.00; author TWikiGuest; state Exp; branches; next 1.8; 1.8 date 2006.05.10.09.13.56; author GregorHagedorn; state Exp; branches; next 1.7; 1.7 date 2006.05.09.14.01.45; author KehanHarman; state Exp; branches; next 1.6; 1.6 date 2005.06.14.10.26.13; author RobBuis; state Exp; branches; next 1.5; 1.5 date 2004.12.10.20.51.27; author JenniferForman; state Exp; branches; next 1.4; 1.4 date 2004.12.07.18.25.10; author JenniferForman; state Exp; branches; next 1.3; 1.3 date 2004.11.25.02.44.30; author JenniferForman; state Exp; branches; next 1.2; 1.2 date 2004.11.25.00.27.48; author JenniferForman; state Exp; branches; next 1.1; 1.1 date 2004.10.17.01.05.00; author KevinThiele; state Exp; branches; next ; desc @none @ 1.13 log @none @ text @%META:TOPICINFO{author="GarryJolleyRogers" date="1259122767" format="1.1" version="1.13"}% %META:TOPICPARENT{name="BDI.SDD_"}% This page has been replaced by http://wiki.tdwg.org/twiki/bin/view/BDI.SDD/Primer/BDI.SDD %META:TOPICMOVED{by="GregorHagedorn" date="1147251845" from="SDD.SddPrimer" to="SDD.PrimerHome"}% @ 1.12 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GarryJolleyRogers" date="1259118875" format="1.1" version="1.12"}% d3 1 a3 1 This page has been replaced by http://wiki.tdwg.org/twiki/bin/view/BDI.SDD_/Primer/BDI.SDD_ @ 1.11 log @none @ text @d1 3 a3 3 %META:TOPICINFO{author="LeeBelbin" date="1258685127" format="1.1" reprev="1.11" version="1.11"}% %META:TOPICPARENT{name="BDI.SDD"}% This page has been replaced by http://wiki.tdwg.org/twiki/bin/view/BDI.SDD/Primer/BDI.SDD @ 1.10 log @none @ text @d1 3 a3 3 %META:TOPICINFO{author="GregorHagedorn" date="1205361112" format="1.1" version="1.10"}% %META:TOPICPARENT{name="SDD.WebHome"}% This page has been replaced by http://wiki.tdwg.org/twiki/bin/view/SDD/Primer/WebHome @ 1.9 log @Added topic name via script @ text @d1 1 a1 3 ---+!! %TOPIC% %META:TOPICINFO{author="GregorHagedorn" date="1147252436" format="1.1" version="1.8"}% d3 1 a3 582 ---+SDD Part 0: Introduction and Primer to the SDD Standard *NOTE: this version is referring to an older version of SDD and is currently being reworked. Please come back later!* Gregor: I have renamed this page to !PrimerHome and propose to trim it down to a shorter size and branch out to multiple pages, all starting with "Primer". Examples: PrimerIntroduction, PrimerCodedDescriptions or PrimerForCodedDescriptions, PrimerNaturalLanguageDescriptions PrimerForNaturalLanguageDescriptions, PrimerIdentificationKeys or PrimerForDichotomousKeys, etc. Also, when reworking, please try to avoid html. Html is legal and ok if truly needed (e.g. for complex tables), but otherwise makes future edits and commenting difficult. For the examples, please use rather than encoding the greater/less than characters. ---+++Abstract SDD Part 0 is a non-normative introduction to the Taxonomic Databases Working Group SDD (Structure of Descriptive Data) Standard. Its intention is to provide a background, introduction and primer to the SDD Standard, with examples. Since the SDD Standard is a work-in-progress, this document will be updated from time to time. ---+++Status of this document Version: 17 Oct. 2004 Edited: [[Kevin.Thiele][Kevin Thiele]] (Centre for Biological Information Technology, University of Queensland), with financial support from the Gordon and Betty Moore Foundation (www.moore.org). To contribute to the discussion on the SDD Standard and to comment on this document, please use this Wiki or the MailingList ---+++Relationship between SDD and other TDWG standards TDWG maintains and is developing other standards that relate to the SDD standard, including the Access to Biological Collections Data (ABCD) and Taxonomic Concept Names standards. TDWG is developing a common, shared base schema for SDD and these related schemas, called UBIF. ---+++SDD Version History Version 0.9 beta: released for comment December 2003.
Version 1.0 beta: released August 2004

---++1.0 Introduction ---+++1.1 Background to the TDWG-SDD Subgroup In September 1998 the Taxonomic Databases Working Group (TDWG) of the International Union of Biological Sciences (IUBS) established the Structure of Descriptive Data (SDD) subgroup. TDWG�s role is to facilitate and manage the development of international standards in the taxonomic domain. The SDD subgroup was established to develop an international XML-based standard for capturing and managing descriptive data for organisms. Development of the SDD standard was initiated in response to recognition that the existing standard previously endorsed by TDWG � the DELTA data standard developed at CSIRO in Canberra from 1971 and adopted by TDWG as a descriptive data standard in 1991 � had become inadequate (FAQ: Why not continue to use DELTA?). The SDD subgroup began discussing and scoping a standard through an email discussion group in November 1999 (see the SDD email list archives). Considerable progress has been made at face-to-face meetings amongst a small group of core contributors, in Nov. 2001 (Canberra), Oct. 2002 (Sao Paulo), Feb. 2003 (Paris), October 2003 (Lisbon), May 2004 (Berlin) and Oct. 2004 (Christchurch). ---+++1.2 The nature of descriptive data in taxonomy In taxonomy, descriptive data takes a number of very different forms. Natural-language descriptions (Box 1.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.
Box 1.2.1 - Typical natural language descriptions

Red Knot (Calidris canutus)
Stout wader with bill same length as head, crown unstreaked, narrow white bar in wing, pale rump with grey barring, shortish olive legs. Non-breeding: grey above with narrow pale edging to feathers, pale eyebrow, smudged sides to neck with faint spotting. Juvenile: feathers of back edged white with dark subterminal bar, breast more heavily spotted pale buff and flanks barred, crown faintly streaked. Breeding: rufous underparts, feathers of back rufous patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.

from Slater, P., Slater, P. & Slater, R. (2001) The Slater Field Guide to Australian Birds  (Reed New Holland: Sydney)

Discaria pubescens (Brongn.) Druce
Rigid, spreading shrub to c. 1 m high and wide; stems glabrous. Leaves soon deciduous, c. oblong, to 10 mm long, 3 mm wide, obtuse or minutely mucronate within an apical notch, margins minutely toothed, surfaces glabrous or a few hairs present near tip; stipules dark reddish-brown, c. 1 mm long, often shallowly joined around the node, pubescent on inner face; spines stout, 1.5-4 cm long. Flowers white, solitary or in few-flowered axillary cymes, sometimes congested on short apical shoots; pedicels 2-3 mm long; hypanthium c. 1.5 mm long; sepals somewhat spreading, 1-1.5 mm long; petals attached at throat of hypanthium, c. 1 mm long; stamens subequal to and weakly hooded by petals; disc prominent, lining base of hypanthium, obscurely 5-angled; style minute. Capsule prominently 3-lobed, 4-5 mm diam., the valves separating incompletely at maturity and splitting dorsally and medially.

from Walsh, N.G. (1999) Rhamnaceae, in N.G.Walsh & T.J.Entwisle, Flora of Victoria Volume 4, Dicotyledons, Cornaceae to Asteraceae (Inkata Press: Melbourne)

 

Dichotomous keys (Box 1.2.2) are specialised identification tools comprising fragments of descriptive data arranged in couplets forming a branching tree. Each fragment (lead) comprises a small (occasionally verbose) natural-language description.

Box 1.2.2 - Typical dichotomous keys

Key to Ascomycete genera  

Ascus unitunicate
 
    Clypeus present around ostiolar neck  
        Clypeus poorly developed Glomerella
        Clypeus well developed Hyllachora
    Clypeus lacking  
        Ascus widest in middle Physalospora
        Ascus clavate or cylindrical Glomerella
Ascus bitunicate  
    Ascostroma uniloculate Guignardia
    Ascostroma muliloculate Botryosphaeria

1 Dark upper lateral zone with one or more distinct series of pale spots or blotches along the body 2
1a Dark upper lateral zone obscurely mottled or uniform with at most a few pale spots anteriorly 3
2 Fewer than 25 lamellae under the fourth toe; supralabials 7-8 (usually 7); prefrontals separated C. arcanus
2a More than 25 lamellae under the fourth toe; supralabials 8-9 (usually 8); prefrontals usually in contact C. alleni
3 Pale mid-lateral stripe passes over the hindlimb to continue along the tail C. inornatus
3a Pale mid-lateral stripe extends to groin, then continues along the front edge of the hindlimb C. coggeri

 

Coded descriptions (Box 1.2.3) comprise highly structured data used in computer identification and analysis programs such as Lucid (www.lucidcentral.org) , DELTA (delta-intkey.com) and a suite of phylogenetic analysis programs such as PAUP (http://paup.csit.fsu.edu/).

Box 1.2.3 - Simple examples of coded descriptions
Lucid Interchange Format (LIF) file

#Lucid Interchange Format File v. 2.1

[..Character List..]
Distribution by region
  Tropical North
  Subtropical and Temperate East and South
  South West
  Arid & Semi-arid (Central)
  Island Territories
General habit
  tree
  shrub
  climber (woody or herbaceous)
  herb
  grass- or sedge-like plant
Seasonal longevity
  annual, biennial or ephemeral
  perennial

[..Taxon List..]
Acanthaceae
Aceraceae
Actinidiaceae
Agavaceae
Aizoaceae
Akaniaceae
Alangiaceae
Alismataceae
Aloaceae
Alseuosmiaceae

[..Main Data (txs)..]
101101111111
100100000101
101000000010
011110111111
101111111111
100100000011
101101000011
011111011111
011100100111
101100000010

DELTA file

*SHOW: Gentianella - character list. Last revised 16 April 1997.

*CHARACTER LIST

#1. plants/
1. monocarpic/
2. polycarpic/

#2. <plants lifecycle>/
1. annual/
2. biennial/
3. perennial/

#3. height in flower/
<> cm/

#4. caudex/
1. unbranched/
2. branched/

*ITEM DESCRIPTIONS

# Gentianella amabilis/
1,2 2,3 3,3-13 4,1

# Gentianella antarctica/
1,1 2,1<Godley 1982> 3,1.6-22.0<Godley 1982> 4,1

# Gentianella antipoda/
1,1<Godley 1982> 2,2 3,3.5-9.8-24 4,1/2<depends on size of plant>

# Gentianella astonii/
1,2 2,3 3,15 4,2

# Gentianella cerina/
1,2 2,3 3,9-17 4,1/2

#Gentianella concinna/
1,1 2,1 3,2.7-15.0 4,1
 

 

 

 

Raw data descriptions (Box 1.2.4) usually comprise repeated measurements of parts of individual specimens, and are the basis from which the more abstracted descriptions in natural language and coded descriptions are derived. Few taxonomists consistently record and archive their raw data in a standardised format.

Box 1.2.4 - Example of raw (specimen) descriptive data

Specimen Spore length Spore width Spore colour
1 2 3 4 5 1 2 3 4 5
TJM45337 12 13 12 15 11 8 8 7 6 6 brown
TLM33466 15 18 17 17 15 10 8 9 9 10 yellow

 

1.3 Goals of SDD

The goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data in all the forms shown above, using a platform- and application-independent, international standard. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases.

The SDD Standard:

It facilitates:

2.0 Basic structure of a simple SDD instance document

The simplest possible description comprises a single descriptive statement about an organism, taxon or object. An example of such a description is given in Box. 2.0.1, and its SDD representation in Example 2.0.1.

Box 2.0.1 - A simple description

Viola hederacea Labill.
Leaves simple

 

Example 2.0.1 - Description in Box 2.0.1 represented in SDD

<?xml version='1.0' encoding='UTF-8'?>
<Datasets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.tdwg.org/2004/UBIF" xsi:schemaLocation="http://www.tdwg.org/2004/UBIF SDD.xsd">
  <Dataset>
    <Derivation datetime='2004-10-17T06:50:13'>
      <Generator name='By Hand' version='1'/>
    </Derivation>
    <ExternalDataInterface>
      <ClassNames>
        <ClassName id='1'>
          <Label>
            <Representation language='en'>
              <Text>Viola hederacea</Text>
            </Representation>
          </Label>
        </ClassName>
      </ClassNames>
      <Agents>
        <Agent id='1'>
          <Label>
            <Representation language='en'>
              <Text>A. Botanist</Text>
            </Representation>
          </Label>
        </Agent>
      </Agents>
    </ExternalDataInterface>
    <Metadata>
      <Description>
        <Representation language='en'>
          <Title>Descriptive statement for a Viola</Title>
        </Representation>
      </Description>
    </Metadata>
    <DescriptiveData>
      <Terminology>
        <Characters>
          <CategoricalCharacter id='1'>
            <Label>
              <Representation language='en'>
                <Text>Leaf complexity</Text>
              </Representation>
            </Label>
            <States>
              <StateDefinition id='1'>
                <Label>
                  <Representation language='en'>
                    <Text>simple</Text>
                  </Representation>
                </Label>
              </StateDefinition>
            </States>
          </CategoricalCharacter>
        </Characters>
      </Terminology>
      <CodedDescriptions>
        <CodedDescription id='0'>
          <Header>
            <ClassName ref='1'/>
          </Header>
          <SummaryData>
            <Categorical ref='1'>
              <State ref='1'/>
            </Categorical>
          </SummaryData>
        </CodedDescription>
      </CodedDescriptions>
    </DescriptiveData>
  </Dataset>
</Datasets>

In the SDD document in Box 2.0.1, data are wrapped in a <Dataset> element. Several datasets may be wrapped in a single SDD document, in the <Datasets> container element.

The <Derivation> element provides information about the way in which the data were created, including the date and time stamp at which the data was generated, and the application or other method by which the document was created.

The <ExternalDataElement> is used to wrap data that may be provided by an external web service (in this case, the data are internal to the document). In this element, the name of the taxon (Viola hederacea) is provided in the <Classes> element, and the name of the author of the document in the <Agents> element

Metadata for the project that provided the data is given in the <Metadata> element. In this case, only a title for the data set is provided.

The description is provided in the <DescriptiveData> element, using a character and state (character = Leaf complexity; state = simple) defined in the <Terminology> element. The <CodedDescription> element contains the description itself, using references to identify the taxon (class), character and state being described.

FAQ: Why are SDD documents so verbose and complex?

3.0 Beyond the simple instance...

Example 2.0.1 describes only the most simple of SDD structures. To go further, the Primer provides several pathways or streams, depending on what you wish to use SDD for. On each stream, the Primer will introduce the basic concepts first, then branch to more complex examples.

Before entering the first stream, you should understand the <Derivation> and <ExternalDataInterface> elements.

For more information on the relationships between the SDD and UBIF schemas, read the topic SDD and UBIF Schemas.

The streams are:
     Using SDD for coded data
     Using SDD for natural language descriptions
     Using SDD for dichotomous keys
     Using SDD for raw observation data

KRT Last Edit: 16 Jan 04

@ 1.8 log @none @ text @d1 2 @ 1.7 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="KehanHarman" date="1147183305" format="1.1" version="1.7"}% d3 1 a3 1 d5 1 a5 7 SDD Primer - Home d7 1 a7 1 d9 21 a29 13
%SDDLOGOIMG%

Index

SDD Part 0: Introduction and Primer to the SDD Standard

a30 29

Abstract

SDD Part 0 is a non-normative introduction to the Taxonomic Databases Working Group SDD (Structure of Descriptive Data) Standard. Its intention is to provide a background, introduction and primer to the SDD Standard, with examples. Since the SDD Standard is a work-in-progress, this document will be updated from time to time.

Status of this document

Version: 17 Oct. 2004

Edited: Kevin Thiele (Centre for Biological Information Technology, University of Queensland), with financial support from the Gordon and Betty Moore Foundation (www.moore.org).

Complete documentation of the SDD Schema is available on the SDD web site.

To contribute to the discussion on the SDD Standard and to comment on this document, please join the SDD Wiki or SDD discussion list.

Relationship between SDD and other TDWG standards

TDWG maintains and is developing other standards that relate to the SDD standard, including the Access to Biological Collections Data (ABCD) and Taxonomic Concept Names standards. TDWG is developing a common, shared base schema for SDD and these related schemas, called UBIF.

SDD Version History

d35 19 a53 34

1.0 Introduction

1.1 Background to the TDWG-SDD Subgroup

In September 1998 the Taxonomic Databases Working Group (TDWG) of the International Union of Biological Sciences (IUBS) established the Structure of Descriptive Data (SDD) subgroup. TDWG�s role is to facilitate and manage the development of international standards in the taxonomic domain. The SDD subgroup was established to develop an international XML-based standard for capturing and managing descriptive data for organisms.

Development of the SDD standard was initiated in response to recognition that the existing standard previously endorsed by TDWG � the DELTA data standard developed at CSIRO in Canberra from 1971 and adopted by TDWG as a descriptive data standard in 1991 � had become inadequate (FAQ: Why not continue to use DELTA?).

The SDD subgroup began discussing and scoping a standard through an email discussion group in November 1999 (see the SDD email list archives). Considerable progress has been made at face-to-face meetings amongst a small group of core contributors, in Nov. 2001 (Canberra), Oct. 2002 (Sao Paulo), Feb. 2003 (Paris), October 2003 (Lisbon), May 2004 (Berlin) and Oct. 2004 (Christchurch).

1.2 The nature of descriptive data in taxonomy

In taxonomy, descriptive data takes a number of very different forms.

Natural-language descriptions (Box 1.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.

a584 2 d586 1 a586 1 %META:TOPICMOVED{by="JenniferForman" date="1102711827" from="SDDPrimer.SddPrimer" to="SDD.SddPrimer"}% @ 1.6 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="RobBuis" date="1118744773" format="1.0" version="1.6"}% d17 4 a20 4 %SDDLOGOIMG%

Index d23 1 a23 1 d25 1 a25 1 d68 1 a68 1 (SDD) subgroup. TDWG’s role is to facilitate and manage the development of d74 2 a75 2 the existing standard previously endorsed by TDWG – the DELTA data standard developed d77 1 a77 1 standard in 1991 – had become inadequate (FAQ: d102 6 a107 6 in wing, pale rump with grey barring, shortish olive legs. Non-breeding: grey above with narrow pale edging to feathers, pale eyebrow, smudged sides to neck with faint spotting. Juvenile: feathers of back edged white with dark subterminal bar, breast more heavily spotted pale buff and flanks barred, crown faintly streaked. Breeding: rufous underparts, feathers of back rufous patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.

d139 113 a251 113
Key to Ascomycete genera  

Ascus unitunicate
 
    Clypeus present around ostiolar neck  
        Clypeus poorly developed Glomerella
        Clypeus well developed Hyllachora
    Clypeus lacking  
        Ascus widest in middle Physalospora
        Ascus clavate or cylindrical Glomerella
Ascus bitunicate  
    Ascostroma uniloculate Guignardia
    Ascostroma muliloculate Botryosphaeria

1 Dark upper lateral zone with one or more distinct series of pale spots or blotches along the body 2
1a Dark upper lateral zone obscurely mottled or uniform with at most a few pale spots anteriorly 3
2 Fewer than 25 lamellae under the fourth toe; supralabials 7-8 (usually 7); prefrontals separated C. arcanus
2a More than 25 lamellae under the fourth toe; supralabials 8-9 (usually 8); prefrontals usually in contact C. alleni
3 Pale mid-lateral stripe passes over the hindlimb to continue along the tail C. inornatus
3a Pale mid-lateral stripe extends to groin, then continues along the front edge of the hindlimb C. coggeri

d256 2 a257 3 (hyperlink) and a suite of phylogenetic analysis programs such as PAUP (hyperlink).

d263 1 a263 1 d315 2 a316 2 d365 2 a366 2  

  d381 1 a381 1 data d384 50 a433 50
Specimen Spore length Spore width Spore colour
1 2 3 4 5 1 2 3 4 5
TJM45337 12 13 12 15 11 8 8 7 6 6 brown
TLM33466 15 18 17 17 15 10 8 9 9 10 yellow

d449 1 a449 1

  • provides a flexible, platform-independent data structure for the capture d451 2 a452 2
  • comprises a superset of data requirements of all existing programs
  • provides extension beyond existing programs where data requirements can be d454 1 a454 1
  • is readily extensible to account for future developments and data d456 1 a456 1
  • is human-readable (although it is assumed that in almost all cases d459 1 a459 1
  • is XML-based, and provides a schema for validation of documents.
  • d472 1 a472 1
  • archiving and sharing of raw and processed data
  • d498 64 a561 64 <Derivation datetime='2004-10-17T06:50:13'> <Generator name='By Hand' version='1'/> </Derivation> <ExternalDataInterface> <ClassNames> <ClassName id='1'> <Label> <Representation language='en'> <Text>Viola hederacea</Text> </Representation> </Label> </ClassName> </ClassNames> <Agents> <Agent id='1'> <Label> <Representation language='en'> <Text>A. Botanist</Text> </Representation> </Label> </Agent> </Agents> </ExternalDataInterface> <Metadata> <Description> <Representation language='en'> <Title>Descriptive statement for a Viola</Title> </Representation> </Description> </Metadata> <DescriptiveData> <Terminology> <Characters> <CategoricalCharacter id='1'> <Label> <Representation language='en'> <Text>Leaf complexity</Text> </Representation> </Label> <States> <StateDefinition id='1'> <Label> <Representation language='en'> <Text>simple</Text> </Representation> </Label> </StateDefinition> </States> </CategoricalCharacter> </Characters> </Terminology> <CodedDescriptions> <CodedDescription id='0'> <Header> <ClassName ref='1'/> </Header> <SummaryData> <Categorical ref='1'> <State ref='1'/> </Categorical> </SummaryData> </CodedDescription> </CodedDescriptions> </DescriptiveData> d605 8 a612 8 further, the Primer provides several pathways or streams, depending on what you wish to use SDD for. On each stream, the Primer will introduce the basic concepts first, then branch to more complex examples.

    Before entering the first stream, you should understand the
    <Derivation> and <ExternalDataInterface> elements.

    d615 1 a615 1 SDD and UBIF Schemas.

    d629 1 @ 1.5 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="JenniferForman" date="1102711887" format="1.0" version="1.5"}% d254 1 a254 1  

    Coded descriptions (Box 1.2.2) comprise highly structured data used in d260 1 a260 1

    Box 1.2.2 - Simple examples of coded descriptions
    d374 1 a374 1  

    Raw data descriptions (Box 1.2.3) usually comprise repeated d381 1 a381 1

    Box 1.2.3 - Example of raw (specimen) descriptive @ 1.4 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="JenniferForman" date="1102443910" format="1.0" version="1.4"}% a10 1 d630 1 a630 1 %META:TOPICMOVED{by="JenniferForman" date="1102443846" from="SDD.SddPrimer" to="SDDPrimer.SddPrimer"}% @ 1.3 log @none @ text @d1 2 a2 2 %META:TOPICINFO{author="JenniferForman" date="1101350670" format="1.0" version="1.3"}% %META:TOPICPARENT{name="WebHome"}% d46 1 a46 1 SDD Wiki or d55 1 a55 1 related schemas, called d631 1 @ 1.2 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="JenniferForman" date="1101342468" format="1.0" version="1.2"}% d16 1 a16 1 @ 1.1 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="KevinThiele" date="1097975100" format="1.0" version="1.1"}% d3 626 a628 626 SDD Primer - Home

    Index

    SDD Part 0: Introduction and Primer to the SDD Standard

    Abstract

    SDD Part 0 is a non-normative introduction to the Taxonomic Databases Working Group SDD (Structure of Descriptive Data) Standard. Its intention is to provide a background, introduction and primer to the SDD Standard, with examples. Since the SDD Standard is a work-in-progress, this document will be updated from time to time.

    Status of this document

    Version: 17 Oct. 2004

    Edited: Kevin Thiele (Centre for Biological Information Technology, University of Queensland), with financial support from the Gordon and Betty Moore Foundation (www.moore.org).

    Complete documentation of the SDD Schema is available on the SDD web site.

    To contribute to the discussion on the SDD Standard and to comment on this document, please join the SDD Wiki or SDD discussion list.

    Relationship between SDD and other TDWG standards

    TDWG maintains and is developing other standards that relate to the SDD standard, including the Access to Biological Collections Data (ABCD) and Taxonomic Concept Names standards. TDWG is developing a common, shared base schema for SDD and these related schemas, called UBIF.

    SDD Version History

    Version 0.9 beta: released for comment December 2003.
    Version 1.0 beta: released August 2004

    1.0 Introduction

    1.1 Background to the TDWG-SDD Subgroup

    In September 1998 the Taxonomic Databases Working Group (TDWG) of the International Union of Biological Sciences (IUBS) established the Structure of Descriptive Data (SDD) subgroup. TDWG’s role is to facilitate and manage the development of international standards in the taxonomic domain. The SDD subgroup was established to develop an international XML-based standard for capturing and managing descriptive data for organisms.

    Development of the SDD standard was initiated in response to recognition that the existing standard previously endorsed by TDWG – the DELTA data standard developed at CSIRO in Canberra from 1971 and adopted by TDWG as a descriptive data standard in 1991 – had become inadequate (FAQ: Why not continue to use DELTA?).

    The SDD subgroup began discussing and scoping a standard through an email discussion group in November 1999 (see the SDD email list archives). Considerable progress has been made at face-to-face meetings amongst a small group of core contributors, in Nov. 2001 (Canberra), Oct. 2002 (Sao Paulo), Feb. 2003 (Paris), October 2003 (Lisbon), May 2004 (Berlin) and Oct. 2004 (Christchurch).

    1.2 The nature of descriptive data in taxonomy

    In taxonomy, descriptive data takes a number of very different forms.

    Natural-language descriptions (Box 1.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.

    Box 1.2.1 - Typical natural language descriptions

    Red Knot (Calidris canutus)
    Stout wader with bill same length as head, crown unstreaked, narrow white bar in wing, pale rump with grey barring, shortish olive legs. Non-breeding: grey above with narrow pale edging to feathers, pale eyebrow, smudged sides to neck with faint spotting. Juvenile: feathers of back edged white with dark subterminal bar, breast more heavily spotted pale buff and flanks barred, crown faintly streaked. Breeding: rufous underparts, feathers of back rufous patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.

    from Slater, P., Slater, P. & Slater, R. (2001) The Slater Field Guide to Australian Birds  (Reed New Holland: Sydney)

    Discaria pubescens (Brongn.) Druce
    Rigid, spreading shrub to c. 1 m high and wide; stems glabrous. Leaves soon deciduous, c. oblong, to 10 mm long, 3 mm wide, obtuse or minutely mucronate within an apical notch, margins minutely toothed, surfaces glabrous or a few hairs present near tip; stipules dark reddish-brown, c. 1 mm long, often shallowly joined around the node, pubescent on inner face; spines stout, 1.5-4 cm long. Flowers white, solitary or in few-flowered axillary cymes, sometimes congested on short apical shoots; pedicels 2-3 mm long; hypanthium c. 1.5 mm long; sepals somewhat spreading, 1-1.5 mm long; petals attached at throat of hypanthium, c. 1 mm long; stamens subequal to and weakly hooded by petals; disc prominent, lining base of hypanthium, obscurely 5-angled; style minute. Capsule prominently 3-lobed, 4-5 mm diam., the valves separating incompletely at maturity and splitting dorsally and medially.

    from Walsh, N.G. (1999) Rhamnaceae, in N.G.Walsh & T.J.Entwisle, Flora of Victoria Volume 4, Dicotyledons, Cornaceae to Asteraceae (Inkata Press: Melbourne)

     

    Dichotomous keys (Box 1.2.2) are specialised identification tools comprising fragments of descriptive data arranged in couplets forming a branching tree. Each fragment (lead) comprises a small (occasionally verbose) natural-language description.

    Box 1.2.2 - Typical dichotomous keys

    Key to Ascomycete genera  

    Ascus unitunicate
     
        Clypeus present around ostiolar neck  
            Clypeus poorly developed Glomerella
            Clypeus well developed Hyllachora
        Clypeus lacking  
            Ascus widest in middle Physalospora
            Ascus clavate or cylindrical Glomerella
    Ascus bitunicate  
        Ascostroma uniloculate Guignardia
        Ascostroma muliloculate Botryosphaeria

    1 Dark upper lateral zone with one or more distinct series of pale spots or blotches along the body 2
    1a Dark upper lateral zone obscurely mottled or uniform with at most a few pale spots anteriorly 3
    2 Fewer than 25 lamellae under the fourth toe; supralabials 7-8 (usually 7); prefrontals separated C. arcanus
    2a More than 25 lamellae under the fourth toe; supralabials 8-9 (usually 8); prefrontals usually in contact C. alleni
    3 Pale mid-lateral stripe passes over the hindlimb to continue along the tail C. inornatus
    3a Pale mid-lateral stripe extends to groin, then continues along the front edge of the hindlimb C. coggeri

     

    Coded descriptions (Box 1.2.2) comprise highly structured data used in computer identification and analysis programs such as Lucid (www.lucidcentral.org) , DELTA (hyperlink) and a suite of phylogenetic analysis programs such as PAUP (hyperlink).

    Box 1.2.2 - Simple examples of coded descriptions
    Lucid Interchange Format (LIF) file

    #Lucid Interchange Format File v. 2.1

    [..Character List..]
    Distribution by region
      Tropical North
      Subtropical and Temperate East and South
      South West
      Arid & Semi-arid (Central)
      Island Territories
    General habit
      tree
      shrub
      climber (woody or herbaceous)
      herb
      grass- or sedge-like plant
    Seasonal longevity
      annual, biennial or ephemeral
      perennial

    [..Taxon List..]
    Acanthaceae
    Aceraceae
    Actinidiaceae
    Agavaceae
    Aizoaceae
    Akaniaceae
    Alangiaceae
    Alismataceae
    Aloaceae
    Alseuosmiaceae

    [..Main Data (txs)..]
    101101111111
    100100000101
    101000000010
    011110111111
    101111111111
    100100000011
    101101000011
    011111011111
    011100100111
    101100000010

    DELTA file

    *SHOW: Gentianella - character list. Last revised 16 April 1997.

    *CHARACTER LIST

    #1. plants/
    1. monocarpic/
    2. polycarpic/

    #2. <plants lifecycle>/
    1. annual/
    2. biennial/
    3. perennial/

    #3. height in flower/
    <> cm/

    #4. caudex/
    1. unbranched/
    2. branched/

    *ITEM DESCRIPTIONS

    # Gentianella amabilis/
    1,2 2,3 3,3-13 4,1

    # Gentianella antarctica/
    1,1 2,1<Godley 1982> 3,1.6-22.0<Godley 1982> 4,1

    # Gentianella antipoda/
    1,1<Godley 1982> 2,2 3,3.5-9.8-24 4,1/2<depends on size of plant>

    # Gentianella astonii/
    1,2 2,3 3,15 4,2

    # Gentianella cerina/
    1,2 2,3 3,9-17 4,1/2

    #Gentianella concinna/
    1,1 2,1 3,2.7-15.0 4,1
     

     

     

     

    Raw data descriptions (Box 1.2.3) usually comprise repeated measurements of parts of individual specimens, and are the basis from which the more abstracted descriptions in natural language and coded descriptions are derived. Few taxonomists consistently record and archive their raw data in a standardised format.

    Box 1.2.3 - Example of raw (specimen) descriptive data

    Specimen Spore length Spore width Spore colour
    1 2 3 4 5 1 2 3 4 5
    TJM45337 12 13 12 15 11 8 8 7 6 6 brown
    TLM33466 15 18 17 17 15 10 8 9 9 10 yellow

     

    1.3 Goals of SDD

    The goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data in all the forms shown above, using a platform- and application-independent, international standard. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases.

    The SDD Standard:

    It facilitates:

    2.0 Basic structure of a simple SDD instance document

    The simplest possible description comprises a single descriptive statement about an organism, taxon or object. An example of such a description is given in Box. 2.0.1, and its SDD representation in Example 2.0.1.

    Box 2.0.1 - A simple description

    Viola hederacea Labill.
    Leaves simple

     

    Example 2.0.1 - Description in Box 2.0.1 represented in SDD

    <?xml version='1.0' encoding='UTF-8'?>
    <Datasets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.tdwg.org/2004/UBIF" xsi:schemaLocation="http://www.tdwg.org/2004/UBIF SDD.xsd">
      <Dataset>
    	 <Derivation datetime='2004-10-17T06:50:13'>
    		<Generator name='By Hand' version='1'/>
    	 </Derivation>
    	 <ExternalDataInterface>
    		<ClassNames>
    		  <ClassName id='1'>
    			 <Label>
    				<Representation language='en'>
    				  <Text>Viola hederacea</Text>
    				</Representation>
    			 </Label>
    		  </ClassName>
    		</ClassNames>
    		<Agents>
    		  <Agent id='1'>
    			 <Label>
    				<Representation language='en'>
    				  <Text>A. Botanist</Text>
    				</Representation>
    			 </Label>
    		  </Agent>
    		</Agents>
    	 </ExternalDataInterface>
    	 <Metadata>
    		<Description>
    		  <Representation language='en'>
    			 <Title>Descriptive statement for a Viola</Title>
    		  </Representation>
    		</Description>
    	 </Metadata>
    	 <DescriptiveData>
    		<Terminology>
    		  <Characters>
    			 <CategoricalCharacter id='1'>
    				<Label>
    				  <Representation language='en'>
    					 <Text>Leaf complexity</Text>
    				  </Representation>
    				</Label>
    				<States>
    				  <StateDefinition id='1'>
    					 <Label>
    						<Representation language='en'>
    						  <Text>simple</Text>
    						</Representation>
    					 </Label>
    				  </StateDefinition>
    				</States>
    			 </CategoricalCharacter>
    		  </Characters>
    		</Terminology>
    		<CodedDescriptions>
    		  <CodedDescription id='0'>
    			 <Header>
    				<ClassName ref='1'/>
    			 </Header>
    			 <SummaryData>
    				<Categorical ref='1'>
    				  <State ref='1'/>
    				</Categorical>
    			 </SummaryData>
    		  </CodedDescription>
    		</CodedDescriptions>
    	 </DescriptiveData>
      </Dataset>
    </Datasets>
    

    In the SDD document in Box 2.0.1, data are wrapped in a <Dataset> element. Several datasets may be wrapped in a single SDD document, in the <Datasets> container element.

    The <Derivation> element provides information about the way in which the data were created, including the date and time stamp at which the data was generated, and the application or other method by which the document was created.

    The <ExternalDataElement> is used to wrap data that may be provided by an external web service (in this case, the data are internal to the document). In this element, the name of the taxon (Viola hederacea) is provided in the <Classes> element, and the name of the author of the document in the <Agents> element

    Metadata for the project that provided the data is given in the <Metadata> element. In this case, only a title for the data set is provided.

    The description is provided in the <DescriptiveData> element, using a character and state (character = Leaf complexity; state = simple) defined in the <Terminology> element. The <CodedDescription> element contains the description itself, using references to identify the taxon (class), character and state being described.

    FAQ: Why are SDD documents so verbose and complex?

    3.0 Beyond the simple instance...

    Example 2.0.1 describes only the most simple of SDD structures. To go further, the Primer provides several pathways or streams, depending on what you wish to use SDD for. On each stream, the Primer will introduce the basic concepts first, then branch to more complex examples.

    Before entering the first stream, you should understand the <Derivation> and <ExternalDataInterface> elements.

    For more information on the relationships between the SDD and UBIF schemas, read the topic SDD and UBIF Schemas.

    The streams are:
         Using SDD for coded data
         Using SDD for natural language descriptions
         Using SDD for dichotomous keys
         Using SDD for raw observation data

    KRT Last Edit: 16 Jan 04

    @