wiki-archive/twiki/static/naturallanguagedescriptions...

143 lines
8.5 KiB
HTML
Raw Normal View History

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15" />
<link rel="stylesheet" type="text/css" href="sdd1.css">
</head>
<body bgcolor="#ffffff">
<a name="PageTop"></a>
<p />
<h2><a name="SDD_Part_0_Introduction_and_Prim"></a> SDD Part 0: Introduction and Primer to the SDD Standard </h2>
<p />
<h3><a name="2_3_Natural_language_description"></a> 2.3 Natural language descriptions. </h3>
<h3><a name="2_3_1_Traditional_natural_langua"></a> 2.3.1 Traditional natural language descriptions. </h3>
<p />
Natural-language descriptions (Box 2.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.
<p />
<h4><a name="Box_2_3_1_Typical_natural_langua"></a> Box 2.3.1 - Typical natural language descriptions </h4>
<p />
<table bgcolor="#ddddff" cellspacing="2" cellpadding="2" border="1">
<tr>
<td></div>
<p align="left">
<b>Red Knot (Calidris canutus)</b><br>
Stout wader with bill same length as head, crown unstreaked, narrow white bar
in wing, pale rump with grey barring, shortish olive legs. Non-breeding:
grey above with narrow pale edging to feathers, pale eyebrow, smudged sides
to neck with faint spotting. Juvenile: feathers of back edged white with
dark subterminal bar, breast more heavily spotted pale buff and flanks
barred, crown faintly streaked. Breeding: rufous underparts, feathers of
back rufous patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.</p>
<p align="right">
from Slater, P., Slater, P. &amp; Slater, R. (2001) The Slater Field Guide to
Australian Birds&nbsp; (Reed New Holland: Sydney)</p>
<p>
<b>Discaria pubescens (Brongn.) Druce<br>
</b>Rigid, spreading shrub to c. 1 m high and wide; stems glabrous. Leaves soon
deciduous, c. oblong, to 10 mm long, 3 mm wide, obtuse or minutely mucronate
within an apical notch, margins minutely toothed, surfaces glabrous or a few
hairs present near tip; stipules dark reddish-brown, c. 1 mm long, often
shallowly joined around the node, pubescent on inner face; spines stout, 1.5-4
cm long. Flowers white, solitary or in few-flowered axillary cymes, sometimes
congested on short apical shoots; pedicels 2-3 mm long; hypanthium c. 1.5 mm
long; sepals somewhat spreading, 1-1.5 mm long; petals attached at throat of
hypanthium, c. 1 mm long; stamens subequal to and weakly hooded by petals;
disc prominent, lining base of hypanthium, obscurely 5-angled; style minute.
Capsule prominently 3-lobed, 4-5 mm diam., the valves separating incompletely
at maturity and splitting dorsally and medially.</p>
</p>
<p align="right">
from Walsh, N.G. (1999) Rhamnaceae, in N.G.Walsh &amp; T.J.Entwisle,
Flora of Victoria Volume 4, Dicotyledons, Cornaceae to Asteraceae (Inkata
Press: Melbourne)</td>
</tr>
</table>
<p />
There are two methods for the production of natural language descriptions within SDD.
<p /> <ul>
<li> Descriptions may be produced elsewhere and simply stored within an SDD instance document, these are "authored natural language descriptions"
</li> <li> Descriptions may be generated from data and text snippets sourced from within the SDD instance document, these are termed "marked up natural language descriptions".
</li></ul>
<p />
<p />
<h3><a name="2_3_2_Authored_natural_language"></a><a name="2_3_2_Authored_natural_language_"></a> 2.3.2 Authored natural language descriptions. </h3>
<p />
Authored natural language descriptions are simply descriptions written by hand, either within an application or imported into an application, including legacy descriptions sourced from existing publications. Within SDD "authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and generated descriptions may contain markup (data supplied from a coded data source) but this is not required. All natural language descriptions are nested within the &lt;NaturalLanguageDescriptions&gt; element within <a href="sdddatasets.html">&lt;Dataset&gt;</a>.
<p />
A natural language description requires only two essential items: the names of the taxa being described, and the descriptions themselves.
<p />
A simple SDD instance document for natural language descriptions has the basic structure shown below and in Example 2.3.2.
<p />
<img alt="authoreddescriptions.gif" src="http://wiki.tdwg.org/twiki/static/_publish/authoreddescriptions.gif" />
<p />
<h4><a name="Example_2_3_2_Anchored_natural_l"></a> Example 2.3.2 - Anchored natural language descriptions </h4>
<p />
<table bgcolor="#ddddff" border="0" width="100%" cellpadding="5" cellspacing="5" style="border-collapse: collapse" bordercolor="#111111">
<p />
<tr>
<td>
<p />
<pre>
&#60;NaturalLanguageDescriptions&#62;
&#60;NaturalLanguageDescription id&#61;&#34;nat1&#34;&#62;
&#60;Representation&#62;
&#60;Label&#62;Acalypha L.&#60;/Label&#62;
&#60;/Representation&#62;
&#60;Scope&#62;
&#60;TaxonName ref&#61;&#34;t1&#34;/&#62;
&#60;/Scope&#62;
&#60;NaturalLanguageData&#62;
&#60;Text&#62;Herbs, shrubs; or trees, monoecious or rarely dioecious.
Leaves alternate, margins usually dentate or crenate. Flowers small,
males and females in separate axillary spikes or females solitary in
separate axils or one or more at or near base of male spikes; male
flowers clustered in axillary spikes with small bract under each cluster,
perianth of 4 segments, glands absent, stamens 8 or rarely 8-16 inserted
on a raised central receptacle, filaments free; female flowers 1-4
together within a leafy bract, bracts solitary or in spikes,
perianth of 3 segments, rarely 4, styles distinct, finely branched.
Fruits capsules.&#60;/Text&#62;
&#60;/NaturalLanguageData&#62;
&#60;/NaturalLanguageDescription&#62;
&#60;NaturalLanguageDescription id&#61;&#34;nat2&#34;&#62;
&#60;Representation&#62;
&#60;Label&#62;Acalypha australis L.&#60;/Label&#62;
&#60;/Representation&#62;
&#60;NaturalLanguageData&#62;
&#60;Text&#62;Herb up to 30 cm tall, stems and leaves often pink or red.
Leaves with petioles 1-2 cm long; blades ovate or subrhomboid, apex
acuminate, base acute or obtuse, margin serrulate-crenate, 2-6 cm X 1-3.5 cm.
Spikes short, 1-3 per axil, peduncles ca 0.5-1 cm long or longer; bracts up
to ca 1.5 cm long.&#60;/Text&#62;
&#60;/NaturalLanguageData&#62;
&#60;/NaturalLanguageDescription&#62;
&#60;/NaturalLanguageDescriptions&#62;
</pre>
<p />
</td>
</tr>
<p />
</table>
<p />
For more information on defining taxon names using the &lt;TaxonNames&gt; element, see the topic <a href="taxonnames.html">Defining taxon names</a>.
<p />
Note that taxa can also be arranged into hierarchies. See the topic <a href="taxonhierarchies.html">Defining taxon hierarchies</a> for more information.
<p />
The &lt;Representation&gt; element provides a label for the description. This may be useful if the instance document includes multiple descriptions for different purposes.
<p />
&lt;Scope&gt; describes the taxon or set of taxa to which the description applies.
<p />
The &lt;NaturalLanguageData&gt; element contains the text of the natural language description.
<p />
<h3><a name="2_3_3_Marked_up_natural_language"></a> 2.3.3 Marked up natural language descriptions. </h3>
<p />
<img alt="naturallanguagemarkup.gif" src="http://wiki.tdwg.org/twiki/static/_publish/naturallanguagemarkup.gif" />
<p />
Marking up of natural language descriptions allows parsing of matrix data into natural language descriptions and modification of character and state names for inclusion in natural language descriptions. "Authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and "generated" descriptions may have markup, but do not need to. The sdd standard is capable of storing data with partial markup, resulting from any mixture of automatic markup by a processor or manual markup.
<p />
<p />
-- DonovanSharp - 01 Jun 2006
<p />
<p />
</body></html>