mirror of https://github.com/tdwg/dwc.git
626 lines
28 KiB
HTML
626 lines
28 KiB
HTML
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|||
|
<head>
|
|||
|
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
|
|||
|
<title>Darwin Core Text Guidelines</title>
|
|||
|
<link rel="schema.DwC" href="http://rs.tdwg.org/dwc/"/>
|
|||
|
<meta name="DC.title" content="Darwin Core Text Guidelines"/>
|
|||
|
<meta name="DC.description" content="Guidelines for implementing Darwin Core in XML."/>
|
|||
|
<meta name="DC.subject" content="biodiversity, standards"/>
|
|||
|
<meta name="DC.creator" content="Darwin Core Task Group"/>
|
|||
|
<meta name="DC.contributor" content="John Wieczorek (MVZ) <tuco@berkeley.edu>"/>
|
|||
|
<meta name="DC.contributor" content="Markus Döring (GBIF) <mdoering@gbif.org>"/>
|
|||
|
<meta name="DC.contributor" content="Renato De Giovanni (CRIA) <renato@cria.org.br>"/>
|
|||
|
<meta name="DC.contributor" content="Tim Robertson (GBIF) <trobertson@gbif.org>"/>
|
|||
|
<meta name="DC.contributor" content="Dave Vieglais (KUNHM) <vieglais@ku.edu>"/>
|
|||
|
<meta name="DC.contributor" content="Stan Blum (CAS) <sblum@calacademy.org>"/>
|
|||
|
<meta name="DC.modified" content="2009-02-12"/>
|
|||
|
<meta name="DC.dateAccepted" content="2009-02-12"/>
|
|||
|
<meta name="DC.format" content="text/html"/>
|
|||
|
<meta name="DC.identifier" content="http://rs.tdwg.org/dwc/terms/xsd/guide/2009-02-12"/>
|
|||
|
<meta name="DC.publisher" content="Biodiversity Information Standards TDWG"/>
|
|||
|
<meta name="DC.rights" content=""/>
|
|||
|
<meta name="DC.accessRights" content="public"/>
|
|||
|
<meta name="DC.bibliographicCitation" content="Darwin Core Text Guidelines. 2009"/>
|
|||
|
<meta name="DC.hasPart" content="http://rs.tdwg.org/dwc/xsd/tdwg_simpledarwincore.xsd"/>
|
|||
|
<meta name="DC.isReplacedBy" content=""/>
|
|||
|
<meta name="DC.replaces" content=""/>
|
|||
|
<meta name="DC.language" content="en"/>
|
|||
|
<link rel="meta" href="http://www.tdwg.org/"/>
|
|||
|
<link rel="stylesheet" href="../../../DarwinCore_files/default.css" type="text/css"/>
|
|||
|
<script src="../../../DarwinCore_files/default.js" type="text/javascript"></script>
|
|||
|
</HEAD>
|
|||
|
<BODY>
|
|||
|
<DIV class="header">
|
|||
|
|
|||
|
<TABLE width="100%" cellspacing="0" cellpadding="0" bgcolor="#617394">
|
|||
|
<TBODY><TR>
|
|||
|
<TD width="70"><A href="http://www.tdwg.org"><IMG src="../../../DarwinCore_files/TDWGlogo_Twiki.gif" width="150" height="70" alt="Biodiversity Information Standards (TDWG) logo"></A></TD>
|
|||
|
<TD width="100%" height="70" align="right" valign="top">
|
|||
|
</TABLE>
|
|||
|
|
|||
|
<H1>Darwin Core Text Guidelines</H1>
|
|||
|
<P>
|
|||
|
<TABLE cellspacing="0" class="docinfo">
|
|||
|
<TBODY>
|
|||
|
<TR>
|
|||
|
<TH>Title:</TH>
|
|||
|
<TD>Darwin Core Text Guidelines</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Date Issued:</TH>
|
|||
|
<TD>2009-02-12</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Abstract:</TH>
|
|||
|
<TD>Guidelines for the implementation of Darwin Core in XML.</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Contributors:</TH>
|
|||
|
<TD>John Wieczorek (MVZ)<tuco@berkeley.edu>, Markus Döring (GBIF)<mdoering@gbif.org>, Renato De Giovanni (CRIA)<renato@cria.org.br>, Tim Robertson (GBIF)<trobertson@gbif.org>, Dave Vieglais (KUNHM)<vieglais@ku.edu>, Stan Blum (CAS)<sblum@calacademy.org></TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Legal:</TH>
|
|||
|
<TD>This document is governed by the standard legal, copyright, licensing provisions and disclaimers issued by the Taxonomic Databases Working Group.</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Part of TDWG Standard:</TH>
|
|||
|
<TD>***URL to DwC Standard*** goes here</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Creator:</TH>
|
|||
|
<TD>Darwin Core Task Group</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Identifier:</TH>
|
|||
|
<TD><A href="http://rs.tdwg.org/dwc/terms/xsd/guide/2009-02-12/">http://rs.tdwg.org/dwc/terms/xsd/guide/2009-02-12/</A></TD>
|
|||
|
</TR>
|
|||
|
<TR><TH>Latest Version:</TH>
|
|||
|
<TD><A href="http://rs.tdwg.org/dwc/terms/xsd/guide/">http://rs.tdwg.org/dwc/terms/xsd/guide/</A></TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Replaces:</TH>
|
|||
|
<TD>Not applicable</A></TD>
|
|||
|
</TR>
|
|||
|
<TR><TH>Replaced By:</TH>
|
|||
|
<TD>Not applicable</TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Translations:</TH>
|
|||
|
<TD><A href="http://rs.tdwg.org/dwc/translations/">http://rs.tdwg.org/dwc/translations/</A></TD>
|
|||
|
</TR>
|
|||
|
<TR>
|
|||
|
<TH>Document Status:</TH>
|
|||
|
<TD>This is a TDWG Request for Comment.</TD>
|
|||
|
</TR>
|
|||
|
</TBODY></TABLE>
|
|||
|
|
|||
|
<H2>Table of Contents</H2>
|
|||
|
<P>
|
|||
|
<TABLE width="95%" border="0" align="center"><TBODY>
|
|||
|
<TR><TD width="100%">1. <a href="#introduction">Introduction</a></TD></TR>
|
|||
|
<TR><TD width="100%">2. <a href="#references">References</a></TD></TR>
|
|||
|
<TR><TD width="100%">3. <a href="#terminology">Terminology</a></TD></TR>
|
|||
|
<TR><TD width="100%">4. <a href="#general">General implementation recommendations</a></TD></TR>
|
|||
|
</TBODY></TABLE
|
|||
|
|
|||
|
<A name="introduction" id="introduction"></A>
|
|||
|
<H2>1. Introduction</H2>
|
|||
|
<P>
|
|||
|
This document provides guidelines for the description of Darwin Core data residing in <em>fielded text</em> files (e.g. comma separated values,
|
|||
|
tab delimited files etc.) by means of providing an XML metafile.<br/>
|
|||
|
<img src="images/usage.png"></img><br/>
|
|||
|
</P>
|
|||
|
|
|||
|
<h3>1.1 XML versus <EM>Fielded Text</EM></h3>
|
|||
|
<p>
|
|||
|
Many resources exist on the web describing the advantages of XML (<a href="http://en.wikipedia.org/wiki/XML">http://en.wikipedia.org/wiki/XML</a>) over less structured content such as <em>fielded text</em>.
|
|||
|
These guidelines <b>do not</b> promote the use of <EM>Fielded Text</EM> over XML for data files, but rather provide recommendations for how to handle such data files when necessary.
|
|||
|
<br/>
|
|||
|
2 such scenarios might be
|
|||
|
<ul>
|
|||
|
<li>The transfer of large numbers of Darwin Core <i>simple</i> records from one database to another.
|
|||
|
Typically databases are very efficient at producing and consuming (e.g.) <em>Tab file</em> output.</li>
|
|||
|
<li>The description of legacy data existing in a <em>fielded text</em> format, such that it might be automatically understood and loaded into another system.
|
|||
|
It could be that this system would then re-serve the data in another format such as XML.</li>
|
|||
|
</ul>
|
|||
|
</p>
|
|||
|
|
|||
|
<h3>1.2 Existing Solution</h3>
|
|||
|
<p>
|
|||
|
Proposed standards exist for similar XML metafiles to describe <EM>fielded text</EM> files, such as the <a href="http://www.fieldedtext.org/">FieldedText</a> standard. The FieldedText standard aims to offer description of any
|
|||
|
<EM>fielded text</EM> file including all possible permutations of content. While beneficial to the publisher, this flexibility provides significant challenges to the consumer due to the diverse options that may exist.
|
|||
|
</p>
|
|||
|
|
|||
|
<h3>1.3 Example Metafile Content</h3>
|
|||
|
A simple comma seperated values data file of the following form:
|
|||
|
<PRE class="example">
|
|||
|
ID,ScientificName,IndividualCount
|
|||
|
123,"Cryptantha gypsophila Reveal & C.R. Broome",12
|
|||
|
124,"Buxbaumia piperi",2
|
|||
|
</PRE>
|
|||
|
can be described with the following illustrative Darwin Core metafile (Namespaces omitted for example):
|
|||
|
<PRE class="example">
|
|||
|
<archive fileRoot="http://data.gbif.org/download/">
|
|||
|
<file
|
|||
|
rowType="http://rs.tdwg.org/dwc/text/DarwinRecord"
|
|||
|
location="specimens.csv"
|
|||
|
ignoreHeaderLines="1">
|
|||
|
<field index="0" term="http://rs.tdwg.org/dwc/terms/CatalogNumber" type="xs:integer"/>
|
|||
|
<field index="1" term="http://rs.tdwg.org/dwc/terms/ScientificName" type="xs:string"/>
|
|||
|
<field index="2" term="http://rs.tdwg.org/dwc/terms/IndividualCount" type="xs:integer"/>
|
|||
|
<!-- A constant value has no index, but applies to all rows -->
|
|||
|
<field term="http://rs.tdwg.org/dwc/terms/DatasetID" type="xs:string" default="urn:lsid:tim.lsid.tdwg.org:collections:1"/>
|
|||
|
</file>
|
|||
|
</archive>
|
|||
|
</pre>
|
|||
|
</p>
|
|||
|
|
|||
|
|
|||
|
<A name="references" id="references"></A>
|
|||
|
<H2>2. References</H2>
|
|||
|
<TABLE width="95%" border="0" align="center">
|
|||
|
<TBODY>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="DCTERMS">[DCTERMS]</A></TD>
|
|||
|
<TD width="40%"><A href="http://dublincore.org/documents/dcmi-terms/">http://dublincore.org/documents/dcmi-terms/</A></TD>
|
|||
|
<TD width="50%">Dublin Core Metadata terms.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="FIELDEDTEXT"></A>[FIELDEDTEXT]</TD>
|
|||
|
<TD width="40%"><A href="http://www.fieldedtext.org/">http://www.fieldedtext.org/</A></TD>
|
|||
|
<TD width="50%">Fielded Text proposed standard.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="HISTORY">[HISTORY]</A></TD>
|
|||
|
<TD width="40%"><A href="../../history/index.htm">http://rs.tdwg.org/dwc/terms/history/</A></TD>
|
|||
|
<TD width="50%">Complete historical reference to Darwin Core terms.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="NAMESPACEPOLICY">[NAMESPACEPOLICY]</A></TD>
|
|||
|
<TD width="40%"><A href="../../namespace/index.htm">http://rs.tdwg.org/dwc/terms/namespace/</A></TD>
|
|||
|
<TD width="50%">Policy governing Darwin Core terms.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="TERMS">[TERMS]</A></TD>
|
|||
|
<TD width="40%"><A href="../../index.htm">http://rs.tdwg.org/dwc/terms/</A></TD>
|
|||
|
<TD width="50%">Quick reference to recommended Darwin Core terms.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="TEXTSCHEMA">[TEXTSCHEMA]</A></TD>
|
|||
|
<TD width="40%"><A href="../../../text/tdwg_dwc_text.xsd">http://rs.tdwg.org/dwc/terms/xsd/tdwg_dwc_text.xsd</A></TD>
|
|||
|
<TD width="50%">Simple Darwin Core Text schema.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="VERSIONS"></A>[VERSIONS]</TD>
|
|||
|
<TD width="40%"><A href="../../history/versions/index.htm">http://rs.tdwg.org/dwc/terms/history/versions/</A></TD>
|
|||
|
<TD width="50%">Reference for mapping historical Darwin Core terms to the current recommended terms.</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
<TR>
|
|||
|
<TD width="10%"><A name="XML"></A>[XML]</TD>
|
|||
|
<TD width="40%"><A href="http://www.w3.org/XML/">http://www.w3.org/XML/</A></TD>
|
|||
|
<TD width="50%">Reference site for the Extensible Markup Language (XML).</TD>
|
|||
|
</TR>
|
|||
|
|
|||
|
</TBODY></TABLE>
|
|||
|
|
|||
|
<A name="terminology" id="terminology"></A>
|
|||
|
<H2>3. Terminology</H2>
|
|||
|
<DL>
|
|||
|
<DT><EM>Fielded Text</EM></DT>
|
|||
|
<DD><EM>Fielded Text</EM> refers to a format of structuring a flat text file into rows and columns; examples include comma separated values(<EM>CSV</EM>) and Tab delimited files (<EM>Tab file</EM>) </DD>
|
|||
|
<P>
|
|||
|
|
|||
|
<H2>4. Metafile content description</H2>
|
|||
|
<p>
|
|||
|
The metafile schema is available at <a href="../../../text/tdwg_dwc_text.xsd">tdwg_dwc_text.xsd</a>.
|
|||
|
</p>
|
|||
|
<h3>4.1 The <archive> element</h3>
|
|||
|
<p>
|
|||
|
<table class="border">
|
|||
|
<thead>
|
|||
|
<caption>Attributes</caption>
|
|||
|
<th>Attribute</th>
|
|||
|
<th>Description</th>
|
|||
|
<th>Required</th>
|
|||
|
<th>Default</th>
|
|||
|
</thead>
|
|||
|
<tbody>
|
|||
|
<tr>
|
|||
|
<td class=""><em>fileRoot</em></td>
|
|||
|
<td>Contains a qualified Uniform Resource Locator (URL) defining the root location of the data files being described, and must be publically accessible.
|
|||
|
Valid examples of the format include <i>http://data.gbif.org/collections/</i>, <i>ftp://ftp.gbif.org/public/</i> and <i>http://data.gbif.org/webservices/export?id=</i>. This value will be concatinated
|
|||
|
with the location of the <a href="#fileTag-location"><file></a> and therefore should contain any necessary trailing characters such as / ? etc.</td>
|
|||
|
<td>✓</td>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
</tbody>
|
|||
|
</table>
|
|||
|
<table class="border">
|
|||
|
<thead>
|
|||
|
<caption>Elements</caption>
|
|||
|
<th>Element</th>
|
|||
|
<th>Description</th>
|
|||
|
</thead>
|
|||
|
<tbody>
|
|||
|
<tr>
|
|||
|
<td class=""><a href="#fileTag"><file></a></td>
|
|||
|
<td>An <archive> will contain one or more <a href="#fileTag"><file></a> elements, each representing an individual file being described.</td>
|
|||
|
</tr>
|
|||
|
</tbody>
|
|||
|
</table>
|
|||
|
</p>
|
|||
|
<h3><a name="fileTag">4.2 The <file> element</a></h3>
|
|||
|
<p>
|
|||
|
<table class="border">
|
|||
|
<thead>
|
|||
|
<caption>Attributes</caption>
|
|||
|
<th>Attribute</th>
|
|||
|
<th>Description</th>
|
|||
|
<th>Required</th>
|
|||
|
<th>Default</th>
|
|||
|
</thead>
|
|||
|
<tbody>
|
|||
|
<tr>
|
|||
|
<td class=""><a name="fileTag-location"><em>location</em></a></td>
|
|||
|
<td>Specifies the location of the file relative to the fileRoot - e.g. dwc-data.txt</td>
|
|||
|
<td>✓</td>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>fieldsTerminatedBy</em></td>
|
|||
|
<td>Specifies the delimiter between fields. Typical values might be "," or "\t" for CSV or Tab files respectively.</td>
|
|||
|
<td/>
|
|||
|
<td>\t</td>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>linesTerminatedBy</em></td>
|
|||
|
<td>Specifies the row separator character.</td>
|
|||
|
<td/>
|
|||
|
<td>\n</td>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>compression</em></td>
|
|||
|
<td>Specifies the compression used for the file. May be omitted or specified as one of:
|
|||
|
<dl>
|
|||
|
<dt>GZIP</dt>
|
|||
|
<dd>Data file is compressed as GZIP</dd>
|
|||
|
<dt>ZIP</dt>
|
|||
|
<dd>Data file is compressed as ZIP (E.g. using PKZIP, WinZip, StuffIt etc)</dd>
|
|||
|
</dl>
|
|||
|
<td/>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>encoding</em></td>
|
|||
|
<td>Specifies the encoding for the data file. One of:
|
|||
|
<dl>
|
|||
|
<dt>UTF-8</dt>
|
|||
|
<dd>8-bit Unicode Transformation Format</dd>
|
|||
|
<dt>UTF-16</dt>
|
|||
|
<dd>16-bit Unicode Transformation Format</dd>
|
|||
|
<dt>ISO-8859-1</dt>
|
|||
|
<dd>Commonly known as Latin-1 and a common default of Microsoft Windows based operating systems</dd>
|
|||
|
<dt>windows-1252</dt>
|
|||
|
<dd>Commonly known as WinLatin and a common default of legacy versions of Microsoft Windows based operating systems</dd>
|
|||
|
</dl>
|
|||
|
</td>
|
|||
|
<td/>
|
|||
|
<td>ISO-8859-1</td>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>ignoreHeaderLines</em></td>
|
|||
|
<td>Specifies the number lines to ignore from the beginning of the file. This can be used to ignore files with column headings or preamble comments for example.</td>
|
|||
|
<td/>
|
|||
|
<td>0</td>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>rowType</em></td>
|
|||
|
<td>
|
|||
|
A Unified Resource Identifier (URI) for the term identifying the class of data represented by each row.
|
|||
|
See <a href="../../index.htm">Darwin Core Terms</a> definitions. Additional classes may be referenced by URI and defined outside the Darwin Core specification.
|
|||
|
For convienience the classes defined by Darwin Core are listed below:
|
|||
|
<dl>
|
|||
|
<dt>Simple Darwin Core</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/text/DarwinRecord</dd>
|
|||
|
<dt>Dataset</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/Dataset</dd>
|
|||
|
<dt>Sample</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/Sample</dd>
|
|||
|
<dt>SamplingEvent</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/SamplingEvent</dd>
|
|||
|
<dt>SamplingLocation</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/SamplingLocation</dd>
|
|||
|
<dt>Identification</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/Identification</dd>
|
|||
|
<dt>Taxon</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/Taxon</dd>
|
|||
|
<dt>RelatedResource</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/RelatedResource</dd>
|
|||
|
<dt>SampleAttribute</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/SampleAttribute</dd>
|
|||
|
<dt>EventAttribute</dt>
|
|||
|
<dd>http://rs.tdwg.org/dwc/terms/EventAttribute</dd>
|
|||
|
</dl>
|
|||
|
|
|||
|
</td>
|
|||
|
<td>✓</td>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>dateFormat</em></td>
|
|||
|
<td>When verbatum dates are used, this field can be used to indicate the format represented. It is recommended to use the date, dateTime and time for field formats wherever possible, but where verbatum dates are required, a format may be specified here.
|
|||
|
This should be considered a 'hint' for consumers. It is recommended that consumers support the minimum combinations of DD MM and YYYY with the separators / and -. Examples are given:
|
|||
|
<dl>
|
|||
|
<dt>DDMMYYYY</dt>
|
|||
|
<dd>E.g. for dates in format 21121978</dd>
|
|||
|
<dt>DD-MM-YYYY</dt>
|
|||
|
<dd>E.g. for dates in format 21-12-1978</dd>
|
|||
|
<dt>MMDDYYYY</dt>
|
|||
|
<dd>E.g. for dates in format 12211978</dd>
|
|||
|
<dt>MM-DD-YYYY</dt>
|
|||
|
<dd>E.g. for dates in format 12-21-1978</dd>
|
|||
|
<dt>YYYYMMDD</dt>
|
|||
|
<dd>E.g. for dates in format 19781221</dd>
|
|||
|
</dl>
|
|||
|
</td>
|
|||
|
<td/>
|
|||
|
<td></td>
|
|||
|
</tr>
|
|||
|
|
|||
|
</tbody>
|
|||
|
</table>
|
|||
|
<table class="border">
|
|||
|
<thead>
|
|||
|
<caption>Elements</caption>
|
|||
|
<th>Attribute</th>
|
|||
|
<th>Description</th>
|
|||
|
</thead>
|
|||
|
<tbody>
|
|||
|
<tr>
|
|||
|
<td class=""><a href="#field"><field></a></td>
|
|||
|
<td>A <file> will contain one or more <a href="#fieldTag"><field></a> elements, each representing a 'column' in the row</td>
|
|||
|
</tr>
|
|||
|
</tbody>
|
|||
|
</table>
|
|||
|
</p>
|
|||
|
<h3><a name="fieldTag">4.2 The <field> element</a></h3>
|
|||
|
<p>
|
|||
|
<table class="border">
|
|||
|
<thead>
|
|||
|
<caption>Attributes</caption>
|
|||
|
<th>Attribute</th>
|
|||
|
<th>Description</th>
|
|||
|
<th>Required</th>
|
|||
|
<th>Default</th>
|
|||
|
</thead>
|
|||
|
<tbody>
|
|||
|
<tr>
|
|||
|
<td class=""><em>index</em></td>
|
|||
|
<td>Specifies the column index from the row. The first column is column 0, the second column 1 etc.
|
|||
|
If no column index is specified, then the term and the default may be used to define a constant value for all rows</td>
|
|||
|
<td/>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>term</em></td>
|
|||
|
<td>A Unified Resource Identifier (URI) for the term identifying the property of data represented by this field.
|
|||
|
For example, a scientific name would be http://rs.tdwg.org/dwc/terms/ScientificName.
|
|||
|
Terms outside of the Darwin Core specification may be used, such as those from the Dublin Core Metadata Initative.
|
|||
|
</td>
|
|||
|
<td>✓</td>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>type</em></td>
|
|||
|
<td>Specifies the type of the content represented in the column. The following values are supported.
|
|||
|
<dl>
|
|||
|
<dt>string</dt>
|
|||
|
<dd>Represents a sequence of characters, and should be used where no other type is appropriate</dd>
|
|||
|
<dt>integer</dt>
|
|||
|
<dd>Represents a whole numeric value (e.g. 123)</dd>
|
|||
|
<dt>decimal</dt>
|
|||
|
<dd>Represents a decimal value (e.g. 10.34). Decimal point must be represented by the character . otherwise the field must be declared as a string type</dd>
|
|||
|
<dt>dateTime</dt>
|
|||
|
<dd>Represents the combination of a date and time, in the format [-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]. Valid values include 2001-10-26T21:32:52, 2001-10-26T21:32:52+02:00, 2001-10-26T19:32:52Z, 2001-10-26T19:32:52+00:00, -2001-10-26T21:32:52, and 2001-10-26T21:32:52.12679. Where this format cannot be used, the string type must be declared</dd>
|
|||
|
<dt>date</dt>
|
|||
|
<dd>Represents a date in the format [-]CCYY-MM-DD[Z|(+|-)hh:mm]. Valid values include 2001-10-26, 2001-10-26+02:00, 2001-10-26Z, 2001-10-26+00:00, -2001-10-26, and -20000-04-01. Where this format cannot be used, the string type must be declared</dd>
|
|||
|
<dt>time</dt>
|
|||
|
<dd>Represents a time in the format hh:mm:ss[Z|(+|-)hh:mm]. Valid values include 21:32:52, 21:32:52+02:00, 19:32:52Z, 19:32:52+00:00, and 21:32:52.12679. Where this format cannot be used, the string type must be declared</dd>
|
|||
|
</dl>
|
|||
|
TODO: See guidelines for type specification</td>
|
|||
|
<td/>
|
|||
|
<td>string</td>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>format</em></td>
|
|||
|
<td>TODO - finish decision on format</td>
|
|||
|
<td/>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
<tr>
|
|||
|
<td class=""><em>default</em></td>
|
|||
|
<td>Used to optionally specify a default value should there not be one supplied in any given row. If no index is supplied, this can be used to define a constant applicable to all rows.</td>
|
|||
|
<td/>
|
|||
|
<td/>
|
|||
|
</tr>
|
|||
|
</tbody>
|
|||
|
</table>
|
|||
|
</p>
|
|||
|
<H2>5. General implementation guidelines</H2>
|
|||
|
<H3>5.1 Single and multiple data files</H3>
|
|||
|
<h4>5.1.1 Single data file</h4>
|
|||
|
In its simplest usage, a single data file can be described.
|
|||
|
Specifically the file location, the row type and the field mapping are provided.
|
|||
|
<br/>
|
|||
|
<img src="images/singleDataFile.png">
|
|||
|
<br/>
|
|||
|
<pre class="example">
|
|||
|
<!-- Namespaces omitted for example -->
|
|||
|
<archive fileRoot="http://mydata.org/">
|
|||
|
<file rowType="http://rs.tdwg.org/dwc/terms/text/DarwinRecord"
|
|||
|
location="specimens.txt">
|
|||
|
<field index="0"
|
|||
|
term="http://rs.tdwg.org/dwc/terms/CatalogNumber"
|
|||
|
type="xs:integer"/>
|
|||
|
<field index="1"
|
|||
|
term="http://rs.tdwg.org/dwc/terms/ScientificName"
|
|||
|
type="xs:string"/>
|
|||
|
</file>
|
|||
|
</archive>
|
|||
|
</pre>
|
|||
|
|
|||
|
<h4>5.1.2 Multiple unrelated data files</h4>
|
|||
|
Multiple files containing no inter-file relationships may be described with a single metafile.
|
|||
|
The files must reside at the same 'root' location. A typical example for this usage might be multiple dataset files each with a common format.
|
|||
|
<br/>
|
|||
|
<img src="images/unrelatedDataFiles.png">
|
|||
|
<br/>
|
|||
|
<pre class="example">
|
|||
|
<!-- Namespaces omitted for example -->
|
|||
|
<archive fileRoot="http://mydata.org/">
|
|||
|
<file rowType="http://rs.tdwg.org/dwc/text/DarwinRecord"
|
|||
|
location="aves.txt">
|
|||
|
<!-- field definitions omitted for example -->
|
|||
|
</file>
|
|||
|
<file rowType="http://rs.tdwg.org/dwc/text/DarwinRecord"
|
|||
|
location="lepidoptera.txt">
|
|||
|
<!-- field definitions omitted for example -->
|
|||
|
</file>
|
|||
|
</archive>
|
|||
|
</pre>
|
|||
|
|
|||
|
<h4>5.1.3 Multiple related data files</h4>
|
|||
|
When the content of one data file relates to another data file, a relationship can be expressed in the metafile using the <relationships> element.
|
|||
|
In database terminology, this is equivalent to defining a foreign key constraint from one table to another.
|
|||
|
However, where a database has the ability to enforce this relationship, <em>fielded text</em> files do not have this capability. The following guidelines are recommended:<br/>
|
|||
|
<ul>
|
|||
|
<li>The fields on either end of a relationship must be of the same type (e.g. xs:integer)</li>
|
|||
|
<li>To indicate a single row is not related, no value must be provided. The use of 0, -1, \N, NULL are not to be used to indicate this</li>
|
|||
|
<li>The data provider must ensure that data has integrity - that the target of a relationship does indeed exist</li>
|
|||
|
</ul>
|
|||
|
Therefore care must be taken by the data provider that the relationship expressed is indeed valid, and that the data integrity is not broken.
|
|||
|
<br/>
|
|||
|
<img src="images/relatedDataFiles.png">
|
|||
|
<br/>
|
|||
|
<pre class="example">
|
|||
|
<!-- Namespaces omitted for example -->
|
|||
|
<archive fileRoot="http://mydata.org/">
|
|||
|
<file rowType="http://rs.tdwg.org/dwc/terms/Sample"
|
|||
|
location="specimens.txt">
|
|||
|
<field index="0" term="http://rs.tdwg.org/dwc/terms/CatalogNumber"/>
|
|||
|
<field index="1" term="http://rs.tdwg.org/dwc/terms/IndividualCount"/>
|
|||
|
</file>
|
|||
|
|
|||
|
<file rowType="http://rs.tdwg.org/dwc/terms/Identification"
|
|||
|
location="identifications.txt">
|
|||
|
<field index="0" term="http://rs.tdwg.org/dwc/terms/IdentificationID"/>
|
|||
|
<field index="1" term="http://rs.tdwg.org/dwc/terms/IdentifiedBy"/>
|
|||
|
<field index="2" term="http://rs.tdwg.org/dwc/terms/CatalogNumber"/>
|
|||
|
<field index="3" term="http://rs.tdwg.org/dwc/terms/ScientificName"/>
|
|||
|
</file>
|
|||
|
|
|||
|
<relationships>
|
|||
|
<relationship>
|
|||
|
<file location="specimens.txt" fieldIndex="0"/>
|
|||
|
<file location="identifications.txt" fieldIndex="2"/>
|
|||
|
</relationship>
|
|||
|
</relationships>
|
|||
|
</archive>
|
|||
|
</pre>
|
|||
|
<br/>
|
|||
|
<p>
|
|||
|
<b>Note:</b><br/>
|
|||
|
Although feasible, it is <b>not</b> recommended to express a relationship from one file to itself.
|
|||
|
This recommendation is made since no description of the relationship type may be expressed.
|
|||
|
</p>
|
|||
|
|
|||
|
<H3>5.2 Field Type Guidelines</H3>
|
|||
|
<p>
|
|||
|
Most terms should be typed as "string" with the exception of the following terms, which are listed with proposed types:
|
|||
|
</p>
|
|||
|
<table class="border">
|
|||
|
<thead>
|
|||
|
<caption>Non string term mappings</caption>
|
|||
|
<th>Term</th>
|
|||
|
<th>Recommended Types</th>
|
|||
|
<th>Comments</th>
|
|||
|
</thead>
|
|||
|
<tbody>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/DateIdentified</td><td>dateTime, date, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/EarliestDateCollected</td><td>dateTime, date, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/EventAttributeDeterminedDate</td><td>dateTime, date, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/LatestDateCollected</td><td>dateTime, date, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/SampleAttributeDeterminedDate</td><td>dateTime, date, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/VerbatimCollectingDate</td><td>dateTime, date, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/CoordinatePrecision</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/CoordinateUncertaintyInMeters</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/DistanceAboveSurfaceInMetersMaximum</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/DistanceAboveSurfaceInMetersMinimum</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/EventAttributeAccuracy</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/EventAttributeValue</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/MaximumDepthInMeters</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/MaximumElevationInMeters</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/MinimumDepthInMeters</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/MinimumElevationInMeters</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/SampleAttributeAccuracy</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/SampleAttributeValue</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/VerbatimDepth</td><td>decimal, int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/DecimalLatitude</td><td>decimal, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/DecimalLongitude</td><td>decimal, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/CatalogNumberNumeric</td><td>int</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/DayOfMonth</td><td>int, string</td><td>using 1 as 1st of the month</td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/EndDayOfYear</td><td>int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/IndividualCount</td><td>int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/MonthOfYear</td><td>int, string</td><td>using 1 as January</td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/PointRadiusSpatialFit</td><td>int, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/StartDayOfYear</td><td>int, string</td><td>using 1 as January 1st</td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/YearSampled</td><td>int, string</td><td>in the format CCYY e.g. 2001</td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/EndTimeOfDay</td><td>time, string</td><td></td></tr>
|
|||
|
<tr><td>http://rs.tdwg.org/dwc/terms/StartTimeOfDay</td><td>time, string</td><td></td></tr>
|
|||
|
</tbody>
|
|||
|
</table>
|
|||
|
|
|||
|
<H2>6. Database exporting examples</H2>
|
|||
|
<H3>6.1 Mysql</H3>
|
|||
|
Using the <code>select into outfile</code> command it is very easy to produce <em>fielded text</em> from mysql.<br/>
|
|||
|
The encoding of the resulting file will depend on the server variables and collations used, and might need modified before the operation.
|
|||
|
It is worth noting that mysql will represent NULL values as \N by default and therefore the isNull() function must be used.
|
|||
|
<pre class="example">
|
|||
|
SELECT
|
|||
|
IFNULL(id, ''), IFNULL(scientific_name, ''), IFNULL(count,'')
|
|||
|
INTO outfile '/tmp/dwc.txt'
|
|||
|
FIELDS TERMINATED BY ','
|
|||
|
OPTIONALLY ENCLOSED BY '"'
|
|||
|
LINES TERMINATED BY '\n'
|
|||
|
FROM
|
|||
|
dwc;
|
|||
|
</pre>
|
|||
|
|
|||
|
<h2>7. Guidelines for consumers</h2>
|
|||
|
It goes beyond the scope of these guidelines to specify how a consumer must deal with related data. However, the following procedure is recommended for a database import:
|
|||
|
<ul>
|
|||
|
<li>Create tables for each described file with no constraints</li>
|
|||
|
<li>Import file content into temporary tables</li>
|
|||
|
<li>Check data integrity by testing the expressed join</li>
|
|||
|
<li>Copy data into tables enforcing the relationship, or add constraint to newly created tables</li>
|
|||
|
</ul>
|
|||
|
|
|||
|
<!-- Footer -->
|
|||
|
<hr>
|
|||
|
<p><a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/">
|
|||
|
<img alt="Creative Commons License" id="creative_commons_icon" src="http://i.creativecommons.org/l/by/3.0/88x31.png" /></a>
|
|||
|
Copyright 2009 - Biodiversity Information Standards - TDWG - <a href="http://www.tdwg.org/about-tdwg/contact-us/">Contact Us</a><br/>
|
|||
|
<p>Except where otherwise noted, content on this site is licensed under a
|
|||
|
<a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/"> Creative Commons
|
|||
|
Attribution 3.0 United States License</a>.</p>
|
|||
|
|
|||
|
</BODY></HTML>
|