Title: | Darwin Core Text Guidelines |
---|---|
Date Issued: | 2009-02-12 |
Abstract: | Guidelines for the implementation of Darwin Core in XML. |
Contributors: | John Wieczorek (MVZ) |
Legal: | This document is governed by the standard legal, copyright, licensing provisions and disclaimers issued by the Taxonomic Databases Working Group. |
Part of TDWG Standard: | ***URL to DwC Standard*** goes here |
Creator: | Darwin Core Task Group |
Identifier: | http://rs.tdwg.org/dwc/terms/xsd/guide/2009-02-12/ |
Latest Version: | http://rs.tdwg.org/dwc/terms/xsd/guide/ |
Replaces: | Not applicable |
Replaced By: | Not applicable |
Translations: | http://rs.tdwg.org/dwc/translations/ |
Document Status: | This is a TDWG Request for Comment. |
1. Introduction |
2. References |
3. Terminology |
4. General implementation recommendations |
This document provides guidelines for the description of Darwin Core data residing in fielded text files (e.g. comma separated values,
tab delimited files etc.) by means of providing an XML metafile.
Many resources exist on the web describing the advantages of XML (http://en.wikipedia.org/wiki/XML) over less structured content such as fielded text.
These guidelines do not promote the use of Fielded Text over XML for data files, but rather provide recommendations for how to handle such data files when necessary.
2 such scenarios might be
Proposed standards exist for similar XML metafiles to describe fielded text files, such as the FieldedText standard. The FieldedText standard aims to offer description of any fielded text file including all possible permutations of content. While beneficial to the publisher, this flexibility provides significant challenges to the consumer due to the diverse options that may exist.
ID,ScientificName,IndividualCount 123,"Cryptantha gypsophila Reveal & C.R. Broome",12 124,"Buxbaumia piperi",2can be described with the following illustrative Darwin Core metafile (Namespaces omitted for example):
<archive fileRoot="http://data.gbif.org/download/"> <file rowType="http://rs.tdwg.org/dwc/text/DarwinRecord" location="specimens.csv" ignoreHeaderLines="1"> <field index="0" term="http://rs.tdwg.org/dwc/terms/CatalogNumber" type="xs:integer"/> <field index="1" term="http://rs.tdwg.org/dwc/terms/ScientificName" type="xs:string"/> <field index="2" term="http://rs.tdwg.org/dwc/terms/IndividualCount" type="xs:integer"/> <!-- A constant value has no index, but applies to all rows --> <field term="http://rs.tdwg.org/dwc/terms/DatasetID" type="xs:string" default="urn:lsid:tim.lsid.tdwg.org:collections:1"/> </file> </archive>
[DCTERMS] | http://dublincore.org/documents/dcmi-terms/ | Dublin Core Metadata terms. |
[FIELDEDTEXT] | http://www.fieldedtext.org/ | Fielded Text proposed standard. |
[HISTORY] | http://rs.tdwg.org/dwc/terms/history/ | Complete historical reference to Darwin Core terms. |
[NAMESPACEPOLICY] | http://rs.tdwg.org/dwc/terms/namespace/ | Policy governing Darwin Core terms. |
[TERMS] | http://rs.tdwg.org/dwc/terms/ | Quick reference to recommended Darwin Core terms. |
[TEXTSCHEMA] | http://rs.tdwg.org/dwc/terms/xsd/tdwg_dwc_text.xsd | Simple Darwin Core Text schema. |
[VERSIONS] | http://rs.tdwg.org/dwc/terms/history/versions/ | Reference for mapping historical Darwin Core terms to the current recommended terms. |
[XML] | http://www.w3.org/XML/ | Reference site for the Extensible Markup Language (XML). |
The metafile schema is available at tdwg_dwc_text.xsd.
Attribute | Description | Required | Default |
---|---|---|---|
fileRoot | Contains a qualified Uniform Resource Locator (URL) defining the root location of the data files being described, and must be publically accessible. Valid examples of the format include http://data.gbif.org/collections/, ftp://ftp.gbif.org/public/ and http://data.gbif.org/webservices/export?id=. This value will be concatinated with the location of the <file> and therefore should contain any necessary trailing characters such as / ? etc. | ✓ |
Element | Description |
---|---|
<file> | An <archive> will contain one or more <file> elements, each representing an individual file being described. |
Attribute | Description | Required | Default |
---|---|---|---|
location | Specifies the location of the file relative to the fileRoot - e.g. dwc-data.txt | ✓ | |
fieldsTerminatedBy | Specifies the delimiter between fields. Typical values might be "," or "\t" for CSV or Tab files respectively. | \t | |
linesTerminatedBy | Specifies the row separator character. | \n | |
compression | Specifies the compression used for the file. May be omitted or specified as one of:
| ||
encoding | Specifies the encoding for the data file. One of:
|
ISO-8859-1 | |
ignoreHeaderLines | Specifies the number lines to ignore from the beginning of the file. This can be used to ignore files with column headings or preamble comments for example. | 0 | |
rowType |
A Unified Resource Identifier (URI) for the term identifying the class of data represented by each row.
See Darwin Core Terms definitions. Additional classes may be referenced by URI and defined outside the Darwin Core specification.
For convienience the classes defined by Darwin Core are listed below:
|
✓ | |
dateFormat | When verbatum dates are used, this field can be used to indicate the format represented. It is recommended to use the date, dateTime and time for field formats wherever possible, but where verbatum dates are required, a format may be specified here.
This should be considered a 'hint' for consumers. It is recommended that consumers support the minimum combinations of DD MM and YYYY with the separators / and -. Examples are given:
|
Attribute | Description |
---|---|
<field> | A <file> will contain one or more <field> elements, each representing a 'column' in the row |
Attribute | Description | Required | Default |
---|---|---|---|
index | Specifies the column index from the row. The first column is column 0, the second column 1 etc. If no column index is specified, then the term and the default may be used to define a constant value for all rows | ||
term | A Unified Resource Identifier (URI) for the term identifying the property of data represented by this field. For example, a scientific name would be http://rs.tdwg.org/dwc/terms/ScientificName. Terms outside of the Darwin Core specification may be used, such as those from the Dublin Core Metadata Initative. | ✓ | |
type | Specifies the type of the content represented in the column. The following values are supported.
|
string | |
format | TODO - finish decision on format | ||
default | Used to optionally specify a default value should there not be one supplied in any given row. If no index is supplied, this can be used to define a constant applicable to all rows. |
<!-- Namespaces omitted for example --> <archive fileRoot="http://mydata.org/"> <file rowType="http://rs.tdwg.org/dwc/terms/text/DarwinRecord" location="specimens.txt"> <field index="0" term="http://rs.tdwg.org/dwc/terms/CatalogNumber" type="xs:integer"/> <field index="1" term="http://rs.tdwg.org/dwc/terms/ScientificName" type="xs:string"/> </file> </archive>
<!-- Namespaces omitted for example --> <archive fileRoot="http://mydata.org/"> <file rowType="http://rs.tdwg.org/dwc/text/DarwinRecord" location="aves.txt"> <!-- field definitions omitted for example --> </file> <file rowType="http://rs.tdwg.org/dwc/text/DarwinRecord" location="lepidoptera.txt"> <!-- field definitions omitted for example --> </file> </archive>
<!-- Namespaces omitted for example --> <archive fileRoot="http://mydata.org/"> <file rowType="http://rs.tdwg.org/dwc/terms/Sample" location="specimens.txt"> <field index="0" term="http://rs.tdwg.org/dwc/terms/CatalogNumber"/> <field index="1" term="http://rs.tdwg.org/dwc/terms/IndividualCount"/> </file> <file rowType="http://rs.tdwg.org/dwc/terms/Identification" location="identifications.txt"> <field index="0" term="http://rs.tdwg.org/dwc/terms/IdentificationID"/> <field index="1" term="http://rs.tdwg.org/dwc/terms/IdentifiedBy"/> <field index="2" term="http://rs.tdwg.org/dwc/terms/CatalogNumber"/> <field index="3" term="http://rs.tdwg.org/dwc/terms/ScientificName"/> </file> <relationships> <relationship> <file location="specimens.txt" fieldIndex="0"/> <file location="identifications.txt" fieldIndex="2"/> </relationship> </relationships> </archive>
Note:
Although feasible, it is not recommended to express a relationship from one file to itself.
This recommendation is made since no description of the relationship type may be expressed.
Most terms should be typed as "string" with the exception of the following terms, which are listed with proposed types:
Term | Recommended Types | Comments |
---|---|---|
http://rs.tdwg.org/dwc/terms/DateIdentified | dateTime, date, string | |
http://rs.tdwg.org/dwc/terms/EarliestDateCollected | dateTime, date, string | |
http://rs.tdwg.org/dwc/terms/EventAttributeDeterminedDate | dateTime, date, string | |
http://rs.tdwg.org/dwc/terms/LatestDateCollected | dateTime, date, string | |
http://rs.tdwg.org/dwc/terms/SampleAttributeDeterminedDate | dateTime, date, string | |
http://rs.tdwg.org/dwc/terms/VerbatimCollectingDate | dateTime, date, string | |
http://rs.tdwg.org/dwc/terms/CoordinatePrecision | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/CoordinateUncertaintyInMeters | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/DistanceAboveSurfaceInMetersMaximum | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/DistanceAboveSurfaceInMetersMinimum | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/EventAttributeAccuracy | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/EventAttributeValue | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/MaximumDepthInMeters | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/MaximumElevationInMeters | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/MinimumDepthInMeters | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/MinimumElevationInMeters | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/SampleAttributeAccuracy | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/SampleAttributeValue | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/VerbatimDepth | decimal, int, string | |
http://rs.tdwg.org/dwc/terms/DecimalLatitude | decimal, string | |
http://rs.tdwg.org/dwc/terms/DecimalLongitude | decimal, string | |
http://rs.tdwg.org/dwc/terms/CatalogNumberNumeric | int | |
http://rs.tdwg.org/dwc/terms/DayOfMonth | int, string | using 1 as 1st of the month |
http://rs.tdwg.org/dwc/terms/EndDayOfYear | int, string | |
http://rs.tdwg.org/dwc/terms/IndividualCount | int, string | |
http://rs.tdwg.org/dwc/terms/MonthOfYear | int, string | using 1 as January |
http://rs.tdwg.org/dwc/terms/PointRadiusSpatialFit | int, string | |
http://rs.tdwg.org/dwc/terms/StartDayOfYear | int, string | using 1 as January 1st |
http://rs.tdwg.org/dwc/terms/YearSampled | int, string | in the format CCYY e.g. 2001 |
http://rs.tdwg.org/dwc/terms/EndTimeOfDay | time, string | |
http://rs.tdwg.org/dwc/terms/StartTimeOfDay | time, string |
select into outfile
command it is very easy to produce fielded text from mysql.SELECT IFNULL(id, ''), IFNULL(scientific_name, ''), IFNULL(count,'') INTO outfile '/tmp/dwc.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM dwc;
Copyright 2009 - Biodiversity Information Standards - TDWG - Contact Us
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 United States License.