mirror of https://github.com/tdwg/dwc.git
commit
686fb33dd8
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,194 @@
|
|||
# Simple Darwin Core
|
||||
|
||||
Title
|
||||
: Simple Darwin Core
|
||||
|
||||
Date version issued
|
||||
: 2015-06-02
|
||||
|
||||
Date created
|
||||
: 2009-04-21
|
||||
|
||||
Part of TDWG Standard
|
||||
: <http://www.tdwg.org/standards/450/>
|
||||
|
||||
This version
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2014-11-08>
|
||||
|
||||
Latest Version
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/>
|
||||
|
||||
Previous version
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2013-10-22>
|
||||
|
||||
Replaced by
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2021-07-15>
|
||||
|
||||
Abstract
|
||||
: This document is a reference for the Simple Darwin Core standard.
|
||||
|
||||
Contributors
|
||||
: John Wieczorek (MVZ), Markus Döring (GBIF), Renato De Giovanni (CRIA), Tim Robertson (GBIF), Dave Vieglais (KUNHM)
|
||||
|
||||
Creator
|
||||
: Darwin Core Task Group
|
||||
|
||||
Bibliographic citation
|
||||
: Darwin Core Task Group. 2014. Simple Darwin Core. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/simple/2014-11-08>
|
||||
|
||||
## 1 Introduction
|
||||
|
||||
Simple Darwin Core is a predefined subset of the terms that have common use across a wide variety of biodiversity applications. The terms used in Simple Darwin Core are those that are found at the cross-section of taxonomic names, places, and events that document biological occurrences on the planet. The two driving principles are simplicity and flexibility.
|
||||
|
||||
### 1.1 Status of the content of this document
|
||||
|
||||
All sections of this document are normative, except for examples, which are explicitly marked as non-normative.
|
||||
|
||||
## 2 Audience
|
||||
|
||||
This document is targeted toward those who want to share biodiversity information using the simplest methods and structure: Simple Darwin Core. It explains the uses and limitations of this structure and how to expand upon it.
|
||||
|
||||
## 3 What makes it simple?
|
||||
|
||||
Simple Darwin Core is simple in that it assumes (and allows) no structure beyond the concept of rows and columns, which might be thought of as attributes and their values, or fields and records. The words field and record will be used throughout the rest of the document to refer to the two dimensions of the Simple Darwin Core structure. Think of the term names as the field names. In other words, a Simple Darwin Core record could be captured in a spreadsheet or in a single database table.
|
||||
|
||||
## 4 What makes it flexible?
|
||||
|
||||
Simple Darwin Core has minimal restrictions on which fields are required (none). You might argue that there should be more required fields, that there isn't anything useful you can do without them. That is partially true. A record with no fields in it wouldn't be very interesting, but there is a difference between requiring that there be a field in a record and requiring that a particular field be in all records. By having no required field restriction, Simple Darwin Core can be used to share any meaningful combination of fields - for example, to share "just names", or "just places", or observations of individuals detected in the wild at a given place and time following a method (an occurrence). This flexibility promotes the reuse of the terms and sharing mechanisms for a wide variety of services.
|
||||
|
||||
## 5 Are there any rules?
|
||||
|
||||
There are just a few general guiding principles on how to make the best use of Simple Darwin Core:
|
||||
|
||||
1. Any Darwin Core term name can be used as a field name.
|
||||
2. No field name may be repeated in a record.
|
||||
3. Do not use a _Class_ ([`Occurrence`](http://rs.tdwg.org/dwc/terms/Occurrence), [`Organism`](http://rs.tdwg.org/dwc/terms/Organism), [`MaterialSample`](http://rs.tdwg.org/dwc/terms/MaterialSample), [`LivingSpecimen`](http://rs.tdwg.org/dwc/terms/LivingSpecimen), [`PreservedSpecimen`](http://rs.tdwg.org/dwc/terms/PreservedSpecimen), [`FossilSpecimen`](http://rs.tdwg.org/dwc/terms/FossilSpecimen), [`Event`](http://rs.tdwg.org/dwc/terms/Event), [`HumanObservation`](http://rs.tdwg.org/dwc/terms/HumanObservation), [`MachineObservation`](http://rs.tdwg.org/dwc/terms/MachineObservation), [`Location`](http://rs.tdwg.org/dwc/terms/Location), [`GeologicalContext`](http://rs.tdwg.org/dwc/terms/GeologicalContext), [`Identification`](http://rs.tdwg.org/dwc/terms/Identification), [`Taxon`](http://rs.tdwg.org/dwc/terms/Taxon)) as a field.
|
||||
4. Provide data in as many fields as you can.
|
||||
5. Use the [`dcterms:type`](http://rs.tdwg.org/dwc/terms/dcterms:type) field to provide the name of the what Dublin Core type class (`PhysicalObject`, `StillImage`, `MovingImage`, `Sound`, `Text`) the record represents.
|
||||
6. Use the [`basisOfRecord`](http://rs.tdwg.org/dwc/terms/basisOfRecord) field to provide the name of the most specific Darwin Core class (`LivingSpecimen`, `PreservedSpecimen`, `FossilSpecimen`, `MaterialSample`, `HumanObservation`, `MachineObservation`, `Event`, `Occurrence`, `Taxon`, `Identification`, `Organism`, `Location`, `GeologicalContext`, `MeasurementOrFact`, `ResourceRelationship`) the record represents.
|
||||
7. Populate fields with data that match the definition of the field.
|
||||
8. Use the controlled vocabulary for the values of fields that recommend them.
|
||||
9. If data are withheld, use [`informationWithheld`](http://rs.tdwg.org/dwc/terms/informationWithheld) to say so.
|
||||
10. If data are shared in lower quality than the original, use [`dataGeneralizations`](http://rs.tdwg.org/dwc/terms/dataGeneralizations) to say so.
|
||||
|
||||
Every field in Simple Darwin Core may appear either once or not at all in a single record - otherwise how could you distinguish one [`scientificName`](http://rs.tdwg.org/dwc/terms/scientificName) field from another one? Think of a database table. It will not allow you to have the same name for two different fields. Because of this design restriction (lack of flexibility for the sake of simplicity), the auxiliary fields from the [`MeasurementOrFact`](http://rs.tdwg.org/dwc/terms/MeasurementOrFact) and [`ResourceRelationship`](http://rs.tdwg.org/dwc/terms/ResourceRelationship) classes are of somewhat limited utility here - you could only share one `MeasurementOrFact` and one `ResourceRelationship` per record. You might argue then that there is no way to share information that requires related structures, such as a history of identifications of a specimen. That is mostly true. The only recourse within Simple Darwin Core is to force the data into one of the catch all "list" terms such as [`recordedBy`](http://rs.tdwg.org/dwc/terms/recordedBy), [`preparations`](http://rs.tdwg.org/dwc/terms/preparations), [`otherCatalogNumbers`](http://rs.tdwg.org/dwc/terms/otherCatalogNumbers), [`associatedMedia`](http://rs.tdwg.org/dwc/terms/associatedMedia), [`associatedReferences`](http://rs.tdwg.org/dwc/terms/associatedReferences), [`associatedSequences`](http://rs.tdwg.org/dwc/terms/associatedSequences), [`associatedTaxa`](http://rs.tdwg.org/dwc/terms/associatedTaxa), [`associatedOccurrences`](http://rs.tdwg.org/dwc/terms/associatedOccurrences), [`associatedOrganisms`](http://rs.tdwg.org/dwc/terms/associatedOrganisms), [`previousIdentifications`](http://rs.tdwg.org/dwc/terms/previousIdentifications), [`higherGeography`](http://rs.tdwg.org/dwc/terms/higherGeography), [`georeferencedBy`](http://rs.tdwg.org/dwc/terms/georeferencedBy), [`georeferenceSources`](http://rs.tdwg.org/dwc/terms/georeferenceSources), [`identifiedBy`](http://rs.tdwg.org/dwc/terms/identifiedBy), [`identificationReferences`](http://rs.tdwg.org/dwc/terms/identificationReferences), and [`higherClassification`](http://rs.tdwg.org/dwc/terms/higherClassification).
|
||||
|
||||
There is a difference between having data in a field and requiring that field to have a value from among a legal set of values. Darwin Core is simple in that it has minimal restrictions on the contents of fields. The term comments give recommendations about the use of controlled vocabularies and how to structure content wherever appropriate. Data contributors are encouraged to follow these recommendations as well as possible. You might argue that having no restrictions will promote "dirty" data (data of low quality or dubious value). Consider the simple axiom "It's not what you have, but what you do with it that matters." If data restrictions were in place at the fundamental level, then a record having any non-compliant data in any of its fields could not be shared via the standard. Not only would there be a dearth of shared data in that case (or an unused standard), but also there would be no way to use the standard to build shared data cleaning tools to actually improve the situation, nor to use data services to look up alternative representations (language translations, for example) to serve a broader audience. The rest is up to how the records will be used - in other words, it is up to applications to enforce further restrictions if appropriate, and it is up to the stakeholders of those applications to decide what the restrictions will be for the purpose the application is trying to serve.
|
||||
|
||||
## 6 How do I use Simple Darwin Core?
|
||||
|
||||
Darwin Core is simple in that data "complying with" Simple Darwin Core can be easily shared in a variety of ways, including, but not limited to, text files and xml documents. Equivalent ways of sharing the same data are described in the sections [Simple Darwin Core as Text](#61-simple-darwin-core-as-text) and [Simple Darwin Core as XML](#62-simple-darwin-core-as-xml).
|
||||
|
||||
What you need to do as a contributor of data via Simple Darwin Core depends on the requirements of the ones who are going to consume those data. For example, if you have a collaborator who wants to share data via Simple Darwin Core, then it may be sufficient to create a spreadsheet that contains column headers matching as many of the Darwin Core term names as you are both interested in sharing - just to be sure you both understand the meaning of the fields you share, and therefore hopefully something about their content. You might create a table in a database using Simple Darwin Core as a model (if it met all of your needs), and then connect that database with services for sharing via the web. You might use that same database (or spreadsheet) to export a comma-separated value (CSV) file for upload into a hosted service that could serve the data on your behalf. Or you might use that same file to upload into a service that would allow you to add value (such as a georeference) or quality (with a data cleaning tool), or to see your data in the context of other shared data.
|
||||
|
||||
### 6.1 Simple Darwin Core as text
|
||||
|
||||
The [Text guide](../text/) describes how to construct and format a text file using a simplified subset of the [Fielded Text](http://www.fieldedtext.org/) specification, which allows the contributor to describe the contents of a text file, or set of text files (related or not) through a separate configuration file (called a metafile). The metafile allows the contributor to communicate the structure of the content of the file or files and any relationships between them. Though it is good practice to describe a Simple Darwin Core file with such a metafile, it isn't strictly necessary if the file follows the CSV file specification and the first line of the file contains the field names. A `Fielded Text` metafile for any text file based on Simple Darwin Core can be created by customizing the [example metafile](../text/example_text_simpledwc_complete.xml), which includes references to all Darwin Core terms. Refer to the comments in the file itself as well as the metafile specification in the [Text guide](../text/) for more information.
|
||||
|
||||
### 6.2 Simple Darwin Core as XML
|
||||
|
||||
The [XML guide](../xml/) describes how to construct XML schemas to share data based on Darwin Core terms. Looking at the [Simple Darwin Core XML Schema](../xml/tdwg_dwc_simple.xsd) using the XML guide as a reference you will be able to see that the schema supports the notion of a `SimpleDarwinRecord`, which is just a grouping of up to one of each of the Darwin Core terms that are `Properties` (not `Classes`).
|
||||
|
||||
#### 6.2.1 Example of Simple Darwin Core as XML (non-normative)
|
||||
|
||||
The following example shows a `SimpleDarwinRecordSet` containing one `SimpleDarwinRecord` for a `Taxon`:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<SimpleDarwinRecordSet
|
||||
xmlns="http://rs.tdwg.org/dwc/xsd/simpledarwincore/"
|
||||
xmlns:dc="http://purl.org/dc/terms/"
|
||||
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd">
|
||||
<SimpleDarwinRecord>
|
||||
<dc:modified>2006-05-04T18:13:51.0Z</dc:modified>
|
||||
<dc:language>en</dc:language>
|
||||
<dwc:basisOfRecord>Taxon</dwc:basisOfRecord>
|
||||
<dwc:scientificNameID>http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?spid=53548</dwc:scientificNameID>
|
||||
<dwc:acceptedNameUsageID>http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?spid=22010</dwc:acceptedNameUsageID>
|
||||
<dwc:originalNameUsageID>http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?spid=53548</dwc:originalNameUsageID>
|
||||
<dwc:nameAccordingToID>http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=22764</dwc:nameAccordingToID>
|
||||
<dwc:namePublishedInID>http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=671</dwc:namePublishedInID>
|
||||
<dwc:scientificName>Centropyge flavicauda Fraser-Brunner 1933</dwc:scientificName>
|
||||
<dwc:acceptedNameUsage>Centropyge fisheri (Snyder 1904)</dwc:acceptedNameUsage>
|
||||
<dwc:parentNameUsage>Centropyge Kaup, 1860</dwc:parentNameUsage>
|
||||
<dwc:originalNameUsage>Centropyge flavicauda Fraser-Brunner 1933</dwc:originalNameUsage>
|
||||
<dwc:nameAccordingTo>Allen, G.R. 1980. Butterfly and angelfishes of the world. Volume II. Mergus Publishers. Pp. 149-352.</dwc:nameAccordingTo>
|
||||
<dwc:namePublishedIn>Fraser-Brunner, A. 1933. A revision of the chaetodont fishes of the subfamily Pomacanthinae. Proceedings of the General
|
||||
Meetings for Scientific Business of the Zoological Society of London 1933 (pt 3, no.30): 543-599, Pl. 1.</dwc:namePublishedIn>
|
||||
<dwc:higherClassification>Animalia;Chordata;Vertebrata;Osteichthyes;Actinopterygii;Neopterygii;Teleostei;Acanthopterygii;Perciformes;
|
||||
Percoidei;Pomacanthidae;Centropyge</dwc:higherClassification>
|
||||
<dwc:kingdom>Animalia</dwc:kingdom>
|
||||
<dwc:phylum>Chordata</dwc:phylum>
|
||||
<dwc:class>Osteichthyes</dwc:class>
|
||||
<dwc:order>Perciformes</dwc:order>
|
||||
<dwc:family>Pomacanthidae</dwc:family>
|
||||
<dwc:genus>Centropyge</dwc:genus>
|
||||
<dwc:specificEpithet>flavicauda</dwc:specificEpithet>
|
||||
<dwc:scientificNameAuthorship>Fraser-Brunner 1933</dwc:scientificNameAuthorship>
|
||||
<dwc:taxonRank>species</dwc:taxonRank>
|
||||
<dwc:nomenclaturalCode>ICZN</dwc:nomenclaturalCode>
|
||||
<dwc:taxonomicStatus>accepted</dwc:taxonomicStatus>
|
||||
</SimpleDarwinRecord>
|
||||
</SimpleDarwinRecordSet>
|
||||
```
|
||||
|
||||
The `SimpleDarwinRecord` acts as a `Class` in implementation, because all of the terms are properties of it. The Simple Darwin Core schema has just one other level of structure, the `SimpleDarwinRecordSet`, which is a grouping of one or more `SimpleDarwinRecords`. The `SimpleDarwinRecordSet` acts as a `Class` to define a data set during implementation.
|
||||
|
||||
## 7 Doing more with Simple Darwin Core
|
||||
|
||||
Sooner or later you may want to share more information than Simple Darwin Core seems to allow. For example, you and your colleagues might decide that it would be useful to have a standard way to exchange additional information relevant to questions in Conservation. How would you do it?
|
||||
|
||||
One way would be to try to "overload" existing terms by using them to hold information other than what was intended based on the definition of the terms. Please don't do this. If an existing term has close to the same meaning as one you want to use, but just doesn't quite fit because of the way the definition is worded, it would be better to request an amendment to the term definition so that it will be clear for your community how to use it. You can request such a change by submitting an issue in the [Darwin Core repository](https://github.com/tdwg/dwc).
|
||||
|
||||
### 7.1 Structured content using dynamicProperties
|
||||
|
||||
Another way to get more out of Darwin Core without adding a term is to "payload" the [`dynamicProperties`](http://rs.tdwg.org/dwc/terms/dynamicProperties) term with structured content, as shown in the example below, using Javascript Open Notation (JSON). This is perfectly legal, since it doesn't compromise the meaning of the term. One of the weaknesses of payloading data in this way is that it is subject to a lack of stable or well-defined semantics. Also, it is highly recommended to flatten the content into a single string with no non-printing characters (such as line feeds) to facilitate use in the widest variety of data sharing contexts. Still, this might be a reasonable way to at least allow you to share all of your data, even if there might be problems with people using it reliably.
|
||||
|
||||
#### 7.1.1 Example of structured JSON content within XML (non-normative)
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<SimpleDarwinRecordSet
|
||||
xmlns="http://rs.tdwg.org/dwc/xsd/simpledarwincore/"
|
||||
xmlns:dc="http://purl.org/dc/terms/"
|
||||
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd">
|
||||
<SimpleDarwinRecord>
|
||||
<dc:modified>2009-02-12T12:43:31</dc:modified>
|
||||
<dc:language>en</dc:language>
|
||||
<dwc:basisOfRecord>Taxon</dwc:basisOfRecord>
|
||||
<dwc:scientificName>Ctenomys sociabilis</dwc:scientificName>
|
||||
<dwc:acceptedNameUsage>Ctenomys sociabilis Pearson and Christie, 1985</dwc:acceptedNameUsage>
|
||||
<dwc:parentNameUsage>Ctenomys Blainville, 1826</dwc:parentNameUsage>
|
||||
<dwc:higherClassification>Animalia; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Rodentia; Hystricognatha; Hystricognathi; Ctenomyidae; Ctenomyini; Ctenomys</dwc:higherClassification>
|
||||
<dwc:kingdom>Animalia</dwc:kingdom>
|
||||
<dwc:phylum>Chordata</dwc:phylum>
|
||||
<dwc:class>Mammalia</dwc:class>
|
||||
<dwc:order>Rodentia</dwc:order>
|
||||
<dwc:family>Ctenomyidae</dwc:family>
|
||||
<dwc:genus>Ctenomys</dwc:genus>
|
||||
<dwc:specificEpithet>sociabilis</dwc:specificEpithet>
|
||||
<dwc:taxonRank>species</dwc:taxonRank>
|
||||
<dwc:scientificNameAuthorship>Pearson and Christie, 1985</dwc:scientificNameAuthorship>
|
||||
<dwc:nomenclaturalCode>ICZN</dwc:nomenclaturalCode>
|
||||
<dwc:namePublishedIn>Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388</dwc:namePublishedIn>
|
||||
<dwc:taxonomicStatus>valid</dwc:taxonomicStatus>
|
||||
<dwc:dynamicProperties>{"iucnStatus":"vulnerable", "distribution":"Neuquén, Argentina"}</dwc:dynamicProperties>
|
||||
</SimpleDarwinRecord>
|
||||
</SimpleDarwinRecordSet>
|
||||
```
|
||||
|
||||
### 7.2 Extending Darwin Core by adding terms
|
||||
|
||||
If you were using just CSV text files to exchange information, then you might be tempted to just add the new fields to the files. This approach suffers most of the same problems as payloading - no one aside from those with whom you communicated would know what those new fields were or how to use them. Sharing in this way via XML would be an even bigger problem, because the [Simple Darwin Core XML Schema](../xml/tdwg_dwc_simple.xsd) defines the terms that it supports and the new fields would not correspond with any terms understood by the schema. In other words, the XML with your fields in it would not be a valid Simple Darwin Core XML document.
|
||||
|
||||
So, if you really need to extend the capabilities of Darwin Core, the best first step is to follow the standards process to add the terms you need. See the [Contributing guide](https://github.com/tdwg/dwc/blob/master/.github/CONTRIBUTING.md) to understand how to suggest a new term.
|
||||
|
||||
## 8 Going beyond Simple Darwin Core
|
||||
|
||||
For cases where rich data require rich (non-simple) structure, Simple Darwin Core alone is not suitable. When sharing information via [Fielded Text](http://www.fieldedtext.org/), the solution is to use Simple Darwin Core as a core record with one or more associated extensions for the additional information. See the [Text guide](../text/) for an explanation and examples.
|
||||
|
||||
When sharing information via [XML](http://www.w3.org/XML/), a richer structure such as the Access to Biological Collections Data schema ([ABCD](https://github.com/tdwg/abcd)), or the [Generic Darwin Core](../xml/tdwg_dwcterms.xsd), or another schema built from Darwin Core terms to suit the use of the data in a particular context. See the [XML guide](../xml/) for examples and references to model schemas.
|
|
@ -4,7 +4,7 @@ Title
|
|||
: Simple Darwin Core
|
||||
|
||||
Date version issued
|
||||
: 2015-06-02
|
||||
: 2021-07-15
|
||||
|
||||
Date created
|
||||
: 2009-04-21
|
||||
|
@ -13,13 +13,13 @@ Part of TDWG Standard
|
|||
: <http://www.tdwg.org/standards/450/>
|
||||
|
||||
This version
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2014-11-08>
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2021-07-15>
|
||||
|
||||
Latest Version
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/>
|
||||
|
||||
Previous version
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2013-10-22>
|
||||
: <http://rs.tdwg.org/dwc/terms/simple/2014-11-08>
|
||||
|
||||
Abstract
|
||||
: This document is a reference for the Simple Darwin Core standard.
|
||||
|
@ -31,7 +31,7 @@ Creator
|
|||
: Darwin Core Task Group
|
||||
|
||||
Bibliographic citation
|
||||
: Darwin Core Task Group. 2009. Simple Darwin Core. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/simple/>
|
||||
: Darwin Core Maintenance Group. 2021. Simple Darwin Core. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/simple/2021-07-15>
|
||||
|
||||
## 1 Introduction
|
||||
|
||||
|
@ -39,7 +39,11 @@ Simple Darwin Core is a predefined subset of the terms that have common use acro
|
|||
|
||||
### 1.1 Status of the content of this document
|
||||
|
||||
All sections of this document are normative, except for examples, which are explicitly marked as non-normative.
|
||||
All sections of this document are non-normative (explanatory), except for Section 5.
|
||||
|
||||
#### 1.1.1 RFC 2119 key words
|
||||
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
|
||||
|
||||
## 2 Audience
|
||||
|
||||
|
@ -51,24 +55,24 @@ Simple Darwin Core is simple in that it assumes (and allows) no structure beyond
|
|||
|
||||
## 4 What makes it flexible?
|
||||
|
||||
Simple Darwin Core has minimal restrictions on which fields are required (none). You might argue that there should be more required fields, that there isn't anything useful you can do without them. That is partially true. A record with no fields in it wouldn't be very interesting, but there is a difference between requiring that there be a field in a record and requiring that a particular field be in all records. By having no required field restriction, Simple Darwin Core can be used to share any meaningful combination of fields - for example, to share "just names", or "just places", or observations of individuals detected in the wild at a given place and time following a method (an occurrence). This flexibility promotes the reuse of the terms and sharing mechanisms for a wide variety of services.
|
||||
Simple Darwin Core has minimal restrictions on which fields are manditory (none). You might argue that there should be more manditory fields, that there isn't anything useful you can do without them. That is partially true. A record with no fields in it wouldn't be very interesting, but there is a difference between requiring that there be a field in a record and requiring that a particular field be in all records. By having no manditory field restriction, Simple Darwin Core can be used to share any meaningful combination of fields - for example, to share "just names", or "just places", or observations of individuals detected in the wild at a given place and time following a method (an occurrence). This flexibility promotes the reuse of the terms and sharing mechanisms for a wide variety of services.
|
||||
|
||||
## 5 Are there any rules?
|
||||
## 5 Are there any rules? (Normative)
|
||||
|
||||
There are just a few general guiding principles on how to make the best use of Simple Darwin Core:
|
||||
|
||||
1. Any Darwin Core term name can be used as a field name.
|
||||
2. No field name may be repeated in a record.
|
||||
3. Do not use a _Class_ ([`Occurrence`](http://rs.tdwg.org/dwc/terms/Occurrence), [`Organism`](http://rs.tdwg.org/dwc/terms/Organism), [`MaterialSample`](http://rs.tdwg.org/dwc/terms/MaterialSample), [`LivingSpecimen`](http://rs.tdwg.org/dwc/terms/LivingSpecimen), [`PreservedSpecimen`](http://rs.tdwg.org/dwc/terms/PreservedSpecimen), [`FossilSpecimen`](http://rs.tdwg.org/dwc/terms/FossilSpecimen), [`Event`](http://rs.tdwg.org/dwc/terms/Event), [`HumanObservation`](http://rs.tdwg.org/dwc/terms/HumanObservation), [`MachineObservation`](http://rs.tdwg.org/dwc/terms/MachineObservation), [`Location`](http://rs.tdwg.org/dwc/terms/Location), [`GeologicalContext`](http://rs.tdwg.org/dwc/terms/GeologicalContext), [`Identification`](http://rs.tdwg.org/dwc/terms/Identification), [`Taxon`](http://rs.tdwg.org/dwc/terms/Taxon)) as a field.
|
||||
4. Provide data in as many fields as you can.
|
||||
5. Use the [`dcterms:type`](http://rs.tdwg.org/dwc/terms/dcterms:type) field to provide the name of the what Dublin Core type class (`PhysicalObject`, `StillImage`, `MovingImage`, `Sound`, `Text`) the record represents.
|
||||
6. Use the [`basisOfRecord`](http://rs.tdwg.org/dwc/terms/basisOfRecord) field to provide the name of the most specific Darwin Core class (`LivingSpecimen`, `PreservedSpecimen`, `FossilSpecimen`, `MaterialSample`, `HumanObservation`, `MachineObservation`, `Event`, `Occurrence`, `Taxon`, `Identification`, `Organism`, `Location`, `GeologicalContext`, `MeasurementOrFact`, `ResourceRelationship`) the record represents.
|
||||
7. Populate fields with data that match the definition of the field.
|
||||
8. Use the controlled vocabulary for the values of fields that recommend them.
|
||||
9. If data are withheld, use [`informationWithheld`](http://rs.tdwg.org/dwc/terms/informationWithheld) to say so.
|
||||
10. If data are shared in lower quality than the original, use [`dataGeneralizations`](http://rs.tdwg.org/dwc/terms/dataGeneralizations) to say so.
|
||||
2. A field name MUST NOT be repeated in a record.
|
||||
3. Class names (e.g., `Occurrence`, `Organism`) MUST NOT be used as field names.
|
||||
4. Data SHOULD be provided in as many fields as possible.
|
||||
5. The [`dc:type`](http://purl.org/dc/elements/1.1/type) field SHOULD be populated with the name of the most appropriate Dublin Core type class (`PhysicalObject`, `StillImage`, `MovingImage`, `Sound`, `Text`) the record represents.
|
||||
6. The [`basisOfRecord`](http://rs.tdwg.org/dwc/terms/basisOfRecord) SHOULD be populated with the name of the most specific Darwin Core class ([`LivingSpecimen`](http://rs.tdwg.org/dwc/terms/LivingSpecimen), [`PreservedSpecimen`](http://rs.tdwg.org/dwc/terms/PreservedSpecimen), [`FossilSpecimen`](http://rs.tdwg.org/dwc/terms/FossilSpecimen), [`MaterialSample`](http://rs.tdwg.org/dwc/terms/MaterialSample), [`HumanObservation`](http://rs.tdwg.org/dwc/terms/HumanObservation), [`MachineObservation`](http://rs.tdwg.org/dwc/terms/MachineObservation), [`MaterialCitation`](http://rs.tdwg.org/dwc/terms/MaterialCitation), [`Event`](http://rs.tdwg.org/dwc/terms/Event), [`Occurrence`](http://rs.tdwg.org/dwc/terms/Occurrence), [`Taxon`](http://rs.tdwg.org/dwc/terms/Taxon), [`Organism`](http://rs.tdwg.org/dwc/terms/Organism), [`Location`](http://purl.org/dc/terms/Location), [`GeologicalContext`](http://rs.tdwg.org/dwc/terms/GeologicalContext)) the record represents.
|
||||
7. Fields SHOULD be populated with data that match the definition of the field.
|
||||
8. Values from a recommended controlled vocabulary SHOULD be used for the values of a field that recommend it.
|
||||
9. If data are withheld, the field [`informationWithheld`](http://rs.tdwg.org/dwc/terms/informationWithheld) SHOULD be populated to say so.
|
||||
10. If data are shared in lower quality than the original, the field [`dataGeneralizations`](http://rs.tdwg.org/dwc/terms/dataGeneralizations) SHOULD be populated to say so.
|
||||
|
||||
Every field in Simple Darwin Core may appear either once or not at all in a single record - otherwise how could you distinguish one [`scientificName`](http://rs.tdwg.org/dwc/terms/scientificName) field from another one? Think of a database table. It will not allow you to have the same name for two different fields. Because of this design restriction (lack of flexibility for the sake of simplicity), the auxiliary fields from the [`MeasurementOrFact`](http://rs.tdwg.org/dwc/terms/MeasurementOrFact) and [`ResourceRelationship`](http://rs.tdwg.org/dwc/terms/ResourceRelationship) classes are of somewhat limited utility here - you could only share one `MeasurementOrFact` and one `ResourceRelationship` per record. You might argue then that there is no way to share information that requires related structures, such as a history of identifications of a specimen. That is mostly true. The only recourse within Simple Darwin Core is to force the data into one of the catch all "list" terms such as [`recordedBy`](http://rs.tdwg.org/dwc/terms/recordedBy), [`preparations`](http://rs.tdwg.org/dwc/terms/preparations), [`otherCatalogNumbers`](http://rs.tdwg.org/dwc/terms/otherCatalogNumbers), [`associatedMedia`](http://rs.tdwg.org/dwc/terms/associatedMedia), [`associatedReferences`](http://rs.tdwg.org/dwc/terms/associatedReferences), [`associatedSequences`](http://rs.tdwg.org/dwc/terms/associatedSequences), [`associatedTaxa`](http://rs.tdwg.org/dwc/terms/associatedTaxa), [`associatedOccurrences`](http://rs.tdwg.org/dwc/terms/associatedOccurrences), [`associatedOrganisms`](http://rs.tdwg.org/dwc/terms/associatedOrganisms), [`previousIdentifications`](http://rs.tdwg.org/dwc/terms/previousIdentifications), [`higherGeography`](http://rs.tdwg.org/dwc/terms/higherGeography), [`georeferencedBy`](http://rs.tdwg.org/dwc/terms/georeferencedBy), [`georeferenceSources`](http://rs.tdwg.org/dwc/terms/georeferenceSources), [`identifiedBy`](http://rs.tdwg.org/dwc/terms/identifiedBy), [`identificationReferences`](http://rs.tdwg.org/dwc/terms/identificationReferences), and [`higherClassification`](http://rs.tdwg.org/dwc/terms/higherClassification).
|
||||
Every field in Simple Darwin Core MAY appear either once or not at all in a single record - otherwise how could you distinguish one [`scientificName`](http://rs.tdwg.org/dwc/terms/scientificName) field from another one? Think of a database table. It will not allow you to have the same name for two different fields. Because of this design restriction (lack of flexibility for the sake of simplicity), the auxiliary fields from the [`MeasurementOrFact`](http://rs.tdwg.org/dwc/terms/MeasurementOrFact) and [`ResourceRelationship`](http://rs.tdwg.org/dwc/terms/ResourceRelationship) classes are of somewhat limited utility here - you could only share one `MeasurementOrFact` and one `ResourceRelationship` per record. You might argue then that there is no way to share information that requires related structures, such as a history of identifications of a specimen. That is mostly true. The only recourse within Simple Darwin Core is to force the data into one of the catch all "list" terms such as [`recordedBy`](http://rs.tdwg.org/dwc/terms/recordedBy), [`preparations`](http://rs.tdwg.org/dwc/terms/preparations), [`otherCatalogNumbers`](http://rs.tdwg.org/dwc/terms/otherCatalogNumbers), [`associatedMedia`](http://rs.tdwg.org/dwc/terms/associatedMedia), [`associatedReferences`](http://rs.tdwg.org/dwc/terms/associatedReferences), [`associatedSequences`](http://rs.tdwg.org/dwc/terms/associatedSequences), [`associatedTaxa`](http://rs.tdwg.org/dwc/terms/associatedTaxa), [`associatedOccurrences`](http://rs.tdwg.org/dwc/terms/associatedOccurrences), [`associatedOrganisms`](http://rs.tdwg.org/dwc/terms/associatedOrganisms), [`previousIdentifications`](http://rs.tdwg.org/dwc/terms/previousIdentifications), [`higherGeography`](http://rs.tdwg.org/dwc/terms/higherGeography), [`georeferencedBy`](http://rs.tdwg.org/dwc/terms/georeferencedBy), [`georeferenceSources`](http://rs.tdwg.org/dwc/terms/georeferenceSources), [`identifiedBy`](http://rs.tdwg.org/dwc/terms/identifiedBy), [`identificationReferences`](http://rs.tdwg.org/dwc/terms/identificationReferences), and [`higherClassification`](http://rs.tdwg.org/dwc/terms/higherClassification).
|
||||
|
||||
There is a difference between having data in a field and requiring that field to have a value from among a legal set of values. Darwin Core is simple in that it has minimal restrictions on the contents of fields. The term comments give recommendations about the use of controlled vocabularies and how to structure content wherever appropriate. Data contributors are encouraged to follow these recommendations as well as possible. You might argue that having no restrictions will promote "dirty" data (data of low quality or dubious value). Consider the simple axiom "It's not what you have, but what you do with it that matters." If data restrictions were in place at the fundamental level, then a record having any non-compliant data in any of its fields could not be shared via the standard. Not only would there be a dearth of shared data in that case (or an unused standard), but also there would be no way to use the standard to build shared data cleaning tools to actually improve the situation, nor to use data services to look up alternative representations (language translations, for example) to serve a broader audience. The rest is up to how the records will be used - in other words, it is up to applications to enforce further restrictions if appropriate, and it is up to the stakeholders of those applications to decide what the restrictions will be for the purpose the application is trying to serve.
|
||||
|
||||
|
@ -86,7 +90,7 @@ The [Text guide](../text/) describes how to construct and format a text file usi
|
|||
|
||||
The [XML guide](../xml/) describes how to construct XML schemas to share data based on Darwin Core terms. Looking at the [Simple Darwin Core XML Schema](../xml/tdwg_dwc_simple.xsd) using the XML guide as a reference you will be able to see that the schema supports the notion of a `SimpleDarwinRecord`, which is just a grouping of up to one of each of the Darwin Core terms that are `Properties` (not `Classes`).
|
||||
|
||||
#### 6.2.1 Example of Simple Darwin Core as XML (non-normative)
|
||||
#### 6.2.1 Example of Simple Darwin Core as XML
|
||||
|
||||
The following example shows a `SimpleDarwinRecordSet` containing one `SimpleDarwinRecord` for a `Taxon`:
|
||||
|
||||
|
@ -141,9 +145,9 @@ One way would be to try to "overload" existing terms by using them to hold infor
|
|||
|
||||
### 7.1 Structured content using dynamicProperties
|
||||
|
||||
Another way to get more out of Darwin Core without adding a term is to "payload" the [`dynamicProperties`](http://rs.tdwg.org/dwc/terms/dynamicProperties) term with structured content, as shown in the example below, using Javascript Open Notation (JSON). This is perfectly legal, since it doesn't compromise the meaning of the term. One of the weaknesses of payloading data in this way is that it is subject to a lack of stable or well-defined semantics. Also, it is highly recommended to flatten the content into a single string with no non-printing characters (such as line feeds) to facilitate use in the widest variety of data sharing contexts. Still, this might be a reasonable way to at least allow you to share all of your data, even if there might be problems with people using it reliably.
|
||||
Another way to get more out of Darwin Core without adding a term is to "payload" the [`dynamicProperties`](http://rs.tdwg.org/dwc/terms/dynamicProperties) term with structured content, as shown in the example below, using Javascript Open Notation (JSON). This is perfectly legal, since it doesn't compromise the meaning of the term. One of the weaknesses of payloading data in this way is that it is subject to a lack of stable or well-defined semantics. Also, it is strongly suggested to flatten the content into a single string with no non-printing characters (such as line feeds) to facilitate use in the widest variety of data sharing contexts. Still, this might be a reasonable way to at least allow you to share all of your data, even if there might be problems with people using it reliably.
|
||||
|
||||
#### 7.1.1 Example of structured JSON content within XML (non-normative)
|
||||
#### 7.1.1 Example of structured JSON content within XML
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
|
|
|
@ -0,0 +1,233 @@
|
|||
# Darwin Core text guide
|
||||
|
||||
Title
|
||||
: Darwin Core text guide
|
||||
|
||||
Date version issued
|
||||
: 2020-09-05
|
||||
|
||||
Date created
|
||||
: 2009-02-12
|
||||
|
||||
Part of TDWG Standard
|
||||
: <http://www.tdwg.org/standards/450/>
|
||||
|
||||
This version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2020-09-05>
|
||||
|
||||
Latest version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/>
|
||||
|
||||
Previous version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2014-11-08>
|
||||
|
||||
Replaced by
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2021-07-15>
|
||||
|
||||
Abstract
|
||||
: Guidelines for implementing Darwin Core in Text files.
|
||||
|
||||
Contributors
|
||||
: Tim Robertson (GBIF), Markus Döring (GBIF), John Wieczorek (MVZ), Renato De Giovanni (CRIA), Dave Vieglais (KUNHM)
|
||||
|
||||
Creator
|
||||
: Darwin Core Task Group
|
||||
|
||||
Bibliographic citation
|
||||
: Darwin Core Maintenance Group. 2020. Darwin Core text guide. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/guides/text/2020-09-05>
|
||||
|
||||
## 1 Introduction
|
||||
|
||||
This document provides guidelines for formatting and sharing [Darwin Core terms](http://rs.tdwg.org/dwc/terms) in _fielded text_ formats, such as one or more comma separated value (CSV) files. Data conforming to the [Simple Darwin Core](../simple/) (CSV format and having the first row include Darwin Core standard term names) can be shared in a single file, while a non-standard text file can be understood using an [XML](http://www.w3.org/XML/) metafile to describe its contents and formatting.
|
||||
|
||||
![Usage](usage.png)
|
||||
|
||||
More complex structure can be shared in multiple related files. The description of content and relationships between files can be achieved using the metafile. This guideline makes recommendations for the simple case of a _core_ file, upon which Darwin Core _records_ are based, and _extensions_ that are linked to records in that core file. Specifically, extension records have a _many-to-one_ relationship with records in the core file. For example, a core file might contain specimen records, with one specimen per row in the file, while an extension file contains one or more identifications for those specimens, with one identification per row in the extension file, and with an identifier to the specimen for each identification row. This example would allow many identifications to be associated with each specimen.
|
||||
|
||||
### 1.1 Status of the content of this document
|
||||
|
||||
All sections of this document are normative, except for examples, whose sections are marked as non-normative.
|
||||
|
||||
### 1.2 Simple example metafile content (non-normative)
|
||||
|
||||
A simple comma separated values (CSV) data file with the following content:
|
||||
|
||||
```csv
|
||||
ID,Species,Count
|
||||
123,"Cryptantha gypsophila Reveal & C.R. Broome",12
|
||||
124,"Buxbaumia piperi",2
|
||||
```
|
||||
|
||||
can be described with the following Darwin Core metafile:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<archive xmlns="http://rs.tdwg.org/dwc/text/"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xmlns:xs="http://www.w3.org/2001/XMLSchema"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/text/ http://rs.tdwg.org/dwc/text/tdwg_dwc_text.xsd">
|
||||
<core rowType="http://rs.tdwg.org/dwc/xsd/simpledarwincore/SimpleDarwinRecord" ignoreHeaderLines="1">
|
||||
<files>
|
||||
<location>http://data.gbif.org/download/specimens.csv</location>
|
||||
</files>
|
||||
<field index="0" term="http://rs.tdwg.org/dwc/terms/catalogNumber" />
|
||||
<field index="1" term="http://rs.tdwg.org/dwc/terms/scientificName" />
|
||||
<field index="2" term="http://rs.tdwg.org/dwc/terms/individualCount" />
|
||||
<!-- A constant value has no index, but applies to all rows -->
|
||||
<field term="http://rs.tdwg.org/dwc/terms/datasetID" default="urn:lsid:tim.lsid.tdwg.org:collections:1"/>
|
||||
</core>
|
||||
</archive>
|
||||
```
|
||||
|
||||
These same data could be understood without the metafile if the first row of the CSV file contained the term names:
|
||||
|
||||
```csv
|
||||
type,institutionCode,collectionCode,catalogNumber,scientificName,individualCount,datasetID
|
||||
PhysicalObject,ANSP,PH,123,"Cryptantha gypsophila Reveal & C.R. Broome",12,urn:lsid:tim.lsid.tdwg.org:collections:1
|
||||
PhysicalObject,ANSP,PH,124,"Buxbaumia piperi",2,urn:lsid:tim.lsid.tdwg.org:collections:1
|
||||
```
|
||||
|
||||
### 1.3 XML versus fielded text
|
||||
|
||||
Many resources exist on the web describing the advantages of Extensible Markup Language [XML](http://www.w3.org/XML/) over less structured content such as _fielded text_. The Darwin Core text guide (this document) is not meant to promote the use of fielded text over XML for data exchange, but rather to provide recommendations for how to handle such data files when necessary.
|
||||
|
||||
Two scenarios that might benefit from the use of fielded text are:
|
||||
|
||||
- The transfer of large numbers of Darwin Core records and related data from one database to another. Typically databases are very efficient at exporting and importing comma separated text files.
|
||||
- The description of legacy data existing in a fielded text format, such that it might be automatically understood and loaded into another system. It could be that this system would then serve the data in another format such as XML.
|
||||
|
||||
## 2 Metafile content
|
||||
|
||||
The [text metafile schema](tdwg_dwc_text.xsd) provides technical details for the structure of a metafile by defining the elements and attributes required to describe the contents and relationships between text files. These elements and attributes, with descriptions and specifications for their use in a metafile, are described in the following table.
|
||||
|
||||
### 2.1 The `<archive>` element
|
||||
|
||||
The `<archive>` element is the container for the list of related files (one core and zero or more extensions). The `<archive>` element has just one attribute, `metadata`.
|
||||
|
||||
#### 2.1.1 Attributes
|
||||
|
||||
Attribute | Description | Required | Default
|
||||
--- | --- | --- | ---
|
||||
`metadata` | Contains a qualified Uniform Resource Locator (URL) defining the location of a metadata description of the entire archive. The format of the metadata is not prescribed, but a standardized format such as Ecological Metadata Language (EML), Federal Geographic Data Committee (FGDC), or ISO 19115 family is recommended. | no |
|
||||
|
||||
#### 2.1.2 Elements
|
||||
|
||||
Element | Description
|
||||
--- | ---
|
||||
`<core>` | An `<archive>` must contain exactly one `<core>` element, representing the data entity (the actual file and its column header mappings to Darwin Core terms) upon which records are based. If extensions are being used, each record in the core data must have a unique identifier. The field for this identifier must be specified in an explicit `<id>` field in order to associate extension records with the core record.
|
||||
`<extension>` | An `<archive>` may define zero or more `<extension>` elements, each representing an individual extension entity directly related to the core. In addition to the general file attributes described below, every extension entity must have an explicit `<coreid>` field to relate the extension record to a row in the core entity. The extension itself does not have to have a unique ID field and many rows can point to the same core record.
|
||||
|
||||
### 2.2 The `<core>` or `<extension>` element
|
||||
|
||||
#### 2.2.1 Attributes
|
||||
|
||||
Attribute | Description | Required | Default
|
||||
--- | --- | --- | ---
|
||||
`rowType` | A Unified Resource Identifier (URI) for the term identifying the class of data represented by each row, for example, <http://rs.tdwg.org/dwc/terms/Occurrence> for Occurrence records or <http://rs.tdwg.org/dwc/terms/Taxon> for Taxon records. Additional classes may be referenced by URI and defined outside the Darwin Core specification. The row type is required. For convenience the URIs for classes defined by the Darwin Core are: `Occurrence`: <http://rs.tdwg.org/dwc/terms/Occurrence>, `Event`: <http://rs.tdwg.org/dwc/terms/Event>, `Location`: <http://purl.org/dc/terms/Location>, `GeologicalContext`: <http://purl.org/dc/terms/GeologicalContext>, `Identification`: <http://rs.tdwg.org/dwc/terms/Identification>, `Taxon`: <http://rs.tdwg.org/dwc/terms/Taxon>, `ResourceRelationship`: <http://rs.tdwg.org/dwc/terms/ResourceRelationship>, `MeasurementOrFact`: <http://rs.tdwg.org/dwc/terms/MeasurementOrFact> | yes |
|
||||
`fieldsTerminatedBy` | Specifies the delimiter between fields. Typical values might be `,` or `\t` for CSV or Tab files respectively. | no | `,`
|
||||
`linesTerminatedBy` | Specifies the row separator character. | no | `\n`
|
||||
`fieldsEnclosedBy` | Specifies the character used to enclose (mark the start and end of) each field. CSV files frequently use the double quote character (`"`), which is the default value if none is explicitly provided. Note that a comma separated value file that has commas within the content of any field must have an enclosing character. | no | `"`
|
||||
`encoding` | Specifies the [character encoding](http://en.wikipedia.org/wiki/Character_encoding) for the data file. The encoding is extremely important, but often ignored. The most frequently used encodings are: `UTF-8`: 8-bit Unicode Transformation Format, `UTF-16`: 16-bit Unicode Transformation Format, `ISO-8859-1`: commonly known as "Latin-1" and a common default on systems configured for a single western European language, `Windows-1252`: commonly known as "WinLatin" and a common default of legacy versions of Microsoft Windows based operating systems. | no | `UTF-8`
|
||||
`ignoreHeaderLines` | Specifies the number lines to ignore from the beginning of the file. This can be used to ignore files with column headings or preamble comments for example. | no | `0`
|
||||
`dateFormat` | When verbatim dates are consistent in format, this field can be used to indicate the format represented. It is recommended to use the date, dateTime and time for field formats wherever possible, but where verbatim dates are required, a format may be specified here. This should be considered a 'hint' for consumers. It is recommended that consumers support the minimum combinations of `DD` `MM` and `YYYY` with the separators `/` and `-`. Examples: `DDMMYYYY`: for dates of the form 21121978, `DD-MM-YYYY`: for dates of the form 21-12-1978, `MMDDYYYY`: for dates of the form 12211978, `MM-DD-YYYY`: for dates of the form 12-21-1978, `YYYYMMDD`: for dates of the form 19781221. | no | `YYYY-MM-DD`
|
||||
|
||||
#### 2.2.2 Elements
|
||||
|
||||
Element | Description
|
||||
--- | ---
|
||||
`<files>` | `<core>` or `<extension>` element must contain one `<files>` element to locate the data being described.
|
||||
`<id>` | If extensions are being used, the `<core>` must contain an <id> element that indicates the identifier for a record.
|
||||
`<coreid>` | If extensions are being used, the `<extension>` element must contain a `<coreid>` element that indicates the column in the extension file that contains the core record identifier (the matching `<id>` in the core file).
|
||||
`<field>` | A `<core>` or `<extension>` element must contain one or more <field> elements, each representing a 'column' in the row.
|
||||
|
||||
### 2.3 `<files>` element
|
||||
|
||||
The files element must contain one or more <location> elements, each defining where a file resides. Each core or extension entity can be composed from one or more files. If an entity has data in more than one file, use the `<location>` element multiple times, once for each file that makes up the entity.
|
||||
|
||||
#### 2.3.1 Elements
|
||||
|
||||
Element | Description
|
||||
--- | ---
|
||||
`<location>` | Specifies the location of the file being described, which may take either of the following forms: 1) a web accessible URL such as `http://www.gbif.org/data/specimen.csv` or `ftp://ftp.gbif.org/tim/specimen.txt`, 2) a filepath relative to the location of the metafile such as `specimen.txt`, `./specimen.txt`, `data/specimen.txt`.
|
||||
|
||||
### 2.4 The `<field>` element
|
||||
|
||||
The field element is used to specify the location and content of data within a file. There must be one field element for every term being shared for the entity, whether explicitly or through the use of a default value for all rows in the file.
|
||||
|
||||
#### 2.4.1 Attributes
|
||||
|
||||
Attribute | Description | Required | Default
|
||||
--- | --- | --- | ---
|
||||
`index` | Specifies the position of the column in the row. The first column has an index of 0, the second column 1, etc. If no column index is specified, then the term and the default may be used to define a constant value for all rows. | no |
|
||||
`term` | A Unified Resource Identifier (URI) for the term represented by this field. For example, a field containing the scientific name would have `term="http://rs.tdwg.org/dwc/terms/scientificName"`. Terms outside of the Darwin Core specification may be used, such as those from the Dublin Core Metadata Initative, for example, `dcterms:modified` would be `term="http://purl.org/dc/terms/modified"`. | yes |
|
||||
`default` | Specifies value to use if one is not supplied for the field in a given row. If no index is supplied, the default can be used to define a constant for all rows for a field that is not in the data file. | no |
|
||||
`vocabulary` | A Unified Resource Identifier (URI) for a vocabulary that the source values for this field are based on. The URI ideally should resolve to some machine readable definition like SKOS, RDF or at least some simple text or html file often found for ISO or RFC standards. For example <http://rs.gbif.org/vocabulary/gbif/nomenclatural_code.xml>, <http://www.ietf.org/rfc/rfc3066.txt> or <http://www.iso.org/iso/list-en1-semic-3.txt>. | no |
|
||||
|
||||
## 3 Implementation guide
|
||||
|
||||
### 3.1 Extension example (non-normative)
|
||||
|
||||
The following example illustrates the use of extensions. In this example there are three files in the archive, all of which are located in the same directory as the metafile. The whales.txt file acts as a core file of Taxon records. The whales.txt file is extended by two other files, types.txt and distribution.txt. The types.txt file contains records of a type specified in an external definition at <http://http://rs.gbif.org/terms/1.0/Types> and consists of Dublin Core and Darwin Core terms, while the distribution.txt file contains records of a type specified at <http://http://rs.gbif.org/terms/1.0/Distribution> and consists of Darwin Core terms plus an additional term for threatStatus. Both extension files are related to the core file by the taxonNameID fields. Presumably, this archive contains information about whale species, type specimen records for those species, and lists of countries and the threat status for those species.
|
||||
|
||||
![Extension](extension.png)
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<archive xmlns="http://rs.tdwg.org/dwc/text/"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xmlns:xs="http://www.w3.org/2001/XMLSchema"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/text/ http://rs.tdwg.org/dwc/text/tdwg_dwc_text.xsd">
|
||||
|
||||
<core encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Taxon">
|
||||
<files>
|
||||
<location>whales.txt</location>
|
||||
</files>
|
||||
<id index="0" />
|
||||
<field index="0" term="http://rs.tdwg.org/dwc/terms/taxonID" />
|
||||
<field index="1" term="http://purl.org/dc/terms/modified" />
|
||||
<field index="2" term="http://rs.tdwg.org/dwc/terms/scientificName"/>
|
||||
<field index="3" term="http://rs.tdwg.org/dwc/terms/acceptedNameUsageID"/>
|
||||
<field index="4" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/>
|
||||
<field index="5" term="http://rs.tdwg.org/dwc/terms/originalNameUsageID"/>
|
||||
</core>
|
||||
|
||||
<extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\n" fieldsEnclosedBy='"' ignoreHeaderLines="1" rowType="http://rs.gbif.org/terms/1.0/Types">
|
||||
<files>
|
||||
<location>types.csv</location>
|
||||
</files>
|
||||
<coreid index="0" />
|
||||
<field index="1" term="http://purl.org/dc/terms/bibliographicCitation"/>
|
||||
<field index="2" term="http://rs.tdwg.org/dwc/terms/catalogNumber"/>
|
||||
<field index="3" term="http://rs.tdwg.org/dwc/terms/collectionCode"/>
|
||||
<field index="4" term="http://rs.tdwg.org/dwc/terms/institutionCode"/>
|
||||
<field index="5" term="http://rs.tdwg.org/dwc/terms/typeStatus"/>
|
||||
</extension>
|
||||
|
||||
<extension encoding="UTF-8" fieldsTerminatedBy="," linesTerminatedBy="\n" fieldsEnclosedBy='"' ignoreHeaderLines="1" rowType="http://rs.gbif.org/terms/1.0/Distribution">
|
||||
<files>
|
||||
<location>distribution.csv</location>
|
||||
</files>
|
||||
<coreid index="0" />
|
||||
<field index="1" term="http://rs.tdwg.org/dwc/terms/countryCode"/>
|
||||
<field index="2" term="http://rs.gbif.org/terms/1.0/threatStatus"/>
|
||||
<field index="3" term="http://rs.tdwg.org/dwc/terms/occurrenceStatus"/>
|
||||
</extension>
|
||||
</archive>
|
||||
```
|
||||
|
||||
## 4 Database example (non-normative)
|
||||
|
||||
### 4.1 MySQL
|
||||
|
||||
It is very easy to produce fielded text using the `SELECT INTO` outfile command from MySQL. The encoding of the resulting file will depend on the server variables and collations used, and might need to be modified before the operation is done. Note that MySQL will export `NULL` values as `\N` by default. Use the `IFNULL()` function as shown in the following example to avoid this.
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
IFNULL(id, ''), IFNULL(scientific_name, ''), IFNULL(count,'')
|
||||
INTO outfile '/tmp/dwc.txt'
|
||||
FIELDS TERMINATED BY ','
|
||||
OPTIONALLY ENCLOSED BY '"'
|
||||
LINES TERMINATED BY '\n'
|
||||
FROM
|
||||
dwc;
|
||||
```
|
|
@ -4,7 +4,7 @@ Title
|
|||
: Darwin Core text guide
|
||||
|
||||
Date version issued
|
||||
: 2020-09-05
|
||||
: 2021-07-15
|
||||
|
||||
Date created
|
||||
: 2009-02-12
|
||||
|
@ -13,13 +13,13 @@ Part of TDWG Standard
|
|||
: <http://www.tdwg.org/standards/450/>
|
||||
|
||||
This version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2020-09-05>
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2021-07-15>
|
||||
|
||||
Latest version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/>
|
||||
|
||||
Previous version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2014-11-08>
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/text/2020-09-05>
|
||||
|
||||
Abstract
|
||||
: Guidelines for implementing Darwin Core in Text files.
|
||||
|
@ -31,7 +31,7 @@ Creator
|
|||
: Darwin Core Task Group
|
||||
|
||||
Bibliographic citation
|
||||
: Darwin Core Task Group. 2009. Darwin Core text guide. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/guides/text/>
|
||||
: Darwin Core Maintenance Group. 2021. Darwin Core text guide. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/guides/text/2021-07-15>
|
||||
|
||||
## 1 Introduction
|
||||
|
||||
|
@ -45,6 +45,10 @@ More complex structure can be shared in multiple related files. The description
|
|||
|
||||
All sections of this document are normative, except for examples, whose sections are marked as non-normative.
|
||||
|
||||
#### 1.1.1 RFC 2119 key words
|
||||
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
|
||||
|
||||
### 1.2 Simple example metafile content (non-normative)
|
||||
|
||||
A simple comma separated values (CSV) data file with the following content:
|
||||
|
@ -95,24 +99,24 @@ Two scenarios that might benefit from the use of fielded text are:
|
|||
|
||||
## 2 Metafile content
|
||||
|
||||
The [text metafile schema](tdwg_dwc_text.xsd) provides technical details for the structure of a metafile by defining the elements and attributes required to describe the contents and relationships between text files. These elements and attributes, with descriptions and specifications for their use in a metafile, are described in the following table.
|
||||
The [text metafile schema](tdwg_dwc_text.xsd) provides technical details for the structure of a metafile by defining the elements and attributes necessary to describe the contents and relationships between text files. These elements and attributes, with descriptions and specifications for their use in a metafile, are described in the following table.
|
||||
|
||||
### 2.1 The `<archive>` element
|
||||
|
||||
The `<archive>` element is the container for the list of related files (one core and zero or more extensions). The `<archive>` element has just one attribute, `metadata`.
|
||||
The `<archive>` element is the container for the list of related files (one core and zero or more extensions). The `<archive>` element MUST have one attribute, `metadata`.
|
||||
|
||||
#### 2.1.1 Attributes
|
||||
|
||||
Attribute | Description | Required | Default
|
||||
--- | --- | --- | ---
|
||||
`metadata` | Contains a qualified Uniform Resource Locator (URL) defining the location of a metadata description of the entire archive. The format of the metadata is not prescribed, but a standardized format such as Ecological Metadata Language (EML), Federal Geographic Data Committee (FGDC), or ISO 19115 family is recommended. | no |
|
||||
`metadata` | If used, the value MUST be a qualified Uniform Resource Locator (URL) defining the location of a metadata description of the entire archive. The format of the metadata is not prescribed, but a standardized format such as Ecological Metadata Language (EML), Federal Geographic Data Committee (FGDC), or ISO 19115 family is RECOMMENDED. | no |
|
||||
|
||||
#### 2.1.2 Elements
|
||||
|
||||
Element | Description
|
||||
--- | ---
|
||||
`<core>` | An `<archive>` must contain exactly one `<core>` element, representing the data entity (the actual file and its column header mappings to Darwin Core terms) upon which records are based. If extensions are being used, each record in the core data must have a unique identifier. The field for this identifier must be specified in an explicit `<id>` field in order to associate extension records with the core record.
|
||||
`<extension>` | An `<archive>` may define zero or more `<extension>` elements, each representing an individual extension entity directly related to the core. In addition to the general file attributes described below, every extension entity must have an explicit `<coreid>` field to relate the extension record to a row in the core entity. The extension itself does not have to have a unique ID field and many rows can point to the same core record.
|
||||
`<core>` | An `<archive>` MUST contain exactly one `<core>` element, representing the data entity (the actual file and its column header mappings to Darwin Core terms) upon which records are based. If extensions are being used, each record in the core data MUST have a unique identifier. The field for this identifier MUST be specified in an explicit `<id>` field in order to associate extension records with the core record.
|
||||
`<extension>` | An `<archive>` MAY define zero or more `<extension>` elements, each representing an individual extension entity directly related to the core. In addition to the general file attributes described below, every extension entity MUST have an explicit `<coreid>` field to relate the extension record to a row in the core entity. The extension itself does not have to have a unique ID field and many rows can point to the same core record.
|
||||
|
||||
### 2.2 The `<core>` or `<extension>` element
|
||||
|
||||
|
@ -120,45 +124,45 @@ Element | Description
|
|||
|
||||
Attribute | Description | Required | Default
|
||||
--- | --- | --- | ---
|
||||
`rowType` | A Unified Resource Identifier (URI) for the term identifying the class of data represented by each row, for example, <http://rs.tdwg.org/dwc/terms/Occurrence> for Occurrence records or <http://rs.tdwg.org/dwc/terms/Taxon> for Taxon records. Additional classes may be referenced by URI and defined outside the Darwin Core specification. The row type is required. For convenience the URIs for classes defined by the Darwin Core are: `Occurrence`: <http://rs.tdwg.org/dwc/terms/Occurrence>, `Event`: <http://rs.tdwg.org/dwc/terms/Event>, `Location`: <http://purl.org/dc/terms/Location>, `GeologicalContext`: <http://purl.org/dc/terms/GeologicalContext>, `Identification`: <http://rs.tdwg.org/dwc/terms/Identification>, `Taxon`: <http://rs.tdwg.org/dwc/terms/Taxon>, `ResourceRelationship`: <http://rs.tdwg.org/dwc/terms/ResourceRelationship>, `MeasurementOrFact`: <http://rs.tdwg.org/dwc/terms/MeasurementOrFact> | yes |
|
||||
`fieldsTerminatedBy` | Specifies the delimiter between fields. Typical values might be `,` or `\t` for CSV or Tab files respectively. | no | `,`
|
||||
`rowType` | MUST be a Unified Resource Identifier (URI) for the term identifying the class of data represented by each row, for example, <http://rs.tdwg.org/dwc/terms/Occurrence> for Occurrence records or <http://rs.tdwg.org/dwc/terms/Taxon> for Taxon records. Additional classes MAY be defined outside the Darwin Core specification if denoted by a URI. The row type is REQUIRED. For convenience the URIs for classes defined by the Darwin Core are: `Occurrence`: <http://rs.tdwg.org/dwc/terms/Occurrence>, `Event`: <http://rs.tdwg.org/dwc/terms/Event>, `Location`: <http://purl.org/dc/terms/Location>, `GeologicalContext`: <http://purl.org/dc/terms/GeologicalContext>, `Identification`: <http://rs.tdwg.org/dwc/terms/Identification>, `Taxon`: <http://rs.tdwg.org/dwc/terms/Taxon>, `ResourceRelationship`: <http://rs.tdwg.org/dwc/terms/ResourceRelationship>, `MeasurementOrFact`: <http://rs.tdwg.org/dwc/terms/MeasurementOrFact> | yes |
|
||||
`fieldsTerminatedBy` | Specifies the delimiter between fields. Typical values MAY be `,` or `\t` for CSV or Tab files respectively. | no | `,`
|
||||
`linesTerminatedBy` | Specifies the row separator character. | no | `\n`
|
||||
`fieldsEnclosedBy` | Specifies the character used to enclose (mark the start and end of) each field. CSV files frequently use the double quote character (`"`), which is the default value if none is explicitly provided. Note that a comma separated value file that has commas within the content of any field must have an enclosing character. | no | `"`
|
||||
`fieldsEnclosedBy` | Specifies the character used to enclose (mark the start and end of) each field. CSV files frequently use the double quote character (`"`), which is the default value if none is explicitly provided. Note that a comma separated value file that has commas within the content of any field MUST have an enclosing character. | no | `"`
|
||||
`encoding` | Specifies the [character encoding](http://en.wikipedia.org/wiki/Character_encoding) for the data file. The encoding is extremely important, but often ignored. The most frequently used encodings are: `UTF-8`: 8-bit Unicode Transformation Format, `UTF-16`: 16-bit Unicode Transformation Format, `ISO-8859-1`: commonly known as "Latin-1" and a common default on systems configured for a single western European language, `Windows-1252`: commonly known as "WinLatin" and a common default of legacy versions of Microsoft Windows based operating systems. | no | `UTF-8`
|
||||
`ignoreHeaderLines` | Specifies the number lines to ignore from the beginning of the file. This can be used to ignore files with column headings or preamble comments for example. | no | `0`
|
||||
`dateFormat` | When verbatim dates are consistent in format, this field can be used to indicate the format represented. It is recommended to use the date, dateTime and time for field formats wherever possible, but where verbatim dates are required, a format may be specified here. This should be considered a 'hint' for consumers. It is recommended that consumers support the minimum combinations of `DD` `MM` and `YYYY` with the separators `/` and `-`. Examples: `DDMMYYYY`: for dates of the form 21121978, `DD-MM-YYYY`: for dates of the form 21-12-1978, `MMDDYYYY`: for dates of the form 12211978, `MM-DD-YYYY`: for dates of the form 12-21-1978, `YYYYMMDD`: for dates of the form 19781221. | no | `YYYY-MM-DD`
|
||||
`ignoreHeaderLines` | Specifies the number lines to ignore from the beginning of the file. This MAY be used to ignore files with column headings or preamble comments for example. | no | `0`
|
||||
`dateFormat` | When verbatim dates are consistent in format, this field MAY be used to indicate the format represented. It is RECOMMENDED to use the date, dateTime and time for field formats wherever possible, but where verbatim dates are required, a format MAY be specified here. This should be considered a 'hint' for consumers. It is RECOMMENDED that consumers support the minimum combinations of `DD` `MM` and `YYYY` with the separators `/` and `-`. Examples: `DDMMYYYY`: for dates of the form 21121978, `DD-MM-YYYY`: for dates of the form 21-12-1978, `MMDDYYYY`: for dates of the form 12211978, `MM-DD-YYYY`: for dates of the form 12-21-1978, `YYYYMMDD`: for dates of the form 19781221. | no | `YYYY-MM-DD`
|
||||
|
||||
#### 2.2.2 Elements
|
||||
|
||||
Element | Description
|
||||
--- | ---
|
||||
`<files>` | `<core>` or `<extension>` element must contain one `<files>` element to locate the data being described.
|
||||
`<id>` | If extensions are being used, the `<core>` must contain an <id> element that indicates the identifier for a record.
|
||||
`<coreid>` | If extensions are being used, the `<extension>` element must contain a `<coreid>` element that indicates the column in the extension file that contains the core record identifier (the matching `<id>` in the core file).
|
||||
`<field>` | A `<core>` or `<extension>` element must contain one or more <field> elements, each representing a 'column' in the row.
|
||||
`<files>` | `<core>` or `<extension>` element MUST contain one `<files>` element to locate the data being described.
|
||||
`<id>` | If extensions are being used, the `<core>` MUST contain an <id> element that indicates the identifier for a record.
|
||||
`<coreid>` | If extensions are being used, the `<extension>` element MUST contain a `<coreid>` element that indicates the column in the extension file that contains the core record identifier (the matching `<id>` in the core file).
|
||||
`<field>` | A `<core>` or `<extension>` element MUST contain one or more <field> elements, each representing a 'column' in the row.
|
||||
|
||||
### 2.3 `<files>` element
|
||||
|
||||
The files element must contain one or more <location> elements, each defining where a file resides. Each core or extension entity can be composed from one or more files. If an entity has data in more than one file, use the `<location>` element multiple times, once for each file that makes up the entity.
|
||||
The files element MUST contain one or more <location> elements, each defining where a file resides. Each core or extension entity can be composed from one or more files. If an entity has data in more than one file, use the `<location>` element multiple times, once for each file that makes up the entity.
|
||||
|
||||
#### 2.3.1 Elements
|
||||
|
||||
Element | Description
|
||||
--- | ---
|
||||
`<location>` | Specifies the location of the file being described, which may take either of the following forms: 1) a web accessible URL such as `http://www.gbif.org/data/specimen.csv` or `ftp://ftp.gbif.org/tim/specimen.txt`, 2) a filepath relative to the location of the metafile such as `specimen.txt`, `./specimen.txt`, `data/specimen.txt`.
|
||||
`<location>` | Specifies the location of the file being described, which MUST take one of the following forms: 1) a web accessible URL such as `http://www.gbif.org/data/specimen.csv` or `ftp://ftp.gbif.org/tim/specimen.txt`, 2) a filepath relative to the location of the metafile such as `specimen.txt`, `./specimen.txt`, `data/specimen.txt`.
|
||||
|
||||
### 2.4 The `<field>` element
|
||||
|
||||
The field element is used to specify the location and content of data within a file. There must be one field element for every term being shared for the entity, whether explicitly or through the use of a default value for all rows in the file.
|
||||
The field element is used to specify the location and content of data within a file. There MUST be one field element for every term being shared for the entity, whether explicitly or through the use of a default value for all rows in the file.
|
||||
|
||||
#### 2.4.1 Attributes
|
||||
|
||||
Attribute | Description | Required | Default
|
||||
--- | --- | --- | ---
|
||||
`index` | Specifies the position of the column in the row. The first column has an index of 0, the second column 1, etc. If no column index is specified, then the term and the default may be used to define a constant value for all rows. | no |
|
||||
`term` | A Unified Resource Identifier (URI) for the term represented by this field. For example, a field containing the scientific name would have `term="http://rs.tdwg.org/dwc/terms/scientificName"`. Terms outside of the Darwin Core specification may be used, such as those from the Dublin Core Metadata Initative, for example, `dcterms:modified` would be `term="http://purl.org/dc/terms/modified"`. | yes |
|
||||
`default` | Specifies value to use if one is not supplied for the field in a given row. If no index is supplied, the default can be used to define a constant for all rows for a field that is not in the data file. | no |
|
||||
`vocabulary` | A Unified Resource Identifier (URI) for a vocabulary that the source values for this field are based on. The URI ideally should resolve to some machine readable definition like SKOS, RDF or at least some simple text or html file often found for ISO or RFC standards. For example <http://rs.gbif.org/vocabulary/gbif/nomenclatural_code.xml>, <http://www.ietf.org/rfc/rfc3066.txt> or <http://www.iso.org/iso/list-en1-semic-3.txt>. | no |
|
||||
`index` | Specifies the position of the column in the row. The first column has an index of 0, the second column 1, etc. If no column index is specified, then the term and the default MAY be used to define a constant value for all rows. | no |
|
||||
`term` | MUST be a Unified Resource Identifier (URI) for the term represented by this field. For example, a field containing the scientific name would have `term="http://rs.tdwg.org/dwc/terms/scientificName"`. Terms outside of the Darwin Core specification MAY be used, such as those from the Dublin Core Metadata Initative, for example, `dcterms:modified` would be `term="http://purl.org/dc/terms/modified"`. | yes |
|
||||
`default` | Specifies value to use if one is not supplied for the field in a given row. If no index is supplied, the default MAY be used to define a constant for all rows for a field that is not in the data file. | no |
|
||||
`vocabulary` | When present, MUST be a Unified Resource Identifier (URI) for a vocabulary that the source values for this field are based on. The URI ideally should resolve to some machine readable definition like SKOS, RDF or at least some simple text or html file often found for ISO or RFC standards. For example <http://rs.gbif.org/vocabulary/gbif/nomenclatural_code.xml>, <http://www.ietf.org/rfc/rfc3066.txt> or <http://www.iso.org/iso/list-en1-semic-3.txt>. | no |
|
||||
|
||||
## 3 Implementation guide
|
||||
|
||||
|
|
|
@ -0,0 +1,339 @@
|
|||
# Darwin Core XML guide
|
||||
|
||||
Title
|
||||
: Darwin Core XML guide
|
||||
|
||||
Date version issued
|
||||
: 2015-06-02
|
||||
|
||||
Date created
|
||||
: 2009-02-12
|
||||
|
||||
Part of TDWG Standard
|
||||
: <http://www.tdwg.org/standards/450/>
|
||||
|
||||
This version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2014-11-08>
|
||||
|
||||
Latest version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/>
|
||||
|
||||
Previous version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2010-05-23>
|
||||
|
||||
Replaced by
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2021-07-15>
|
||||
|
||||
Abstract
|
||||
: Guidelines for the implementation of Darwin Core in XML.
|
||||
|
||||
Contributors
|
||||
: John Wieczorek (MVZ), Markus Döring (GBIF), Renato De Giovanni (CRIA), Tim Robertson (GBIF), Dave Vieglais (KUNHM)
|
||||
|
||||
Creator
|
||||
: Darwin Core Task Group
|
||||
|
||||
Bibliographic citation
|
||||
: Darwin Core Task Group. 2014. Darwin Core XML guide. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/guides/xml/2014-11-08>
|
||||
|
||||
## 1 Introduction
|
||||
|
||||
This document provides guidelines for implementing application schemas based on [Darwin Core terms](../../terms/) using [XML](http://www.w3.org/XML/). The underlying metadata model is described (in a syntax neutral way), followed by some specific guidelines for XML implementations. Some guidance on the use of non-Darwin Core terms is also provided.
|
||||
|
||||
This document does not provide guidelines for encoding Darwin Core in RDF/XML. Nor does it take a position on the relative merits of encoding metadata in "plain" XML rather than RDF/XML. This document provides guidelines in those cases where RDF/XML is not considered appropriate.
|
||||
|
||||
### 1.1 Status of the content of this document
|
||||
|
||||
All sections of this document are normative, except for sections that are explicitly marked as non-normative.
|
||||
|
||||
### 1.2 Audience
|
||||
|
||||
This document is targeted toward those who wish to use or construct application schemas using Darwin Core terms in XML. It includes explanations of existing schemas such as [Simple Darwin Core](../simple/) and how to build new schemas to meet specific models of information.
|
||||
|
||||
## 2 Implementation guide
|
||||
|
||||
### 2.1 XML schema
|
||||
|
||||
Implementors should base their XML applications on [XML Schemas](http://www.w3.org/XML/Schema) rather than _XML DTDs_. Approaches based on _XML Schemas_ are more flexible and are more easily re-used within other XML applications.
|
||||
|
||||
### 2.2 XML namespaces
|
||||
|
||||
Implementors should use [XML Namespaces](http://www.w3.org/TR/1999/REC-xml-names-19990114/) to uniquely identify elements. Darwin Core namespaces are defined in the [Darwin Core Namespace Policy](../../namespace/), while Dublin Core namespaces are defined in the [DCMI Namespace Recommendation](http://dublincore.org/documents/dcmi-namespace/).
|
||||
|
||||
### 2.3 Abstract model
|
||||
|
||||
The Darwin Core follows the [Dublin Core Metadata Initiative Abstract Model](http://dublincore.org/documents/abstract-model/) except that the Darwin Core _record_ is roughly equivalent to the Dublin Core _resource_.
|
||||
|
||||
- Darwin Core terms are either `classes` or `properties`.
|
||||
- Each `property` has at most one `class` as its domain (describes no more than one `class`).
|
||||
- A `Darwin Core record` is made up of zero or more `classes` and one or more `properties` with their associated `values`.
|
||||
- Each `value` is a literal string.
|
||||
- The `values` of `properties` within a `Darwin Core record` describe that record.
|
||||
- A `Darwin Core record` must include all required `properties`, if any, and their associated `values`.
|
||||
|
||||
### 2.4 Properties and values
|
||||
|
||||
Darwin Core follows the guidelines for expressing [Dublin Core metadata using XML](http://dublincore.org/documents/dc-xml/) except in that Darwin Core implementors should encode `properties` as XML elements and `values` as the content of those elements instead of having each property contain a value representation and its associated value. The name of the XML element should be an XML qualified name (QName), which associates the value given in the `Term name` attribute in the [Darwin Core Terms](../../terms/) recommendation with the appropriate namespace name. For example, use:
|
||||
|
||||
```xml
|
||||
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
|
||||
targetNamespace="http://rs.tdwg.org/dwc/terms/"
|
||||
xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
|
||||
...
|
||||
<dwc:basisOfRecord>HumanObservation</dwc:basisOfRecord>
|
||||
```
|
||||
|
||||
rather than:
|
||||
|
||||
```xml
|
||||
<dwc:basisOfRecord value="HumanObservation"/>
|
||||
```
|
||||
|
||||
### 2.5 Null values
|
||||
|
||||
Elements for which the value is null should be omitted from the document or explicitly coded using the attribute `xsi:nil="true"`.
|
||||
|
||||
```xml
|
||||
<dwc:locality xsi:nil="true"/>
|
||||
```
|
||||
|
||||
Do not use an empty string - an element with no content:
|
||||
|
||||
```xml
|
||||
<dwc:locality></dwc:locality>
|
||||
```
|
||||
|
||||
### 2.6 Simple Darwin Core
|
||||
|
||||
[Simple Darwin Core](tdwg_dwc_simple.xsd) most closely models the "flat" nature of many data sets. It is a ready-made schema for sharing information with no structure beyond properties of a _record_ (equivalent to fields in a table, or columns in a spreadsheet). It is meant to accommodate all properties except those that require further structure to be meaningful (auxilliary terms in the classes [ResourceRelationship](http://rs.tdwg.org/dwc/terms/ResourceRelationship) and [MeasurementOrFact](http://rs.tdwg.org/dwc/terms/MeasurementOrFact). The schema has no required terms and no term is repeated within a given _record_. Refer to [Simple Darwin Core](../simple/) for the rationale behind this schema.
|
||||
|
||||
The term [`dcterms:type`](http://rs.tdwg.org/dwc/terms/dcterms:type) (which is controlled by the [Dublin Core Type Vocabulary](http://dublincore.org/documents/dcmi-type-vocabulary/)), gives the basic category of object (`PhysicalObject`, `StillImage`, `MovingImage`, `Sound`, `Text`) the record is about. The term [`basisOfRecord`](http://rs.tdwg.org/dwc/terms/basisOfRecord), which has a controlled vocabulary distinct from that of `dcterms:type`, shows the name of the Darwin Core class (e.g., [`LivingSpecimen`](http://rs.tdwg.org/dwc/terms/LivingSpecimen), [`PreservedSpecimen`](http://rs.tdwg.org/dwc/terms/PreservedSpecimen), [`FossilSpecimen`](http://rs.tdwg.org/dwc/terms/FossilSpecimen), [`HumanObservation`](http://rs.tdwg.org/dwc/terms/HumanObservation), [`MachineObservation`](http://rs.tdwg.org/dwc/terms/MachineObservation), [`Taxon`](http://rs.tdwg.org/dwc/terms/Taxon)) the record is about.
|
||||
|
||||
#### 2.6.1 Simple Darwin Core example (non-normative)
|
||||
|
||||
Following is a brief example of an XML document for a single specimen complying with the [Simple Darwin Core Schema](tdwg_dwc_simple.xsd)]. The [Simple Darwin Core XML example document](example_simple.xml) (if this link shows a blank page in your browser, use the View Source option to see the XML document) shows detail for a single record having a more complete set of elements.
|
||||
|
||||
```xml
|
||||
<?xml version="1.0"?>
|
||||
<dwr:SimpleDarwinRecordSet
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
|
||||
xmlns:dcterms="http://purl.org/dc/terms/"
|
||||
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
|
||||
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">
|
||||
<dwr:SimpleDarwinRecord>
|
||||
<dcterms:type>PhysicalObject</dcterms:type>
|
||||
<dcterms:modified>2009-02-12T12:43:31</dcterms:modified>
|
||||
<dcterms:rightsHolder>Museum of Vertebrate Zoology</dcterms:rightsHolder>
|
||||
<dcterms:rights>Creative Commons License</dcterms:rights>
|
||||
<dwc:institutionCode>MVZ</dwc:institutionCode>
|
||||
<dwc:collectionCode>Mammals</dwc:collectionCode>
|
||||
<dwc:occurrenceID>urn:catalog:MVZ:Mammals:14523</dwc:occurrenceID>
|
||||
<dwc:basisOfRecord>PreservedSpecimen</dwc:basisOfRecord>
|
||||
<dwc:country>Argentina</dwc:country>
|
||||
<dwc:countryCode>AR</dwc:countryCode>
|
||||
<dwc:stateProvince>Neuquén</dwc:stateProvince>
|
||||
<dwc:locality>25 km al NNE de Bariloche por Ruta 40 (=237)</dwc:locality>
|
||||
</dwr:SimpleDarwinRecord>
|
||||
</dwr:SimpleDarwinRecordSet>
|
||||
```
|
||||
|
||||
### 2.7 Classes and containment
|
||||
|
||||
Many Darwin Core terms (`properties`) are defined as being associated with another term (a `class`). For example, [`scientificName`](http://rs.tdwg.org/dwc/terms/scientificName) and [`Taxon`](http://rs.tdwg.org/dwc/terms/Taxon) are both Darwin Core terms, but `scientificName` is a property associated with the `Taxon` class. When constructing schemas that take advantage of classes in structures, implementors are encouraged to maintain the property/class relationships defined by the terms whenever possible (refer to the `Class` attribute of the term as given in the [Quick Reference Guide](../../terms/) or the attribute `dwcattributes:organizedInClass` in the term declaration in the [`dcterms.rdf`](../rdf/dcterms.rdf) file. To promote reuse, Darwin Core provides a set of xml schemas to use as the basis of additional schemas:
|
||||
|
||||
- [Terms XML Schema](tdwg_dwcterms.xsd) - property term definitions as typed global elements and named groups for all terms for a given class to be referenced. The schema makes use of substitution groups `anyClass`, `anyProperty`, `anyIdentifier` and `anyXYZTerm` for each class, e.g. `anyTaxonTerm`. This is the schema upon which the [Simple Darwin Core XML Schema](tdwg_dwc_simple.xsd) is based.
|
||||
- [Class Terms XML Schema](tdwg_dwc_class_terms.xsd) - class term definitions as typed global elements with subelements referencing all corresponding property terms via their substitution group.
|
||||
|
||||
It is encouraged to use classes in a normalized way to avoid deep nesting. A [Darwin Core Tools and Applications page](https://github.com/tdwg/dwc-documentation/blob/master/documentation/resources.md) has been created as an index to example schemas for the purpose of community discussions and development. An [XML schema](tdwg_dwc_classes.xsd) is provided to freely mix any Darwin Core Class in a global list and allow them to reference each other using the respective class identifier terms.
|
||||
|
||||
#### 2.7.1 Normalized classes examples (non-normative)
|
||||
|
||||
Following is an example of using normalized classes to represent two related specimen occurrences (one of which has had a second identification) at one location following this class-based schema. Note that you can reuse the location definition here by referring to it via locationID:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0"?>
|
||||
<dwr:DarwinRecordSet
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/dwcrecord/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_classes.xsd"
|
||||
xmlns:dcterms="http://purl.org/dc/terms/"
|
||||
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
|
||||
xmlns:dwr="http://rs.tdwg.org/dwc/dwcrecord/">
|
||||
<dcterms:Location>
|
||||
<dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
|
||||
<dwc:country>Argentina</dwc:country>
|
||||
<dwc:countryCode>AR</dwc:countryCode>
|
||||
<dwc:stateProvince>Neuquén</dwc:stateProvince>
|
||||
<dwc:locality>25 km al NNE de Bariloche por Ruta 40 (=237)</dwc:locality>
|
||||
</dcterms:Location>
|
||||
<dwc:Occurrence>
|
||||
<dcterms:type>PhysicalObject</dcterms:type>
|
||||
<dcterms:modified>2009-02-12T12:43:31</dcterms:modified>
|
||||
<dwc:institutionCode>MVZ</dwc:institutionCode>
|
||||
<dwc:collectionCode>Mammals</dwc:collectionCode>
|
||||
<dwc:occurrenceID>urn:catalog:MVZ:Mammals:14523</dwc:occurrenceID>
|
||||
<dwc:basisOfRecord>PreservedSpecimen</dwc:basisOfRecord>
|
||||
<dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
|
||||
</dwc:Occurrence>
|
||||
<dwc:Identification>
|
||||
<dwc:identificationID>http://guid.mvz.org/identifications/23459</dwc:identificationID>
|
||||
<dwc:identifiedBy>Richard Sage</dwc:identifiedBy>
|
||||
<dwc:dateIdentified>2000</dwc:dateIdentified>
|
||||
<dwc:identificationQualifier>sp.</dwc:identificationQualifier>
|
||||
<dwc:occurrenceID>urn:catalog:MVZ:Mammals:14523</dwc:occurrenceID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:d79c11aa-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Identification>
|
||||
<dwc:Taxon>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:d79c11aa-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
<dwc:scientificName>Ctenomys</dwc:scientificName>
|
||||
<dwc:taxonRank>genus</dwc:taxonRank>
|
||||
<dwc:nomenclaturalCode>ICZN</dwc:nomenclaturalCode>
|
||||
<dwc:genus>Ctenomys</dwc:genus>
|
||||
</dwc:Taxon>
|
||||
<dwc:Identification>
|
||||
<dwc:identificationID>http://guid.mvz.org/identifications/94752</dwc:identificationID>
|
||||
<dwc:identifiedBy>James L Patton</dwc:identifiedBy>
|
||||
<dwc:dateIdentified>2001-09-14</dwc:dateIdentified>
|
||||
<dwc:occurrenceID>urn:catalog:MVZ:Mammals:14523</dwc:occurrenceID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:df0a797c-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Identification>
|
||||
<dwc:Taxon>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:df0a797c-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
<dwc:parentNameUsageID>urn:lsid:catalogueoflife.org:taxon:d79c11aa-29c1-102b-9a4a-00304854f820:col20120721</dwc:parentNameUsageID>
|
||||
<dwc:scientificName>Ctenomys sociabilis</dwc:scientificName>
|
||||
<dwc:scientificNameAuthorship>Pearson and Christie, 1985</dwc:scientificNameAuthorship>
|
||||
<dwc:taxonRank>species</dwc:taxonRank>
|
||||
<dwc:nomenclaturalCode>ICZN</dwc:nomenclaturalCode>
|
||||
<dwc:higherClassification>Animalia; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Rodentia; Hystricognatha; Hystricognathi; Ctenomyidae; Ctenomyini; Ctenomys</dwc:higherClassification>
|
||||
<dwc:kingdom>Animalia</dwc:kingdom>
|
||||
<dwc:phylum>Chordata</dwc:phylum>
|
||||
<dwc:class>Mammalia</dwc:class>
|
||||
<dwc:order>Rodentia</dwc:order>
|
||||
<dwc:family>Ctenomyidae</dwc:family>
|
||||
<dwc:genus>Ctenomys</dwc:genus>
|
||||
<dwc:specificEpithet>sociabilis</dwc:specificEpithet>
|
||||
</dwc:Taxon>
|
||||
<dwc:Occurrence>
|
||||
<dcterms:type>PhysicalObject</dcterms:type>
|
||||
<dcterms:modified>2009-02-12T12:43:31</dcterms:modified>
|
||||
<dwc:institutionCode>MVZ</dwc:institutionCode>
|
||||
<dwc:collectionCode>Mammals</dwc:collectionCode>
|
||||
<dwc:occurrenceID>urn:catalog:MVZ:Mammals:14524</dwc:occurrenceID>
|
||||
<dwc:basisOfRecord>PreservedSpecimen</dwc:basisOfRecord>
|
||||
<dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
|
||||
</dwc:Occurrence>
|
||||
<dwc:Identification>
|
||||
<dwc:identificationID>http://guid.mvz.org/identifications/94753</dwc:identificationID>
|
||||
<dwc:identifiedBy>James L Patton</dwc:identifiedBy>
|
||||
<dwc:dateIdentified>2001-09-14</dwc:dateIdentified>
|
||||
<dwc:occurrenceID>urn:catalog:MVZ:Mammals:14524</dwc:occurrenceID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:df0a797c-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Identification>
|
||||
<dwc:ResourceRelationship>
|
||||
<dwc:resourceRelationshipID>http://guid.mvz.org/relations/23423</dwc:resourceRelationshipID>
|
||||
<dwc:resourceID>urn:catalog:MVZ:Mammals:14523</dwc:resourceID>
|
||||
<dwc:relatedResourceID>urn:catalog:MVZ:Mammals:14524</dwc:relatedResourceID>
|
||||
<dwc:relationshipOfResource>offspring of</dwc:relationshipOfResource>
|
||||
</dwc:ResourceRelationship>
|
||||
<dwc:ResourceRelationship>
|
||||
<dwc:resourceRelationshipID>http://guid.mvz.org/relations/23424</dwc:resourceRelationshipID>
|
||||
<dwc:resourceID>urn:catalog:MVZ:Mammals:14524</dwc:resourceID>
|
||||
<dwc:relatedResourceID>urn:catalog:MVZ:Mammals:14523</dwc:relatedResourceID>
|
||||
<dwc:relationshipOfResource>mother of</dwc:relationshipOfResource>
|
||||
</dwc:ResourceRelationship>
|
||||
</dwr:DarwinRecordSet>
|
||||
```
|
||||
|
||||
Here is different example demonstrating area count observations for events on two different days at one location. Note that we omit the identification class here as there is not identification related data and link via the `taxonID` directly:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0"?>
|
||||
<dwr:DarwinRecordSet
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://rs.tdwg.org/dwc/dwcrecord/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_classes.xsd"
|
||||
xmlns:dcterms="http://purl.org/dc/terms/"
|
||||
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
|
||||
xmlns:dwr="http://rs.tdwg.org/dwc/dwcrecord/">
|
||||
<dcterms:Location>
|
||||
<dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
|
||||
<dwc:country>Argentina</dwc:country>
|
||||
<dwc:countryCode>AR</dwc:countryCode>
|
||||
<dwc:stateProvince>Neuquén</dwc:stateProvince>
|
||||
<dwc:locality>Valle Limay, Estancia Rincon Grande, 48 ha area with centroid at this point</dwc:locality>
|
||||
<dwc:decimalLatitude>-40.97467</dwc:decimalLatitude>
|
||||
<dwc:decimalLongitude>-71.0734</dwc:decimalLongitude>
|
||||
<dwc:geodeticDatum>WGS84</dwc:geodeticDatum>
|
||||
<dwc:coordinateUncertaintyInMeters>200</dwc:coordinateUncertaintyInMeters>
|
||||
</dcterms:Location>
|
||||
<dwc:Event>
|
||||
<dwc:eventID>http://guid.mvz.org/events/2006/11/26/17</dwc:eventID>
|
||||
<dwc:samplingProtocol>area count</dwc:samplingProtocol>
|
||||
<dwc:eventDate>2006-11-26</dwc:eventDate>
|
||||
<dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
|
||||
</dwc:Event>
|
||||
<dwc:Occurrence>
|
||||
<dwc:occurrenceID>urn:catalog:AUDCLO:EBIRD:OBS64515288</dwc:occurrenceID>
|
||||
<dcterms:type>Event</dcterms:type>
|
||||
<dcterms:modified>2009-02-17T07:33:04Z</dcterms:modified>
|
||||
<dwc:institutionCode>AUDCLO</dwc:institutionCode>
|
||||
<dwc:collectionCode>EBIRD</dwc:collectionCode>
|
||||
<dwc:basisOfRecord>HumanObservation</dwc:basisOfRecord>
|
||||
<dwc:individualCount>2</dwc:individualCount>
|
||||
<dwc:eventID>http://guid.mvz.org/events/2006/11/26/17</dwc:eventID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:f000ee00-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Occurrence>
|
||||
<dwc:Taxon>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:f000ee00-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
<dwc:scientificName>Anthus hellmayri Hartert, 1909</dwc:scientificName>
|
||||
<dwc:class>Aves</dwc:class>
|
||||
<dwc:genus>Anthus</dwc:genus>
|
||||
<dwc:specificEpithet>hellmayri</dwc:specificEpithet>
|
||||
</dwc:Taxon>
|
||||
<dwc:Occurrence>
|
||||
<dwc:occurrenceID>urn:catalog:AUDCLO:EBIRD:OBS64515286</dwc:occurrenceID>
|
||||
<dcterms:type>Event</dcterms:type>
|
||||
<dcterms:modified>2009-02-17T07:33:04Z</dcterms:modified>
|
||||
<dwc:institutionCode>AUDCLO</dwc:institutionCode>
|
||||
<dwc:collectionCode>EBIRD</dwc:collectionCode>
|
||||
<dwc:basisOfRecord>HumanObservation</dwc:basisOfRecord>
|
||||
<dwc:individualCount>1</dwc:individualCount>
|
||||
<dwc:eventID>http://guid.mvz.org/events/2006/11/26/17</dwc:eventID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:f000e838-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Occurrence>
|
||||
<dwc:Taxon>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:f000e838-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
<dwc:scientificName>Anthus correndera Vieillot, 1818</dwc:scientificName>
|
||||
<dwc:class>Aves</dwc:class>
|
||||
<dwc:genus>Anthus</dwc:genus>
|
||||
<dwc:specificEpithet>correndera</dwc:specificEpithet>
|
||||
</dwc:Taxon>
|
||||
<dwc:Event>
|
||||
<dwc:eventID>http://guid.mvz.org/events/2006/11/27/6</dwc:eventID>
|
||||
<dwc:samplingProtocol>area count</dwc:samplingProtocol>
|
||||
<dwc:eventDate>2006-11-27</dwc:eventDate>
|
||||
<dwc:locationID>http://guid.mvz.org/sites/arg/127</dwc:locationID>
|
||||
</dwc:Event>
|
||||
<dwc:Occurrence>
|
||||
<dwc:occurrenceID>urn:catalog:AUDCLO:EBIRD:OBS64515333</dwc:occurrenceID>
|
||||
<dcterms:type>Event</dcterms:type>
|
||||
<dcterms:modified>2009-02-17T07:33:04Z</dcterms:modified>
|
||||
<dwc:institutionCode>AUDCLO</dwc:institutionCode>
|
||||
<dwc:collectionCode>EBIRD</dwc:collectionCode>
|
||||
<dwc:basisOfRecord>HumanObservation</dwc:basisOfRecord>
|
||||
<dwc:individualCount>1</dwc:individualCount>
|
||||
<dwc:eventID>http://guid.mvz.org/events/2006/11/27/6</dwc:eventID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:f000ee00-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Occurrence>
|
||||
<dwc:Occurrence>
|
||||
<dwc:occurrenceID>urn:catalog:AUDCLO:EBIRD:OBS64515331</dwc:occurrenceID>
|
||||
<dcterms:type>Event</dcterms:type>
|
||||
<dcterms:modified>2009-02-17T07:33:04Z</dcterms:modified>
|
||||
<dwc:institutionCode>AUDCLO</dwc:institutionCode>
|
||||
<dwc:collectionCode>EBIRD</dwc:collectionCode>
|
||||
<dwc:basisOfRecord>HumanObservation</dwc:basisOfRecord>
|
||||
<dwc:individualCount>2</dwc:individualCount>
|
||||
<dwc:eventID>http://guid.mvz.org/events/2006/11/27/6</dwc:eventID>
|
||||
<dwc:taxonID>urn:lsid:catalogueoflife.org:taxon:f000ee00-29c1-102b-9a4a-00304854f820:col20120721</dwc:taxonID>
|
||||
</dwc:Occurrence>
|
||||
</dwr:DarwinRecordSet>
|
||||
```
|
|
@ -4,7 +4,7 @@ Title
|
|||
: Darwin Core XML guide
|
||||
|
||||
Date version issued
|
||||
: 2015-06-02
|
||||
: 2021-07-15
|
||||
|
||||
Date created
|
||||
: 2009-02-12
|
||||
|
@ -13,13 +13,13 @@ Part of TDWG Standard
|
|||
: <http://www.tdwg.org/standards/450/>
|
||||
|
||||
This version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2014-11-08>
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2021-07-15>
|
||||
|
||||
Latest version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/>
|
||||
|
||||
Previous version
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2010-05-23>
|
||||
: <http://rs.tdwg.org/dwc/terms/guides/xml/2014-11-08>
|
||||
|
||||
Abstract
|
||||
: Guidelines for the implementation of Darwin Core in XML.
|
||||
|
@ -31,18 +31,22 @@ Creator
|
|||
: Darwin Core Task Group
|
||||
|
||||
Bibliographic citation
|
||||
: Darwin Core Task Group. 2009. Darwin Core XML guide. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/guides/xml/>
|
||||
: Darwin Core Maintenance Group. 2021. Darwin Core XML guide. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/terms/guides/xml/2021-07-15>
|
||||
|
||||
## 1 Introduction
|
||||
|
||||
This document provides guidelines for implementing application schemas based on [Darwin Core terms](../../terms/) using [XML](http://www.w3.org/XML/). The underlying metadata model is described (in a syntax neutral way), followed by some specific guidelines for XML implementations. Some guidance on the use of non-Darwin Core terms is also provided.
|
||||
|
||||
This document does not provide guidelines for encoding Darwin Core in RDF/XML. Nor does it take a position on the relative merits of encoding metadata in "plain" XML rather than RDF/XML. This document provides guidelines in those cases where RDF/XML is not considered appropriate.
|
||||
This document does not provide guidelines for encoding Darwin Core in RDF/XML. Nor does it take a position on the relative merits of encoding metadata in "plain" XML rather than RDF/XML. This document provides guidelines in those cases where RDF/XML is not considered appropriate. For information about implementing Darwin Core as RDF, see the Darwin Core RDF Guide, <http://rs.tdwg.org/dwc/terms/guides/rdf/>.
|
||||
|
||||
### 1.1 Status of the content of this document
|
||||
|
||||
All sections of this document are normative, except for sections that are explicitly marked as non-normative.
|
||||
|
||||
#### 1.1.1 RFC 2119 key words
|
||||
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
|
||||
|
||||
### 1.2 Audience
|
||||
|
||||
This document is targeted toward those who wish to use or construct application schemas using Darwin Core terms in XML. It includes explanations of existing schemas such as [Simple Darwin Core](../simple/) and how to build new schemas to meet specific models of information.
|
||||
|
@ -51,11 +55,11 @@ This document is targeted toward those who wish to use or construct application
|
|||
|
||||
### 2.1 XML schema
|
||||
|
||||
Implementors should base their XML applications on [XML Schemas](http://www.w3.org/XML/Schema) rather than _XML DTDs_. Approaches based on _XML Schemas_ are more flexible and are more easily re-used within other XML applications.
|
||||
Implementors SHOULD base their XML applications on [XML Schemas](http://www.w3.org/XML/Schema) rather than _XML DTDs_. Approaches based on _XML Schemas_ are more flexible and are more easily re-used within other XML applications.
|
||||
|
||||
### 2.2 XML namespaces
|
||||
|
||||
Implementors should use [XML Namespaces](http://www.w3.org/TR/1999/REC-xml-names-19990114/) to uniquely identify elements. Darwin Core namespaces are defined in the [Darwin Core Namespace Policy](../../namespace/), while Dublin Core namespaces are defined in the [DCMI Namespace Recommendation](http://dublincore.org/documents/dcmi-namespace/).
|
||||
Implementors SHOULD use [XML Namespaces](http://www.w3.org/TR/1999/REC-xml-names-19990114/) to uniquely identify elements. Darwin Core namespaces are defined in the [Darwin Core Namespace Policy](../../namespace/), while Dublin Core namespaces are defined in the [DCMI Namespace Recommendation](http://dublincore.org/documents/dcmi-namespace/).
|
||||
|
||||
### 2.3 Abstract model
|
||||
|
||||
|
@ -66,11 +70,11 @@ The Darwin Core follows the [Dublin Core Metadata Initiative Abstract Model](htt
|
|||
- A `Darwin Core record` is made up of zero or more `classes` and one or more `properties` with their associated `values`.
|
||||
- Each `value` is a literal string.
|
||||
- The `values` of `properties` within a `Darwin Core record` describe that record.
|
||||
- A `Darwin Core record` must include all required `properties`, if any, and their associated `values`.
|
||||
- A `Darwin Core record` MUST include all required `properties`, if any, and their associated `values`.
|
||||
|
||||
### 2.4 Properties and values
|
||||
|
||||
Darwin Core follows the guidelines for expressing [Dublin Core metadata using XML](http://dublincore.org/documents/dc-xml/) except in that Darwin Core implementors should encode `properties` as XML elements and `values` as the content of those elements instead of having each property contain a value representation and its associated value. The name of the XML element should be an XML qualified name (QName), which associates the value given in the `Term name` attribute in the [Darwin Core Terms](../../terms/) recommendation with the appropriate namespace name. For example, use:
|
||||
Darwin Core follows the guidelines for expressing [Dublin Core metadata using XML](http://dublincore.org/documents/dc-xml/) except in that Darwin Core implementors MUST encode `properties` as XML elements and `values` as the content of those elements instead of having each property contain a value representation and its associated value. The name of the XML element SHOULD be an XML qualified name (QName), which associates the value given in the `Term name` attribute in the [Darwin Core Terms](../../terms/) recommendation with the appropriate namespace name. For example, use:
|
||||
|
||||
```xml
|
||||
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
|
||||
|
@ -88,13 +92,13 @@ rather than:
|
|||
|
||||
### 2.5 Null values
|
||||
|
||||
Elements for which the value is null should be omitted from the document or explicitly coded using the attribute `xsi:nil="true"`.
|
||||
Elements for which the value is null SHOULD be omitted from the document or OPTIONALLY be explicitly coded using the attribute `xsi:nil="true"`.
|
||||
|
||||
```xml
|
||||
<dwc:locality xsi:nil="true"/>
|
||||
```
|
||||
|
||||
Do not use an empty string - an element with no content:
|
||||
Implementers MUST NOT use an empty string - an element with no content:
|
||||
|
||||
```xml
|
||||
<dwc:locality></dwc:locality>
|
||||
|
@ -102,7 +106,7 @@ Do not use an empty string - an element with no content:
|
|||
|
||||
### 2.6 Simple Darwin Core
|
||||
|
||||
[Simple Darwin Core](tdwg_dwc_simple.xsd) most closely models the "flat" nature of many data sets. It is a ready-made schema for sharing information with no structure beyond properties of a _record_ (equivalent to fields in a table, or columns in a spreadsheet). It is meant to accommodate all properties except those that require further structure to be meaningful (auxilliary terms in the classes [ResourceRelationship](http://rs.tdwg.org/dwc/terms/ResourceRelationship) and [MeasurementOrFact](http://rs.tdwg.org/dwc/terms/MeasurementOrFact). The schema has no required terms and no term is repeated within a given _record_. Refer to [Simple Darwin Core](../simple/) for the rationale behind this schema.
|
||||
[Simple Darwin Core](tdwg_dwc_simple.xsd) most closely models the "flat" nature of many data sets. It is a ready-made schema for sharing information with no structure beyond properties of a _record_ (equivalent to fields in a table, or columns in a spreadsheet). It is meant to accommodate all properties except those that require further structure to be meaningful (auxilliary terms in the classes [ResourceRelationship](http://rs.tdwg.org/dwc/terms/ResourceRelationship) and [MeasurementOrFact](http://rs.tdwg.org/dwc/terms/MeasurementOrFact). The schema has no required terms and terms SHOULD NOT be repeated within a given _record_. Refer to [Simple Darwin Core](../simple/) for the rationale behind this schema.
|
||||
|
||||
The term [`dcterms:type`](http://rs.tdwg.org/dwc/terms/dcterms:type) (which is controlled by the [Dublin Core Type Vocabulary](http://dublincore.org/documents/dcmi-type-vocabulary/)), gives the basic category of object (`PhysicalObject`, `StillImage`, `MovingImage`, `Sound`, `Text`) the record is about. The term [`basisOfRecord`](http://rs.tdwg.org/dwc/terms/basisOfRecord), which has a controlled vocabulary distinct from that of `dcterms:type`, shows the name of the Darwin Core class (e.g., [`LivingSpecimen`](http://rs.tdwg.org/dwc/terms/LivingSpecimen), [`PreservedSpecimen`](http://rs.tdwg.org/dwc/terms/PreservedSpecimen), [`FossilSpecimen`](http://rs.tdwg.org/dwc/terms/FossilSpecimen), [`HumanObservation`](http://rs.tdwg.org/dwc/terms/HumanObservation), [`MachineObservation`](http://rs.tdwg.org/dwc/terms/MachineObservation), [`Taxon`](http://rs.tdwg.org/dwc/terms/Taxon)) the record is about.
|
||||
|
||||
|
@ -142,7 +146,7 @@ Many Darwin Core terms (`properties`) are defined as being associated with anoth
|
|||
- [Terms XML Schema](tdwg_dwcterms.xsd) - property term definitions as typed global elements and named groups for all terms for a given class to be referenced. The schema makes use of substitution groups `anyClass`, `anyProperty`, `anyIdentifier` and `anyXYZTerm` for each class, e.g. `anyTaxonTerm`. This is the schema upon which the [Simple Darwin Core XML Schema](tdwg_dwc_simple.xsd) is based.
|
||||
- [Class Terms XML Schema](tdwg_dwc_class_terms.xsd) - class term definitions as typed global elements with subelements referencing all corresponding property terms via their substitution group.
|
||||
|
||||
It is encouraged to use classes in a normalized way to avoid deep nesting. A [Darwin Core Tools and Applications page](https://github.com/tdwg/dwc-documentation/blob/master/documentation/resources.md) has been created as an index to example schemas for the purpose of community discussions and development. An [XML schema](tdwg_dwc_classes.xsd) is provided to freely mix any Darwin Core Class in a global list and allow them to reference each other using the respective class identifier terms.
|
||||
It is RECOMMENDED to use classes in a normalized way to avoid deep nesting. A [Darwin Core Tools and Applications page](https://github.com/tdwg/dwc-documentation/blob/master/documentation/resources.md) has been created as an index to example schemas for the purpose of community discussions and development. An [XML schema](tdwg_dwc_classes.xsd) is provided to freely mix any Darwin Core Class in a global list and allow them to reference each other using the respective class identifier terms.
|
||||
|
||||
#### 2.7.1 Normalized classes examples (non-normative)
|
||||
|
||||
|
|
Loading…
Reference in New Issue