Simple Darwin Core
|
Audience: This document is targeted toward those who want to share biodiversity information using the simplest methods and structure - the Simple Darwin Core. It explains the uses and limitations of this structure and how to expand upon it.
There is a difference between having data in a field and requiring that field to have a value from among a legal set of values. The Darwin Core is simple in that it has minimal restrictions on the contents of fields. The term comments give recommendations about the use of controlled vocabularies and how to structure content wherever appropriate. Data contributors are encouraged to follow these recommendations as well as possible. You might argue that having no restrictions will promote "dirty" data (data of low quality or dubious value). Consider the simple axiom "It's not what you have, but what you do with it that matters." If data restrictions were in place at the fundamental level, then a record having any non-compliant data in any of its fields could not be shared via the standard. Not only would there be a dearth of shared data in that case (or an unused standard), but also there would be no way to use the standard to build shared data cleaning tools to actually improve the situation, nor to use data services to look up alternative representations (language translations, for example) to serve a broader audience. The rest is up to how the records will be used - in other words, it is up to applications to enforce further restrictions if appropriate, and it is up to the stakeholders of those applications to decide what the restrictions will be for the purpose the application is trying to serve.
What you need to do as a contributor of data via the Simple Darwin Core depends on the requirements of the ones who are going to consume those data. For example, if you have a collaborator who wants to share data via the Simple Darwin Core, then it may be sufficient to create a spreadsheet that contains column headers matching as many of the Darwin Core term names as you are both interested in sharing - just to be sure you both understand the meaning of the fields you share, and therefore hopefully something about their content. You might create a table in a database using the Simple Darwin Core as a model (if it met all of your needs), and then connect that database with services for sharing via the web. You might use that same database (or spreadsheet) to export a comma-separated value (CSV) file for upload into a hosted service that could serve the data on your behalf. Or you might use that same file to upload into a service that would allow you to add value (such as a georeference) or quality (with a data cleaning tool), or to see your data in the context of other shared data.
<?xml version="1.0" encoding="UTF-8"?> <SimpleDarwinRecordSet xmlns="http://rs.tdwg.org/dwc/xsd/simpledarwincore/" xmlns:dc="http://purl.org/dc/terms/" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"> <SimpleDarwinRecord> <dc:modified>2006-05-04T18:13:51.0Z</dc:modified> <dc:language>en</dc:language> <dwc:basisOfRecord>Taxon</dwc:basisOfRecord> <dwc:scientificNameID>http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?spid=53548</dwc:scientificNameID> <dwc:acceptedNameUsageID>http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?spid=22010</dwc:acceptedNameUsageID> <dwc:originalNameUsageID>http://research.calacademy.org/research/ichthyology/catalog/fishcatget.asp?spid=53548</dwc:originalNameUsageID> <dwc:nameAccordingToID>http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=22764</dwc:nameAccordingToID> <dwc:namePublishedInID>http://research.calacademy.org/research/ichthyology/catalog/getref.asp?id=671</dwc:namePublishedInID> <dwc:scientificName>Centropyge flavicauda Fraser-Brunner 1933</dwc:scientificName> <dwc:acceptedNameUsage>Centropyge fisheri (Snyder 1904)</dwc:acceptedNameUsage> <dwc:parentNameUsage>Centropyge Kaup, 1860</dwc:parentNameUsage> <dwc:originalNameUsage>Centropyge flavicauda Fraser-Brunner 1933</dwc:originalNameUsage> <dwc:nameAccordingTo>Allen, G.R. 1980. Butterfly and angelfishes of the world. Volume II. Mergus Publishers. Pp. 149-352.</dwc:nameAccordingTo> <dwc:namePublishedIn>Fraser-Brunner, A. 1933. A revision of the chaetodont fishes of the subfamily Pomacanthinae. Proceedings of the General Meetings for Scientific Business of the Zoological Society of London 1933 (pt 3, no.30): 543-599, Pl. 1.</dwc:namePublishedIn> <dwc:higherClassification>Animalia;Chordata;Vertebrata;Osteichthyes;Actinopterygii;Neopterygii;Teleostei;Acanthopterygii;Perciformes; Percoidei;Pomacanthidae;Centropyge</dwc:higherClassification> <dwc:kingdom>Animalia</dwc:kingdom> <dwc:phylum>Chordata</dwc:phylum> <dwc:class>Osteichthyes</dwc:class> <dwc:order>Perciformes</dwc:order> <dwc:family>Pomacanthidae</dwc:family> <dwc:genus>Centropyge</dwc:genus> <dwc:specificEpithet>flavicauda</dwc:specificEpithet> <dwc:scientificNameAuthorship>Fraser-Brunner 1933</dwc:scientificNameAuthorship> <dwc:taxonRank>species</dwc:taxonRank> <dwc:nomenclaturalCode>ICZN</dwc:nomenclaturalCode> <dwc:taxonomicStatus>accepted</dwc:taxonomicStatus> </SimpleDarwinRecord> </SimpleDarwinRecordSet>
The SimpleDarwinRecord acts as a Class in implementation, because all of the terms are properties of it. The Simple Darwin Core schema has just one other level of structure, the SimpleDarwinRecordSet, which is a grouping of one or more SimpleDarwinRecords. The SimpleDarwinRecordSet acts as a Class to define a data set during implementation.
One way would be to try to "overload" existing terms by using them to hold information other than what was intended based on the definition of the terms. Please don't do this. If an existing term has close to the same meaning as one you want to use, but just doesn't quite fit because of the way the definition is worded, it would be better to request an amendment to the term definition so that it will be clear for your community how to use it. You can request such a change by submitting an issue in the Darwin Core Project [DWC-PROJECT].
Another way to get more out of the Darwin Core without adding a term is to "payload" the dynamicProperties term with structured content, as shown in the example below, using Javascript Open Notatation (JSON). This is perfectly legal, since it doesn't compromise the meaning of the term. One of the weaknesses of payloading data in this way is that it is subject to a lack of stable or well-defined semantics. Also, it is highly recommended to flatten the content into a single string with no non-printing characters (such as line feeds) to facilitate use in the widest variety of data sharing contexts. Still, this might be a reasonable way to at least allow you to share all of your data, even if there might be problems with people using it reliably.
<?xml version="1.0" encoding="UTF-8"?> <SimpleDarwinRecordSet xmlns="http://rs.tdwg.org/dwc/xsd/simpledarwincore/" xmlns:dc="http://purl.org/dc/terms/" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"> <SimpleDarwinRecord> <dc:modified>2009-02-12T12:43:31</dc:modified> <dc:language>en</dc:language> <dwc:basisOfRecord>Taxon</dwc:basisOfRecord> <dwc:scientificName>Ctenomys sociabilis</dwc:scientificName> <dwc:acceptedNameUsage>Ctenomys sociabilis Pearson and Christie, 1985</dwc:acceptedNameUsage> <dwc:parentNameUsage>Ctenomys Blainville, 1826</dwc:parentNameUsage> <dwc:higherClassification>Animalia; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Rodentia; Hystricognatha; Hystricognathi; Ctenomyidae; Ctenomyini; Ctenomys</dwc:higherClassification> <dwc:kingdom>Animalia</dwc:kingdom> <dwc:phylum>Chordata</dwc:phylum> <dwc:class>Mammalia</dwc:class> <dwc:order>Rodentia</dwc:order> <dwc:family>Ctenomyidae</dwc:family> <dwc:genus>Ctenomys</dwc:genus> <dwc:specificEpithet>sociabilis</dwc:specificEpithet> <dwc:taxonRank>species</dwc:taxonRank> <dwc:scientificNameAuthorship>Pearson and Christie, 1985</dwc:scientificNameAuthorship> <dwc:nomenclaturalCode>ICZN</dwc:nomenclaturalCode> <dwc:namePublishedIn>Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388</dwc:namePublishedIn> <dwc:taxonomicStatus>valid</dwc:taxonomicStatus> <dwc:dynamicProperties>{"iucnStatus":"vulnerable", "distribution":"Neuquén, Argentina"}</dwc:dynamicProperties> </SimpleDarwinRecord> </SimpleDarwinRecordSet>
If you were using just CSV text files to exchange information, then you might be tempted to just add the new fields to the files. This approach suffers most of the same problems as payloading - no one aside from those with whom you communicated would know what those new fields were or how to use them. Sharing in this way via XML would be an even bigger problem, because the Simple Darwin Core XML Schema [SIMPLEXMLSCHEMA] defines the terms that it supports and the new fields would not correspond with any terms understood by the schema. In other words, the XML with your fields in it would not be a valid Simple Darwin Core XML document.
So, if you really need to extend the capabilities of Darwin Core, the best first step is to follow the standards process to add the terms you need. The mechanisms for pursuing this are explained in the Darwin Core Namespace Policy [NAMESPACEPOLICY]. The process will help to assure that the new terms are well conceived, that they don't conflict with existing terms, and that they are properly defined in the broader context of biological diversity information.
For cases where rich data require rich (non-simple) structure, the Simple Darwin Core alone is not suitable. When sharing information via fielded text [FIELDEDTEXT], the solution is to use the Simple Darwin Core as a core record with one or more associated extensions for the additional information. See the Darwin Core Text Guide [TEXTGUIDE] for an explanation and examples.
When sharing information via XML [XML], a richer structure such as the Access to Biological Collections Data schema [ABCD], or the Generic Darwin Core [GENERICXMLSCHEMA], or another schema built from the Darwin Core terms to suit the use of the data in a particular context. See the Darwin Core XML Guide [XMLGUIDE] for examples and references to model schemas.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International License.
Copyright 2011-2014 -
Biodiversity Information Standards - TDWG - Contact Us