diff --git a/docs/text/index.md b/docs/text/index.md index c8de673..a55e7a8 100644 --- a/docs/text/index.md +++ b/docs/text/index.md @@ -120,13 +120,13 @@ Element | Description Attribute | Description | Required | Default --- | --- | --- | --- -`rowType` | A Unified Resource Identifier (URI) for the term identifying the class of data represented by each row, for example, http://rs.tdwg.org/dwc/terms/Occurrence for Occurrence records or http://rs.tdwg.org/dwc/terms/Taxon for Taxon records. Additional classes may be referenced by URI and defined outside the Darwin Core specification. The row type is required. For convenience the URIs for classes defined by the Darwin Core are listed below:
Occurrence
http://rs.tdwg.org/dwc/terms/Occurrence
Event
http://rs.tdwg.org/dwc/terms/Event
Location
http://purl.org/dc/terms/Location
GeologicalContext
http://purl.org/dc/terms/GeologicalContext
Identification
http://rs.tdwg.org/dwc/terms/Identification
Taxon
http://rs.tdwg.org/dwc/terms/Taxon
ResourceRelationship
http://rs.tdwg.org/dwc/terms/ResourceRelationship
MeasurementOrFact
http://rs.tdwg.org/dwc/terms/MeasurementOrFact
| yes | +`rowType` | A Unified Resource Identifier (URI) for the term identifying the class of data represented by each row, for example, for Occurrence records or for Taxon records. Additional classes may be referenced by URI and defined outside the Darwin Core specification. The row type is required. For convenience the URIs for classes defined by the Darwin Core are: `Occurrence`: , `Event`: , `Location`: , `GeologicalContext`: , `Identification`: , `Taxon`: , `ResourceRelationship`: , `MeasurementOrFact`: | yes | `fieldsTerminatedBy` | Specifies the delimiter between fields. Typical values might be `,` or `\t` for CSV or Tab files respectively. | no | `,` `linesTerminatedBy` | Specifies the row separator character. | no | `\n` `fieldsEnclosedBy` | Specifies the character used to enclose (mark the start and end of) each field. CSV files frequently use the double quote character (`"`), but the default is no enclosing character. Note that a comma separated value file that has commas within the content of any field must have an enclosing character. | no | `"` -`encoding` | Specifies the [character encoding](http://en.wikipedia.org/wiki/Character_encoding) for the data file. The encoding is extremely important, but often ignored. The most frequently used encodings are:
UTF-8
8-bit Unicode Transformation Format.
UTF-16
16-bit Unicode Transformation Format.
ISO-8859-1
Commonly known as Latin-1 and a common default on systems configured for a single western European language.
Windows-1252
Commonly known as WinLatin and a common default of legacy versions of Microsoft Windows based operating systems.
| no | `UTF-8` +`encoding` | Specifies the [character encoding](http://en.wikipedia.org/wiki/Character_encoding) for the data file. The encoding is extremely important, but often ignored. The most frequently used encodings are: `UTF-8`: 8-bit Unicode Transformation Format, `UTF-16`: 16-bit Unicode Transformation Format, `ISO-8859-1`: commonly known as "Latin-1" and a common default on systems configured for a single western European language, `Windows-1252`: commonly known as "WinLatin" and a common default of legacy versions of Microsoft Windows based operating systems. | no | `UTF-8` `ignoreHeaderLines` | Specifies the number lines to ignore from the beginning of the file. This can be used to ignore files with column headings or preamble comments for example. | no | `0` -`dateFormat` | When verbatim dates are consistent in format, this field can be used to indicate the format represented. It is recommended to use the date, dateTime and time for field formats wherever possible, but where verbatim dates are required, a format may be specified here. This should be considered a 'hint' for consumers. It is recommended that consumers support the minimum combinations of `DD` `MM` and `YYYY` with the separators `/` and `-`. Examples:
DDMMYYYY
For dates of the form 21121978
DD-MM-YYYY
For dates of the form 21-12-1978
MMDDYYYY
For dates of the form 12211978
MM-DD-YYYY
For dates of the form 12-21-1978
YYYYMMDD
For dates of the form 19781221
| no | `YYYY-MM-DD` +`dateFormat` | When verbatim dates are consistent in format, this field can be used to indicate the format represented. It is recommended to use the date, dateTime and time for field formats wherever possible, but where verbatim dates are required, a format may be specified here. This should be considered a 'hint' for consumers. It is recommended that consumers support the minimum combinations of `DD` `MM` and `YYYY` with the separators `/` and `-`. Examples: `DDMMYYYY`: for dates of the form 21121978, `DD-MM-YYYY`: for dates of the form 21-12-1978, `MMDDYYYY`: for dates of the form 12211978, `MM-DD-YYYY`: for dates of the form 12-21-1978, `YYYYMMDD`: for dates of the form 19781221. | no | `YYYY-MM-DD` #### 2.2.2 Elements @@ -145,7 +145,7 @@ The files element must contain one or more elements, each defining wh Element | Description --- | --- -`` | Specifies the location of the file being described, which may take either of the following forms:
  • A web accessible URL such as `http://www.gbif.org/data/specimen.csv` or `ftp://ftp.gbif.org/tim/specimen.txt`.
  • A filepath relative to the location of the metafile such as `specimen.txt`, `./specimen.txt`, `data/specimen.txt`.
+`` | Specifies the location of the file being described, which may take either of the following forms: 1) a web accessible URL such as `http://www.gbif.org/data/specimen.csv` or `ftp://ftp.gbif.org/tim/specimen.txt`, 2) a filepath relative to the location of the metafile such as `specimen.txt`, `./specimen.txt`, `data/specimen.txt`. ### 2.4 The `` element @@ -158,13 +158,13 @@ Attribute | Description | Required | Default `index` | Specifies the position of the column in the row. The first column has an index of 0, the second column 1, etc. If no column index is specified, then the term and the default may be used to define a constant value for all rows. | no | `term` | A Unified Resource Identifier (URI) for the term represented by this field. For example, a field containing the scientific name would have `term="http://rs.tdwg.org/dwc/terms/scientificName"`. Terms outside of the Darwin Core specification may be used, such as those from the Dublin Core Metadata Initative, for example, `dcterms:modified` would be `term="http://purl.org/dc/terms/modified"`. | yes | `default` | Specifies value to use if one is not supplied for the field in a given row. If no index is supplied, the default can be used to define a constant for all rows for a field that is not in the data file. | no | -`vocabulary` | A Unified Resource Identifier (URI) for a vocabulary that the source values for this field are based on. The URI ideally should resolve to some machine readable definition like SKOS, RDF or at least some simple text or html file often found for ISO or RFC standards. For example http://rs.gbif.org/vocabulary/gbif/nomenclatural_code.xml, http://www.ietf.org/rfc/rfc3066.txt or http://www.iso.org/iso/list-en1-semic-3.txt. | no | +`vocabulary` | A Unified Resource Identifier (URI) for a vocabulary that the source values for this field are based on. The URI ideally should resolve to some machine readable definition like SKOS, RDF or at least some simple text or html file often found for ISO or RFC standards. For example , or . | no | ## 3 Implementation guide ### 3.1 Extension example (non-normative) -The following example illustrates the use of extensions. In this example there are three files in the archive, all of which are located in the same directory as the metafile. The whales.txt file acts as a core file of Taxon records. The whales.txt file is extended by two other files, types.txt and distribution.txt. The types.txt file contains records of a type specified in an external definition at http://http://rs.gbif.org/terms/1.0/Types and consists of Dublin Core and Darwin Core terms, while the distribution.txt file contains records of a type specified at http://http://rs.gbif.org/terms/1.0/Distribution and consists of Darwin Core terms plus an additional term for threatStatus. Both extension files are related to the core file by the taxonNameID fields. Presumably, this archive contains information about whale species, type specimen records for those species, and lists of countries and the threat status for those species. +The following example illustrates the use of extensions. In this example there are three files in the archive, all of which are located in the same directory as the metafile. The whales.txt file acts as a core file of Taxon records. The whales.txt file is extended by two other files, types.txt and distribution.txt. The types.txt file contains records of a type specified in an external definition at and consists of Dublin Core and Darwin Core terms, while the distribution.txt file contains records of a type specified at and consists of Darwin Core terms plus an additional term for threatStatus. Both extension files are related to the core file by the taxonNameID fields. Presumably, this archive contains information about whale species, type specimen records for those species, and lists of countries and the threat status for those species. ![Extension](extension.png)