head 1.40;
access;
symbols;
locks; strict;
comment @# @;
1.40
date 2007.06.09.18.07.55; author WalterBerendsohn; state Exp;
branches;
next 1.39;
1.39
date 2007.04.13.10.35.46; author RicardoPereira; state Exp;
branches;
next 1.38;
1.38
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.37;
1.37
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.36;
1.36
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.35;
1.35
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.34;
1.34
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.33;
1.33
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.32;
1.32
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.31;
1.31
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.30;
1.30
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.29;
1.29
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.28;
1.28
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.27;
1.27
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.26;
1.26
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.25;
1.25
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.24;
1.24
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.23;
1.23
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.22;
1.22
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.21;
1.21
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.20;
1.20
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.19;
1.19
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.18;
1.18
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.17;
1.17
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.16;
1.16
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.15;
1.15
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.14;
1.14
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.13;
1.13
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.12;
1.12
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.11;
1.11
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.10;
1.10
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.9;
1.9
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.8;
1.8
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.7;
1.7
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.6;
1.6
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.5;
1.5
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.4;
1.4
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.3;
1.3
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.2;
1.2
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next 1.1;
1.1
date 2007.01.09.00.00.00; author MoinMoin; state Exp;
branches;
next ;
desc
@Initial revision
@
1.40
log
@none
@
text
@%META:TOPICINFO{author="WalterBerendsohn" date="1181412475" format="1.1" reprev="1.40" version="1.40"}%
---+ An Introduction to the ABCD Schema v2.0
This document provides an introduction to the proposed TDWG standard XML schema for Access to Biological Collection Data. The proposed standard itself is the XML schema, in its version 2.06, available on the World Wide Web under the URL
. http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/ABCD_2.06.XSD.
This document is not part of that standard.
---++ Background
Biodiversity collections exist in different scientific sub-disciplines:
* Preserved collections, such as those in museums and herbaria
* Living collections, like botanical and zoological gardens, aquaria, seed banks, microbial strain and tissue collections
* Data collections, from surveys of objects in the field, such as observations, floristic and faunistic mapping and inventories
* DNA samples produced by molecular biologists, natural substance samples produced by pharmacists, etc.
Research conducted since the beginning of the 1990ies has revealed that all these collections have most of their attributes in common, although the terminology used to describe them may differ substantially.
These collections represent an immense knowledge base on global biodiversity. The objects contained in collections can be a physical resource of great value for research and industry. Preserved objects represent the historical perspective, providing a falsifiable source of information. For example, specimens of fish from a particular area can be examined to determine mercury levels at the time of collection. Associated field and research notes contain detailed data on the locality, time, and often the appearance of organisms.
It is estimated that between 2 and 3 billion objects exist in natural history collections alone. Currently, this knowledge base is largely under-utilized, because it is highly distributed, heterogeneous, and the complex scientific nature obstructs efficient information retrieval.
Databasing and networking is now seen as the key to unlocking the value of biological collections for science, government, education, the public, and businesses, operating in the environmental sector, including land management; in biotechnology or in biodiversity research. International collaboration on the standardization of information models and standard data used in collection databases can enhance the efficiency of this process.
---+++ Purpose
ABCD - Access to Biological Collections Data - Schema is a common data specification for biological collection units, including living and preserved specimens, along with field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary collection and observation data.
All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection. ABCD provides a reconciled set of element names and their definition for scientists and curators to use. It is not expected (or even possible) for any collection to use more than a fraction of the elements defined in the standard.
A design goal of the data specification was to be both comprehensive and general, to include a broad array of concepts that might be available in a collection database, but to mandate only the bare minimum of elements required to make the specification functional. ABCD deliberately does not cover taxonomic data, such as synonymy, other than the use of names in identifications. Likewise, taxon-related information, such as distribution range, indicator values, etc., is also not included. The elements and concepts that are used provide as much compatibility as is possible with other standards in the field of biological collection data, such as HISPID, Darwin Core, and others.
The data specification is cast as an XML schema.
---+++ Design principles
ABCD was designed with the following principles in mind:
1. Full coverage approach: ABCD is comprehensive and therefore complex. It explicitly aims to define the semantics of all elements, in order to:
* Provide a unified approach for the natural history collection community
* Accept detailed information, where available
* Develop a proto-ontology as a first step towards a collection ontology
2. Polymorphism: Variable atomisation allows provision of data in different degrees of detail and standardisation, in order to:
* Accept data from a wide variety of sources
* Enable data integration
3. (Almost) no internal referencing: A single-root document without relational structures that use IDs - to make processing easier and faster.
4. Extensible Slots: Extensions are not meant for individualised adaptations of the schema, but instead to allow:
* Fast community support in case of missing elements, before integration into a subsequent version
* Inclusion of third-party-schemas (or parts thereof), in order to prevent duplication of developments in other communities (e.g. geographical data)
5. Flexible containers: Element-element or element-attribute couples for category-value pairs allow freely defined and repeatable data fields (e.g., higher taxa, measurements, morphological features). In addition, there is often provision for free-text data where it is impractical to provide atomised data.
6. No recursive structures
7. Machine-readable annotations: Structured element annotations will permit their evaluation by program tools (e.g. a semantic search by the Configuration Assistant)
8. Language support: Language can be be made explicit for most text elements
9. Typing: The use of complex types and the deposition in a common type library allows type-sharing with other communities (e.g. Structure of Descriptive Data [SDD])
---+++ History
The ABCD content definition effort is based on data modelling and database development projects which were carried on throughout the 1990ies (see http://www.bgbm.org/TDWG/acc/Referenc.htm for some results). TDWG with its Subgroup on Biological Collection Data provided a forum for developers and efforts to standardise collection data. Some exchange standards developed by the community were spearheading developments, such as the ITF format for botanical garden records and the Australian HISPID standard for herbarium records. However, data standardisation became a pressing issue only when the Internet matured and global data access to distributed heterogeneous resources became possible.
Development of the ABCD content definition started after the 2000 meeting of the Taxonomic Databases Working Group (TDWG) in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogeneous biological collection databases then available. As a consequence, the TDWG/CODATA subgroup on Access to Biological Collection Data (ABCD) was established, with one sub-section working on search and retrieval protocols, and a second working on a specification for biological collection data (the content development group).
Protocol development resulted in a limited and non-hierarchical set of data elements, named the Darwin Core (!DwC, see http://www.tdwg.org/activities/darwincore/), as a workable specification to be used with the !DiGIR protocol near-term, whilst the ABCD content development resulted in a comprehensive and highly structured standard for data about objects in biological collections, which was in turn picked up by the developers of the !BioCASe protocol.
An early achievement of the working group had been to bring together existing networks on specimen information to discuss common access, namely !ENHSIN, !ITIS, !ITIS-CA, !REMIB, !Species Analyst, !speciesLink, and the Virtual Australian Herbarium. In addition, the TDWG subgroup was recognised as a CODATA Working Group for 2001/2002. The discussion that had been started during the TDWG meeting in Frankfurt (2000) was picked up in a sequence of workshops:
* First workshop, Santa Barbara (June 2001)
* Second workshop, Sydney (November 2001)
* Informal meeting, Sydney (March 2002)
* Editorial meeting, Singapore (December 2002)
* Third workshop, Indaiatuba, Brazil (October 2002)
* Fourth workshop, Oeiras, Portugal (October 2003)
* Fifth workshop, Tervuren, Belgium (July 2006)
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the TDWG endorsed standards, !HISPID and !ITF.
In preparation for the second workshop in Sydney, this DTD was transformed into an XML schema and extended by elements from the !BioCISE information model and the British !NBN/Recorder model. For part of this meeting, the option of using the Gathering Event rather than the Collection Unit as the root concept of the hierarchical data structure was discussed, since observations data is usually organized by place and time first and then by taxon.
Nevertheless, the decision was made to stay with the structure that uses Unit as the root concept for two reasons:
* The goal at that time was to achieve clarity, universality, completeness, and simplicity in the semantics of the standard
* Collection databases implemented as flat data structures (of which there are many) will not easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore will be restrained from participating in a federation based on this alternative.
By the time of the informal meeting in Sydney, March 2002, the European !BioCASE project had started. Its schema definition group (The Natural History Museum in London and the Botanic Garden and Botanical Museum Berlin-Dahlem) was to provide a collection-level schema (!BioCASE only) and a unit-level schema (!CODATA/TDWG and !BioCASE), so that this group was able to dedicate personnel resources to the schema definition process.
The priority was to develop a consensus about which elements should be included. The annotation tag was structured to hold metadata about each element and a schema-viewer, developed in Berlin, was established to allow XML non-specialists to browse the schema and view the annotations in a structured way. The documentation gathered here was later transferred to the ABCD Wiki site and used to annotate the individual "!ABCDConcepts".
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by the Global Biodiversity Information Facility (GBIF) held in December 2002 in Singapore, led to the version that has been used in the first round of reference implementations (v. 1.2).
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the !BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and !DarwinCore. The primary difference between the two standards is that DiGIR handles only flat schemas, such as !DarwinCore, whereas the !BioCASe protocol can handle structured schemas, such as ABCD. !DarwinCore is ideal for resource discovery purposes in particular, while ABCD records hold the additional data that may be required by researchers once a selection has been made.
Recognising the importance of the developments, CODATA decided to raise the status of the group to the level of a CODATA Task Group for 2003/2004 (renewed for 2005/2006).
At the fourth Workshop in October 2003, a major point of discussion was the need for more guidance at the user-interface as well as at provider level. The very broad coverage of ABCD leaves it to the user to determine how to map their data. The structure looks too complicated for the average user and should thus be hidden from them. It will be the task of programmers to reassemble the different uses of the structure into a presentation layer that supports user requirements.
A reference portal implementation was constructed by the Paris BioCASE team, work that has been picked up by the SYNTHESYS project. The Berlin team has implemented preliminary interfaces as an intermediate measure. GBIF supported a project to produce a configuration assistant in a generic interface to map between database schemas and federation schemas such as ABCD, so that providers get recommendations on how to map elements, on preferred points for searches, what to do if an element is empty and so forth.
In September 2005, ABCD version 2.06 was ratified as a TDWG standard under the than valid TDWG procedures.
The 6th workshop in Tervuren, 2006, provided a charter for the subgroup and discussed the future of the standard (see http://www.tdwg.org/uploads/media/ABCDJuly2006Report_01.pdf ).
During the TDWG meeting in St. Louis in September 2006, a joint meeting with the !DarwinCore and Observations groups led to the formation of a new TDWG Interest Group for Observation and Specimen Records. DwC, Observations and ABCD now form Task Groups within the OSR Interest Group.
---+++ Future versions
Some further changes have become necessary and a new minor upgrade will be released in 2007 (ABCD version 2.06c, see http://www.tdwg.org/proceedings/article/view/62 for an explanation of the upgrade policy). This version will be put forth under the new procedural rules of TDWG. Because its inherent mechanisms for extensions allows for preliminary accommodation of new elements, we think that v. 2.06 can be kept stable and in use for a reasonable time period. We expect that the next major version will form part of a system of biodiversity data standards, with common modules across several TDWG standards, e.g. for metadata, images, agents, and bibliography.
-----
---++ Top Level Structure
The ABCD schema is highly structured in order to manage the large quantity of data that a record may contain.
The top level of the schema is arranged as follows:
. DATASETS
. DATASET
. - GUID AbcdIntroduction Contacts (technical and content) AbcdIntroduction Other providers - Metadata - Units (Observations and Specimens)
From this it can be seen that an XML document based on ABCD may contain records from several datasets, each of which is treated separately. Each dataset has a Globally Unique Identifier (GUID) along with information about who may be contacted for further details, for the content of the dataset and for technical information.
There are then two major groups, one holding metadata about the entire dataset and the other holding the actual data records.
The Metadata section holds information about an entire dataset and has the following structure:
. METADATA
. - Description - Icon URI - Scope (Geo-ecological and Taxonomic) - Version - Revision data (Creator, Contributors, Creation and Modification dates) - Owners - Intellectual Property Rights (IPR) statements
The second major section, called UNITS, holds all the records selected and exported from the original dataset, each one of which is a UNIT. This is by far the largest component of ABCD and has the following high-level structure:
. UNITS
. UNIT
. Here we can distinguish several areas. Most of these do not show up in the actual XML hierarchy, because ABCD 2.06 avoids using container elements that serve only to group items together: - Unit-level metadata - Record basis and Kind Of Unit - Identifications - Collection domain-specific data - Unit relationships (Associations and Assemblages) AbcdIntroduction Named collections and surveys - Gathering event and site characteristics - Measurements and Facts - Unit extension area
---++ The ABCD v2.06 Element Groups
For the purpose of this documentation, the data items (XML elements and attributes) are classified according to their content and arranged in groups. On the ABCD documentation Wiki (http://ww3.bgbm.org/abcddocs/), a list of the nearly 1200 concepts in ABCD is provided, with individual documentation pages for each of them that include the classification according to the groups and subgroups outlined below.
---+++ Metadata
The Metadata items record information such as who created the dataset, record, or linked object (e.g. an image), from what source and on which date, along with the Intellectual Property Rights (IPR) and other statements that govern the usage of the data. It also includes technical data items, such as "IDs" needed to access the data.
Items belonging to this group may change in future versions in order to harmonise with the suite of standards that are being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.
The are five subgroups for metadata:
. __ Identifiers __
These are the names or codes that identify data objects or physical objects. Codes include record identifiers or keys, institution codes, accession numbers and collector's field numbers. Names include named collections, but for personal or corporate names see under Agents.
Identifiers include:
Globally unique identifiers (GUID) for datasets and for individual unit records, currently optional. Discussions underway at TDWG indicate that this may be an LSID (Life Science Identifier), a development from the bioinformatics domain, which would have the benefit of linking collection, observation and sequence data.
Apart from the GUID, each unit record contains four identifier elements, three of which are mandatory. These are:
* an identification code for the source institution
* an identification code for the data source that is unique within the source institution
* an identification code for the unit record within the data source
In the interim, a GUID can be synthesised from the hierarchy of these three mandatory elements. Optionally, if the unit ID is alphanumeric, the numeric part can be separately placed in the unit ID numeric element for sorting purposes. The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossilSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference. The Kind of Unit element descibes the part(s) of organism or class of materials represented by the unit, such as whole organism, DNA, fruit and so forth. For consistency, terms should be chosen from the short list of preferred terms.
Further identifiers in the schema include Other providers (referencing an ID in the UDDI registry), Multimedia object ID (e.g. for images), Collector's field number, Observation unit identifier, Unit assemblage ID, Named collection or survey including the unit, and Specimen loan identifier.
. __ Record Basis and Kind of Unit __
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossilSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference.
The Kind of Unit element describes the part(s) of the organism or class of materials represented by the unit, such as whole organism, DNA, fruit, etc. For consistency, terms should be chosen from the short list of preferred terms.
. __ IPR, Versioning, Edit History and other Statements __
Intellectual Property Rights (IPR) play an increasingly important part in the use of shared data and it is important that full use is made of the capabilities of ABCD to record statements on copyright, licensing, terms of use, disclaimers, acknowledgements and citations. It is equally important that proper accreditation is given to data sources.
Versioning refers to the numbering and date of the dataset version, which may be used in citations and to determine the currency of the data. The creator of a dataset or record and the date of creation is recorded and never changes. Provision is also made to identify the date of the most recent edit and the name of the editor, again as an indicator of data currency.
. __ Language and Character Sets __
Many elements have an attribute that can indicate the language of the text that they hold. This is valuable for sorting and searching purposes. Data should be provided in either UTF-8 or UTF-16 encodings of Unicode, which are both valid for use with XML.
. __ UDDI Registry Items __
UDDI (Universal Description, Discovery, and Integration) is a platform-independent, XML-based directory that enable businesses worldwide to list themselves and their services on the Internet. The Global Biodiversity Information Facility currently uses a UDDI registry. The relevant ABCD elements are Technical Contacts and Content Contacts.
---+++ Identification Event
A unit may be identified by one or more identification events. The Identification Event has two main parts, being the identifications themselves and a free text identification history. For every individual identification event the data include the date; the method; references and verification details. The identifier may be a person or an organisation. A flag can be used to indicate a preferred identification where several events in the history of the specimen took place. Likewise, a negative identification can be flagged as such, in addition to one of the identifications that is used to indicate where a specimen is stored (a useful feature e.g. for mixed samples or for type specimens). If there are several identifications of a mixed sample, the individual role may be indicated (e.g. as parasite or host). The outcome of the identification event is an identification result.
---+++ Identification Result
. __ Non-taxonomic Result __
ABCD has a provision to use the schema for the identification of non-biological materials together with the taxonomic identification of an organism. This is used, inter alia, to describe a specimen consisting of a substratum and the organism, e.g. a certain rock type and a lichen crust on that rock. Currently there is only a single text element (Material identified) to accommodate this type of data. However, especially for collections from geo-sciences, ABCD will be expanded to include full cover of the "taxonomy" used in these fields. For the time being, a temporary Extension can be made using the respective element typed as xs:any.
. __ Taxonomic Result __
This is the structure provided for the taxon identified as the result of an identification event.
ABCD considers the classification of the taxon identified in higher taxa not to be within the domain of the collection schema. Nevertheless, for convenience a higher taxon element is included. In contrast to Darwin Core, ABCD handles higher taxa through a repeatable element pair, one for name and one for rank.
The "Full Scientific Name String" element is one of the few mandatory elements in ABCD (if a taxonomic identification is provided at all). This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and, where relevant, the year. It could also hold the name of a cultivar or cultivar group, or of a hybrid formula.
If the name does not conform to a Code of Nomenclature, e.g. a common name, provision is made for it to be recorded as an informal name.
In addition to the Full Scientific Name element described above, a structure is provided for recording names in a fully atomised form, adapted to each of the four main naming conventions (Codes) - for botanical, zoological, bacterial and viral names. The structure is completed with elements for an identification qualifier, where doubts may exist about the accuracy of the identification, and an element for a name addendum such as "sensu lato".
Both, the Full scientific name and the atomised structure are also used for the typified name in the section of the schema treating nomenclatural type designations (see under Specimen Collections below).
---+++ Collection Domain-specific Items
Most of the data handled by ABCD are common to all the subject domains, both in collections and observations. However, there are some data that are very specific to certain domains, such as the morphotype of a lichen. This section of ABCD provides a place for such data so that specialists may easily identify which subsections are relevant to their data and which are not.
This section is also used to accommodate domain specific standard data, some of which may be characterised as legacy data but which is still provided or used in specialised networks.
ABCD can be extended into new domains by the creation of an additional domain-specific section. Temporary additions should make use of the Unit Extension feature.
. __ Observation Records __
Most data about observation records are under the Gathering group of elements, since observers work with place as priority rather than taxon. However, there is a place here for numbers or other registration marks which may be associated with an observation record as the equivalent of the Accession Number used with specimen collections.
Observation records indicating the absence of an organism from a site are accommodated by means of a record with a negative identification.
. __ Specimen Collections __
Data on ownership of the physical specimen (as opposed to the data record), including ownership history, acquisition and accession can be placed here, along with information on the type status of the specimen, preparation technique and details of any markings and labels.
The type section of the schema provides information on the status and kind of nomenclatural type, but also allows full documentation of the verification process of the type status.
In addition, each collection has a set of elements for holding data that are specific to the content of the collection. As mentioned under Collection Domain-specific Items above, ABCD version 2.06 provides containers for the following specialisms: Microbial Genetic Resources (a.k.a. Culture Collections), Mycological (including Lichenological) Collections, Herbaria, Botanic Gardens, Plant Genetic Resources, Zoological Collections and Palaeontological Collections.
---+++ Measurements and Facts
The main difference between measurements and facts is that measurements are numeric whilst facts are textual. These are treated generically, rather than providing an individual place for everything that could be measured, with one or two exceptions that are noted later. The atomised version of measurements/facts captures essentials such as what is being measured, by what method, using which units and so on, as well as the actual value or value range. A free-text alternative is available if the data are not easily available in atomised form.
Measurements and Facts appear at several places in ABCD:
. __Gathering-related Measurements and Facts__
These are the measurements or facts taken at the collection locality at the time of collection, such as water or air temperature, slope, weather conditions etc. Separate elements are available for Altitude, Depth and Height, due to the complex relationships between these. Biotype measurements or facts allow linking of all biotope-related measurement to the site description.
. __ Unit-related Measurements and Facts __
These relate directly to the Unit which is the subject of the record.
. __ Molecular Sequence Data __
A container is provided for sequence data, thus offering the ability to link sequence data back to the specimen from which the sequenced molecule was derived. Links to public repositories such as GeneBank as well as to unpublished material are accommodated.
. __ Stage, Age and Sex __
The final subgroup covers stages (such as egg, larval or adult), age and sex.
---+++ Multimedia and References
Pointers to additional material that relates to the unit may be placed in the Multimedia and References group. Elements are available for the URI of either a "raw" file or a rendered product, such as an HTML or JavaScript resource. The relationship between the resource and the unit in this record can be recorded in a context element. Further elements are available for recording technical data, especially for digital images. The subgroups are:
. __ Multimedia __
Photographs, diagrams, sound files and other types of electronic resources.
. __ Bibliographic References __ Literature can be referenced in several places in the schema. As mentioned before, the data for the entire unit record may have been extracted from the literature, but it is also possible to record instances in which the specimen was cited in the literature. The key and/or description used in an identification event may be referenced there, as could be an identification taken from literature. All measurements and facts can be related to a publication, including molecular sequences. Finally, the nomenclatural reference to the original description(s) of a taxon are recorded within the type designation section.
ABCD uses a very simple structure for bibliographic references, which may be changed to a more elaborate design at a later stage.
. __ Record URI __
This relates to the Web address of the page where the original of this particular record in its database can be found, rather than the address of the whole dataset, which is available in the Metadata group of elements under metadata description representation.
---+++ Agents
The Agents group of elements contains information about the persons and organisations that are associated with collections and observations and their roles. This is an example of a re-usable set of elements that occur in several places within ABCD. Contact details, such as address, telephone number and email, may be recorded here with the permission of the subject.
---+++ Gathering
The Gathering group of elements provides places for a comprehensive set of data about the event and place of collection or observation, including agents, date and georeference coordinate systems. Provision is made for the use of GML (Geographic Markup Language), WMS (Web Map Server) and WFS (Web Feature Server) data, based on the standards promoted by the Open GIS Consortium.
Additional elements can hold details of permits, methods, projects, site images, site-specific measurements, biotope, synecology and stratigraphy, along with others.
---+++ Unit Relations: Associations and Assemblages
The relationships between units can be recorded in this group of elements.
. __ Associations __
Associations are the relationships of this unit with other units in ABCD conformant datasets, using the institution ID, database ID and unit ID triplet for the record within the database. The type of association can be recorded, such as host and parasite, predator and prey etc.
This may also be useful in linking the records for several preparations from the same specimen, such as when a zoological specimen yields skeleton and tissue preparations.
. __ Assemblages __
A unit assemblage describes symmetric relationships between several units, such as herds and flocks or several fossils embedded in a rock. A common identifier links the members of the assemblage.
---+++ Unit Extensions
The Unit Extension is a temporary home to accommodate urgent inter-version additions to the ABCD schema. For example, if a specific community (e.g. culture collections) discover that there are elements missing in the current version of ABCD, they may communicate that to the group responsible for schema development. If it is necessary to move rapidly, for example due to project pressures, these elements may be added to the current version as an extension schema until the best placement for them has been decided.
---+++ Other
The final group is Other, which contains data that does not fit anywhere else, such as Notes. Notes may contain any text that is relevant to this unit that cannot be placed elsewhere within the record.
@
1.39
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="RicardoPereira" date="1176460546" format="1.1" version="1.39"}%
d8 2
d11 14
d69 1
a69 1
Biodiversity collections exist in different scientific sub-disciplines:
d71 1
a71 13
* Preserved collections, such as those in museums and herbaria
* Living collections, like botanical and zoological gardens, aquaria, seed banks, microbial strain and tissue collections
* Data collections, from surveys of objects in the field, such as observations, floristic and faunistic mapping and inventories
* Sequences produced by molecular biologists
Research conducted over the past decade has revealed that all these collections have most of their attributes in common, although the terminology used to describe them may differ substantially.
These collections represent an immense knowledge base on global biodiversity. The objects contained in collections can be a physical resource of great value for research and industry. Preserved objects represent the historical perspective, providing a falsifiable source of information. For example, specimens of fish from a particular area can be examined to determine mercury levels at the time of collection. Associated field and research notes contain detailed data on the locality, time, and often the appearance of organisms.
It is estimated that between 2 and 3 billion objects exist in natural history collections alone. Currently, this knowledge base is largely under-utilized, because it is highly distributed, heterogeneous, and the complex scientific nature obstructs efficient information retrieval.
Databasing and networking is now seen as the key to unlocking the value of biological collections for science, government, education, the public, and businesses, operating in the environmental sector, including land management; in biotechnology or in biodiversity research. International collaboration on the standardization of information models and standard data used in collection databases can enhance the efficiency of this process.
Development of the ABCD content definition started after the 2000 meeting of the*Taxonomic Databases Working Group (TDWG)* in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogeneous biological collection databases then available. As a consequence, the TDWG/CODATA subgroup on*Access to Biological Collection Data (ABCD)* was established, with one sub-section working on the search and retrieval protocol, *DiGIR*, and a second working on a comprehensive specification for biological collection data (the ABCD data standard).
d73 1
a73 1
Protocol development resulted in a limited and non-hierarchical set of data elements, named the*Darwin Core (DwC)*, as a workable specification to be used near-term, whilst the ABCD specification resulted in a comprehensive and highly structured standard for data about objects in biological collections.
d75 1
a75 1
An early achievement of the working group had been to bring together existing networks on specimen information to discuss common access, namely ENHSIN, ITIS, ITIS-CA, REMIB, Species Analyst, speciesLink, and the Virtual Australian Herbarium. In addition, the TDWG subgroup was recognised as a CODATA Working Group for 2001/2002. The discussion that had been started during the TDWG meeting in Frankfurt (2000) was picked up in a sequence of workshops:
d80 1
d83 4
a86 1
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the TDWG endorsed standards, *HISPID* and *ITF*.
d88 1
a88 1
In preparation for the second workshop in Sydney, this DTD was transformed into an XML schema and extended by elements from the*BioCISE information model* and the British*NBN/Recorder model.* For part of this meeting, the option of using the Gathering Event rather than the Collection Unit as the root concept of the hierarchical data structure was discussed, since observations data is usually organized by place and time first and then by taxon.
d92 9
a100 3
1 The goal at that time was to achieve clarity, universality, completeness, and simplicity in the semantics of the standard
1 Collection databases implemented as flat data structures (of which there are many) will not easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore will be restrained from participating in a federation based on this alternative.
By the time of the informal meeting in Sydney, March 2002, the European*BioCASE* project had started. Its schema definition group (The Natural History Museum in London and the Botanic Garden and Botanical Museum Berlin-Dahlem) was to provide a collection-level schema (BioCASE only) and a Unit level schema (CODATA/TDWG and BioCASE), so that this group was able to dedicate personnel resources to the schema definition process.
d102 1
a102 1
The priority was to develop a consensus about which elements should be included. The annotation tag was structured to hold metadata about each element and a schema-viewer, developed in Berlin, was established to allow XML non-specialists to browse the schema and view the annotations in a structured way. The documentation gathered here was later transferred to the ABCD Wiki site and used to annotate the individual �ABCDConcepts�.
d104 1
a104 1
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by the*Global Biodiversity Information Facility (GBIF)* held in December 2001 in Singapore, led to the version that has been used in the first round of reference implementations (v. 1.2).
d106 1
a106 1
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD. Darwin Core is ideal for resource discovery purposes in particular, while ABCD records hold the additional data that may be required by researchers once a selection has been made.
d108 1
a108 1
Recognising the importance of the developments, CODATA decided to raise the status of the group to the level of a ' 'CODATA Task Group* for 2003/2004 (renewed for 2005/2006). *
d110 1
a110 1
*At the fourth Workshop in October 2003, a major point of discussion was the need for more guidance at the user-interface as well as at provider level. The very broad coverage of ABCD leaves it to the user to determine how to map their data. The structure looks too complicated for the average user and should thus be hidden from them. It will be the task of programmers to reassemble the different uses of the structure into a presentation layer that supports user requirements. *
d112 1
a112 1
*A reference portal implementation was constructed by the Paris BioCASE team, work that has been picked up by the SYNTHESYS project. The Berlin team has implemented preliminary interfaces as an intermediate measure. GBIF supported a project to produce a configuration assistant in a generic interface to map between database schemas and federation schemas such as ABCD, so that providers get recommendations on how to map elements, on preferred points for searches, what to do if an element is empty and so forth. *
d115 1
a115 1
ABCD version 2.06 is a proposed TDWG standard, which has been recommended for ratification by the annual TDWG meeting in September 2005. If ratified by the TDWG membership, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG as new versions. However, because its inherent mechanisms for extensions allows for preliminary accommodation of new elements, we think that v. 2.06 can be kept stable and in use for a reasonable time period. We expect that the next major version will form part of a system of biodiversity data standards, with common modules across several TDWG standards, e.g. for metadata, images, agents, and bibliography.
@
1.38
log
@Revision 38
@
text
@d1 261
a261 260
---+ An Introduction to the ABCD Schema v2.0
This document provides an introduction to the proposed TDWG standard XML schema for Access to Biological Collection Data. The proposed standard itself is the XML schema, in its version 2.06, available on the World Wide Web under the URL
. http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/ABCD_2.06.XSD.
This document is not part of that standard.
---++ Background
---+++ Purpose
ABCD - Access to Biological Collections Data - Schema is a common data specification for biological collection units, including living and preserved specimens, along with field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary collection and observation data.
All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection. ABCD provides a reconciled set of element names and their definition for scientists and curators to use. It is not expected (or even possible) for any collection to use more than a fraction of the elements defined in the standard.
A design goal of the data specification was to be both comprehensive and general, to include a broad array of concepts that might be available in a collection database, but to mandate only the bare minimum of elements required to make the specification functional. ABCD deliberately does not cover taxonomic data, such as synonymy, other than the use of names in identifications. Likewise, taxon-related information, such as distribution range, indicator values, etc., is also not included. The elements and concepts that are used provide as much compatibility as is possible with other standards in the field of biological collection data, such as HISPID, Darwin Core, and others.
The data specification is cast as an XML schema.
---+++ Design principles
ABCD was designed with the following principles in mind:
ABCD was designed with the following principles in mind:
1 Full coverage approach: ABCD is comprehensive and therefore complex. It explicitly aims to define the semantics of all elements, in order to:
* Provide a unified approach for the natural history collection community
* Accept detailed information, where available
* Develop a proto-ontology as a first step towards a collection ontology
2. Polymorphism: Variable atomisation allows provision of data in different degrees of detail and standardisation, in order to:
* Accept data from a wide variety of sources
* Enable data integration
3. (Almost) no internal referencing: A single-root document without relational structures that use IDs - to make processing easier and faster.
4. Extensible Slots: Extensions are not meant for individualised adaptations of the schema, but instead to allow:
* Fast community support in case of missing elements, before integration into a subsequent version
* Inclusion of third-party-schemas (or parts thereof), in order to prevent duplication of developments in other communities (e.g. geographical data)
5. Flexible containers: Element-element or element-attribute couples for category-value pairs allow freely defined and repeatable data fields (e.g., higher taxa, measurements, morphological features). In addition, there is often provision for free-text data where it is impractical to provide atomised data.
6. No recursive structures
7. Machine-readable annotations: Structured element annotations will permit their evaluation by program tools (e.g. a semantic search by the Configuration Assistant)
8. Language support: Language can be be made explicit for most text elements
9. Typing: The use of complex types and the deposition in a common type library allows type-sharing with other communities (e.g. Structure of Descriptive Data [SDD])
---+++ History
Biodiversity collections exist in different scientific sub-disciplines:
* Preserved collections, such as those in museums and herbaria
* Living collections, like botanical and zoological gardens, aquaria, seed banks, microbial strain and tissue collections
* Data collections, from surveys of objects in the field, such as observations, floristic and faunistic mapping and inventories
* Sequences produced by molecular biologists
Research conducted over the past decade has revealed that all these collections have most of their attributes in common, although the terminology used to describe them may differ substantially.
These collections represent an immense knowledge base on global biodiversity. The objects contained in collections can be a physical resource of great value for research and industry. Preserved objects represent the historical perspective, providing a falsifiable source of information. For example, specimens of fish from a particular area can be examined to determine mercury levels at the time of collection. Associated field and research notes contain detailed data on the locality, time, and often the appearance of organisms.
It is estimated that between 2 and 3 billion objects exist in natural history collections alone. Currently, this knowledge base is largely under-utilized, because it is highly distributed, heterogeneous, and the complex scientific nature obstructs efficient information retrieval.
Databasing and networking is now seen as the key to unlocking the value of biological collections for science, government, education, the public, and businesses, operating in the environmental sector, including land management; in biotechnology or in biodiversity research. International collaboration on the standardization of information models and standard data used in collection databases can enhance the efficiency of this process.
Development of the ABCD content definition started after the 2000 meeting of the*Taxonomic Databases Working Group (TDWG)* in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogeneous biological collection databases then available. As a consequence, the TDWG/CODATA subgroup on*Access to Biological Collection Data (ABCD)* was established, with one sub-section working on the search and retrieval protocol, *DiGIR*, and a second working on a comprehensive specification for biological collection data (the ABCD data standard).
Protocol development resulted in a limited and non-hierarchical set of data elements, named the*Darwin Core (DwC)*, as a workable specification to be used near-term, whilst the ABCD specification resulted in a comprehensive and highly structured standard for data about objects in biological collections.
An early achievement of the working group had been to bring together existing networks on specimen information to discuss common access, namely ENHSIN, ITIS, ITIS-CA, REMIB, Species Analyst, speciesLink, and the Virtual Australian Herbarium. In addition, the TDWG subgroup was recognised as a CODATA Working Group for 2001/2002. The discussion that had been started during the TDWG meeting in Frankfurt (2000) was picked up in a sequence of workshops:
* First workshop, Santa Barbara (June 2001)
* Second workshop, Sydney (November 2001)
* Informal meeting, Sydney (March 2002)
* Third workshop, Indaiatuba, Brazil (October 2002)
* Fourth workshop, Oeiras, Portugal (October 2003)
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the TDWG endorsed standards, *HISPID* and *ITF*.
In preparation for the second workshop in Sydney, this DTD was transformed into an XML schema and extended by elements from the*BioCISE information model* and the British*NBN/Recorder model.* For part of this meeting, the option of using the Gathering Event rather than the Collection Unit as the root concept of the hierarchical data structure was discussed, since observations data is usually organized by place and time first and then by taxon.
Nevertheless, the decision was made to stay with the structure that uses Unit as the root concept for two reasons:
1 The goal at that time was to achieve clarity, universality, completeness, and simplicity in the semantics of the standard
1 Collection databases implemented as flat data structures (of which there are many) will not easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore will be restrained from participating in a federation based on this alternative.
By the time of the informal meeting in Sydney, March 2002, the European*BioCASE* project had started. Its schema definition group (The Natural History Museum in London and the Botanic Garden and Botanical Museum Berlin-Dahlem) was to provide a collection-level schema (BioCASE only) and a Unit level schema (CODATA/TDWG and BioCASE), so that this group was able to dedicate personnel resources to the schema definition process.
The priority was to develop a consensus about which elements should be included. The annotation tag was structured to hold metadata about each element and a schema-viewer, developed in Berlin, was established to allow XML non-specialists to browse the schema and view the annotations in a structured way. The documentation gathered here was later transferred to the ABCD Wiki site and used to annotate the individual “ABCDConcepts”.
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by the*Global Biodiversity Information Facility (GBIF)* held in December 2001 in Singapore, led to the version that has been used in the first round of reference implementations (v. 1.2).
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD. Darwin Core is ideal for resource discovery purposes in particular, while ABCD records hold the additional data that may be required by researchers once a selection has been made.
Recognising the importance of the developments, CODATA decided to raise the status of the group to the level of a ' 'CODATA Task Group* for 2003/2004 (renewed for 2005/2006). *
*At the fourth Workshop in October 2003, a major point of discussion was the need for more guidance at the user-interface as well as at provider level. The very broad coverage of ABCD leaves it to the user to determine how to map their data. The structure looks too complicated for the average user and should thus be hidden from them. It will be the task of programmers to reassemble the different uses of the structure into a presentation layer that supports user requirements. *
*A reference portal implementation was constructed by the Paris BioCASE team, work that has been picked up by the SYNTHESYS project. The Berlin team has implemented preliminary interfaces as an intermediate measure. GBIF supported a project to produce a configuration assistant in a generic interface to map between database schemas and federation schemas such as ABCD, so that providers get recommendations on how to map elements, on preferred points for searches, what to do if an element is empty and so forth. *
---+++ Future versions
ABCD version 2.06 is a proposed TDWG standard, which has been recommended for ratification by the annual TDWG meeting in September 2005. If ratified by the TDWG membership, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG as new versions. However, because its inherent mechanisms for extensions allows for preliminary accommodation of new elements, we think that v. 2.06 can be kept stable and in use for a reasonable time period. We expect that the next major version will form part of a system of biodiversity data standards, with common modules across several TDWG standards, e.g. for metadata, images, agents, and bibliography.
-----
---++ Top Level Structure
The ABCD schema is highly structured in order to manage the large quantity of data that a record may contain.
The top level of the schema is arranged as follows:
. DATASETS
. DATASET
. - GUID AbcdIntroduction Contacts (technical and content) AbcdIntroduction Other providers - Metadata - Units (Observations and Specimens)
From this it can be seen that an XML document based on ABCD may contain records from several datasets, each of which is treated separately. Each dataset has a Globally Unique Identifier (GUID) along with information about who may be contacted for further details, for the content of the dataset and for technical information.
There are then two major groups, one holding metadata about the entire dataset and the other holding the actual data records.
The Metadata section holds information about an entire dataset and has the following structure:
. METADATA
. - Description - Icon URI - Scope (Geo-ecological and Taxonomic) - Version - Revision data (Creator, Contributors, Creation and Modification dates) - Owners - Intellectual Property Rights (IPR) statements
The second major section, called UNITS, holds all the records selected and exported from the original dataset, each one of which is a UNIT. This is by far the largest component of ABCD and has the following high-level structure:
. UNITS
. UNIT
. Here we can distinguish several areas. Most of these do not show up in the actual XML hierarchy, because ABCD 2.06 avoids using container elements that serve only to group items together: - Unit-level metadata - Record basis and Kind Of Unit - Identifications - Collection domain-specific data - Unit relationships (Associations and Assemblages) AbcdIntroduction Named collections and surveys - Gathering event and site characteristics - Measurements and Facts - Unit extension area
---++ The ABCD v2.06 Element Groups
For the purpose of this documentation, the data items (XML elements and attributes) are classified according to their content and arranged in groups. On the ABCD documentation Wiki (http://ww3.bgbm.org/abcddocs/), a list of the nearly 1200 concepts in ABCD is provided, with individual documentation pages for each of them that include the classification according to the groups and subgroups outlined below.
---+++ Metadata
The Metadata items record information such as who created the dataset, record, or linked object (e.g. an image), from what source and on which date, along with the Intellectual Property Rights (IPR) and other statements that govern the usage of the data. It also includes technical data items, such as "IDs" needed to access the data.
Items belonging to this group may change in future versions in order to harmonise with the suite of standards that are being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.
The are five subgroups for metadata:
. __ Identifiers __
These are the names or codes that identify data objects or physical objects. Codes include record identifiers or keys, institution codes, accession numbers and collector's field numbers. Names include named collections, but for personal or corporate names see under Agents.
Identifiers include:
Globally unique identifiers (GUID) for datasets and for individual unit records, currently optional. Discussions underway at TDWG indicate that this may be an LSID (Life Science Identifier), a development from the bioinformatics domain, which would have the benefit of linking collection, observation and sequence data.
Apart from the GUID, each unit record contains four identifier elements, three of which are mandatory. These are:
* an identification code for the source institution
* an identification code for the data source that is unique within the source institution
* an identification code for the unit record within the data source
In the interim, a GUID can be synthesised from the hierarchy of these three mandatory elements. Optionally, if the unit ID is alphanumeric, the numeric part can be separately placed in the unit ID numeric element for sorting purposes. The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossilSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference. The Kind of Unit element descibes the part(s) of organism or class of materials represented by the unit, such as whole organism, DNA, fruit and so forth. For consistency, terms should be chosen from the short list of preferred terms.
Further identifiers in the schema include Other providers (referencing an ID in the UDDI registry), Multimedia object ID (e.g. for images), Collector's field number, Observation unit identifier, Unit assemblage ID, Named collection or survey including the unit, and Specimen loan identifier.
. __ Record Basis and Kind of Unit __
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossilSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference.
The Kind of Unit element describes the part(s) of the organism or class of materials represented by the unit, such as whole organism, DNA, fruit, etc. For consistency, terms should be chosen from the short list of preferred terms.
. __ IPR, Versioning, Edit History and other Statements __
Intellectual Property Rights (IPR) play an increasingly important part in the use of shared data and it is important that full use is made of the capabilities of ABCD to record statements on copyright, licensing, terms of use, disclaimers, acknowledgements and citations. It is equally important that proper accreditation is given to data sources.
Versioning refers to the numbering and date of the dataset version, which may be used in citations and to determine the currency of the data. The creator of a dataset or record and the date of creation is recorded and never changes. Provision is also made to identify the date of the most recent edit and the name of the editor, again as an indicator of data currency.
. __ Language and Character Sets __
Many elements have an attribute that can indicate the language of the text that they hold. This is valuable for sorting and searching purposes. Data should be provided in either UTF-8 or UTF-16 encodings of Unicode, which are both valid for use with XML.
. __ UDDI Registry Items __
UDDI (Universal Description, Discovery, and Integration) is a platform-independent, XML-based directory that enable businesses worldwide to list themselves and their services on the Internet. The Global Biodiversity Information Facility currently uses a UDDI registry. The relevant ABCD elements are Technical Contacts and Content Contacts.
---+++ Identification Event
A unit may be identified by one or more identification events. The Identification Event has two main parts, being the identifications themselves and a free text identification history. For every individual identification event the data include the date; the method; references and verification details. The identifier may be a person or an organisation. A flag can be used to indicate a preferred identification where several events in the history of the specimen took place. Likewise, a negative identification can be flagged as such, in addition to one of the identifications that is used to indicate where a specimen is stored (a useful feature e.g. for mixed samples or for type specimens). If there are several identifications of a mixed sample, the individual role may be indicated (e.g. as parasite or host). The outcome of the identification event is an identification result.
---+++ Identification Result
. __ Non-taxonomic Result __
ABCD has a provision to use the schema for the identification of non-biological materials together with the taxonomic identification of an organism. This is used, inter alia, to describe a specimen consisting of a substratum and the organism, e.g. a certain rock type and a lichen crust on that rock. Currently there is only a single text element (Material identified) to accommodate this type of data. However, especially for collections from geo-sciences, ABCD will be expanded to include full cover of the "taxonomy" used in these fields. For the time being, a temporary Extension can be made using the respective element typed as xs:any.
. __ Taxonomic Result __
This is the structure provided for the taxon identified as the result of an identification event.
ABCD considers the classification of the taxon identified in higher taxa not to be within the domain of the collection schema. Nevertheless, for convenience a higher taxon element is included. In contrast to Darwin Core, ABCD handles higher taxa through a repeatable element pair, one for name and one for rank.
The "Full Scientific Name String" element is one of the few mandatory elements in ABCD (if a taxonomic identification is provided at all). This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and, where relevant, the year. It could also hold the name of a cultivar or cultivar group, or of a hybrid formula.
If the name does not conform to a Code of Nomenclature, e.g. a common name, provision is made for it to be recorded as an informal name.
In addition to the Full Scientific Name element described above, a structure is provided for recording names in a fully atomised form, adapted to each of the four main naming conventions (Codes) - for botanical, zoological, bacterial and viral names. The structure is completed with elements for an identification qualifier, where doubts may exist about the accuracy of the identification, and an element for a name addendum such as "sensu lato".
Both, the Full scientific name and the atomised structure are also used for the typified name in the section of the schema treating nomenclatural type designations (see under Specimen Collections below).
---+++ Collection Domain-specific Items
Most of the data handled by ABCD are common to all the subject domains, both in collections and observations. However, there are some data that are very specific to certain domains, such as the morphotype of a lichen. This section of ABCD provides a place for such data so that specialists may easily identify which subsections are relevant to their data and which are not.
This section is also used to accommodate domain specific standard data, some of which may be characterised as legacy data but which is still provided or used in specialised networks.
ABCD can be extended into new domains by the creation of an additional domain-specific section. Temporary additions should make use of the Unit Extension feature.
. __ Observation Records __
Most data about observation records are under the Gathering group of elements, since observers work with place as priority rather than taxon. However, there is a place here for numbers or other registration marks which may be associated with an observation record as the equivalent of the Accession Number used with specimen collections.
Observation records indicating the absence of an organism from a site are accommodated by means of a record with a negative identification.
. __ Specimen Collections __
Data on ownership of the physical specimen (as opposed to the data record), including ownership history, acquisition and accession can be placed here, along with information on the type status of the specimen, preparation technique and details of any markings and labels.
The type section of the schema provides information on the status and kind of nomenclatural type, but also allows full documentation of the verification process of the type status.
In addition, each collection has a set of elements for holding data that are specific to the content of the collection. As mentioned under Collection Domain-specific Items above, ABCD version 2.06 provides containers for the following specialisms: Microbial Genetic Resources (a.k.a. Culture Collections), Mycological (including Lichenological) Collections, Herbaria, Botanic Gardens, Plant Genetic Resources, Zoological Collections and Palaeontological Collections.
---+++ Measurements and Facts
The main difference between measurements and facts is that measurements are numeric whilst facts are textual. These are treated generically, rather than providing an individual place for everything that could be measured, with one or two exceptions that are noted later. The atomised version of measurements/facts captures essentials such as what is being measured, by what method, using which units and so on, as well as the actual value or value range. A free-text alternative is available if the data are not easily available in atomised form.
Measurements and Facts appear at several places in ABCD:
. __Gathering-related Measurements and Facts__
These are the measurements or facts taken at the collection locality at the time of collection, such as water or air temperature, slope, weather conditions etc. Separate elements are available for Altitude, Depth and Height, due to the complex relationships between these. Biotype measurements or facts allow linking of all biotope-related measurement to the site description.
. __ Unit-related Measurements and Facts __
These relate directly to the Unit which is the subject of the record.
. __ Molecular Sequence Data __
A container is provided for sequence data, thus offering the ability to link sequence data back to the specimen from which the sequenced molecule was derived. Links to public repositories such as GeneBank as well as to unpublished material are accommodated.
. __ Stage, Age and Sex __
The final subgroup covers stages (such as egg, larval or adult), age and sex.
---+++ Multimedia and References
Pointers to additional material that relates to the unit may be placed in the Multimedia and References group. Elements are available for the URI of either a "raw" file or a rendered product, such as an HTML or JavaScript resource. The relationship between the resource and the unit in this record can be recorded in a context element. Further elements are available for recording technical data, especially for digital images. The subgroups are:
. __ Multimedia __
Photographs, diagrams, sound files and other types of electronic resources.
. __ Bibliographic References __ Literature can be referenced in several places in the schema. As mentioned before, the data for the entire unit record may have been extracted from the literature, but it is also possible to record instances in which the specimen was cited in the literature. The key and/or description used in an identification event may be referenced there, as could be an identification taken from literature. All measurements and facts can be related to a publication, including molecular sequences. Finally, the nomenclatural reference to the original description(s) of a taxon are recorded within the type designation section.
ABCD uses a very simple structure for bibliographic references, which may be changed to a more elaborate design at a later stage.
. __ Record URI __
This relates to the Web address of the page where the original of this particular record in its database can be found, rather than the address of the whole dataset, which is available in the Metadata group of elements under metadata description representation.
---+++ Agents
The Agents group of elements contains information about the persons and organisations that are associated with collections and observations and their roles. This is an example of a re-usable set of elements that occur in several places within ABCD. Contact details, such as address, telephone number and email, may be recorded here with the permission of the subject.
---+++ Gathering
The Gathering group of elements provides places for a comprehensive set of data about the event and place of collection or observation, including agents, date and georeference coordinate systems. Provision is made for the use of GML (Geographic Markup Language), WMS (Web Map Server) and WFS (Web Feature Server) data, based on the standards promoted by the Open GIS Consortium.
Additional elements can hold details of permits, methods, projects, site images, site-specific measurements, biotope, synecology and stratigraphy, along with others.
---+++ Unit Relations: Associations and Assemblages
The relationships between units can be recorded in this group of elements.
. __ Associations __
Associations are the relationships of this unit with other units in ABCD conformant datasets, using the institution ID, database ID and unit ID triplet for the record within the database. The type of association can be recorded, such as host and parasite, predator and prey etc.
This may also be useful in linking the records for several preparations from the same specimen, such as when a zoological specimen yields skeleton and tissue preparations.
. __ Assemblages __
A unit assemblage describes symmetric relationships between several units, such as herds and flocks or several fossils embedded in a rock. A common identifier links the members of the assemblage.
---+++ Unit Extensions
The Unit Extension is a temporary home to accommodate urgent inter-version additions to the ABCD schema. For example, if a specific community (e.g. culture collections) discover that there are elements missing in the current version of ABCD, they may communicate that to the group responsible for schema development. If it is necessary to move rapidly, for example due to project pressures, these elements may be added to the current version as an extension schema until the best placement for them has been decided.
---+++ Other
The final group is Other, which contains data that does not fit anywhere else, such as Notes. Notes may contain any text that is relevant to this unit that cannot be placed elsewhere within the record.
@
1.37
log
@Revision 37
@
text
@a1 1
d4 2
a5 4
http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/ABCD_2.06.XSD.
This document is not part of that standard.
d8 1
a9 3
http://ww3.bgbm.org/abcddocs/AbcdConcept1437
d11 1
a11 1
ABCD - Access to Biological Collections Data - Schema is a common data specification for biological collection units, including living and preserved specimens, along with field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary collection and observation data.
d13 1
a13 1
All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection. ABCD provides a reconciled set of element names and their definition for scientists and curators to use. It is not expected (or even possible) for any collection to use more than a fraction of the elements defined in the standard.
d15 1
a15 1
A design goal of the data specification was to be both comprehensive and general, to include a broad array of concepts that might be available in a collection database, but to mandate only the bare minimum of elements required to make the specification functional. ABCD deliberately does not cover taxonomic data, such as synonymy, other than the use of names in identifications. Likewise, taxon-related information, such as distribution range, indicator values, etc., is also not included. The elements and concepts that are used provide as much compatibility as is possible with other standards in the field of biological collection data, such as HISPID, Darwin Core, and others.
d17 1
a17 1
The data specification is cast as an XML schema.
d20 1
d22 1
a22 1
ABCD was designed with the following principles in mind:
d24 1
a24 1
ABCD was designed with the following principles in mind:
d26 1
a26 4
1 Full coverage approach: ABCD is comprehensive and therefore complex. It explicitly aims to define the semantics of all elements, in order to:
* Provide a unified approach for the natural history collection community
* Accept detailed information, where available
* Develop a proto-ontology as a first step towards a collection ontology
d28 4
a31 3
2. Polymorphism: Variable atomisation allows provision of data in different degrees of detail and standardisation, in order to:
* Accept data from a wide variety of sources
* Enable data integration
d33 3
a35 1
3. (Almost) no internal referencing: A single-root document without relational structures that use IDs - to make processing easier and faster.
d37 1
a37 3
4. Extensible Slots: Extensions are not meant for individualised adaptations of the schema, but instead to allow:
* Fast community support in case of missing elements, before integration into a subsequent version
* Inclusion of third-party-schemas (or parts thereof), in order to prevent duplication of developments in other communities (e.g. geographical data)
d39 3
a41 1
5. Flexible containers: Element-element or element-attribute couples for category-value pairs allow freely defined and repeatable data fields (e.g., higher taxa, measurements, morphological features). In addition, there is often provision for free-text data where it is impractical to provide atomised data.
d43 1
a43 1
6. No recursive structures
d45 1
a45 1
7. Machine-readable annotations: Structured element annotations will permit their evaluation by program tools (e.g. a semantic search by the Configuration Assistant)
d47 1
a47 1
8. Language support: Language can be be made explicit for most text elements
d49 1
a49 1
9. Typing: The use of complex types and the deposition in a common type library allows type-sharing with other communities (e.g. Structure of Descriptive Data [SDD])
d52 1
a52 1
Biodiversity collections exist in different scientific sub-disciplines:
d54 5
a58 4
* Preserved collections, such as those in museums and herbaria
* Living collections, like botanical and zoological gardens, aquaria, seed banks, microbial strain and tissue collections
* Data collections, from surveys of objects in the field, such as observations, floristic and faunistic mapping and inventories
* Sequences produced by molecular biologists
d60 1
a60 1
Research conducted over the past decade has revealed that all these collections have most of their attributes in common, although the terminology used to describe them may differ substantially.
d62 1
a62 1
These collections represent an immense knowledge base on global biodiversity. The objects contained in collections can be a physical resource of great value for research and industry. Preserved objects represent the historical perspective, providing a falsifiable source of information. For example, specimens of fish from a particular area can be examined to determine mercury levels at the time of collection. Associated field and research notes contain detailed data on the locality, time, and often the appearance of organisms.
d64 1
a64 1
It is estimated that between 2 and 3 billion objects exist in natural history collections alone. Currently, this knowledge base is largely under-utilized, because it is highly distributed, heterogeneous, and the complex scientific nature obstructs efficient information retrieval.
d66 1
a66 1
Databasing and networking is now seen as the key to unlocking the value of biological collections for science, government, education, the public, and businesses, operating in the environmental sector, including land management; in biotechnology or in biodiversity research. International collaboration on the standardization of information models and standard data used in collection databases can enhance the efficiency of this process.
d68 1
a68 3
Development of the ABCD content definition started after the 2000 meeting of the*Taxonomic Databases Working Group (TDWG)* in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogeneous biological collection databases then available. As a consequence, the TDWG/CODATA subgroup on*Access to Biological Collection Data (ABCD)* was established, with one sub-section working on the search and retrieval protocol, *DiGIR*, and a second working on a comprehensive specification for biological collection data (the ABCD data standard).
Protocol development resulted in a limited and non-hierarchical set of data elements, named the*Darwin Core (DwC)*, as a workable specification to be used near-term, whilst the ABCD specification resulted in a comprehensive and highly structured standard for data about objects in biological collections.
d72 4
a75 4
* First workshop, Santa Barbara (June 2001)
* Second workshop, Sydney (November 2001)
* Informal meeting, Sydney (March 2002)
* Third workshop, Indaiatuba, Brazil (October 2002)
d77 1
d79 1
a79 3
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the TDWG endorsed standards, *HISPID* and *ITF*.
In preparation for the second workshop in Sydney, this DTD was transformed into an XML schema and extended by elements from the*BioCISE information model* and the British*NBN/Recorder model.* For part of this meeting, the option of using the Gathering Event rather than the Collection Unit as the root concept of the hierarchical data structure was discussed, since observations data is usually organized by place and time first and then by taxon.
a81 2
1 The goal at that time was to achieve clarity, universality, completeness, and simplicity in the semantics of the standard
2. Collection databases implemented as flat data structures (of which there are many) will not easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore will be restrained from participating in a federation based on this alternative.
d83 3
a85 2
By the time of the informal meeting in Sydney, March 2002, the European*BioCASE* project had started. Its schema definition group (The Natural History Museum in London and the Botanic Garden and Botanical
Museum Berlin-Dahlem) was to provide a collection-level schema (BioCASE only) and a Unit level schema (CODATA/TDWG and BioCASE), so that this group was able to dedicate personnel resources to the schema definition process.
d89 1
a89 1
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by the*Global Biodiversity Information Facility (GBIF)* held in December 2001 in Singapore, led to the version that has been used in the first round of reference implementations (v. 1.2).
d91 1
a91 1
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD. Darwin Core is ideal for resource discovery purposes in particular, while ABCD records hold the additional data that may be required by researchers once a selection has been made.
d93 1
a93 1
Recognising the importance of the developments, CODATA decided to raise the status of the group to the level of a ' 'CODATA Task Group* for 2003/2004 (renewed for 2005/2006).
d95 1
a95 1
At the fourth Workshop in October 2003, a major point of discussion was the need for more guidance at the user-interface as well as at provider level. The very broad coverage of ABCD leaves it to the user to determine how to map their data. The structure looks too complicated for the average user and should thus be hidden from them. It will be the task of programmers to reassemble the different uses of the structure into a presentation layer that supports user requirements.
d97 1
a97 1
A reference portal implementation was constructed by the Paris BioCASE team, work that has been picked up by the SYNTHESYS project. The Berlin team has implemented preliminary interfaces as an intermediate measure. GBIF supported a project to produce a configuration assistant in a generic interface to map between database schemas and federation schemas such as ABCD, so that providers get recommendations on how to map elements, on preferred points for searches, what to do if an element is empty and so forth.
a99 1
a104 1
d109 4
a112 11
DATASETS
DATASET
- GUID AbcdIntroduction Contacts (technical and content) AbcdIntroduction Other providers
- Metadata
- Units (Observations and Specimens)
From this it can be seen that an XML document based on ABCD may contain records from several datasets, each of which is treated separately. Each dataset has a Globally Unique Identifier (GUID) along with information about who may be contacted for further details, for the content of the dataset and for technical information.
d118 2
a119 16
METADATA
- Description
- Icon URI
- Scope (Geo-ecological and Taxonomic)
- Version
- Revision data (Creator, Contributors, Creation and Modification dates)
- Owners
- Intellectual Property Rights (IPR) statements
d122 3
a124 22
UNITS
UNIT
Here we can distinguish several areas. Most of these do not show up in the actual XML hierarchy, because ABCD 2.06 avoids using container elements that serve only to group items together:
- Unit-level metadata
- Record basis and Kind Of Unit
- Identifications
- Collection domain-specific data
- Unit relationships (Associations and Assemblages) AbcdIntroduction Named collections and surveys
- Gathering event and site characteristics
- Measurements and Facts
- Unit extension area
d126 1
a126 2
For the purpose of this documentation, the data items (XML elements and attributes) are classified according to their content and arranged in groups. On the ABCD documentation Wiki (http://ww3.bgbm.org/abcddocs/), a list of the nearly 1200 concepts in ABCD is provided, with individual documentation pages for each of them that include the classification according to the groups and subgroups outlined below.
d129 1
d131 1
a131 8
The Metadata items record information such as who created the dataset, record, or linked object (e.g. an image), from what source and on which date, along with the Intellectual Property Rights (IPR) and other statements that govern the usage of the data. It also includes technical data items, such as "IDs" needed to access the data.
Items belonging to this group may change in future versions in order to harmonise with the suite of standards that are being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.
The are five subgroups for metadata:
__ Identifiers __
d133 1
a133 1
These are the names or codes that identify data objects or physical objects. Codes include record identifiers or keys, institution codes, accession numbers and collector's field numbers. Names include named collections, but for personal or corporate names see under Agents.
d135 2
a136 1
Identifiers include:
d138 1
a138 1
Globally unique identifiers (GUID) for datasets and for individual unit records, currently optional. Discussions underway at TDWG indicate that this may be an LSID (Life Science Identifier), a development from the bioinformatics domain, which would have the benefit of linking collection, observation and sequence data.
d140 1
a140 4
Apart from the GUID, each unit record contains four identifier elements, three of which are mandatory. These are:
* an identification code for the source institution
* an identification code for the data source that is unique within the source institution
* an identification code for the unit record within the data source
d142 1
a142 2
In the interim, a GUID can be synthesised from the hierarchy of these three mandatory elements. Optionally, if the unit ID is alphanumeric, the numeric part can be separately placed in the unit ID numeric element for sorting purposes. The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossilSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference.
The Kind of Unit element descibes the part(s) of organism or class of materials represented by the unit, such as whole organism, DNA, fruit and so forth. For consistency, terms should be chosen from the short list of preferred terms.
d144 4
a147 1
Further identifiers in the schema include Other providers (referencing an ID in the UDDI registry), Multimedia object ID (e.g. for images), Collector's field number, Observation unit identifier, Unit assemblage ID, Named collection or survey including the unit, and Specimen loan identifier.
d149 1
a149 1
__ Record Basis and Kind of Unit __
d151 2
a152 1
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossilSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference.
d154 1
a154 1
The Kind of Unit element describes the part(s) of the organism or class of materials represented by the unit, such as whole organism, DNA, fruit, etc. For consistency, terms should be chosen from the short list of preferred terms.
d156 2
a157 1
__ IPR, Versioning, Edit History and other Statements __
d159 1
a159 5
Intellectual Property Rights (IPR) play an increasingly important part in the use of shared data and it is important that full use is made of the capabilities of ABCD to record statements on copyright, licensing, terms of use, disclaimers, acknowledgements and citations. It is equally important that proper accreditation is given to data sources.
Versioning refers to the numbering and date of the dataset version, which may be used in citations and to determine the currency of the data. The creator of a dataset or record and the date of creation is recorded and never changes. Provision is also made to identify the date of the most recent edit and the name of the editor, again as an indicator of data currency.
__ Language and Character Sets __
d161 1
d164 2
a165 3
__ UDDI Registry Items __
UDDI (Universal Description, Discovery, and Integration) is a platform-independent, XML-based directory that enable businesses worldwide to list themselves and their services on the Internet. The Global Biodiversity Information Facility currently uses a UDDI registry. The relevant ABCD elements are Technical Contacts and Content Contacts.
d168 1
a168 2
A unit may be identified by one or more identification events. The Identification Event has two main parts, being the identifications themselves and a free text identification history. For every individual identification event the data include the date; the method; references and verification details. The identifier may be a person or an organisation. A flag can be used to indicate a preferred identification where several events in the history of the specimen took place. Likewise, a negative identification can be flagged as such, in addition to one of the identifications that is used to indicate where a specimen is stored (a useful feature e.g. for mixed samples or for type specimens). If there are several identifications of a mixed sample, the individual role may be indicated (e.g. as parasite or host). The outcome of the identification event is an identification result.
d171 1
a171 3
__ Non-taxonomic Result __
d174 2
a175 1
__ Taxonomic Result __
d177 1
a177 1
This is the structure provided for the taxon identified as the result of an identification event.
d179 1
a179 1
ABCD considers the classification of the taxon identified in higher taxa not to be within the domain of the collection schema. Nevertheless, for convenience a higher taxon element is included. In contrast to Darwin Core, ABCD handles higher taxa through a repeatable element pair, one for name and one for rank.
d181 1
a181 1
The "Full Scientific Name String" element is one of the few mandatory elements in ABCD (if a taxonomic identification is provided at all). This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and, where relevant, the year. It could also hold the name of a cultivar or cultivar group, or of a hybrid formula.
d183 1
a183 1
If the name does not conform to a Code of Nomenclature, e.g. a common name, provision is made for it to be recorded as an informal name.
d185 1
a185 4
In addition to the Full Scientific Name element described above, a structure is provided for recording names in a fully atomised form, adapted to each of the four main naming conventions (Codes) - for botanical, zoological, bacterial and viral names. The structure is completed with elements for an identification qualifier, where doubts may exist about the accuracy of the identification, and an element for a name addendum such as "sensu lato".
Both, the Full scientific name and the atomised structure are also used for the typified name in the section of the schema treating nomenclatural type designations (see under Specimen Collections below).
d188 1
d190 1
a190 3
Most of the data handled by ABCD are common to all the subject domains, both in collections and observations. However, there are some data that are very specific to certain domains, such as the morphotype of a lichen. This section of ABCD provides a place for such data so that specialists may easily identify which subsections are relevant to their data and which are not.
This section is also used to accommodate domain specific standard data, some of which may be characterised as legacy data but which is still provided or used in specialised networks.
d192 1
a192 3
ABCD can be extended into new domains by the creation of an additional domain-specific section. Temporary additions should make use of the Unit Extension feature.
__ Observation Records __
d194 1
d197 1
a197 3
Observation records indicating the absence of an organism from a site are accommodated by means of a record with a negative identification.
__ Specimen Collections __
d199 2
a200 1
Data on ownership of the physical specimen (as opposed to the data record), including ownership history, acquisition and accession can be placed here, along with information on the type status of the specimen, preparation technique and details of any markings and labels.
d202 1
a202 1
The type section of the schema provides information on the status and kind of nomenclatural type, but also allows full documentation of the verification process of the type status.
d204 1
a204 1
In addition, each collection has a set of elements for holding data that are specific to the content of the collection. As mentioned under Collection Domain-specific Items above, ABCD version 2.06 provides containers for the following specialisms: Microbial Genetic Resources (a.k.a. Culture Collections), Mycological (including Lichenological) Collections, Herbaria, Botanic Gardens, Plant Genetic Resources, Zoological Collections and Palaeontological Collections.
d207 1
d209 1
a209 5
The main difference between measurements and facts is that measurements are numeric whilst facts are textual. These are treated generically, rather than providing an individual place for everything that could be measured, with one or two exceptions that are noted later. The atomised version of measurements/facts captures essentials such as what is being measured, by what method, using which units and so on, as well as the actual value or value range. A free-text alternative is available if the data are not easily available in atomised form.
Measurements and Facts appear at several places in ABCD:
__Gathering-related Measurements and Facts__
d211 2
a212 1
These are the measurements or facts taken at the collection locality at the time of collection, such as water or air temperature, slope, weather conditions etc. Separate elements are available for Altitude, Depth and Height, due to the complex relationships between these. Biotype measurements or facts allow linking of all biotope-related measurement to the site description.
d214 2
a215 1
__ Unit-related Measurements and Facts __
d217 2
a218 7
These relate directly to the Unit which is the subject of the record.
__ Molecular Sequence Data __
A container is provided for sequence data, thus offering the ability to link sequence data back to the specimen from which the sequenced molecule was derived. Links to public repositories such as GeneBank as well as to unpublished material are accommodated.
__ Stage, Age and Sex __
d220 1
a223 1
d226 1
a226 2
__ Multimedia __
d229 2
a230 7
__ Bibliographic References __
Literature can be referenced in several places in the schema. As mentioned before, the data for the entire unit record may have been extracted from the literature, but it is also possible to record instances in which the specimen was cited in the literature. The key and/or description used in an identification event may be referenced there, as could be an identification taken from literature. All measurements and facts can be related to a publication, including molecular sequences. Finally, the nomenclatural reference to the original description(s) of a taxon are recorded within the type designation section.
ABCD uses a very simple structure for bibliographic references, which may be changed to a more elaborate design at a later stage.
__ Record URI __
d232 1
a235 1
d239 1
d241 1
a241 3
The Gathering group of elements provides places for a comprehensive set of data about the event and place of collection or observation, including agents, date and georeference coordinate systems. Provision is made for the use of GML (Geographic Markup Language), WMS (Web Map Server) and WFS (Web Feature Server) data, based on the standards promoted by the Open GIS Consortium.
Additional elements can hold details of permits, methods, projects, site images, site-specific measurements, biotope, synecology and stratigraphy, along with others.
d244 1
d246 1
a246 4
The relationships between units can be recorded in this group of elements.
__ Associations __
d251 1
a251 2
__ Assemblages __
d255 1
d257 1
a257 2
The Unit Extension is a temporary home to accommodate urgent inter-version additions to the ABCD schema. For example, if a specific community (e.g. culture collections) discover that there are elements missing in the current version of ABCD, they may communicate that to the group responsible for schema development. If it is necessary to move rapidly, for example due to project pressures, these elements may be added to the current version as an extension schema until the best placement for them has been decided.
a259 1
@
1.36
log
@Revision 36
@
text
@d20 1
a20 1
A design goal of the data specification was to be both comprehensive and general, to include a broad array of concepts that might be available in a collection database, but to mandate only the bare minimum of elements required to make the specification functional. ABCD deliberately does not cover taxonomic data, such as synonymy, other than the use of names in identifications. Likewise, taxon-related information, such as distribution range, indicator values, etc., is also nABCD version 2 is a proposed TDWG standard, which has been recommended for ratification by the annual TDWG meeting in September 2005. If ratified by the TDWG membership, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG as new versions. However, because its inherent mechanisms for extensions allows for preliminary accomodation of new elements, we think that v. 2.06 can be kept stable and in use for a reasonable time period. We expect that the next major version will form part of a system of biodiversity data standards, with common modules across several TDWG standards, e.g. for metadata, images, agents, and bibliography.ABCD version 2 is a proposed TDWG standard, which has been recommended for ratification by the annual TDWG meeting in September 2005. If ratified by the TDWG membership, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG as new versions. However, because its inherent mechanisms for extensions allows for preliminary accomodation of new elements, we think that v. 2.06 can be kept stable and in use for a reasonable time period. We expect that the next major version will form part of a system of biodiversity data standards, with common modules across several TDWG standards, e.g. for metadata, images, agents, and bibliography.ot included. The elements and concepts that are used provide as much compatibility as is possible with other standards in the field of biological collection data, such as HISPID, Darwin Core, and others.
@
1.35
log
@Revision 35
@
text
@d12 3
d24 1
@
1.34
log
@Revision 34
@
text
@d332 1
@
1.33
log
@Revision 33
@
text
@a198 1
a200 2
d203 1
a203 1
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossileSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, and MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference.
d205 1
a205 1
The Kind of Unit element descibes the part(s) of organism or class of materials represented by the unit, such as whole organism, DNA, fruit and so forth. For consistency, terms should be chosen from the short list of preferred terms.
d209 1
a209 1
Intellectual Property Rights (IPR) play an increasingly important part in the use of shared data and it is important that full use is made of the capabilities of ABCD to record statements on copyright, licencing, terms of use, disclaimers, acknowledgements and citations. It is equally important that proper accreditation is given to data sources.
d211 1
a211 1
Versioning refers to the numbering and date of the dataset version, which may be used in citations and to determine the currency of the data. The creator of a dataset or record and the date of creation is recorded and never changes. Provision is also made to identify the date of the most recent edit and the name of the editor, again as an indicator of data currency.
d219 1
a219 1
UDDI (Universal Description, Discovery, and Integration) is a platform-independent, XML-based directory that enable businesses worldwide to list themselves and their services on the Internet. The Global Biodiversity Information Facility currently uses a UDDI registry. The relevant ABCD elements are Technical Contacts and Content Contacts.
d223 1
a223 1
A unit may be identified by one or more identification events. The Identification Event has two main parts, being the identifications themselves and a free text identification history. For every individual identification event the data include the date; the method; references and verification details. The identifier may be a person or an organisation. A flag can be used to indicate a preferred identification where several events in the history of the specimen took place. Likewise, a negative identification can be flagged as such, same as one of the identifications that is used to indicate where a specimen is stored (a useful feature e.g. for mixed samples or for type specimens). If there are several identifications of a mixed sample, the individual role may be indicated (e.g. as parasite or host). The outcome of the identification event is an indentification result.
d229 1
a229 1
ABCD has a provision to use the schema for the identification of non-biological materials together with the taxonomic identification of an organism. This is used, inter alia, to describe a specimen consisting of a substrate and the organism, e.g. a certain rock type and a lichen crust on that rock. Currently there is only a single text element (Material identified) to accommodate this type of data. However, expecially for collections from geo-sciences, ABCD will be expanded to include full cover of the "taxonomy" used in these natural history sciences. For the time being, a temporary Extension can be made using the respective element typed as xs:any.
d237 1
a237 1
The "Full Scientific Name String" element is one of the few mandatory elements in ABCD (if a taxonomic identification is provided at all). This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and, where relevant, the year. It could also hold the name of a cultivar or cultivar group, or of a hybrid formula.
d241 1
a241 1
In addition to the Full Scientific Name element described above, a structure is provided for recording names in a fully atomised form, adapted to each of the four main naming conventions (Codes) - for botanical, zoological, bacterial and viral names. The structure is completed with elements for an identification qualifier, where doubts may exist about the accuracy of the identification, and an element for a name addendum such as "sensu lato".
d243 2
a244 1
Both, Full scientific name and the atomised structure are also used for the typified name in the section of the schema treating nomenclatural type designations (see under Specimen Collections below).
d248 1
a248 1
Most of the data handled by ABCD is common to all the subject domains, both in collections and observations. However, there is some data that is very specific to certain domains, such as the morphotype of a lichen. This section of ABCD provides a place for such data so that specialists may easily identify which subsections are relevant to their data and which are not.
d250 1
a250 1
This section is also used to accommmodate domain specific standard data, some of which may be characterised as legacy data but which is still provided or used in specialised networks.
d252 1
a252 1
ABCD can be extended into new domains by the creation of an additional domain-specific section. Temporary additions should make use of the Unit Extension feature.
d256 3
a258 1
Most data about observation records is under the Gathering group of elements, since observers work with place as priority rather than taxon. However, there is a place here for numbers or other registration marks which may be associated with an observation record as the equivalent of the Accession Number used with specimen collections.
d262 1
a262 1
Data on ownership of the physical specimen (as opposed to the data record), including ownership history, acquisition and accession can be placed here, along with information on the type status of the specimen, preparation technique and details of any markings and labels.
d264 1
a264 1
The type section of the schema provides information on the status and kind of nomenclatural type, but also allows full documentation of the verification process of the type status.
d266 1
a266 1
In addition, each collection has a set of elements for holding data that are specific to the content of the collection. As mentioned under Collection Domain-specific Items above, ABCD version 2.0 provides containers for the following specialisms: Culture Collections, Mycological and Lichenological Collections, Herbaria, Botanic Gardens, Plant Genetic Resources, Zoological Collections and Palaeontological Collections.
d270 1
a270 1
The main difference between measurements and facts is that measurements are numeric whilst facts are textual. These are treated generically, rather than providing an individual place for everything that could be measured, with one or two exceptions that are noted later. The atomised version of measurements/facts captures essentials such as what is being measured, by what method, using which units and so on, as well as the actual value or value range. A free-text alternative is available if the data is not easily available in atomised form.
d276 1
a276 1
These are the measurements or facts taken at the collection locality at the time of collection, such as water or air temperature, slope, weather conditions etc. Separate elements are available for Altitude, Depth and Height, due to the complex relationships between these. Biotype measurents or facts allow to link all biotope-related measurement to the site description.
d280 1
a280 1
All other measurements and facts relate directly to the Unit which is the subject of the record.
d284 1
a284 1
A container is provided for sequence data, thus providing the ability to link bioinformatics data back to the specimen from which the DNA was derived. ABCD thus unites data from three worlds - those of collections, observations and molecular biology.
d292 1
a292 1
Pointers to additional material that relates to the unit may be placed in the Multimedia and References group. Elements are available for the URI of either a "raw" file or a rendered product, such as an HTML or Javascript resource. The relationship between the resource and the unit in this record can be recorded in a context element. Further elements are available for recording technical data, expecially for digital images. The subgoups are:
d300 1
a300 1
Literature can be referenced in several places in the schema. As mentioned before, the data for the entire unit record may have been extracted from literature, but it is also possible to record instances in which the specimen was cited in literature. The key and/or description used in an identification event my be referenced there, same as an identification taken from literature. All measurments and facts can be related to a publication, including molecular sequences. Finally, the nomenclatural reference to the original description(s) of a taxon are recorded within the type designation section.
d310 1
a310 1
The Agents group of elements contains information about the persons and organisations that are associated with collections and observations and their roles. This is an example of a re-usable set of elements that occurs in several places within ABCD. Contact details, such as address, telephone number and email, may be recorded here with the permission of the subject.
d314 1
a314 1
The Gathering group of elements provides places for a comprehensive set of data about the event and place of collection or observation, including agents, date and georeference coordinate systems. Provision is made for the use of GML (Geographic Markup Language), WMS (Web Map Server) and WFS (Web Feature Server) data, based on the standards promoted by the Open GIS Consortium.
d334 1
a334 1
The Unit Extension is a temporary home to accommodate urgent inter-version additions to the ABCD schema. For example, if a specific community (e.g. culture collections) discover that there are elements missing in the current version of ABCD, they may communicate that to the group responsible for schema development. If the it is necessary to move rapidly, for example due to project pressures, these elements may be added to the current version as an extension schema until the best placement for them has been decided.
d338 1
a338 5
The final group is Other, which contains data that does not fit anywhere else.
__ Notes __
Notes may contain any text that is relevant to this unit that cannot be placed elsewhere within the record.
@
1.32
log
@Revision 32
@
text
@d17 1
a17 1
A design goal of the data specification was to be both comprehensive and general, to include a broad array of concepts that might be available in a collection database, but to mandate only the bare minimum of elements required to make the specification functional. ABCD deliberately does not cover taxonomic data, such as synonymy, other than the use of names in identifications. Likewise, taxon-related information, such as distribution range, indicator values, etc., is also not included. The elements and concepts that are used provide as much compatibility as is possible with other standards in the field of biological collection data, such as HISPID, Darwin Core, and others.
d26 1
a26 1
1 Full coverage approach:ABCD is comprehensive and therefore complex. It explicitly aims to define the semantics of all elements, in order to:
d67 1
a67 1
Development of the ABCD content definition started after the 2000 meeting of the*Taxonomic Databases Working Group (TDWG)* in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogenous biological collection databases then available. As a consequence, the TDWG/CODATA subgroup on*Access to Biological Collection Data (ABCD)* was established, with one sub-section working on the search and retrieval protocol, *DiGIR*, and a second working on a comprehensive specification for biological collection data (the ABCD data standard).
d76 1
a76 1
* Third workshop, Indaiatuba, Brasil (October 2002)
d94 1
a94 1
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD. Darwin Core is ideal for resource disovery purposes in particular, while ABCD records hold the additional data that may be required by researchers once a selection has been made.
d104 1
a104 1
ABCD version 2 is a proposed TDWG standard, which has been recommended for ratification by the annual TDWG meeting in September 2005. If ratified by the TDWG membership, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG as new versions. However, because its inherent mechanisms for extensions allows for preliminary accomodation of new elements, we think that v. 2.06 can be kept stable and in use for a reasonable time period. We expect that the next major version will form part of a system of biodiversity data standards, with common modules across several TDWG standards, e.g. for metadata, images, agents, and bibliography.
d152 1
a152 1
Here we can distinguish several areas, but most of these do not show up in the actual structure, because ABCD 2.0 avoids using container elements that serve only to group items together:
d170 1
a170 1
---++ The ABCD v2.0 Element Groups
d178 3
a180 1
Items belonging to this group may change in future versions in order to harmonise with the suite of standards that is being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.
a181 1
The are five subgroups for metadata:
d185 1
a185 5
Identifiers are the names or codes that identify data objects or physical objects. Codes include record identifiers or keys, institution codes, accession numbers and collector's field numbers. Names include named collections, but for personal or corporate names see under Agents. Identifiers include:
* a (currently optional) globally unique identifier (GUID). Discussions underway at TDWG indicate that this may be an LSID (Life Science Identifier), a development from the bioinformatics domain, which would have the benefit of linking collection, observation and sequence data.
Apart from the GUID, each unit record contains four identifier elements, three of which are mandatory. These are:
d187 1
a187 1
* an identification code for the source insitution
d189 1
a189 1
* an identification code for the data source that is unique within the source institution
d191 4
a194 1
* an identification code for the unit record within the data source
d196 2
a197 1
In the interim, a GUID can be synthesised from the hierarchy of these three mandatory elements.
a198 1
* Optionally, if the unit ID is alphanumeric, the numeric part can be separately placed in the unit ID numeric element for sorting purposes.
d204 1
a204 1
__ Basis and Kind __
@
1.31
log
@Revision 31
@
text
@d71 1
a71 1
An early achievement of the working group had been to bring together existing networks on specimen information to discuss common access, namely ENHSIN, ITIS, ITIS-CA, REMIB, Species Analyst, speciesLink, and the Virtual Australian Herbarium. The discussion that had been started during the TDWG meeting in Frankfurt (2000) was picked up in a sequence of workshops:
d87 1
a87 1
By the time of the informal meeting in Sydney, March 2002, the European*BioCASE* project had started. Its schema definition group (The Natural History Museum in London and the Botanic Garden und Botanical
d90 1
a90 1
The priority was to develop a consensus about which elements should be included. The annotation tag was structured to hold metadata about each element and a schema-viewer, developed in Berlin, was established to allow XML non-specialists to browse the schema and view the annotations in a structured way.
d92 1
a92 1
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by the*Global Biodiversity Information Facility (GBIF)* held in December 6-9 in Singapore, led to the version currently used in reference implementations (v. 1.2).
d94 3
a96 1
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD. Darwin Core is ideal for resource disovery purposes in particular, while ABCD records hold the additional data that may be required by researchers once a selection has been made.
d104 1
a104 3
ABCD version 2 is a proposed TDWG standard, which will be voted on at the annual meeting in September 2005. If accepted, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG and result in a version increment.
The main changes are likely to be extensions for use in new domains, refinement of domain-specific elements and support for the modularisation of TDWG standards. The modularisation process is for element groups that are common across several of the TDWG standards, such as in metadata, images and agents.
@
1.30
log
@Revision 30
@
text
@d24 5
a28 4
1 Full coverage approach
ABCD is comprehensive and therefore complex. It explicitly aims to define the semantics of all elements, in order to:
* Provide a unified approach for the natural history collection community
* Accept detailed information, where available
d31 1
a31 2
2. Polymorphism
Variable atomisation allows provision of data in different degrees of detail and standardisation, in order to:
d35 1
a35 2
3. (Almost) no internal referencing
A single-root document without relational structures that use IDs - to make processing easier and faster.
d37 1
a37 2
4. Extensible
Slots for extensions are not meant for individualised adaptations of the schema, but instead to allow:
d41 1
a41 4
5. Flexible containers
Element-element or element-attribute couples for category-value pairs allow freely defined and repeatable data fields (e.g., higher taxa, measurements, morphological features). In addition, there is often provision for free-text data where it is impractical to provide atomised data.
6. No recursive structures
d43 1
a43 2
7. Machine-readable annotations
Structured element annotations will permit their evaluation by program tools (e.g. a semantic search by the Configuration Assistant)
d45 1
a45 2
8. Language support
Language can be be made explicit for most text elements.
d47 1
a47 2
9. Typing
The use of complex types and the deposition in a common type library allows type-sharing with other communities (e.g. Structure of Descriptive Data (SDD))
d49 1
d55 4
a58 4
* Living collections, like botanical and zoological gardens, aquaria, seed banks, microbial strain cultures and tissue collections
* Data collections, from surveys of objects in the field, such as observations, floristic and faunistic mapping and inventories
* Sequences produced by molecular biologists
d63 1
a63 1
It is estimated that between 2 and 3 billion objects exist in natural history collections alone. Currently, this knowledge base is largely under-utilized, because it is highly distributed, heterogeneous, and complex scientific nature obstructs efficient information retrieval.
d67 1
a67 1
Development of the ABCD content definition started after the 2000 meeting of the*Taxonomic Databases Working Group (TDWG)* in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogenous biological collection databases. As a consequence, the TDWG/CODATA subgroup on*Access to Biological Collection Data (ABCD)* was established, with one sub-section working on the search and retrieval protocol, *DiGIR*, and a second working on a comprehensive specification for biological collection data (the ABCD data standard).
d74 2
a75 2
* Second workshop, Sydney ( November 2001)
* Informal meeting Sydney (March 2002)
d77 1
a77 1
* Fourth workshop in Oeiras, Portugal (October 2003)
d79 1
a79 1
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the*BioCISE* information model, and the TDWG endorsed standards, *HISPID* and *ITF*.
d81 1
a81 1
In preparation for the second workshop in Sydney, this DTD was transformed into an XML schema and extended by elements from the BioCISE information model and the British NBN/Recorder model. For part of this meeting, the option of using the Gathering Event rather than the Collection Unit as the root concept of the hierarchical data structure was discussed, since observations data is usually organized by place and time first and then by taxon.
d83 1
a83 1
Nevertheless, the decision was made to stay with the structure that uses Collection Unit as the root concept for two reasons:
@
1.29
log
@Revision 29
@
text
@d3 7
@
1.28
log
@Revision 28
@
text
@d8 1
a8 2
All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, date, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection and thThis section may change in future versions in order to harmonise with the other data standards in the suite of standards that is being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under deA container allowing to linking any biotope-related measurement to the site descriptvelopment as UBIF - the Unified Biosciences Information Framework.This section may change in future versions in order to harmonise with the other data standards in the suite of standards that is being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.ere is no widely adopted standard for a common set of elements.
ABCD provides a reconciled set of element names and their definition for scientists and curators to use. It is not expected (or even possible) for any collection to use more than a fraction of the elements defined in the standard.
a13 1
@
1.27
log
@Revision 27
@
text
@d1 1
a1 1
---+ A Brief Introduction to the ABCD Schema v2.0
d6 1
a6 1
ABCD - Access to Biological Collections Data - is a common data specification for biological collection units, including living and preserved specimens, along with field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary collection and observation data.
@
1.26
log
@Revision 26
@
text
@a0 8
$$WALTER$$ I have added in the top-level structure section, based on the slides you sent from Brussels. I think that it is valuable to have this section, but there is a bit of a mismatch in that structure and the one taken from the AbcdSchemaGroups on this wiki. For example, Record Basis and Kind is shown as part of the Metadata Group but the slide shows it (more correctly) as part of the Unit Group. Should it be moved?
$$WALTER$$ I now consider this is a finished first draft and await your comments.
Neil
-----
d316 1
a316 1
The Gathering group of elements provides places for a comprehensive set of data about the place of collection or observation, including agents, date and georeference coordinate systems. Provision is made for the use of GML (Geographic Markup Language), WMS (Web Map Server) and WFS (Web Feature Server) data, based on the standards promoted by the Open GIS Consortium.
a344 4
-----
---++ Example records
@
1.25
log
@Revision 25
@
text
@d16 1
a16 1
All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, date, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection and thThis section may change in future versions in order to harmonise with the other data standards in the suite of standards that is being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.This section may change in future versions in order to harmonise with the other data standards in the suite of standards that is being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.ere is no widely adopted standard for a common set of elements.
d216 1
a216 1
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate. Preferred terms here include PreservedSpecimen, LivingSpecimen, FossileSpecimen, OtherSpecimen, HumanObservation, MachineObservation, DrawingOrPhotograph, MultimediaObject. Note that these categories should also be used if the data were copied from a publication, a fact that can be indicated using the element Source Reference.
d256 2
d274 2
d306 1
a306 1
Photographs, diagrams, sound files and other types of resource.
d310 3
a312 1
The link through to the literature is important, particularly the original description(s) of the unit and may be recorded here.
@
1.24
log
@Revision 24
@
text
@d236 1
a236 1
A unit may be identified by one or more identification events. The Identification Event has two main parts, being the identifications themselves and a free text identification history. For every individual identification event the data include the date; the method; references and verification details. The identifier may be a person or an organisation. A flag can be used to indicate a preferred identification where several events in the history of the specimen took place. Likewise, a negative identification can be flagged as such, same as one of the identifications that is used to indicate where a specimen is stored (a useful feature e.g. for mixed samples or for type specimens). The outcome of the identification event is an indentification result.
d270 1
a270 1
Data on ownership, including ownership history, acquisition and accession can be placed here, along with information on the type status of the specimen, preparation technique and details of any markings.
d272 1
a272 1
In addition, each collection has a set of elements for holding data that is specific to the content of the collection. ABCD version 2.0 provides containers for the following specialisms: Culture Collections, Mycological and Lichenological Collections, Herbaria, Botanic Gardens, Plant Genetic Resources, Zoological Collections and Palaeontological Collections.
d276 1
a276 1
The main difference between measurements and facts is that measurements are numeric whilst facts are textual. Measurements are treated generically, rather than providing an individual place for everything that could be measured, with one or two exceptions that are noted later. The atomised version of measurements captures essentials such as what is being measured, by what method, using which units and so on, as well as the actual value or value range. A free-text alternative is available if the data is not easily available in atomised form, or is a statement of fact.
d278 1
a278 1
Measurements and Facts appear at two places in ABCD:
d282 1
a282 1
These are the measurements or facts taken at the collection locality at the time of collection, such as water or air temperature, slope, weather conditions etc. Separate elements are available for Altitude, Depth and Height, due to the complex relationships between these.
@
1.23
log
@Revision 23
@
text
@d210 1
a210 1
Further identifiers in the schema include Multimedia object ID (e.g. for images), Collector's field number, Observation unit identifier, Unit assemblage ID, Named collection or survey including the unit, and Specimen loan identifier.
d236 1
a236 1
The Identification Event has two main parts, being the identification itself and a free text identification history. The identification data includes the date; various flags; the method; references and verification details. The identifier may be a person or an organisation.
d240 1
a240 1
In contrast to Darwin Core, ABCD handles higher taxa through a repeatable element pair, one for name and one for rank.
d242 1
a242 1
The "Full Scientific Name String" element is one of the few mandatory elements in ABCD. This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and, where relevant, the year. It could also hold the name of a cultivar or cultivar group.
d244 1
a244 1
If the name does not conform to a Code of Nomenclature, provision is made for it to be recorded as an informal name.
d246 1
a246 1
__ Taxonomic Result __
d248 1
d250 1
a250 1
In addition to the Full Scientific Name element described above, a structure is provided for recording names in a fully atomised form, adapted to each of the four main naming conventions - for botanical, zoological, bacterial and viral names. The structure is completed with elements for an identification qualifier, where doubts may exist about the accuracy of the identification, and a name addendum such as "sensu lato".
d252 1
a252 1
__ Non-taxonomic Result __
d254 1
a254 1
The Material Identified element may contain a description of the substrate used for a micro-organism culture, the rock in which a fossil was preserved or other materials that are not part of the life science domain.
d260 3
a262 1
This is one of the features of ABCD that make it extensible into new domains, by the creation of an additional domain-specific section.
@
1.22
log
@Revision 22
@
text
@d216 1
a216 3
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is available for this and for several other elements. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate.
If the record is based on a publication, the the element Source Reference serves this purpose.
d232 1
a232 1
UDDI (Universal Description, Discovery, and Integration) is a platform-independent, XML-based directory that enable businesses worldwide to list themselves and their services on the Internet. The relevant ABCD elements are Technical Contacts, Content Contacts and Other Sources.
@
1.21
log
@Revision 21
@
text
@d210 4
@
1.20
log
@Revision 20
@
text
@a199 2
In the interim, a GUID can be synthesised from the hierarchy of the three mandatory elements:
d206 2
@
1.19
log
@Revision 19
@
text
@d16 1
a16 1
All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, date, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection and there is no widely adopted standard for a common set of elements.
d194 1
a194 1
Identifiers are the names or codes that identify data objects or physical objects. Codes include record identifiers or keys, institution codes, accession numbers and collector's field numbers. Names include named collections, but for personal or corporate names see under Agents.
d196 1
a196 1
Each unit record contains five identifier elements, three of which are mandatory. These are:
d198 1
a198 1
* a (currently optional) globally unique identifier (GUID). Discussions underway at TDWG indicate that this may be an LSID (Life Science Identifier), a development from the bioinformatics domain, which would have the benefit of linking collection, observation and sequence data.
@
1.18
log
@Revision 18
@
text
@d120 1
a120 1
The ABCD schema is highly structured in order to manage the large quantity of data that a record may contain. The data elements are arranged in logical groups, each of which is outlined below.
d130 1
a130 1
- Meatdata
d134 1
a134 1
From this it can be seen that a package may contain records from several datasets, each of which is treated separately within the package. Each dataset has a Globally Unique Identifier (GUID) along with information about who may be contacted for further details, for the content of the dataset and for technical information.
d136 1
a136 1
There are then two major groups, one holding metadata about the dataset and the other holding the actual data records.
d138 1
a138 1
The Metadata Group holds information about an entire dataset and has the following structure:
d156 1
a156 1
The second major group, called UNITS, holds all the records selected and exported from the original dataset, each one of which is a UNIT. This is by far the largest component of ABCD and has the following high-level structure:
d162 3
a164 1
- Identifiers AbcdIntroduction Editor AbcdIntroduction Owner AbcdIntroduction IPR (where different from that of the whole dataset) AbcdIntroduction Content contact AbcdIntroduction References
d176 1
a176 1
- Measurements and Facts AbcdIntroduction Age AbcdIntroduction Sex AbcdIntroduction Sequence data AbcdIntroduction Notes AbcdIntroduction Record URI
d182 2
d186 1
a186 1
The Metadata group of elements holds data about the entire package, consisting of one or more datasets. Here is recorded information such as who created the dataset, from what source and on which date, along with the Intellectual Property Rights (IPR) and other statements that govern the usage of the data.
d188 1
a188 1
This section may change in future versions in order to harmonise with the other data standards in the suite of standards that is being developed through TDWG. Areas that are common to more than one standard, such as metadata, will be adjusted so that they are the same in each. The framework for this is currently under development as UBIF - the Unified Biosciences Information Framework.
@
1.17
log
@Revision 17
@
text
@a2 2
$$WALTER$$ I sent by email my first attempt at mapping OBIS to ABCD2 - did you receive it ok?
d38 1
a38 1
3. No internal referencing
d52 1
a52 1
Structured element annotations permit their evaluation by program tools (e.g. a semantic search by the Configuration Assistant)
d55 1
a55 1
Language can be be made explicit for most text elements. [[A][similar approach towards scripts is in preparation.]]
d57 1
a57 1
9. Strong typing
d75 1
a75 1
Databasing and networking is now seen as the key to unlocking the value of biological collections for science, government, education, the public, and businesses, operating in the environmental sector, including land management; in biotechnology or in biodiversity research. Efforts to network the resources exist, but there is little transfer of technology and co-ordination on a global level. International collaboration on the standardization of information models and standard data used in collection databases can enhance the efficiency of this process.
d89 1
a89 1
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the*BioCISE* information model, and the TDWG endorsed standard, *HISPID*.
d95 1
a95 1
2. Collection databases implemented as flat data structures (of which there are many) will not easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore will not be able to participate in a federation based on this alternative.
d97 2
a98 2
By the time of the informal meeting in Sydney, March 2002, the European*BioCASE* project had started. Its schema definition group (The Natural History Museum in London and the Botanischer Garten und Botanisches
Museum, Berlin-Dahlem) was to provide a collection-level schema (BioCASE only) and a Unit level schema (CODATA/TDWG and BioCASE), so that this group was able to dedicate personnel resources to the schema definition process.
d102 1
a102 1
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by the*Global Biodiversity Information Facility (GBIF)* held in December 6-9 in Singapore, led to the version currently used in reference implementations.
d108 1
a108 1
A reference portal implementation is under construction by the Paris BioCASE team. The Berlin team has implemented preliminary interfaces as an intermediate measure. Providers, on the other hand, need recommendations on how to map elements, on preferred points for searches, what to do if an element is empty and so forth. GBIF currently supports a project to produce a configuration assistant in a generic interface to map between database schemas and federation schemas such as ABCD.
d112 1
a112 1
ABCD version 2.0 is a proposed TDWG standard, which will be voted on at the annual meeting in September 2005. If accepted, this will be the version that GBIF will promote for use globally. If further changes become necessary, they will also be proposed through TDWG and result in a version increment.
@
1.16
log
@Revision 16
@
text
@d1 1
a1 1
$$WALTER$$ I would appreciate an indication of how comprehensive the information should be under each of the headings. My view is that, as an introduction, it should just give an indication of the purpose of the element groups and that detailed information should be held elsewhere, derived from Yael's work Is that correct?
d3 1
a3 3
I hope all is well with you and that you enjoyed your trip,
Neil
d5 1
a5 3
$$WALTER$$ I have added in the top-level structure section, based on the slides you sent from Brussels. I think that it is valuable to have this section, but there is a bit of a mismatch in that structure and the one taken from the ABCDSchemaGroups on this wiki. For example, Record Basis and Kind is shown as part of the Metadata Group but the slide shows it (more correctly) as part of the Unit Group. Should it be moved?
$$WALTER$$ I sent by email my first attempt at mapping OBIS to ABCD2 - did you receive it ok?
d40 1
a40 1
3. Avoid referencing
d51 1
a51 1
6. No recursive structures/hierarchies
a62 1
d69 1
d99 2
a100 1
By the time of the informal meeting in Sydney, March 2002, the European*BioCASE* project had started. Its schema definition group (NHM and BGBM) was to provide a collection-level schema (BioCASE only) and a Unit level schema (CODATA/TDWG and BioCASE), so that this group was able to dedicate personnel resources to the schema definition process.
d106 1
a106 1
In 2002 the ABCD Schema was accepted by GBIF. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the DiGIR protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD.
d108 1
a108 1
At the fourth Workshop in October 2003, a major point of discussion was the need for more guidance, on user-interface as well as provider level: the very broad coverage of ABCD leaves it to the user to determine how to map their data. The structure looks too complicated for the average user and should thus be hidden from them. It will be the task of programmers to reassemble the different uses of the structure into a presentation layer that supports user requirements.
d116 2
d140 1
a140 1
The Metadata Group also holds information about the dataset as a whole and has the following structure:
d192 1
a192 1
Identifiers are the names or codes that identify data objects or physical objects. Codes include record identifiers or keys, institution codes, accession numbers and collector's field number. Names include named collections, but for personal or corporate names see under Agents.
d210 1
a210 1
The Record Basis element provides an indication of what the unit record describes, such as a preserved specimen or a multimedia object. A short list of preferred terms is availble for this and several other field. Using a term from such a list provides a degree of consistency that makes subset retrieval considerably more accurate.
d214 1
a214 1
The Kind of Unit element descibes the part(s) of organism or class of materials represented by the unit, such as whole organism, DNA, fruit and so forth. Terms should be chosen from the short list of preferred terms for consistency.
d220 1
a220 1
Versioning refers to the numbering and date of the dataset version, which may be used in citations and to determine currency of the data. The creator of a dataset or record and the date of creation is recorded and never changes. Provision is also made to identify the date of the most recent edit and the name of the editor again as an indicator of data currency.
d224 1
a224 2
$$POINTS$$ _Language attributes valuable for sorting & searching purposes. The UTF-8 and UTF-16 encodings of Unicode are both valid for XML. If your data is not in
UTF-8 or UTF-16, XML requires that you identify your character set._
d232 1
a232 1
The Identification Event has two main parts, the identification itself and a free text identification history. The identification data includes the date, various flags, the method, references and verification details. The identifier may be a person or an organisation.
d236 3
a238 1
In contrast to Darwin Core, ABCD handles higher taxa through a repeatable structure with a pair of elements, one for name and one for rank. The "Full Scientific Name String" element is one of the few mandatory elements in ABCD. This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and, where relevant, the year. It could also hold the name of a cultivar or cultivar group.
d244 1
d263 1
a263 1
Data on ownership, including ownership history, acquisition and accession can be placed here, along with information on the type status of the specimen, markings and preparation technique.
d265 1
a265 1
In addition, each type of collection has a set of elements for holding data that is specific. ABCD version 2.0 provides containers for the following specialisms: Culture Collections, Mycological and Lichenological Collections, Herbaria, Botanic Gardens, Plant Genetic Resources, Zoological Collections and Palaeontological Collections.
d269 1
a269 1
The main difference between measurements and facts is that measurements are numeric whilst facts are textual. Measurements are treated generically, rather than providing a place everything that could be measured, with one or two exceptions that are noted later. The atomised version of measurements captures essentials such as what is being measured, by what method, using which units and so on, as well as the actual value or value range. A free-text alternative is available if the data is not easily available in atomised form, or is a statement of fact.
d275 1
a275 1
These are the measurements or facts taken at the collection locality at the time of collection, such as (water-) temperature, slope, weather conditions etc. Separate elements are available for Altitude, Depth and Height, due to the complex relationships between these.
d279 1
a279 1
All other measurements and facts relate directly to the Unit which is the subject of the record and are placed here.
d281 1
a281 1
__ Moleular Sequence Data __
d283 1
a283 1
A container is provided for sequence data, thus providing the ability to tie bioinformatics data back to the specimen from which the DNA was derived. ABCD thus unites data from three worlds - those of collections, observations and molecular biology.
d287 1
a287 1
The final subgroup covers stages, such as egg, larval or adult, age and sex.
d303 1
a303 1
This relates to the Web address of the page where the original of this particular record in its database can be found, rather than the address of the whole dataset which is available in the Metadata group of elements under metadata description representation.
d311 1
a311 1
The Gathering group of elements provides places for a comprehensive set of data about the place of collection or observation, including agents, date and georeference coordinate systems. Provision is made for the use of GML (Geographic Markup Language), WMS (Web Map Server) and WFS (Web Feature Server) data, based on the standards propmoted by the Open GIS Consortium.
d321 1
a321 1
Associations are the relationships of this unit with other units in ABCD conformant datasets, using the institution ID, database ID and identifier for the record within the database. The type of association can be recorded, such as host and parasite, predator and prey etc.
@
1.15
log
@Revision 15
@
text
@d1 1
a1 1
$$WALTER$$ I don't really consider any of the pages to be near finished, but you are of course welcome to take what is here with that proviso.
d3 1
a3 1
I would appreciate an indication of how comprehensive the information should be under each of the headings. My view is that, as an introduction, it should just give an indication of the purpose of the element groups and that detailed information should be held elsewhere, derived from Yael's work Is that correct?
d5 1
a5 1
Current task is to trawl through the Comments Commented document for additional points that are inportant for the Introduction document.
d7 1
a7 1
This week, I will also do the mapping to OBIS and polish the text in this Introduction on a intermittent basis.
d9 1
a9 1
I hope all is well with you and that you enjoyed your trip,
d121 61
a181 1
---++ Element Groups
@
1.14
log
@Revision 14
@
text
@d1 1
a1 1
$$WALTER$$ I don't really consider any of the pages to be near finished, but you are of course welcome to take what is there with that proviso. I was out of the Museum again yesterday, but have completed preliminary text for all but two of the group/subgroups.
d5 1
a5 1
Next week, I will do the mapping to OBIS and polish the text in this Introduction.
d7 3
a9 1
I hope all is well with you and that you enjoy your trip,
d125 5
a129 1
The Metadata group of elements holds data about the entire package, consisting of one or more datasets. Here is recorded information such as who created the dataset, from what source and on which date, along with the Intellectual Property Rights (IPR) and other statements that govern the usage of the data. The are five subgroups for metadata:
d135 14
d178 1
a178 1
In contrast to Darwin Core, ABCD handles higher taxa through a repeatable structure with a pair of elements, one for name and one for rank. The "Full Scientific Name String" element is one of the few mandatory elements in ABCD. This holds the concatenated scientific name, preferably formed in accordance with a named Code of Nomenclature. It should thus be a monomial, bionomial, or trinomial plus author(s) or author team(s) and - where relevant - the year. It could also hold the name of a cultivar or cultivar group.
d184 1
a184 1
In addition to the Full Scientific Name element described above, a structure is provided for recording names in a fully atomised form, adapted to each of the four main naming conventions - for botanical, zoological, bacterial and viral names. The structure is completed with elements for an identification qualifier, where doubts may exist, and a name addendum such as "sensu lato".
d188 1
a188 1
$$POINTS$$ _Material Identified ?? _
d262 2
d266 1
a266 1
A unit assemblage describes symmetric relationships between several units, such as herds and flocks or several fossils embedded in a rock.
@
1.13
log
@Revision 13
@
text
@d1 1
a1 1
$$NEIL$$ I'll be in Brussels from Monday on, meeting all today. Can you give me those pages you consider +/- finished, so that I can copy them and take them with me? I don't know how good the Internet connection at the meeting will be.
d3 7
a9 1
Walter
d172 1
a172 3
$$POINTS$$ _Extensibility for specific domains. To handle data items that are unique within a domain. List existing domain coverage _
__ Specimen Collections __
d174 1
a174 1
$$POINTS$$ _ _
d178 1
a178 1
$$POINTS$$ _ _
d180 1
a180 1
__ Culture Collections __
d182 1
a182 1
$$POINTS$$ _ _
d184 1
a184 1
__ Mycological and Lichenological Collections __
d186 1
a186 7
$$POINTS$$ _ _
__ Herbaria __
$$POINTS$$ _ _
__ Botanic Gardens __
d188 1
a188 1
$$POINTS$$ _ _
d190 1
a190 1
__ Plant Genetic Resources __
d192 1
a192 1
$$POINTS$$ _ _
d194 1
a194 15
__ Zoological Collections __
$$POINTS$$ _ _
__ Palaeontological Collections __
$$POINTS$$ _ _
---+++ Measurements and Facts
$$POINTS$$ _Generic treatment of measurements. Facts for text-based rather than numeric-based detail. _
__ Gathering-related Measurements and Facts __
$$POINTS$$ _ "Site Measurements and Facts". Altitude AbcdIntroduction depth AbcdIntroduction height avaiable separately _
d198 1
a198 1
$$POINTS$$ _All others - examples for clarification. Poosibly include "Age, stage and sex" - see below. _
d202 1
a202 1
$$POINTS$$ _Container for sequence data - tying the bioinformatics data back to the specimen from which it was derived. ABCD thus unites data from 3 worlds - collections AbcdIntroduction observations AbcdIntroduction molecular. _
d206 1
a206 2
$$Neil$$ _Should this be moved into "Unit-related Measurements and Facts"? It seems a bit odd on its own. _
Well, I think we can decide this later, but to move it we would have to introduce two subheadings under the group (for those covered by the general type and for these specific ones), which would look odd as well. Zoologists and Mycologists are quite particular about this point, so I'd opt to keep it separate for the time being.
d210 1
a210 1
$$POINTS$$ _Additional material relating to the unit_
d214 1
a214 1
$$POINTS$$ _Photographs, diagrams, sound files ... _
d218 1
a218 1
$$POINTS$$ _Link through to the literature, particularly the original description(s) of the unit. _
d222 1
a222 1
$$POINTS$$ _Web address of the page where more information on this particular record (not on the whole dataset) can be found. _
d226 1
a226 1
$$POINTS$$ _ Persons, organisations and roles. Example of a re-usable Type _
d230 3
a232 1
$$POINTS$$ _Agents, date and place. Coordinate systems. Provision for use of GML AbcdIntroduction WMS and WFS data. _
d236 1
a236 1
$$POINTS$$ _Realtionships _
d240 1
a240 1
$$POINTS$$ _Relationships of this unit with other units in ABCD conformant datasets. Give examples _
d244 1
a244 1
$$POINTS$$ _A unit assemblage handles symmetric relationships between several units. Give examples _
d248 1
a248 1
$$POINTS$$ _The extension is temporary and serves only to accommodate urgent inter-version additions to the unit-object schema [[in][future to the the various schemas/objects used in ABCD]]. For example, if a specific community (e.g. culture collections) discover that there are elements missing in the current version of ABCD (as they did for 1.2), they communicate that to the group responsible for schema development, where it will be discussed if these elements are already accommodated in the schema, if not, if they can and should be generalized and where they should end up in the schema. IF the culture collection community needs to move rapidly (e.g. due to project pressures), they can add these elements to the current version as an extension schema under a structure that follows SDD's CustomExtensions element (CustomExtensions-CustomExtension-Any ##other). In future modular versions of ABCD, such a structure should exist in the root of every object/schema. _
d252 1
a252 1
$$POINTS$$ _Data that does not fit anywhere else. _
d256 1
a256 1
$$POINTS$$ _Free-text notes. _
@
1.12
log
@Revision 12
@
text
@d222 2
a223 1
$$WALTER$$ _Should this be moved into "Unit-related Measurements and Facts"? It seems a bit odd on its own. _
@
1.11
log
@Revision 11
@
text
@d1 1
a1 1
$$WALTER$$ I have made a start on the content of the Introduction [[7th][April]]. I have replaced the URLs in the "Background" with the actual text and done some minimal editing on that text. More may be necessary depending on how comprehensive you wish this section to be.
d3 1
a3 5
I have added the new subgroups under "Collection Domain-specific Items".
I have also added some scope notes for the group and subgroup headings. I suggest that we keep these in place for correspondence between us at group/subgroup level. The actual text can be filled in underneath each.
Neil
@
1.10
log
@Revision 10
@
text
@a120 2
$$POINTS$$ _Data about the content of the package, including keywords on scope, creators, sources and IPR _
d125 1
a125 1
$$POINTS$$ _Codes or fields that serve to identify data objects or physical objects, including record identifiers, field and accession ID's and Codes, unique IDs, named collections, collector's field number ... _
d129 3
a131 1
$$POINTS$$ _The Record Basis element provides an indication of what the unit record describes, such as ... Note that the element SourceReference provides for the case that the record is based on a publication. _
d133 1
a133 1
_ Kind of Unit element descibes the part(s) of organism or class of materials represented by this unit._
d137 1
a137 1
$$POINTS$$ _Importance of IPR and proper accreditation. Good provision for declarations about copyright, licencing, terms of use, disclaimers, acknowledgements and citations._
d139 1
a139 1
_Versioning = numbering and date of the dataset version, for citations and to determine currency of the data. Identification of the data creators and date of last modification_
d148 1
a148 1
$$POINTS$$ _Brief description of what UDDI does. Relevant elements are Technical Contacts, Content Contacts and Other Sources _
d152 1
a152 1
$$POINTS$$ _Identification and Identification History._
d154 1
a154 1
_Date, flags, method, references and verification. Person or organisation and role. _
d156 1
a156 1
---+++ Identification Result
d158 1
a158 1
$$POINTS$$ _Handling of higher taxa, scientific names and informal names. _
d162 1
a162 1
$$POINTS$$ _ Detail on scientific names - atomised and by domain _
@
1.9
log
@Revision 9
@
text
@d1 3
a3 1
$$WALTER$$ I have made a start on the content of the Introduction. I have replaced the URLs in the "Background" with the actual text and done some minimal editing on that text. More may be necessary depending on how comprehensive you wish this section to be.
a6 4
I am not in the Museum tomorrow, but I may have the opportunity to do some further work on this at home over the weekend. I cannot promise this, I'm afraid, since I'm going to Cambridge for my parents Diamond (60th) wedding anniversary. I will be able to devote more time to this next week for sure.
Hope everything else is going ok.
d113 1
a113 1
There should be some text on this somewhere (in the Comments Commented section perhaps?)
d123 3
a125 1
---++++ Identifiers
d129 1
a129 1
---++++ Basis and Kind
d135 1
a135 1
---++++ IPR, Versioning, Edit History and other Statements
d137 1
a137 1
$$POINTS$$ _Improtance of IPR and proper accreditation. Good provision for declarations about copyright, licencing, terms of use, disclaimers, acknowledgements and citations._
d141 1
a141 1
---++++ Language and Character Sets
d146 1
a146 1
---++++ UDDI Registry Items
d160 1
a160 1
---++++ Taxonomic Result
d164 1
a164 1
---++++ Non-taxonomic Result
d172 36
d212 1
a212 1
---++++ Gathering-related Measurements and Facts
d216 1
a216 1
---++++ Unit-related Measurements and Facts
d220 1
a220 1
---++++ Moleular Sequence Data
d224 1
a224 1
---++++ Stage, Age and Sex
d232 1
a232 1
---++++ Multimedia
d236 1
a236 1
---++++ Bibliographic References
d240 1
a240 1
---++++ Record URI
d256 1
a256 1
---++++ Associations
d260 1
a260 1
---++++ Assemblages
d272 1
a272 1
---++++ Notes
@
1.8
log
@Revision 8
@
text
@d1 12
d127 1
a127 1
$$POINTS$$ _Codes or fields that serve to identify data objects or physical objects, including record identifiers, field and accession ID's and Codes, unique IDs, named collections, collector's field number _
d131 1
a131 1
$$POINTS$$ _The Record Basis element provides an indication of what the unit record describes, such as ... Note that the element SourceReference provides for the case that the record is based on a publication.
d133 1
a133 1
Kind of Unit element descibes the part(s) of organism or class of materials represented by this unit._
d137 1
a137 1
$$POINTS$$ _Improtance of IPR and proper accreditation. Good provision for declarations about copyright, licencing, terms of use, disclaimers, acknowledgements and citations.
d139 1
a139 1
Versioning = numbering and date of the dataset version, for citations and to determine currency of the data. Identification of the data creators and date of last modification_
d152 3
a154 1
$$POINTS$$
d158 1
a158 1
$$POINTS$$
d162 1
a162 1
$$POINTS$$
d166 1
a166 1
$$POINTS$$
d170 1
a170 1
$$POINTS$$
d174 1
a174 1
$$POINTS$$
d178 1
a178 1
$$POINTS$$
d182 1
a182 1
$$POINTS$$
d186 1
a186 1
$$POINTS$$
d190 1
a190 1
$$POINTS$$
d192 1
a192 1
---+++ Multiedia and References
d194 1
a194 1
$$POINTS$$
d198 1
a198 1
$$POINTS$$
d202 1
a202 1
$$POINTS$$
d206 1
a206 1
$$POINTS$$
d210 1
a210 1
$$POINTS$$
d214 1
a214 1
$$POINTS$$
d218 1
a218 1
$$POINTS$$
d222 1
a222 1
$$POINTS$$
d226 1
a226 1
$$POINTS$$
d230 1
a230 1
$$POINTS$$
d234 1
a234 1
$$POINTS$$
d238 1
a238 1
$$POINTS$$
@
1.7
log
@Revision 7
@
text
@d6 1
a6 1
ABCD is a common data specification for biological collection units, including living and preserved specimens, along with field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary collection and observation data.
d39 1
a39 2
Element-element or element-attribute couples for category-value pairs allow:
* Freely defined and repeatable data fields (e.g., higher taxa, measurements, morphological features)
d111 2
d115 2
d119 4
d125 4
d131 3
d136 2
d140 2
d144 2
d148 2
d152 2
d156 2
d160 2
d164 2
d168 2
d172 2
d176 2
d180 2
d184 2
d188 2
d192 2
d196 2
d200 2
d204 2
d208 2
d212 2
d216 2
d220 2
d224 2
@
1.6
log
@Revision 6
@
text
@d6 4
a9 1
ABCD is a common data specification for biological collection units, including living and preserved specimens, along with field observations that did not produce voucher specimens. It is intended to support the exchange and integration of detailed primary collection and observation data. All of the world's biological collections contain a number of data items including specimen specific (e.g. taxon, date, altitude, sex) and collection specific (e.g. holding institution) elements. The set of elements used varies from collection to collection and there is no widely adopted standard for a common sets of elements. ABCD provides a reconciled set of element names and their definition for scientists and curators to use. It is not expected (or even possible) for any collection to use more than a fraction of the elements defined in the standard.
a20 1
a26 1
a34 1
a39 1
d51 1
a51 1
The use of complex types and the deposition in a common type library allows type-sharing with other communities (e.g. SDD)
d70 1
a70 1
Development of the ABCD content definition started after the 2000 meeting of the*Taxonomic Databases Working Group* (TDWG) in Frankfurt/Main, where the decision was made to specify both a protocol and a data structure to enable interoperability of the numerous heterogenous biological collection databases. As a consequence, the TDWG/CODATA subgroup on*Access to Biological Collection Data* (ABCD) was established, with one sub-section working on the search and retrieval protocol (DiGIR) and a second working on a comprehensive specification for biological collection data (the ABCD data standard).
d72 1
a72 1
Protocol development resulted in a limited and non-hierarchical set of data elements, named the*Darwin Core*, as a workable specification to be used near-term, whilst the ABCD specification resulted in a comprehensive and highly structured standard for data about objects in biological collections.
d82 1
a82 1
The first workshop in Santa Barbara produced an XML DTD, using a combination of top-down conceptualisation (and organisation) and bottom-up use of existing relevant specifications, such as the BioCISE information model, and the TDWG endorsed standard, *HISPID*.
d86 3
a88 3
Nevertheless, the decision was made to stay with the structure that uses Collection Unit as the root concept for two reasons:
1 The goal at that time was to achieve clarity, universality, completeness, and simplicity in the semantics of the standard
2. Collection databases implemented as flat data structures (of which there are many) will not easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore will not be able to participate in a federation based on this alternative.
d92 1
a92 1
The priority was to develop a consensus about which elements should be included. The annotation tag was structured to hold metadata about each element and a schema-viewer (13), developed in Berlin, was established to allow XML non-specialists to browse the schema and view the annotations in a structured way.
d94 1
a94 1
The ENHSIN and BioCASE projects drove the process during 2001/2002, providing drafts that were discussed during TDWG and other meetings and which were exposed in a Request for Comment process. An editorial meeting sponsored by GBIF (Dec. 6-9 in Singapore) led to the version currently used in reference implementations.
d96 1
a96 1
In 2002 the ABCD Schema was accepted by*GBIF*. A protocol supporting ABCD was provided by the BioCASE reference implementation in 2003, and in October, GBIF decided to integrate the BioCASE network into the nascent GBIF network along with the*DiGIR* protocol and Darwin Core. The primary difference between the two standards is that DiGIR handles only flat schemas, such as Darwin Core, whereas the BioCASe protocol can handle structured schemas, such as ABCD.
d106 2
d166 2
@
1.5
log
@Revision 5
@
text
@d57 45
a101 1
See http://www.bgbm.org/TDWG/CODATA/ABCD-Evolution.htm
@
1.4
log
@Revision 4
@
text
@d14 41
a54 1
See http://www.bgbm.org/TDWG/CODATA/ABCD-DesignPrinciples.htm
@
1.3
log
@Revision 3
@
text
@d6 6
a11 1
See http://www.bgbm.org/TDWG/CODATA/ABCD-Purpose.htm
@
1.2
log
@Revision 2
@
text
@d1 4
d15 1
d17 60
@
1.1
log
@Initial revision
@
text
@d2 1
d5 1
d8 1
d11 1
a11 2
@