---++ GUID-1 Workshop Report ---+++ Introduction The Taxonomic Databases Working Group (TDWG) and the Global Biodiversity Information Facility (GBIF) completed their first Workshop on Globally Unique Identifiers for Biodiversity Informatics (GUID-1) at the National Evolutionary Synthesis Center (NESCent), Durham, NC, USA on Feb 1 3, 2006. See [[GUID1Minutes]] for a complete set of materials presented during the workshop. Download the [[wiki.gbif.org/guidwiki/images/GUID-1Report.pdf][report in PDF format]]. ---+++ Motivation A GUID framework is foundational in facilitating systems interoperability in biodiversity informatics. It meets the need for a universally adopted system for assigning and recognizing identifiers in the domain. A GUID framework will help to manage and cross-link the many different types of entities that are manipulated analytically in biodiversity informatics and will improve interoperability with other related life sciences domains, such as bioinformatics and ecology. ---+++ The Group The workshop delegates consisted of a representative cross-section of domain experts from around the world (see [[GUID1Participants]]). ---+++ Goals The goals of the workshop were to: * Discuss the requirements for globally unique identifiers for biodiversity informatics * Select an optimal GUID technology (LSID, DOI, Handles or other) * Begin to identify key parameters for implementing an effective system * Investigate the use of a RDF-based metadata architecture for GUIDs * Form working groups to address key identified issues before the GUID-2 workshop ---+++ Outcomes * Life Science Identifiers (LSID) seem the most appropriate GUID strategy in biodiversity informatics. * The use of LSIDs does not preclude the use of other technologies where appropriate. * LSID authorities must use the Domain Name Service (DNS) to support identifier resolution. (The LSID specification allows for other resolution mechanisms, but DNS is currently the only mechanism in use.) * Although it is not possible to prevent multiple data providers from issuing alternate identifiers resolving to the same data record, the community should develop processes and tools to coordinate issuing of single identifiers for some classes of data (e.g. taxon names). * Metadata should be provided as RDF serialized as XML and should exploit existing vocabularies such as Dublin Core wherever these are in wide use. * The LSID getData method should be used only where it is possible and appropriate to return an unchanging series of bytes. In other cases only the LSID getMetadata method should be used. (This reflects the use of the terms "data" and "metadata" in connection with LSIDs.) ---+++ Justifications The main criteria leading to the selection of LSID technology were: * The cost-model of DOI. That technology is predicated on the idea that a revenue stream can be constructed for the identified objects, typically sufficient to defray the cost. That this is not the case for most, if not all, of the objects that are likely to be identified in our systems. * The more dynamic nature of LSIDs, which does not require prior registration of every individual identifier before use. * The open nature of the LSID protocol and software stack, and the ease of implementing LSIDs on different platforms. Technology Comparison The group compared the GUID technologies according to the following criteria: *Opacity*: Is the identifier free from embedded semantic information? Opacity was identified as a possibly important criterion in that genuinely opaque identifiers could not be used to make false inferences about the object represented by a GUID. Handles, DOIs and LSIDs all include similar levels of embedded information. *Governance*: Is there a body that monitors the assignment of identifiers? DOI has a more formal governance model for identifiers than the other standards. Assignment of identifiers is a more strongly contractual matter and all identifier assignment and access is mediated through the DOI registration infrastructure. Several use cases for GUIDs in biodiversity informatics require more dynamic assignment and resolution paths. *Guaranteed persistent*: Is there any guarantee that identifiers will remain resolvable other than the commitments made by the assigning authority (commitments which must be made regardless of which technology is adopted)? The central DOI infrastructure holds the registered identifiers and makes some commitments to host orphaned data. *Registration of assigning organisations*: Must institutions register before being permitted to issue identifiers? Issuing authorities for Handles and DOIs are registered centrally. LSID resolvers must be registered in DNS but do not need to be identified to a central LSID authority. *Registration of identifiers*: Must institutions register each identifier before use? DOIs are only resolvable if they are known to the central authority. *Metadata*: Do the identifiers have a standard association with metadata? Both DOIs and LSIDs have mechanisms to provide access to metadata. *Resolvable*: Does the identifier include a mechanism to retrieve the associated metadata and data? Handles, DOIs and LSIDs are all resolvable in this way. *Globally unique*: Is there a commitment that the identifier will uniquely identify a single object? Handles, DOIs and LSIDs all involve commitments to global uniqueness. *Relocatable*: Can an organisation's identifiers be transferred for a different organisation to resolve (e.g. upon closure of the issuing institution)? An assigner of Handles, DOIs or LSIDs can pass responsibility for resolution to another resolver organisation. *Individually relocatable*: Can individual identifiers be transferred for a different organisation to resolve? Individual DOIs may be assigned to other organisations to resolve. This is not possible with LSIDs. *Open architecture*: Does TDWG have the ability to take over ownership of the standard and software if others stop supporting it? Handle and DOI are both based on proprietary technologies. LSID is based on a more open strategy. *Affordable*: Is the technology affordable for TDWG, GBIF and its partners? TDWG partners together expect to assign many millions of GUIDs and have no model to fund the cost of DOIs. The cost of licensing Handle technology is unclear. LSIDs will involve costs in development of processes and infrastructure, but TDWG has more control over the process. ---+++ Summary Technology Comparison The following table includes catalogue numbers and taxon names for comparison as these are examples of identifiers currently in use for data integration in biodiversity informatics. ""
Criterion | Catalogue numbers | Taxon names | Handle | DOI | LSID |
Opaque | +/- | - | - | - | - |
Governance | +/- | + | - | + | - |
Guaranteed persistent | N/A | N/A | - | + | - |
Registration of assigning organisations | - | - | + | + | - |
Registration of identifiers | +/- | +/- | - | + | - |
Metadata | - | - | - | + | + |
Resolvable | - | - | + | + | + |
Globally unique | - | - | + | + | + |
Relocatable | - | - | + | + | + |
Individually relocatable | - | - | - | + | - |
Open architecture | - | - | - | - | + |
Affordable | + | + | ? | - | + |