wiki-archive/twiki/data/GUID/WhatDoWeMeanByGUID.txt

---+++ Topic 1: What do we mean by "GUID"?

----
DonaldHobern on mailing list:

The most fundamental thing that we need to establish as we consider a GUID implementation is a definition for "GUID" in this context.  We have been using a number of terms to describe the identifiers we need (unique, resolvable, persistent, etc.).  

I've been spending some time following up on Rod Page's recommendation that we consider the use of Archival Resource Keys (ARK) from the California Digital Library (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK).  The CDL web site includes an excellent overview of this GUID model, which also serves as an excellent introduction to the issues involved.  I would urge you all to read this document - it's only nine pages long!):

http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf

This document arrives at the following problem definition for persistent, actionable identifiers:

   1.  The goal: long-term _actionable_ identifiers. 
      a) _Requirement: that identifiers deliver you to objects (where feasible)._
      b) _Requirement: that identifiers deliver you to object metadata._
      c) _Desirable: each object should wear its own identifier._
      d) _Requirement: that identifiers deliver you to statements of commitment._
   2.  The problem: URLs break _for some objects (that is, associations between URLs and objects are not maintained), and we have no way to tell which ones will or won't break._
   3.  Why URLs break: because objects are moved, removed, and replaced - _completely normal activities - and the provider in each case demonstrates insufficient commitment to update indirection tables, or to plan identifier assignment carefully. Persistence is in the mission of few organizations._
   4.  Conventional hypothesis: use indirect names (PURLs, URNs, Handles) instead of URLs; what worked for DNS should work for digital object references.  _Wrong. Indirection is spectacularly successful and elegant in DNS, but it's a side issue in the provision of digital object persistence._
 
This document clearly identifies issues around provider service commitments as the key problem that needs solving.  The construction of ARKs seeks to address this in a couple of ways.  It separates the role of Name Assigning Authority (i.e. who initially assigns the identifier) from that of the Name Mapping Authority (i.e. who is able to map the identifier to the data object at any particular time).  It also defines a simple standard relationship between three things: the data object, the metadata for the object, and a commitment statement from the provider as to what aspects of persistence are guaranteed.

ARK is a technology that we have not really considered up to this point.  My question for discussion is what, if anything, is missing or wrong about the problem definition provided in this document?  If we agree that it provides a crisp definition of what we need, that in itself will be a major step forward.

----

*Summary of Discussion from Mailing List:*

The main concern of the GUID group members seems to be the definition of what data units should get identifiers. The issue has been raised again on the discussions of [[GUIDsForCollectionsAndSpecimens][Topic 2 - Collections and Specimens]] and [[GUIDsForTaxonNamesAndTaxonConcepts][Topic 3 - Taxonomic Names and Concepts]] as well.

Several members of the group believe that to be able to cross link elements in the network in a meaningful manner, we need to define carefully what classes of data objects get GUIDs, and some believe that these rules should be enforced to some extent.

The group acknowledges that these discussions independent of the development of the underlying GUID infrastructure. These discussions are, however, very important to elicit requirements for the underlying GUID infrastructure.

Donald Hobern suggested the development of a shared ontology to identify biological object classes and have GUID assigned objects to point to the class they belong in the ontology. See [[IdentifiersAtMOBOTDonaldsReply][Donald's complete message]] for more information. That solution is initially proposed for collections and specimens. The group is discussing other approaches to solve the issue for taxonomic names and concepts. 

Besides that, other issues were raised as well, that are more directly related to Topic 1: 

   * Separation between Assigning and Resolving Authorities
   * Who generates identifiers
   * Data and metadata versioning and persistence
   * GUID representing physical or digital objects (issue regarding Topic 2 actually)

Those issues have not been throughly discussed, though.

---+++++ Categories
CategoryTopics
Archive entire twiki setup from owl as of today 2016-02-24 10:46:01 +00:00			`---+++ Topic 1: What do we mean by "GUID"?`

			`----`
			`DonaldHobern on mailing list:`

			`The most fundamental thing that we need to establish as we consider a GUID implementation is a definition for "GUID" in this context. We have been using a number of terms to describe the identifiers we need (unique, resolvable, persistent, etc.).`

			`I've been spending some time following up on Rod Page's recommendation that we consider the use of Archival Resource Keys (ARK) from the California Digital Library (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK). The CDL web site includes an excellent overview of this GUID model, which also serves as an excellent introduction to the issues involved. I would urge you all to read this document - it's only nine pages long!):`

			`http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf`

			`This document arrives at the following problem definition for persistent, actionable identifiers:`

			`1. The goal: long-term _actionable_ identifiers.`
			`a) _Requirement: that identifiers deliver you to objects (where feasible)._`
			`b) _Requirement: that identifiers deliver you to object metadata._`
			`c) _Desirable: each object should wear its own identifier._`
			`d) _Requirement: that identifiers deliver you to statements of commitment._`
			`2. The problem: URLs break _for some objects (that is, associations between URLs and objects are not maintained), and we have no way to tell which ones will or won't break._`
			`3. Why URLs break: because objects are moved, removed, and replaced - _completely normal activities - and the provider in each case demonstrates insufficient commitment to update indirection tables, or to plan identifier assignment carefully. Persistence is in the mission of few organizations._`
			`4. Conventional hypothesis: use indirect names (PURLs, URNs, Handles) instead of URLs; what worked for DNS should work for digital object references. _Wrong. Indirection is spectacularly successful and elegant in DNS, but it's a side issue in the provision of digital object persistence._`

			This document clearly identifies issues around provider service commitments as the key problem that needs solving. The construction of ARKs seeks to address this in a couple of ways. It separates the role of Name Assigning Authority (i.e. who initially assigns the identifier) from that of the Name Mapping Authority (i.e. who is able to map the identifier to the data object at any particular time). It also defines a simple standard relationship between three things: the data object, the metadata for the object, and a commitment statement from the provider as to what aspects of persistence are guaranteed.

			`ARK is a technology that we have not really considered up to this point. My question for discussion is what, if anything, is missing or wrong about the problem definition provided in this document? If we agree that it provides a crisp definition of what we need, that in itself will be a major step forward.`

			`----`

			`Summary of Discussion from Mailing List:`

			`The main concern of the GUID group members seems to be the definition of what data units should get identifiers. The issue has been raised again on the discussions of [[GUIDsForCollectionsAndSpecimens][Topic 2 - Collections and Specimens]] and [[GUIDsForTaxonNamesAndTaxonConcepts][Topic 3 - Taxonomic Names and Concepts]] as well.`

			`Several members of the group believe that to be able to cross link elements in the network in a meaningful manner, we need to define carefully what classes of data objects get GUIDs, and some believe that these rules should be enforced to some extent.`

			`The group acknowledges that these discussions independent of the development of the underlying GUID infrastructure. These discussions are, however, very important to elicit requirements for the underlying GUID infrastructure.`

			`Donald Hobern suggested the development of a shared ontology to identify biological object classes and have GUID assigned objects to point to the class they belong in the ontology. See [[IdentifiersAtMOBOTDonaldsReply][Donald's complete message]] for more information. That solution is initially proposed for collections and specimens. The group is discussing other approaches to solve the issue for taxonomic names and concepts.`

			`Besides that, other issues were raised as well, that are more directly related to Topic 1:`

			`* Separation between Assigning and Resolving Authorities`
			`* Who generates identifiers`
			`* Data and metadata versioning and persistence`
			`* GUID representing physical or digital objects (issue regarding Topic 2 actually)`

			`Those issues have not been throughly discussed, though.`

			`---+++++ Categories`
			`CategoryTopics`