%META:TOPICINFO{author="RicardoPereira" date="1173790253" format="1.1" version="1.66"}%
---+ GUID-2 Report
---++++ *Second International Workshop on Globally Unique Identifiers for Biodiversity Informatics (GUID-2)*
_10th - 12th June 2006_
_e-Science Institute, Edinburgh_
----
---++++ 1. Background
The Taxonomic Databases Working Group (TDWG, http://www.tdwg.org/) and the Global Biodiversity Information Facility (GBIF, http://www.gbif.org) held the second of two workshops to address the need for a system of globally unique identifiers (GUIDs) for use in biodiversity informatics. The first workshop was held at NESCent in North Carolina in February 2006 and recommended the adoption of Life Science Identifiers (LSIDs) for this purpose (see: [[GUID1Report]]). Subsequent activity has focussed on the development of prototypes, integrating LSIDs with a key data sets and on-line discussion of key infrastructure and policy implications of using LSIDs.
----
---++++ 2. Summary
The second GUID workshop was held at the e-Science Institute (e-SI) in Edinburgh, Scotland, 10-12 June 2006. The meeting re-iterated the conclusions from the first workshop and developed the following recommendations and priorities for the use of LSIDs, including:
* Work first with the nomenclators, particularly IPNI, Index Fungorum and ZooBank, to associate LSIDs with these reference lists of scientific names, recognising their foundational place in biodiversity informatics.
* Update software tools for LSID resolution and develop packaged installers.
* Support integration of LSID functionality into TDWG data provider packages.
* Develop documentation and publicity materials for managers, biologists and IT specialists.
* Approach external software projects to encourage the support of LSIDs.
----
---++++ 3. Goals
1. To review the outcomes of the first GUID meeting and subsequent developments.
1. To confirm the choice of LSIDs as persistent identifiers (GUIDs) for TDWG functional data types as will be defined in the TDWG Core Ontology (currently under development, [[http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntology]]).
1. To discuss the governance of GUID allocation.
1. To identify issues and discuss the options for ensuring long term availability and archival strategy for GUID objects (or their data sets).
1. To identify gaps and weaknesses in current GUID server and resolving software and to list the actions required to address them.
1. To ways to promote the benefits of using GUIDs to the taxonomic, biological research and biodiversity information communities.
1. To identify the technical and �sociological' support required to assist potential users to start serving GUIDs.
1. To review and make recommendations on the incorporation of GUIDs and RDF into TDWG standards.
----
---++++ 4. Discussion
* The meeting identified the importance of promoting the use of LSIDs to the broader community. This goal evolved into the formation of an outreach group [[GUIDOutreach]] to identify priorities for accelerating the update of LSIDs within the broader community associated with TDWG.
* A discussion on the relative merits of using HTTP URI (e.g. PURL or URL) rather than LSIDs was motivated by the current lack of support for LSID resolution in client software and browsers versus the ability to resolve ubiquitous URLs. The use of DNS as the current resolution mechisms for LSID makes them appear similar to regular URLs but with this clientside impediment. Having considered these issues, the meeting concluded that the selection of LSIDs made at the GUID 1 meeting remained sound because:
* URLs are bound to a single transport protocol (http). It would not, therefore, be possible to separate the resolution mechanism from the identification mechamism in future.
* LSIDs are names and abstract the resolution mechanism, and that mechanism may change in the future.
* LSIDs provide a mechanism for binding data and metadata that would have to be invented for the use with URLs.
* The LSID specification recommends RDF as a metadata format. This would have to be mandated with the use of URLs.
* Socially, people do not trust URLs to be permanent because of the 'link rot' they have encountered on the hypertext web. LSIDs are more likely to gain acceptance for use in publications etc (as is the case with DOI).
* Software development - Discussion of the availability and functionality of both provider and client software resulted in a 'break-out' group that produced prioritised recommendations for filling the gaps, improving usability and extending functionality including how to get client software to resolve URNs (see [[GUID2WgSoftwareDevelopment]])
* What gets an LSID? - The recommendation is that the architecture requires that an object identified by an LSID should be 'typed'. Therefore anything that can be explicitly typed, preferably with the TDWG ontology, could be named with an LSID. The depth of granularity is subject to what the provider wishes to manage. We recognised that we needed an explicit policy statement about LSIDs being attributed to abstract vs descriptive objects.
* Metadata annotation and linkout. FAN can be useful for notification but cannot be effectively used to create back-links to external annotations, because of management issues. A possible solution would be to create a central service where agents could register any annotations that would be available to anyone in the community. If providers are willing to expose their systems to web crawlers, normal discovery methods (e.g. through Google) would also locate distributed references to GUIDs.
* There was insufficient evidence from case studies to provide definitive rules about versioning. There is a need for a TDWG policy statement backed with examples. See: VersioningOfLSIDs
----
---++++ 5. Outcomes
---+++++ Recommendations
A number of specific, mainly technical recommendations were made by the end of the conference:
1. LSIDs will be used as unique identifiers.
1. HTTP GET will be the default binding for getMetadata calls.
1. RDF will be the default metadata response format.
1. The TDWG ontology (and other well-known vocabularies) should be used wherever possible.
1. The use of non-RDF as an accepted_types for metadata is deferred.
1. There should be a central SRV hosting service (see CentralSrvService).
---+++++ Actions
In addition to technical recommendations a number of actions were identified and allocated, with a commitment to reporting back on progress at the TDWG annual meeting in October 2006.
* Lee to write a 1-page document on the case for LSIDs aimed at senior administrators/managers by the end of June 2006
* Rich to develop a document/presentation on the case for LSIDs to biologists for presentation at TDWG 2006 (August 2006)
* Ricardo, with assistance from Rod and Ben to evaluate software resources collected during GUID-2 and identify gaps in requirements and in the associated documentation (August 2006)
* Roger to evaluate the status of the documentation necessary to promote the uptake of LSIDs within our community (August 2006)
* A narrative explanation for the use of LSIDs and RDF in the biodiveristy infrastructure to be written by Roger (July 2006).
* Stable informative material is to be loaded from the Wiki to the new TDWG website
* The need for applicability statement and guidelines for biodiversity organisations serving LSIDs
* Ricardo and Roger to attempt to produce a packaged installer and improved documentation for setting up LSID server software (October 2006).
* Ben to approach IBM to improve access to and use of their LSID resolver plugins (with support from Ricardo)
* Encourage development of �good news' demonstrations through ant and fish communities
* Review tasks involved and costs of setting up the host for a central DNS server service for LSIDs
----
---++++ 6. Conclusions
* There has been considerable discussion about the utility of a URN vs URL strategy. The meeting concluded that our community should implement LSIDs. Experience of the prototypers suggest that implementaiton of LSID Authorities has been straight-forward.
* Priority actions identified by the meeting should be completed within the next three months to achieve significant progress that must be reported to TDWG 2006.
* The server and client software and associated documentation must be improved to encourage uptake of LSIDs. Key improvements are: i) simplified installation of LSID server software ii) native or 'plug-in' support for LSIDs in semantic web tools, ontology software and browsers.
* Promotional documentation targeted separately at managers, biologists and IT specialists is urgently required.
* Implementation of LSID Authorities at a few key nomenclators and two operational examples with a few associated collections is a priority.
* There is a sound case for establishing a central DNS server service for LSIDs, which will be further assessed and costed.
----
---+++ Annexes
---+++++ Presentations
The meeting included presentations, break-out group discussions and plenary discussions. The presentations given were:
1. [[%ATTACHURL%/hobern_GUID2Day1Goals.ppt][Introduction and overview]] _Donald Hobern_
1. [[%ATTACHURL%/Hyam_TDWG_GUID-2_arch_03.ppt][Summary of architectural recommendations from TAG]] _Roger Hyam_
1. [[%ATTACHURL%/Rod-Page-TDWG-GUID2.ppt][LSID Tester]] _Roderic Page_
1. [[%ATTACHURL%/Perry_LSIDAndDiGIR2.ppt][LSID resolver for specimens in DiGIR2]] _Steve Perry_
1. [[%ATTACHURL%/GUID2Morris.ppt][Image LSID resolver prototypes]] _Bob Morris_
1. [[%ATTACHURL%/Richards_IndexFungorumSetup.ppt][Index Fungorum LSID authority setup]] _Kevin Richards_
1. [[%ATTACHURL%/Richards_LSIDFrameworkPort.ppt][Port of LSID framework to .NET]] _Kevin Richards_
1. [[http://www.nesc.ac.uk/talks/682/guid-2.ppt][LSID resolution in SEEK]] _Jessie Kennedy_
1. [[%ATTACHURL%/Pereira_GUID2_LSID_SW_Analysis_v3.ppt][Existing software components and gap analysis]] _Ricardo Pereira_
1. [[http://www.nesc.ac.uk/talks/682/Core%20Ontology.ppt][Core Ontology]] _Jessie Kennedy_
1. [[%ATTACHURL%/hobern_AdditionalComponentsOfAnLSIDNetwork.ppt][Additional components of an LSID network]] _Donald Hobern_
1. [[%ATTACHURL%/PublicationBank.ppt][Publication Bank]] _Robert Huber_
See also the [[http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=682][list of presentations]] at the eSI website.
----
---++++ Attendees
* Damian Barnier (University of Queensland)
* Lee Belbin (TDWG)
* Stan Blum (California Academy of Sciences)
* Charles Copp (EIM, Technical Writer
* Simon Coppard (NHM, London)
* Rob Gales (University of Kansas)
* Sally Hinchcliffe (RBG, Kew)
* Donald Hobern (GBIF)
* Robert Huber (Marum - Zentrum fur marine Umweltwissenschaften)
* Roger Hyam (TDWG)
* Jessie Kennedy (Napier University, Edinburgh)
* Chuck Miller (Missouri Botanical Garden)
* Bob Morris (University of Massachusetts, Boston)
* Tom Orrell (ITIS, Smithsonian Institution)
* Roderic Page (University of Glagow)
* Ricardo Pereira (TDWG)
* Steve Perry (University of Kansas)
* Andrew Polaszek (NHM, London)
* Rich Pyle (Bishop Museum, Hawaii)
* Kevin Richards (Landcare Research NZ Ltd)
* Tim Robertson (GBIF)
* Vince Smith (NHM)
* Ben Szekeley (IBM)
* Dave Vieglais (University of Kansas)
---++ Comments
Use the space below to make comments about this page.
------
%ICON{bubble}% Posted by Main.BobMorris - 2006-06-14 10:47:49
Unless I misread the spec, the statement in the report "The LSID specification recommends RDF as a metadata format." appears to me to be somewhere between wrong and wishful thinking. The specification provides for the accepted metadata format(s) to be passed as an argument to getMetadata(...). By virtue of mentioning only these two, the spec seems to identify RDF and XMI as popular metadata formats. ("At the time of writing, there are no formally standardized media types for the more popular metadata formats.
Specifically, RDF and XMI do not have media types registered with IANA."). By virtue of the fact that all examples are in RDF, one might conclude that the specification recommends this, but that is an inference, not a fact as best I can see.
A more accurate report of the GUID2 meeting might be "Despite the fact that unfamiliarity with RDF and lack of browser support were identified as impediments to adoption, TDWG will recommend it as the default metadata format because [whatever]".
In any case, whether or not RDF is a good metadata format for LSID metadata, even concluding that the spec means to suggest that RDF and XMI are popular for LSID is probably disingenuous, since there is in fact very little deployment of LSIDs to date.
My own reading of the meeting is that eaxactly three attendees---Roger, Rod, and Donald, were wildly enthusiastic about RDF and that everybody else was basically merely accepting of it as at least worth a try. Comments here may dispute that reading.
------
%ICON{bubble}% Posted by Main.BobMorris - 2006-06-14 11:28:05
[The following might be considered a whole new direction and so be taken as irrelevant noise, but ...]. XMI, also mentioned in the LSID spec as a popular metadata expression, is, roughly, a standardized XML serialization of UML. As a consequence, the (IMO appropriate) TAG decision to represent the TDWG ontology in UML (rather than RDF) would, with readily(?) available(?) UML tools provide an XMI expression of the TWG ontology. Consequently, many of the GUID goals that mean to relate the GUID metadata to the TDWG ontology would actually be better served if TDWG LSID metadata had XMI as the preferred representation.
------
%ICON{bubble}% Posted by Main.BobMorris - 2006-06-14 12:13:47
Other than the previous, I find the report a complete, accurate, and useful summary of a successful and well-organized meeting. Also, it is \extremely/ helpful to have it appear so quickly. Thanks.
------
%ICON{bubble}% Posted by Main.DonaldHobern - 2006-06-14 17:28:12
Thanks, Bob, for the comments. My only counter-comment would be that there were a few more people beyond those you list who expressed (at least to me) that they did like RDF as a representation of TDWG (meta-)data.
------
%ICON{bubble}% Posted by Main.ChuckMiller - 2006-06-20 07:03:43
Although I see some of the positive attributes of RDF, I remain unclear as to exactly what it would be used for other than a formatted metadata response. If the intent is to actually append multiple RDF responses from a multi-LSID query result and then query/infer over those combined RDF triples, then I see the benefit of RDF. But, if not, then it just sounds like yet another formatted XML document to me, perhaps with a richer vocabulary. So, I remain in the "accepting as worth a try" camp that Bob describes.
Before we can sell the RDF format, I think we need better real-world prototype solutions that involve the RDF part, not just the LSID locator. The word I got from the BHL meeting was that everyone was aghast when they heard that RDF was being chosen as the metadata format. I expect that to be a common reaction until some plain and clear examples are shown.
------
%ICON{bubble}%