wiki-archive/twiki/data/GUID/AbfLsidMeetingExecSummary.txt

65 lines
12 KiB
Plaintext

%META:TOPICINFO{author="KevinRichards" date="1178063745" format="1.1" version="1.6"}%
%META:TOPICPARENT{name="AustralasianBiodiversityFederationLsidPolicy"}%
---++ The importance of Life Science Identifiers (LSIDs) for the biodiversity community
---+++ Executive summary
The World Wide Web revolutionized the way in which we broadcast and access digital information. The next revolution will come from new technologies that allow us to synthesize, manage and integrate the web’s vast quantities of information - the so-called semantic web. These technologies will evolve the Web from an electronic notice board into a truly connected, dynamic and flexible knowledge collaboration.
Globally Unique Identifiers (GUIDs) are a critical building block in this new revolution. GUIDs are small, standardised tags attached to digital objects. Database records, documents, images, names, or any other object that will be electronically shared may be uniquely identified and described using GUIDs. Such tagged objects may then be integrated and combined with other information to bring new insights and allow new knowledge discovery. GUIDs may also function as calling-home cards - a GUID on a digital object can be used to find, attribute and identify its original owner, no matter where the object travels on the web.
One type of GUID – Life Science Identifiers (LSIDs) – have been chosen as an agreed standard in the global biodiversity community, supported by the Global Biodiversity Information Facility and other global and Australian biodiversity peak bodies. LSIDs are decentralized, collaborative and free. Individual institutions - the custodians of data - manage the deployment of LSIDs for their own shareable data assets rather than relying on a centralized issuing authority. This provides LSIDs with much-needed flexibility in the fast-evolving web.
The risks of supporting LSID technologies are inherently low. The only costs involved in deploying an LSID service are the time necessary for a data manager to establish the service (typically days to weeks). LSIDs are simple and lightweight, and promote rather than impede normal data management workflow.
By contrast, the risks of not supporting LSID technologies are high. Institutions that fail to deploy LSID services will become increasingly disconnected from the emerging web of knowledge, and will be unable to share their data effectively with the world and to share the world's data for better decision support.
A joint meeting of information professionals from the combined Australasian herbarium and museum communities has recommended the adoption and deployment of LSIDs by all major Australian biological collections and their host organizations and institutions. The recommendation is endorsed by the Global Biodiversity Information Facility and Biodiversity Information Standards (TDWG) organization. Your institution is urged to support their deployment using the attached implementation plan and strategy.
---+++ Background
Life Science Identifiers (LSIDs) are small, lightweight, globally unique digital tags that can be attached to any digital object. Objects that carry an LSID can be uniquely identified and attributed, even when the object is shared, merged into other objects, or moved from its local context. Three properties of LSIDs contribute to their flexibility and utility.
---++++ 1. Think global, then everything’s local
Databases use unique identifiers to manage records. For example, specimen records in a specimen database are often identified using accession numbers, and names databases generally assign nameIDs for each name. The uniqueness of the identifier allows a each record in the database to be unambiguously identified – clearly important in managing, using and maintaining the data.
Identifiers are however almost always local to the particular database in which they are assigned. If data from two or more databases are combined in some way, uniqueness of the combined identifiers cannot be guaranteed. The outcome of this is that it may then no longer be possible to unambiguously refer to any given record, and all records will need to be cumbersomely renumbered after which many broken processes and links will need to be fixed.
Imagine if every database record in every database in the world had an identifier that could be guaranteed to be globally unique. Then when two databases are merged or share data it would be immediately possible to use the newly accessible records with no possibility of identity clashes or ambiguity.
LSIDs provide just such a way of tagging records in databases with globally unique identifiers. An LSID is a string of text of the form =urn:lsid:authority:namespace:identifier=. An example would be =urn:lsid:herbarium.PERTH.lsid.org.au:specimen:02344759=
If the authority (=herbarium.PERTH.lsid.org.au=) is a unique address, and the authority can guarantee that the record =02344759= in its specimen table is unique, then the LSID is globally unique and can identify that specific record.
LSIDs may be applied to any type of digital object that may at some time be shared, not just records in a database. LSIDs in exactly the same form can be applied to specimen records, names, descriptions, characters and character states, documents, images, spreadsheets, phylogenies – to any digital object of any kind.
Applying LSIDs to objects is cost-free, and LSIDs are assigned by custodians of data with no requirement for a centralized issuing authority. For these reasons, LSIDs have been adopted by the international biodiversity community as the principal system of globally unique identifiers for use in the life sciences domain. LSIDs are seen as an enabling technology for the next generation of web applications, processes and operations.
---++++ 2. Have calling card, will travel
LSIDs are more than just unique identifiers for records and other digital objects. They also act as calling-home cards for the objects they are attached to. This means that objects with LSIDs can never get lost on the Internet, and can always be ascribed back to their custodian or owner using standard protocols.
Consider a database (the client) which aggregates records from several source databases. The owner of the client database may need to query the source databases at intervals for updates to their records. To do this the client would need to maintain systems for identifying each record in its source database, and for querying each source database for the updates. The updating would almost certainly be a cumbersome and expensive operation.
If, however, the records carry LSIDs and the source databases establish simple resolving services, a straightforward mechanism for updates can be established. Part of an LSID is the address (e.g. =herbarium.PERTH.lsid.org.au=) of the authority which maintains the resolving service of the source database. Free web tools are available which will accept an LSID and find from this address the source’s LSID resolver. The tool sends the LSID as a query to the source, which recovers from it the pointer to the original record in its database (e.g. =specimen:02344759=). The resolving service then returns information about the original record in a standard format. The returned information will normally include essential (meta-)data about the record, and this can be used by the client to update its copy of the record.
If all records carry LSIDs, one process attached to the client’s database can be used to update records from all sources. In addition, one process at the source databases can be used to supply updates for all clients. Substantial time and cost savings are available at source and client ends using LSID technology.
---++++ 3. Carry meaning, not just data
Over time, the ability of LSIDs to recover information about digital objects from their custodians will establish the true power of LSIDs, and play a part in the evolution of the World Wide Web into the Semantic Web – a flexible and intelligent web of knowledge. The key to the power of LSIDs is that the information returned when the LSID of a digital object is queried can be made meaningful to machines as well as humans.
Consider, as an example, Google Images. When this was new a few years ago it was considered pretty cool. But it is simply an early and somewhat primitive example of a data aggregator that is suffering from the lack of LSIDs.
Google Images is powered by web robots which trawl the web for image objects embedded in web pages. When an image is found, the robot returns to Google a thumbnail of the image and an extract of the html page text that surrounds it. From this text, Google Images makes a guess at the meaning of the image – is it an image of Copacabana Beach or of a funnel-web spider? The thumbnail image, a link to the original image and the inferred meaning is then databased ready for querying. The weak link is the inference part – these days any query using Google Images will return some images that correctly match the query but many are wrong – a funnel-web spider image returned from a query about Copacabana Beach is ample evidence of failed inference.
If images are progressively tagged with LSIDs, it will become possible to build vastly more accurate inference engines. If the funnel-web spider image is tagged with an LSID it will be possible to directly query the original custodian of the image to ask for information about it, the metadata. Such a query will return tagged, machine-readable, information using standardized and well-structured formats. One tag may say “This image is of an organism” while another may say “The name of the organism is Atrax robustus”. Immediately, an inference engine like Google Images will be able to accurately identify the image, because it has real information from the custodian of the image rather than simply the context of the image on its page.
A system of LSIDs becomes more powerful when LSIDs point to other LSIDs. For example, if the name of the funnel-web spider changes, inference becomes more difficult; it will be hard for a machine to determine that the name has changed and what it has changed to. In the above example, however, if the “name” tag of the funnel-web spider image said “the name of this organism can be found at =urn:lsid:museum.NSW.lsid.org.au:name:117858= then the current name can be retrieved by the same process, through a query to another authority. In this way, whenever the name changes, the image will be automatically referred to its correct name rather than to an out-of-date name.
Tools have already been built to collate species information across multiple databases based on unique identifiers. For example, inference about the names of a taxa has been built to discover unknown Genbank sequences for that taxa.
Simple examples like these show the power that LSIDs are bringing to the World Wide Web. The global biodiversity community has a real and immediate need for LSID technology; indeed, the success of initiatives such as the Atlas of Living Australia and the ePedia of Life depend on LSIDs being used extensively by our community’s information systems. It’s probable that the next generation will use LSIDs in reasoning and inference engines to create information structures that can hardly be imagined today.
All Australian herbaria and museums have been asked to implement LSIDs as quickly as possible with available resources. Early benefits expected to flow include more effective management of specimen records between institutions, better handling of taxon names and concepts, and more (and more flexible) electronic floras, faunas and identification keys. The average client may not see the LSIDs in the background, but their presence will ensure that available knowledge is accessible.
-- Main.KevinThiele - 04 Apr 2007