wiki-archive/twiki/data/UBIF/GuidLinking.txt

---+!! %TOPIC%

%META:TOPICINFO{author="GregorHagedorn" date="1147083758" format="1.1" version="1.3"}%
%META:TOPICPARENT{name="WebHome"}%
This topic collects some comments on the "Link" part of the UBIF ObsoleteTopicProxyDataModel.

<h3>1. DOI</h3>

I was thinking that the DOI as proposed for UBIF Link is not a URN, but if written as "doi:10.xxxx/123" 
it would be a valid URN. Dave Thau points out this is not the case. "All urns start with urn:NID (http://www.ietf.org/rfc/rfc2141.txt) - in the case of LSIDs it's urn:lsid.  The doi: is a doi uri and that's about it.  There's something in the DOI documentation about how they COULD be a urn like this urn:doi:10.23/121 but that would mess up their automagical mapping to an http address and they don't like urns: http://www.doi.org/handbook_2000/enumeration.html#2.9.2

---

<h3>2. LSID</h3>

There is no doubt that a full LSID looks like: "urn:lsid:lsid.gbif.net:www.fishbase.org:1029". However, in the UBIF schema I proposed to also have a non-redundant LSID-base, stripping off the constant "urn:lsid:".

I wrote in email: "your comment points out that this is not intuitive, especially given that the tag name is LSID. You think it should be left redundant (typing it as a special kind of URN that needs extra parsing to use), or is there a better name for the tag containing only the non-redundant information?"

Dave Thau answered: "I don't think it makes sense to lop off the urn:lsid part of the lsid.  I think that's a critical part of the LSID.  If I had a directory of files, SDD files and others, and I wanted to grep for LSIDs to resolve, I'd grep on urn:lsid because that's what defines an lsid. I think you should keep the LSID element, but type it so that it has the urn:lsid part in it.  Otherwise it's not a valid LSID."

Gregor then wrote: "Not contradicting - just questioning: you are thinking essentially about untyped data. The xml data are typed however, so grep to me makes no sense. For example The xml date type has no "date:" prefix, so you know without looking for schema types what it is.

As a scientist we are trained to avoid redundancy. If I write a publication and put the same information (e.g. scientific unit) after every measurement in a table, the editors will comment: Sorry, it belongs in the table header. Similarly, If a table column always has the same data (which sometime occurs, if this is required to define the context)  they will say: Sorry, move this into a table footnote.

This is no argument whatsoever against what computer scientists consider intuitive - which may be more important here.
"

Dave Thau answers: "I can see that.  But in the case you mention above, the word date: is not PART of the date itself.  So you're not altering the date by leaving it out.  On the contrary, the urn:lsid IS part of the LSID and it's not a valid LSID without the urn:lsid part in there.  It seems odd to me to violate the LSID specification in this case.

Am I correct in thinking that with the current setup, an xml savvy tool which knew about urns but was not specifically designed to process SDD documents would have to know that "to find out more about the thing in the <LSID> tag, note that it's of the LSID type, and append a urn:lsid to the front of the element's contents and then resolve it?"   I'd be please to know that xml tools are already that clever.

If you leave the urn:lsid part in there, then a generic tool (which understood about urns) could realize, "Hey!  That's an LSID, it says so right there!  I can resolve it!"  I'm not sure this handy-dandy discovery would work without the urn:lsid in there."


---

There are more issues disscussed in the annotations on the LSID and DOI simple types in the schema. If you can contribute, please cite the relevant part here and discuss! Thanks!

---

<h3>More than one ID or linking mechanism in parallel?</h3>

Gregor: Is duplication of URL and LSID/DOI in parallel desirable or not? 

Dave Thau: "I think it's sensible.  The LSID system permits parallel resolution
mechanisms - http, soap, and ftp for starters, and I like the idea of
providing multiple access points."

Gregor: Currently a single LSID or DOI, and multiple URLs are possible. It would probably be possible to reduce this to a single of each. Markus D<>ring is implicitly requesting this, he prefers to have three optional attributes instead of an unbounded collection of elements. As far as I understand, the main argument is not architectural, but one of intuition and acceptance. He believes only a very compact notation (which then also looks more similar to the well known xhtml "a href" construct) will find acceptance. Any comments?