wiki-archive/twiki/data/SDD/RDFandSDD.txt

50 lines
9.1 KiB
Plaintext
Raw Permalink Blame History

%META:TOPICINFO{author="GarryJolleyRogers" date="1259118875" format="1.1" version="1.11"}%
%META:TOPICPARENT{name="WebHome"}%
---+!! %TOPIC%
Something new upfront DataChallengeRDF
----
<h2>Below some older discussions:</h2>
I'm quite confused from the meeting reports whether there was some argument accepted that LSID metadata in RDF should represent the \content/ of the current concerns of TDWG, including TCS, DC, ABCD, SDD, and the impending new groups, or merely \describe/ the databases against which answers are rendered in those content standards. For example, if a taxon concept is given an LSID, is the metadata returned expected to be a replacement for the current XML constrained by TCS? RDF certainly can encode a taxon concept and address the relations it encodes, but I'm unaware of applications of LSID metadata of objects in a database where the datum is encoded, though in many cases RDF could rationally make a claim to do so. I agree with Sally:Where's the robust, widely accepted killer app?
See my take on Rod's take on !McCool's take on the failure of RDF. Basically I agree with Rod in theory but not in reality. You can see it in my response in his blog. I cited !McCool because he is the most qualified I have every seen of the many who write that RDF is still a technology in waiting. One blogger claims that an MIT RDF adherent couldn't give a single instance of content in RDF other than constructed examples. http://www.zacker.org/the-battle-for-the-semantic-web-rdf-vs-xml. In 1998, that paragon of scholarship Byte magazine, predicted that RDF would be the most important application of XML. http://www.byte.com/art/9803/sec5/art4.htm A host of more explicit predictions for RDF in that article remain (realistic) promises.
The problem with RDF is that enterprise-class applications still seem to be a somewhat realistic promise, but a promise nevertheless. XML Schema, for all its warts and expressive power weaker than RDF and RDFS, is the foundation of quite a bit of robust application software. There's even a public XML Schema from Microsoft that constrains what happens when you select &#65533;Save as XML&#65533; in Excel, (to which, I am given to understand, Open Office also conforms). Lately I've learned that if a technology supports spreadsheets, biologists will say &#65533;lead me to it&#65533;, or at least will not run away from it.
If the proposal is just to encode LSID metadata about databases for which exchange documents are being generated, my feeling is that this is somewhere between harmless and useful, mainly for discovery and baby machine reasoning during discovery. If the proposal is to re-encode ABCD, TCS, DC and SDD in RDF, it is somewhere between harmless and harmless, but VERY time consuming for a community of volunteers for not much near-term gain. In the former case, I see little need to have distinct efforts for each subgroup. I suspect that this is principally a re-expression in RDF of some, or even most, of the concerns of UBIF, and that it is independent of any subgroup. If one accepts that the exchange standards remain in XML-Schema, with all its warts, weaker expressive power and weaker extensibility than RDF, then a maintenance problem arises: there are two sets of infrastructures and tools to be supported: those for XML and those for RDF. It may be that, in this scenario, the discovery applications of RDF are sufficiently isolated from the content issues that most people won't be impacted by LSID metadata represented in RDF, except to whatever extent they find what they are looking for more easily. So it may not be a big deal. Or it may be.
For some concerns, notably those of TCS, there is already a very mature ontology language (which, however, has a somewhat shaky representation in RDF as well as its more mature representations). This is UMLS, the Unified Medical Language System a 25 year old effort whose killer apps are used regularly and invisibly by everybody who visits the U.S. National Library of Medicine. One effort by Neil Sarkar at AMNH has found that about 60% of the Torrey Bueno Glossary of Entomology is already encoded in UMLS where the ontological concerns of entomologists intersect those of medical researchers. Neil has organized a session next week at the AAAS meeting in St. Louis. Bryan Heidorn will give a talk about SDD.
People at the Sydney 2001/2001bis TDWG/GBIF meetings may see some irony here if they recall that I argued then for RDF as the representation language for SDD and was beaten down on the grounds that it was too complex and too untried. I found this a strange argument because we turned to another complex and untried technology, XML Schema, but made it work. However, what alarms me is that XML Schema caught on, and RDF remains controversial for its original goals. So now I'm on the other side, at least for a few more years. That said, it's always been my lab's intention to use RDF \externally/ to SDD to compare controlled vocabularies (Terminologies, in SDD). As I reported in St. Petersburg, we use XML databinding frameworks to generate SDD import/export programs and I expect that these actually will automate the process of generating RDF too. I don't know what are the corresponding tools for generating RDF marshalling and unmarshalling code. If there aren't any, then I am really convinced that RDF hasn't yet arrived. [Marshalling is the process of translating the serialization language, say RDF, into objects in your programming language; unmarshalling the reverse. It is desirable for many reasons to have such code not written by hand, but rather to be generated with only defined in the serialization constraint language (RDFS?) as input ].
Finally - although there is nothing inherently wrong with this - all new users of a new representation language for solving any information delivery problems typically make na<6E>ve use of the language. This happened in TDWG with XML Schema and it will happen in TDWG in RDF. Whatever you think your required time to provide good RDF generation, add 100% for rewriting it in \good/ RDF after you finally understand RDF from your first system.
-- Main.BobMorris - 13 Feb 2006
I am concerned about the tools that are available. I cannot reason how good RDF is, but I know that although in SDD we did struggle a bit with the shortcomings of xml schema, the real problems where conceptual in the area of biological or logical reasoning, where to draw the line, how to keep things moderately simply and still working etc. This has nothing to do with the technology used, but a lack of high level editors/graphical presentation tools can make it impossible to make any progress on these issues. For xml schema such tools are available, and they matured over the 8 years I am working on SDD to a stage where they are almost really useful.
Bob talks about lack of time. A number of people now understand xml schema. I do not understand RDF and using at the moment seems to be an order of magnitude more difficult to me. But then I have not tried it for several years, as I did schema. So:
* We need a discussion about this, and not only from the consumers point (I wish people would deliver me the data with all information in it), but also from the creators point both for data and schemata. Where is that discussion going on? The information I received was the the "GUID meeting has decided that things need to move to RDF".
* How are we going to get together the domain knowledge of biologists/bioinformaticists and the expert knowledge of ontology/rdf people? Are we biologists getting some help?
* Any hints about really good and readable tutorials? Any high-level tools for rdf?
* What "perceptually" worries me is that I see RDF in my limited knowledge only in combination with lowest-common-denominator (or "coarse cover info") applications like the ~DublinCore vocabulary or RSS. In my analysis these are not very useful when it comes to exchanging scientific information (rather than google it...). ~DublinCore is a very good example: excellent to aggregate information and search it with modified free-text search, but useless to exchange structure information. Widely used data exchange schema that do care for data quality (e.g. GML, MODS) seem to be using other, w3c-schema based methods. Perhaps RDF is indeed a better way to do it, but what Bob describes is that people in practice are not using it for such purposes. Should we?
-- Main.GregorHagedorn - 14 Feb 2006
Two things that will arise rather quickly when trying to apply RDF are well noted in the RDF community:
1. There are no standards for user-defined data types
1. Numeric ranges are difficult (impossible?) to represent, or at least in any standardized ways.
People have begun to work on this though. See http://protege.stanford.edu/plugins/owl/xsp.html and http://protege.stanford.edu/plugins/owl/xsp.html
We have thought about trying to define an SDD subset that would be reasonably expressible in OWL. My guess is that restricting to
categorical characters has a chance, and possibly there is an audience for that.
-- Main.BobMorris - 15 March 2007
%META:TOPICMOVED{by="BobMorris" date="1144533189" from="SDD.RDFConsideredHarmful" to="SDD.RDFandSDD"}%