398 lines
19 KiB
Plaintext
398 lines
19 KiB
Plaintext
|
head 1.6;
|
||
|
access;
|
||
|
symbols;
|
||
|
locks; strict;
|
||
|
comment @# @;
|
||
|
|
||
|
|
||
|
1.6
|
||
|
date 2009.11.09.22.11.31; author BobMorris; state Exp;
|
||
|
branches;
|
||
|
next 1.5;
|
||
|
|
||
|
1.5
|
||
|
date 2009.11.05.18.06.46; author BobMorris; state Exp;
|
||
|
branches;
|
||
|
next 1.4;
|
||
|
|
||
|
1.4
|
||
|
date 2007.07.20.10.04.09; author BobMorris; state Exp;
|
||
|
branches;
|
||
|
next 1.3;
|
||
|
|
||
|
1.3
|
||
|
date 2007.07.12.13.19.18; author GregorHagedorn; state Exp;
|
||
|
branches;
|
||
|
next 1.2;
|
||
|
|
||
|
1.2
|
||
|
date 2007.05.14.13.50.30; author BobMorris; state Exp;
|
||
|
branches;
|
||
|
next 1.1;
|
||
|
|
||
|
1.1
|
||
|
date 2007.05.09.12.00.20; author BobMorris; state Exp;
|
||
|
branches;
|
||
|
next ;
|
||
|
|
||
|
|
||
|
desc
|
||
|
@none
|
||
|
@
|
||
|
|
||
|
|
||
|
1.6
|
||
|
log
|
||
|
@none
|
||
|
@
|
||
|
text
|
||
|
@%META:TOPICINFO{author="BobMorris" date="1257804691" format="1.1" reprev="1.6" version="1.6"}%
|
||
|
%META:TOPICPARENT{name="WebHome"}%
|
||
|
If an issue here gets large, start an separate topic and link it here.
|
||
|
|
||
|
See also [[PlaziFinalReport]] for issues identified by the [[PlaziEOLProject]]
|
||
|
|
||
|
* [[Concept of Descriptions]] - What are descriptions? Is Behavior or DNA sequence not a description? But physiology, anatomy, chemistry, morphology are?
|
||
|
|
||
|
------
|
||
|
|
||
|
* Recommended Instance Form: Should an instance document be valid OWL?
|
||
|
|
||
|
I think yes. It makes it easier to write robust applications. On the other hand, it needs usable tools... -- Main.BobMorris - 09 May 2007
|
||
|
|
||
|
------
|
||
|
* Rendering Instance Documents: What are the recommended tools for rendering instance documents? Does this depend on the above? -- Main.BobMorris - 14 May 2007
|
||
|
------
|
||
|
* Repeated targets of the same hasInformation: Suppose an info item, say Range, has multiple values, e.g. representing that the range of a certain species includes Germany, France, and Belgium. Does this require three hasInformation triples (I suppose the answer is yes) and how do we guarantee that they are all talking about the same thing (I suppose the answer is use owl:sameAs). -- Main.BobMorris - 20 Jul 2007
|
||
|
|
||
|
---++Issues and Questions from PlaziFinalReport
|
||
|
Below we discuss 12 issues that Plazi faced during the development, including what we did about them and what recommendations we have based on our experiences. Some of these issues are not about SPM per-se except to the extent we found ambiguities or silence in SPM. Some may be addressed in the recent GBIF draft recommendations on the (<i>Cryer, et. al 2009 Adoption of Persistent Identifiers for Biodiversity Informatics</i>, http://imsgbif.gbif.org/File/retrieve.php?PATH=4&FILE=2efc20187e6ad3dd828bbeadaa1040e6&FILENAME=LGTGReportDraft.pdf&TYPE=application/pdf)<br>
|
||
|
|
||
|
*Issue 1*. Validation of the RDF to ensure RDF being produced was valid. This was accomplished by testing against the web-based [[http://www.w3.org/RDF/Validator/ !W3C RDF validator]]. We found this a particularly useful tool since it yields easy to understand representations of the RDF triples generated. By contrast, easy as the XML form of RDF is for humans to read, it is not always easy to understand from it whether one or another RDF predicate is being correctly or appropriately used.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue1:_
|
||
|
|
||
|
Best practices for ontology annotation should be developed, perhaps with particular attention to documenting predicates.<br>
|
||
|
|
||
|
*Issue 2.*Even for valid RDF there was still a question as to whether it was valid OWL RDF, or whether OWL RDF was a goal.
|
||
|
|
||
|
No clear goals have been set and documented by GBIF or TDWG about reasoning on SPM, or other TDWG ontology vocabularies. It is generally accepted that the OWL Full dialect of OWL promotes data integration robustly in the sense that OWL Full has enough expressiveness to give integrators confidence in semantic equivalences or near equivalences in their mappings between one vocabulary and another. However, the OWL DL (Description Logics) dialect of OWL promotes tractable reasoning computation, making it easier to determine, e.g. whether a pair of vocabularies are logically inconsistent with one another, or whether data violates some quality control axioms that an application might wish to enforce. SPM invokes quite a bit of the current TDWG ontology, with the consequence that SPM is OWL FULL but not DL, because some of the TDWG ontology is not.
|
||
|
|
||
|
The "Open World" assumption for RDF is presently frequently cited as the slogan "AAA" (Anyone can say Anything Anywhere). One consequences is that misuses of ontology constructs can inadvertently pass into instances (by instance generation code), without discovery merely by RDF validation. This can happen if known applications do not fail on the misuse because it addresses issues the application ignores, or because particular consequences are harmless (e.g. because they return empty resource URI's and so are about nothing). One such SPM instance generation error was discovered only at the time of this writing in trying to understand why the Manchester !WonderWeb OWL validator ( http://www.mygrid.org.uk/OWL/Validator) was asserting that _!TaxonConcept_ was being used as both an RDF Class and an RDF Property. That is forbidden in OWL DL, but not OWL Full, for which the SPM instances were valid OWL. No such invalidity appeared in either the SPM ontology or the TDWG Ontology. The problem proved to be that the Plazi XSLT was generating incorrect RDF for the _hasRelationship_ object property of _TaxonConcept_ where it expected a Relationship object, which is one of the low level classes in the TDWG ontology.
|
||
|
We were intending to model not only what taxa were associated with the
|
||
|
_!TaxonConcept_ being described (as supported by SPM
|
||
|
_!InfoItems_), but also what those associations are. (The SPM
|
||
|
annotations give predator-prey relationships as an example.) The
|
||
|
result was that the instance document used _TaxonConcept_ as both a
|
||
|
Property and a Class, and this forces the instance document into OWL
|
||
|
Full. Moreover, the underlying set of kinds of taxonomic relationships
|
||
|
available to _tc:hasRelationship_ is presently defined by an
|
||
|
enumeration that arose historically from a set of concerns of
|
||
|
taxonomists, largely about the nomenclatural issues surrounding
|
||
|
taxonomic revisions. This is nowhere near broad enough to cover the
|
||
|
kinds of _Associations_ envisioned in SPM, which includes such things
|
||
|
as predator/prey and other ecological relationships. Pending future
|
||
|
additions to !TaxonX, the underlying schema representing the documents
|
||
|
from which we extract SPM-based knowledge, we are no longer attempting
|
||
|
to output _tc:hasRelationship_.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 2:_
|
||
|
|
||
|
(1) The SPM concept _associatedTaxon_ is underspecified. It does not
|
||
|
provide a robust mechanism for specifying the nature of the
|
||
|
association. It is possibly that this can be remedied with a robust
|
||
|
appeal to _tc:hasRelationship_, although that presently has overly
|
||
|
narrow range.
|
||
|
|
||
|
(2) Clear goals for reasoning support for SPM should be elucidated.
|
||
|
|
||
|
*Issue 3a.* Some vocabulary items in SPMI lacked definition or guidance
|
||
|
for their use. For example, the SPMI ontology defines a set of
|
||
|
sublasses of the SPM _InfoItem_ class, of which one or more instances
|
||
|
is given for an SPM object using the _hasInformation_ property of
|
||
|
SPM. One such type of _InfoItem_ is the _Description. _But this term
|
||
|
is rather broadly used in biology. In systematics literature it is
|
||
|
ambiguous whether the concept should apply to the entire section
|
||
|
designated as the taxonomic treatment of a taxon in the article, or
|
||
|
should refer only to the morphological description section. By
|
||
|
practice or by nomenclatural codes, the morphological description
|
||
|
section serves, strictly speaking, only to determine which specimens
|
||
|
are circumscribed by that morphological description. We addressed this
|
||
|
ambiguity with a user-settable parameter in the stylesheet which
|
||
|
determines which of these is extracted. We offer a service parameter
|
||
|
that allows the client to determine whether they wish a narrowly(
|
||
|
i.e. morphology only) or broadly defined description.
|
||
|
|
||
|
*Issue 3b.* Insufficient SPMI concepts. Anyone providing data in SPM
|
||
|
faces a potential mismatch between domain concepts and those SPMI
|
||
|
classes they select to represent the domain classes. SPM can
|
||
|
address this by adding more types of _InfoItems_, but this will tend
|
||
|
to increase the complexity in creating and processing SPM. Conversely,
|
||
|
SPM could decrease the number of concepts and heighten ambiguity. For
|
||
|
example, we found no way to signal the important "Materials Examined"
|
||
|
section of typical systems papers. This might make it difficult to
|
||
|
mine our service for occurrence records.
|
||
|
|
||
|
*Issue 3c.* Potentially overlapping SPMI classes. There are three
|
||
|
different concepts in SPMI about description. These are the
|
||
|
_InfoItem_ subclasses _Description_, _GeneralDescription,_ and
|
||
|
_DiagnosticDescription._ Lacking definitions it is impossible to
|
||
|
determine what relations these have to one another.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 3: _
|
||
|
|
||
|
(1) There should be more guidance about the semantics of
|
||
|
_InfoItem_. Right now, they are little more than concept names. By
|
||
|
virtue of having no substructure other than what is inherited from
|
||
|
class _InfoItem_, these concepts are able to express little more than
|
||
|
the taxonomic concerns modeled by the class _TaxonConcept_, which are
|
||
|
probably of little importance for many of the subclasses of
|
||
|
_InfoItem_.
|
||
|
|
||
|
(2) Consideration should be given to major ontological elucidation of
|
||
|
the substructures of the InfoItem subclasses, with particular
|
||
|
attention to existing relevant ontologies.
|
||
|
|
||
|
*Issue 4.* Should text extracted from publications permit or require
|
||
|
markup? At the moment, we offer the choice as a runtime parameter, to
|
||
|
signify whether the service should return plain text or XHTML. Current
|
||
|
use for by EOL chooses the XHTML in order to render paragraph
|
||
|
boundaries faithfully to the original literature.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 4:_
|
||
|
|
||
|
We have no recommendation beyond leaving the issue as a service parameter.
|
||
|
|
||
|
*Issue 5.* How to handle statements of Intellectual Property
|
||
|
Rights. Taxonomic treatment data is in the public domain and not
|
||
|
copyrightable. EOL's practices required a Creative Commons license,
|
||
|
but such licenses (or any license) applies only to copyrightable
|
||
|
material. We insert an RDF statment a statement that the material has
|
||
|
no copyright restrictions:
|
||
|
|
||
|
<font face="Courier New"><<span class=start-tag>dcterms:rights</span><span class=attribute-name> xmlns:dcterms</span>=<span class=attribute-value>"http://dublincore.org/2008/01/14/dcterms.rdf#"</span>></font>No known copyright restrictions.<font face="Courier New">.</<span class=end-tag>dcterms:rights</span>>
|
||
|
</font>
|
||
|
We discussed whether more clarity is required about attribution of
|
||
|
non-copyrightable material. Should there be both a text statement
|
||
|
and a machine processable indication that the material is in the
|
||
|
public domain because it is not copyrightable? How should consumers
|
||
|
be warned that the non-copyrightable material is extracted from
|
||
|
copyrighted material which still requires attribution. The issues
|
||
|
are laid out in Agosti and Egloff (2009:
|
||
|
(http://www.biomedcentral.com/1756-0500/2/53). The current solution
|
||
|
to be adopted by EoL is to output the text mentioned above in our
|
||
|
dc:rights term.
|
||
|
|
||
|
*Issue 6. * Completeness and adequacy of data provided. It's unclear
|
||
|
how much detail the data provider should offer a data
|
||
|
recipient. For example, it may be evident to a human that the object
|
||
|
"Donisthorpe, H. S. J. K." of the _tpcit:authorship_ predicate is
|
||
|
the name of a person, that "Donisthorpe" is a surname, etc. This
|
||
|
semantics may be available through an ontology but not be of
|
||
|
interest if the recipient has no need of machine reasoning or even
|
||
|
integrating across authors. It's difficult to know at what point
|
||
|
enough information has been provided satisfy the data recipient's
|
||
|
purposes. We serve whatever data we found that is expressible in
|
||
|
the vocabularies commonly in use in TDWG applications.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 6:_
|
||
|
|
||
|
Educate consumers to the possibility that implict information can be
|
||
|
inferred by machine reasoning over the applicable ontologies, and
|
||
|
applications that don't do this can only have access to the explictly
|
||
|
asserted relationships.
|
||
|
|
||
|
*Issue 7.* Open World Issues. The Open World assumption (now often
|
||
|
described as the AAA slogan: Anybody can say Anything Anywhere )
|
||
|
means that some issues cannot be addressed by the data being served.
|
||
|
AAA means that everything is unknown unless explicitly known. Should
|
||
|
"unknown" be signaled in some cases? For example, a taxonomic
|
||
|
description might be extracted from something whose author is
|
||
|
unknown. Normally RDF would simply be silent on this point, but it
|
||
|
may be important to distinguish that a piece of data is important but
|
||
|
simply unknown. There is a risk in assigning "unknown" to something
|
||
|
which in fact is possibly somewhere known. That risk is that future
|
||
|
semantic data integration with data contradicting the "unknown"
|
||
|
semantics will then be logically inconsistent. Unfortunately, in the
|
||
|
First Order Logic that underlies RDF reasoning, if there is one
|
||
|
contradiction in a set of assertions, it can be proved that every
|
||
|
assertion is both true and false. This is not nice.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 7:_
|
||
|
|
||
|
Best practices should be established about unknown data. Probably the
|
||
|
community needs to be educated about AAA. A possible best practice
|
||
|
is to use RDF annotations when signifying "unknown" is desired.
|
||
|
These can be read by machines (and humans) but do not participate in
|
||
|
semantic analysis.
|
||
|
|
||
|
*Issue 8.* Updates: It is unclear how to handle URI's assigned to
|
||
|
different versions of the same SPM record. Should a URI resolve one
|
||
|
record regardless of what information is in it, or should each
|
||
|
version have it's own URI. Like most data providers, we largely
|
||
|
ignore this issue, although we do embed an XML comment with a service
|
||
|
timestamp on it.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 8:_
|
||
|
This is probably a general problem for RDF and should be the subject
|
||
|
of a uniform best practice. There is a recent GBIF workgroup report
|
||
|
on the subject. (Cryer et al. 2009)
|
||
|
|
||
|
|
||
|
*Issue 9.* Strings or URIs: As a data provider we sometimes faced the
|
||
|
choice of providing a URI or a string value for much of the data. In
|
||
|
principle, a URI should be sufficient but in practice it is helpful
|
||
|
to have both e.g., for scientific names. In the absence of guidance
|
||
|
from the data consumer it is impossible to know what is necessary or
|
||
|
sufficient. Other examples that SPM does not directly address, and
|
||
|
for which there seem to be no authorities presently recommended,
|
||
|
include URIs for taxonomies, ranks within those taxonomies, authors,
|
||
|
journals, articles, etc. Some of the issue is addressed by SPM's
|
||
|
provision of both _hasContent_ and _hasValue_ properties. The former
|
||
|
provides strings, and the latter provides objects from the TDWG
|
||
|
Ontology class _definedTerm_. The only case in which we might have
|
||
|
been able to use _definedTerm _would be to build some application
|
||
|
that attempts to place the publication's taxonomic rank in some
|
||
|
named taxonomy. We deemed that outside of the scope of this work,
|
||
|
particularly since a client might choose to ignore it and use their
|
||
|
own preferred taxonomy.
|
||
|
|
||
|
Elsewhere, we provide both strings and URIs where the publication is
|
||
|
unambiguous. See for example, the element _spm:aboutTaxon_ in the
|
||
|
first table above. For its target _!TaxonConcept_, we provide a
|
||
|
URI-identified rdf:about as required, but _!TaxonConcept _also has
|
||
|
an element _nameString_ with which we provide a string that should
|
||
|
correspond to a scientific name. An integrating provider such as EOL
|
||
|
possibly would choose to ignore the URI and base their integration
|
||
|
on the name string.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 9:_
|
||
|
|
||
|
Unless a consumer has specified preferences, whenever possible include
|
||
|
both string and URI values. It may be that best practices need to be
|
||
|
established for doing this in ways specific to SPM, or even to
|
||
|
individual SPMI _!InfoItems_.
|
||
|
|
||
|
*Issue 10:* Multiple identifiers: resources may have multiple ids in
|
||
|
multiple GUID schemes associated with them.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 10:_
|
||
|
|
||
|
SPM should specify means to associate multiple ids with the same
|
||
|
resource. It may be that owl:sameAs is adequate, but use cases
|
||
|
should be developed and the semantics of owl:sameAs examined to see
|
||
|
if it satisfies them. This may be in the scope of (Cryer et
|
||
|
al. 2009)
|
||
|
|
||
|
|
||
|
*Issue 11:* It is unclear how the data provider is to explain the
|
||
|
intended meaning behind possibly ambiguous sets of statements. For
|
||
|
example- A taxon name string may be provided twice with different
|
||
|
languages, for example English or Latin. In this case it's to be
|
||
|
understood that the name can be in either Latin or English but
|
||
|
depending on the consuming applications' reasoning -the first may be
|
||
|
taken as the primary, the second as the second. But the generated
|
||
|
RDF would usually be order independent, making it difficult to
|
||
|
track.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 11:_
|
||
|
|
||
|
SPM should specify mechanisms and practices that allow a provider to
|
||
|
signify relationships among alternatives. rdf:List may not be
|
||
|
adequate if statements appear independently of one another (for
|
||
|
example, after data integration).
|
||
|
|
||
|
|
||
|
*Issue 12:* Lack of Metadata about the served SPM: We found no clear
|
||
|
way to document within the SPM file how the SPM itself was
|
||
|
produced. We resorted to XML comments, but it is unclear whether
|
||
|
some standard RDF annotation mechanism might be better. Of special
|
||
|
importance might be provenance of the SPM, including original
|
||
|
source, changes, versions, etc.
|
||
|
|
||
|
_Conclusions and Recommendations about Issue 12:_
|
||
|
|
||
|
There should be best practices established for annotating service
|
||
|
output, and it should be examined whether SPM has any specific
|
||
|
needs.
|
||
|
@
|
||
|
|
||
|
|
||
|
1.5
|
||
|
log
|
||
|
@none
|
||
|
@
|
||
|
text
|
||
|
@d1 1
|
||
|
a1 1
|
||
|
%META:TOPICINFO{author="BobMorris" date="1257444406" format="1.1" version="1.5"}%
|
||
|
d18 252
|
||
|
a269 1
|
||
|
* Repeated targets of the same hasInformation: Suppose an info item, say Range, has multiple values, e.g. representing that the range of a certain species includes Germany, France, and Belgium. Does this require three hasInformation triples (I suppose the answer is yes) and how do we guarantee that they are all talking about the same thing (I suppose the answer is use owl:sameAs). -- Main.BobMorris - 20 Jul 2007@
|
||
|
|
||
|
|
||
|
1.4
|
||
|
log
|
||
|
@none
|
||
|
@
|
||
|
text
|
||
|
@d1 1
|
||
|
a1 1
|
||
|
%META:TOPICINFO{author="BobMorris" date="1184925849" format="1.1" reprev="1.4" version="1.4"}%
|
||
|
d5 2
|
||
|
d18 1
|
||
|
a18 1
|
||
|
* Repeated targets of the same hasInformation: Suppose an info item, say Range, has multiple values, e.g. representing that the range of a certain species includes Germany, France, and Belgium. Does this require three hasInformation triples (I suppose the answer is yes) and how do we guarantee that they are all talking about the same thing (I suppose the answer is use owl:sameAs). -- Main.BobMorris - 20 Jul 2007
|
||
|
@
|
||
|
|
||
|
|
||
|
1.3
|
||
|
log
|
||
|
@none
|
||
|
@
|
||
|
text
|
||
|
@d1 1
|
||
|
a1 1
|
||
|
%META:TOPICINFO{author="GregorHagedorn" date="1184246358" format="1.1" version="1.3"}%
|
||
|
d14 3
|
||
|
a16 1
|
||
|
* Rendering Instance Documents: What are the recommended tools for rendering instance documents? Does this depend on the above? -- Main.BobMorris - 14 May 2007@
|
||
|
|
||
|
|
||
|
1.2
|
||
|
log
|
||
|
@none
|
||
|
@
|
||
|
text
|
||
|
@d1 1
|
||
|
a1 1
|
||
|
%META:TOPICINFO{author="BobMorris" date="1179150630" format="1.1" version="1.2"}%
|
||
|
d5 1
|
||
|
a5 2
|
||
|
---
|
||
|
---
|
||
|
d7 3
|
||
|
a9 1
|
||
|
* RecommendedInstanceForm: Should an instance document be valid OWL?
|
||
|
d13 2
|
||
|
a14 4
|
||
|
|
||
|
---
|
||
|
---
|
||
|
* RenderingInstanceDocuments: What are the recommended tools for rendering instance documents? Does this depend on the above? -- Main.BobMorris - 14 May 2007@
|
||
|
|
||
|
|
||
|
1.1
|
||
|
log
|
||
|
@none
|
||
|
@
|
||
|
text
|
||
|
@d1 1
|
||
|
a1 1
|
||
|
%META:TOPICINFO{author="BobMorris" date="1178712020" format="1.1" reprev="1.1" version="1.1"}%
|
||
|
d8 1
|
||
|
a8 1
|
||
|
* Recommended form: Should an instance document be valid OWL?
|
||
|
d15 1
|
||
|
@
|