wiki-archive/twiki/data/InvasiveSpecies/SampleDefinitions.txt

88 lines
8.4 KiB
Plaintext

%META:TOPICINFO{author="BobMorris" date="1187669695" format="1.1" version="1.3"}%
%META:TOPICPARENT{name="TerminologyWorkingGroup"}%
---+++Some Sample Definitions
(by Kevin Thiele)
%TABLE{tablerules="all"}%
|*Source*| | | |K.R. Thiele||||||||||| |CBD| |Thomson 1991 (from GISP)| |GISP|
|*Term*| | | |Alien| |Invasive| |Introduced| |Native| |Adventive| |Cryptogenic| |Alien| |Invasive| |Invasive|
|Origin| | | | | | | | | | | | | | | | | | | | |
| | |Outside polygon of interest| |Yes| |Yes| |Yes| | | |Yes| | | |Yes| |Yes| |Yes|
| | |Inside polygon of interest| | | | | | | |Yes| | | | | | | | | | |
| | |Not relevant| | | | | | | | | | | | | | | | | | |
| | |Not known| | | | | | | | | | | |Yes| | | | | | |
|Human agent in range change?| | | | | | | | | | | | | | | | | | | | |
| | |Yes| |Yes| |Yes| | | | | |Yes| | | | | | | |Yes|
| | |No| | | | | | | | | | | | | | | | | | |
| | |Not relevant| | | | | |Yes| | | | | | | |Yes| |Yes| | |
| | |Not known| | | | | | | | | | | |Yes| | | | | | |
| | |Not applicable| | | | | |Yes| | | | | | | | | | | | |
|Impact| | | | | | | | | | | | | | | | | | | | |
| | |Has negative impact (at least sometimes)| | | | | | | | | | | | | | | | | | |
| | |Does not have negative impact| | | | | | | | | | | | | | | | | | |
| | |Not relevant| |Yes| |Yes| |Yes| |Yes| |Yes| |Yes| |Yes| |Yes| |Yes|
| | |Not known| | | | | | | | | | | | | | | | | | |
|Establishment| | | | | | | | | | | | | | | | | | | | |
| | |Established| | | |Yes| | | |Yes| | | |Yes| | | |Yes| |Yes|
| | |Not established| | | | | | | | | | | | | | | | | | |
| | |Not relevant| |Yes| | | |Yes| | | |Yes| | | |Yes| | | | |
| | |Not known| | | | | | | | | | | | | | | | | | |
|Geographic Range| | | | | | | | | | | | | | | | | | | | |
| | |Widespread and/or spreading| | | |Yes| | | | | | | | | | | |Yes| |Yes|
| | |Not widespread, not spreading| | | | | | | | | |Yes| | | | | | | | |
| | |Not relevant| |Yes| | | |Yes| |Yes| | | |Yes| |Yes| | | | |
| | |Not known| | | | | | | | | | | | | | | | | | |
|Ecological Range| | | | | | | | | | | | | | | | | | | | |
| | |Restricted to human-modified habitats| | | | | | | | | | | | | | | | | | |
| | |Occurs in natural habitats| | | |Yes| | | | | | | | | | | | | |Yes|
| | |Not relevant| |Yes| | | |Yes| |Yes| |Yes| |Yes| |Yes| |Yes| | |
| | |Not known| | | | | | | | | | | | | | | | | | |
---
---++ Commentary
Main.BobMorris wrote this commentary on Kevin's scheme in email to Main.AnnieSimpson and Main.KevinThiele.
Kevin's spreadsheet is a representation of the first part of an _ontology_, namely the terminology. It's enough for humans, but machines
usually need a little more to make good use of the ontology in deciding whether two databases use of a term are comparable for various purposes.
Nevertheless, specialists could and should start with it because it is well thought out, easy to understand, and _at least_ as much as will
ultimately be needed to resolve terminological differences between databases. Note that there are in fact four ontologies illustrated here: "K.R. Thiele", "CBD" , "Thomson 1991 (from GISP)" and "GISP". The point of Kevin's scheme is to make them comparable, as much as possible.
His spreadsheet expresses "concepts" (e.g. 'invasive', 'alien', etc.), by telling the values they have for a small set of _properties_ ('Origin',
'Impact', ...). For each property, there are a small set of possible _values_ to be used in these descriptions. Kevin's hope, which is probably
very reasonable, is that with enough discussion, the discussants will agree that a small list of properties and values are expressive enough
that most data providers can express the list of concepts coded in their own databases using just those properties and values the community
agrees upon. Thus,rather than requiring the community of data providers (and consumers) to agree on a a single set of concepts, they instead
agree on a set of terms that are used to _express_ concepts. In turn, application writers can decide for themselves how to rationalize comparisons of
those concepts. I might write an application that enshrines a notion like "If two concepts agree on the majority of their property-value
pairs, the program will treat them as the same". You might write an application that is more conservative and enshrines a requirement that
two concepts must agree on 80% of their property-value pairs before considering them the same concept. Depending on the ultimate
representation of these property-value lists, more sophisticated machine reasoning is possible than just conclusions of the form "For my purposes
I can assume that CBD 'invasive' is equivalent to Thiele 'alien'+'invasive'".
I understand that Kevin believes his property-value terms are a start, not an end, for discussion among the specialists and other interested
parties, but regardless of this there are a few technical issues about how and where to represent such ontologies that the biologists shouldn't
take a position about. In particular, there are arguments about whether the property-value terms belong in the schema or in some external document, and
whether particular concept lists (the "definitions part of a database") belong in the data to be exchanged or in external documents. Those
debates should not hinder specialists from discussing Kevin's notions and initially representing them in a spreadsheet as he has. The most
important hidden technical issue is the extent to which proliferating terms also proliferates computation time unacceptably. What one desires is that if
the terminology is increased N-fold, the time that applications need to compare data from different databases increases at most N-fold. Alas, it
is easy in the ontology game for proliferation of terminology to become exponential in computation. For example, 10 times as many terms and
concepts could lead not to 10 times as much computation, but 2 to the 10th power---1024 times as much computation. So there will always be a
tension between adequacy of expression and small size---the two central disiderata of ontologies. There is a little bit of mathematical ontology
theory that allows one to tell whether you are in this exponential situation, but I have the impression that for the few interesting
ontologies coded in other ecology projects, the situation is, sadly, exponential and the advice is to try to limit terminology size if possible.
The good news about all this is that it is a well explored and very current topic (it is presently high on the radar of the TDWG
Architecture Group (TAG), and also of the GBIF Data Access and Data Integration (DADI) group, which are debating various standards to adopt
for ontology. The bad news is that ontology tools are very sophisticated, but not very suitable for other than those who make a
living crafting ontologies. The challenge to the GISIN technical group will be to make a representation that is nimble enough to get somethng
going in a short timeframe, yet limber enough to slip out and replace if TDWG decides on standards for ontologies. I'm pretty sure this is
possible if we find that some large fraction of the target databases can get along with ontologies that are, oh, say, significantly smaller than
the databases themselves, and significantly fewer in number than the number of databases!
In summary, my opinion is that initially, people should not be debating the concepts ("Invasive", "alien", ...) yet , but rather whether the primitive properties and values Kevin provides are powerful enough to express most of the concepts represented by the databases whose data would form part of GISIN.
An interesting document for that discussion might also be the [[http://www.mnla.com/pdf/invasive/MIPAG_final_050325_rev.pdf][Massachusetts Criteria for Evaluating Non-Native Plant Species for Invasiveness]] pointed out by Main.JenniferForman, who will shortly post some other things that comprise what amount to ontologies.
-- Main.BobMorris - 15 Mar 2006