wiki-archive/twiki/data/UBIF/LinneanCoreDisentangle.txt

116 lines
22 KiB
Plaintext

---+!! %TOPIC%
%META:TOPICINFO{author="GregorHagedorn" date="1100280420" format="1.0" version="1.13"}%
%META:TOPICPARENT{name="LinneanCore"}%
(Of course only an *attempt* to disentangle some issues of name, concept, hierarchy! Please discuss, add, and feel free to insert short comments immediately into the text!):
<h2>Multiple names for a taxon concept</h2>
Scientific organism names-string (with or without nomenclatural or concept citation) and taxon concepts (in the sense of conveying a concept in the real world, i. e. through a circumscription by means of character description or listing the classified objects) have a n&nbsp;:&nbsp;m relationship: The same scientific organism name in practice may have multiple circumscriptions, and the same circumscription may have multiple scientific organism names.
* Main.JessieKennedy - 12 Nov 2004: I agree with this statement given your definition of taxon concept - however this is not the definition TCS uses as taxon concept (perhaps we were wrong sticking with the name taxon concept when we seem to use it differently to others). You are relating name to circumscription - which if you look in TCS you will find that's what we have, but we don't treat circumscription as an object in its own right nor name as an object in it's own right for the simple reason that in the real world of people we can't talk about names without implying a circumscription (meaning) of some sort and we can't communicate about circumscriptions without using some sort of name - so they need each other - hence the definition of a taxon concept in TCS being the name+circumscription (definition)
When using "scientific organism name" here I refer to a nomenclatural object, the identity of which is defined by nomenclatural rules and for which many name-string-variants may exist. Note that the nomenclatural rules may be by convention rather than by ICBN/ICZN etc. codes. For example, forma specialis or race is not governed by any code, yet <i>"Genus species f. sp. capsici"</i>, <i>"Genus species forma spec. capsici"</i>, and even <i>"Genus species f. sp. capsicum"</i>, or "race 1A", "race 1 A", "r. 1a", "Rasse 1a" can all be recognized as referring to the same name object, respectively.
* JMS: Relationship between name string and the target object is n : m, but name as relationship between name string and the target object is unique if this 'usage' is referred to in a context where the name string was used as a name. -- JMS 26 Oct 2004
* JMS: Suppose a ceremony, where the President of the federation says "I name this spaceship as Enterprise". Here the President makes a relationship between the object (the spaceship) and the name string ("Enterprise"). Name is this relationship (the spaceship-"Enterprise") made under the naming action. The spaceship may have another name, e.g. NCC-1701. This is another relationship between the spaceship and another name string ("NCC-1701"). They are different names of the same spaceship because name string parts are different. The president may own his space-fuel company called the Enterprise. This company Enterprise shares the same name string with the spaceship, but it is different name becase the object part of the name (as relationship) is different from spaceship's name (as relationship). The President's use of name string "Enterprise" designate different target depending on context. If he's reading the Financial Times, he would mean the company. -- 27 Oct 2004
* Gregor: thanks; the concept of name-string-object-relation is very helpful. I do find your redefinition of the term "name" confusing. This may be a grammar question and different in Japanese. For German and English I feel that if I say "my name is Gregor Hagedorn" I express the relation between me and an instance of the class "names". I do not say "My name-string is ...". I do not object to "name-string", but I think to avoid confusion with what I consider "normal language use", one should disambiguate the other term as well. Perhaps "named object", "name string", "naming relation" may make arguments clearer. -- 28 Oct 2004
* JMS: Your exaplanations shows that the word name is degenerated, rather than name as a class. I agree with you that we'd better to avoid term name by using clearer terms. This is the reason why we used terms name usage or name record to distinguish it from name string, and why I often ask what do you mean by the word name. LinneanCore covers Linean-ish name strings, and its variants such as typgraphical erros. TCS covers naming relation and also non-Linnean-ish names. Named object would be covered by both TCS and ABCD (and SDD?); specimen ID is another name string type. Is it right understainding? -- 28 Oct 2004
* Richard Pyle: I tried to address this point in my earlier email message to the group. The word "name", without qualification, should be banned from these discussions. I propose some of definitions at LinneanCoreDefinitions. - 29 Oct 2004
* Main.JessieKennedy - 12 Nov 2004: on Gregor's example above are you meaning that different abbreviations of e.g. f. forma etc in 2 names is what you mean by a concept having many names. I don't think that treating different variations of these terms as being different names is useful. Surely amongst those who know we could devise simple look ups for possible representations of these terms but agree one of them as the standard? Do the codes specify one as the prefered way? If so then I see no harm in always presenting names using the preferred abbreviation of the term.
* Gregor: The codes provide a number of canonicalizations, and in the case of rank connecting terms you are referring to automcatic (machine-) canonicalization is quite possible. In other cases it is really tough and not a question or "surely we could". The grammatical changes require detailed understanding of latin and greek grammar, and I find myself sometimes guessing that this must be a grammatical correction and then it turns out it is indeed a different name. However, What I am really referring to are the nomenclatural relations like comb. nov, nom. nov, etc. that do not change the concept at all. Recognizing this must be data driven (see examples further below detailing this). -- 28 Oct 2004
Why is this so?
The following is largely influenced by my botanical background, but in principle I believe most applies to zoology as well.
1. Spelling issues, founded or unfounded. Some go back to the original orthography. Is it <i>'Evonymus'</i> or <i>'Euonymus'</i>? <i>Haplospora</i> or <i>Aplospora</i>? Sometimes grammatical corrections are unfounded: <i>Capnodium citricola</i> <nop>McAlpine is correct, <i>Capnodium citricolum</i> <nop>McAlp. wrong (...cola = growing-on never changes with genus).
* Main.JessieKennedy - 12 Nov 2004 surely the way it was first published (original orthography elsewhere on the wiki) is the name of the name of the original concept as it was described, defined and named (for better or worse) and should therefore always be recorded as such. Any subsequent corrections to that name would generate new concept (name change therefore new concept according to TCS) which would be according to someone but would hasve a relationship to the original for the definition unless the according to person was also redefining the circumscription at the same time.
* Gregor: I think the term "surely" can be deceptive. No, this is surely not so in botany. Zoology is close to what you assume, but the canonicalization rules in botany require authoritative grammar and spelling corrections of a name. -- 28 Oct 2004
2. Author abbreviations. Although for some taxonomic domains attempts to standardize author abbreviations exist, normalizing abbreviations is very labor intensive to use, and often may cause more confusion when used to guess on the correct abbreviation for non-standard short-hand abbreviations found in the data sources (specimens, literature, etc.).
* Main.JessieKennedy - 12 Nov 2004: Author abreviations could with some work, be treated as like variations in abbreviations of sp., subsp. etc. with a preferred variation taken. Re-writing a name Bellis perennis L. or Bellis perennis Linn. isn't different in my opinion or not in any important way if we know all the different variations for Linnaeus. (As an aside - when working on Prometheus we recognised this and tried to build in our system (not in the data) a new type applicable to authors so they would get a single GUID but have different name representations to simplify the issue of matching on names). So, I believe we should we concentrate on original concepts and thereby names i. e. those as used when taxa were defined first. This clearly doesn't deal with all the mis-spelled references to taxon names used in identification or in the general litertaure - but I think this is a secondary problem that could be more easily dealt with if we sorted out what the taxonomists meant first rather than trying to resolve the meaning and relationship between what anyone might've called a taxon or organism.
* Gregor: You would be correct if there would be a 1:n relation between authors and abbreviations, but in fact it is n:m, i.e. non-canonical abbreviatiosn are ambiguous and can be resolved only in the context of a name. I entirely agree with you that this is the issue of name-variants, just like spelling variants, but I disagree that the solution *for existing data* (not for the new biology using GUIDs and starting from 0) is trivial. From my own work I am pretty certain we need data to deal with this. -- 28 Oct 2004
3. A minor issue is that in the case of infraspecific taxa, the taxon author may be given for both the species and the infraspecific taxon. The canonical recommended form in botany is to give authors only for the lowest ranking infraspecific taxon.
4. A scientific name is often a mixture of hierarchy information and name information. This is one of the reasons, why multiple names for the same taxon concept exist. Indications of hierarchy exist in three places:
a) between genus and species:<br />
<i>Cortinarius olidus</i> J.E. Lange<br />
= <i>Cortinarius (Phlegmacium) olidus</i> J.E. Lange<br />
= <i>Cortinarius</i> (subgen. <i>Phlegmacium</i>, sect. <i>Elastici</i>) <i>olidus</i> J.E. Lange
b) between species and lowest infrageneric rank accepted by the code:<br />
<i>Saxifraga aizoon subf. surculosa</i> Engl. & Irmsch.<br /> (citing only the subforma is the recommended canonical form of a botanical name)<br />
= <i>Saxifraga aizoon var. aizoon subvar. brevifolia f. multicaulis subf. surculosa</i> Engl. & Irmsch
c) at the genus level itself<br />
<i>Microbotryum violaceum</i> (Pers.) G. Deml &amp; Oberw.<br />
= <i>Ustilago violacea</i> (Pers.) Roussel
5. The above does not preclude that a scientific name is cited in a form (using "secundum" or "sensu" that does explicitly indicates a referred taxon concept. But even this is not necessarily unique, since the concept citation are often given in a highly historical-context defined abbreviated form (s. str., s. lat., s. latissimo, etc.).
<h2>Name hierarchy and name identity</h2>
Point 4 above addresses the issue of hierarchy information. Linnean scientific names are a mixture ("entanglement") of expressions of hierarchy and nomenclatural-object-identity. The circumscription of a taxon concept is always specific to the lowest rank. However, some name parts/elements confer both hierarchy and identity. Those only confering hierarchy (subgenus etc.) are redundant in a canonical name, those also confering identity are not. Genus names must be globally unique within each nomenclatural code, but e.g. subgenus names are locally unique, requiring the Genus for identity (the name must be a "combination" (see ICBN 6.7, e.g. "Arytera sect. Mischarytera"). Similarly, infraspecific epithets are only unique within a species. They are, however, not restricted to their immediate infraspecific hierarchy, i.e. a variety in a subspecies must be unique for the entire species, not just for the subspecies.
Hierarchical information does *not* change the circumscription for the purpose of identification or comparing property data about taxa. It does change the "concept" in a sense where the term concept implies the entire hierarchy. Since only few use cases make use of specific hierarchies for individual taxa (as opposed to the importance of a hierarchy in general, which may however be applied to taxon concepts defined elsewhere) - I think these issues should be separated.
* JMS: Hierarchy may affect on organisation of circumscription, as identification key. Does it mean cange of concept? If the circumscription is complete, i.e. it enables distinction of the target taxon from all other world, then the circumscription can be hierarchy independent. If it is somewhat differential, it depends on hierarchy. -- 26 Oct 2004
* Gregor: Good point. I am thinking a bit too much in terms of SDD, where the description independently defines a "description inheritance" as part of descriptive data. However, is it not that the hierarchy informs only about the incompleteness of a differential diagnosis? Moving the species into a new genus requires someone applying the base-concept to understand that the original differential circumscription has to be interpreted in the context of the original genus. I still believe that the recombination it does not change the taxon concept itself. That is, the number of instance objects (specimens) classified as belonging to the concept class should be identical before and after the recombination. -- Gregor - 26 Oct 2004
* JMS: Conservation of number of specimens in recombination is unnecessary. Specimens may be added to the class safely if they satisfy the circumscription. Suppose moving <i>Aus xus</i> to <i>Bus xus</i>. Circumscription of the <i>xus</i> can be represented by a set of character states, or attributes, Attr(the <i>xus</i>) = {...complete list of attributes...}. If reassignment of <i>xus</i> does not change the taxon, it can be represented by different subsetting of the Attr(the <i>xus</i>): Attr(the <i>xus</i>) = {Attr(<i>Aus</i>), and remainings...} = {Attr(<i>Bus</i>), and remainings...}. These remainings are diferent in these subsettings, of course. Although I'm not confident that all recombinations satisfy this requirement, this separation between hierachy and (complete) circumscription seems reasonable. It is better to avoid the word concept because it may imply something embedded in hierarchy, as Formal Concept. -- 27 Oct 2004
* Gregor: I think what you say does not disagree with what I say, do you agree? I meant "specimen number" not in a Prometheus circumscription sense, but as a mean of testing a concept against all specimens in the world, evaluated at the same time. -- 28 Oct 2004
* JMS: I'm not confident yet, but I have no objection against to your thought, except conservation of specimens (or specimen numbers) even in non-Prometheus sense. It is TCS issue, not LinneanCore. -- 28 Oct 2004
<h2>Some thoughts</h2>
Name strings may be:<br/>
1. with/without authors, <br/>
2. with/without year of publication<br/>
3. with/without concept suffix<br/>
4. in zoology: with/without indication protonym genus (when only the epithet is given as a name).
* Richard Pyle: In zoology, there is always an "indication" of a genus placement, and there is usually an indication of whether the placement is the same as the Protonym genus placement (i. e., whether or not the authorship is enclosed within parentheses). The epithet is almost never given as a stand-alone name unless it is very obviously a subheading underneath an unabiguous genus heading. - 29 Oct 2004
Names without authors are frequently found in works addressing the general public. Resolving them to a unique scientific name requires knowledge of a historical and geographical publication context. In some groups this may be relatively easy, if data on the usage period (= years) of homonyms (same genus and species, but different authors) are available. However, datasets providing this information are not known to me.
* Main.JessieKennedy - 12 Nov 2004: resolving them to a unique scientifc name and even concept is difficult but as I was saying above it is secondary to resolving the taxonomic literature and names associated with concepts. I could imagine eventually getting to such a position but only if we get the genuine taxonomic information sorted to act as a reference point first.
The most common practice in scientific publication is the use the scientific name with authors and or year of publication. It is is highly desirable to be able to find unique names. Much of the LinneanCoreUseCases ultimately depend on the desire to compare different data sets with each other.
* Main.JessieKennedy - 12 Nov 2004: I agree - but I guess you are now thinking of trying to resolve these names to those recorded in data sets or non-taxonomic literature (I mean literature where the author is not trying to specifically define some taxa)
* Gregor: That is correct - and I believe this is the really relevant side for LinneanCore. There would be no funding for GBIF if people would not want to *use* data. Almost all molecular data, morphological descriptions, geographical distribution and specimen data, i.e. all what we talk about in GBIF is not a taxonomic definition of a new taxon concept, but name usage.. -- 28 Oct 2004
Taxon concepts are very desirable in many case. Also professional practice to refer to concepts in publications in a uniquely resolvable way should be encouraged. However, many cases exists where each identification creates its own taxon concept.
* Main.JessieKennedy - 12 Nov 2004 that's one way to look at it.
Only a subset of taxon concept applications is currently operationally feasable. Whenever I identify a plant pathogen, I routinely use 3-4 publication sources with keys, indexes, descriptions. The resulting name is a result of my attempt to understand and reconcile the concepts I find in this literature. Citing the sources is helpful, but introduces a fuzzy statement that my new concept is somehow and operationally intractably related to these other concepts.
* Main.JessieKennedy - 12 Nov 2004: I find it hard to believe that you couldn't say whether or not at the end of the day your identification adheres to all or some of the definitions for each of the concepts you used as your references - yes it might be more work but... It might be that you think it fits one exactly, it is similar to another etc. now what you need is a mechanism to capture the degree of reliability in your identification. Isn't the observation group going to look at that? I'm sure they will need to specify somehting like this when they start thinking about it, i.e. whether or not they think they have an unquestionable identification to some concept or a loose identification in some way. So I don't think it's so much a concept issue but an identification issue.
* Not I cannot say what you assume, and I believe anybody working in difficult taxonomic groups (i.e. not in something vastly overpublished like Vascular plants of British Islands or Germany...). The problem is that it is not that you are using multiple sets of real well defined taxon concepts, but that information is extremely fragmented and not well summarized/worked through. One reference informs me about the host species (but has no other information than the name), another has several illustrations and perhaps a key with 2 or 3 characters, but no complete descriptions. One publication describes only the herbarium characters, but all I have is a living culture, which only approximately behaves like the fungus on the plant itself. In another publication it is actually the reverse!
Resolving unique scientific names to a taxon concept in retrospect is often very difficult and extremely labor intensive. In the projects that I am involved it, I see no hope to ever find the resources to attempt this. Resources to cross reference taxon concepts among taxnomic publications may be found, but property or spatial/temporal observation data are usually separate from these treatments - and I believe they are the real interest in using a GBIF name service.
* Richard Pyle: I don't think this group (<nop>LinneanCore) should get bogged down in these concept-specific issues. Leave that to the TCS group. - 29 Oct 2004
* Gregor: I agree; I elaborated too much in the above. My argument is that without denying the importance of adequately dealing with taxon concepts, practical problems exists, and hence the need for mechanisms to at least partially solve problems, which is where I see LinneanCore as an option. - 30 Oct 2004
<h2>Some conclusions from me for the design of <nop>LinneanCore</h2>
1. Separate canonical name information from hierarchy opinion
* For ranks covered by the nomenclatural codes the code gives a good indication of canonicalization
* Where hierarchy information is required for identity, the hierarchy changes must be traced by other means that canonicalization. As a consequence, I propose to accept rank-status changes and genus combinations as separate <nop>LinneanCore objects, and trace there dependency explicit using basionym (or similar homotypic) relationships as part of a nomenclatural core.
* author names and spelling problems are best resolved by referring to nomenclators by a GUID (globally unique identifier) like an LSID
* *The issue of below-code name parts and the best way to make this comparable is open and I have no good solution*. Only some of these names are covered by current nomenclators. Perhaps a new breed of "nomenclators" providing GUIDs for these is necessary.
2. Separate taxonomic opinion from the much more reliable nomenclatural relations.
* When collaborating, it is much more feasable to agree on nomenclatural opinion, than on concepts.
* Main.JessieKennedy - 12 Nov 2004: why the nominal concept was introduced into TCS.
* Multiple concept synonymies can share a common nomenclator. I believe this is not possible with the proposed TCS. (Ensuing discussion points moved to LinneanCoreTCSInteraction)
* Main.JessieKennedy - 12 Nov 2004don't understand - will check the other discussion....
3. Things like biostatus, vernacular names etc. should be urgently developed, but as separate modules to the scientific name component.
-- Main.GregorHagedorn - 26 Oct 2004