wiki-archive/twiki/data/UBIF/LinneanCoreDomain.txt

---+!! %TOPIC%

%META:TOPICINFO{author="GregorHagedorn" date="1099397520" format="1.0" version="1.9"}%
%META:TOPICPARENT{name="LinneanCore"}%
I think this is one of the most fundamental points of discussion that we need to resolve: What is the "unit" or "basis of record" of a <nop>LinneanCore instance?  Simply answering, "a scientific name" just doesn't cut it. One way to think of this is, if an LC "name" were to receive a GUID, then what "unit" instances would receive their own GUID?

These are some examples of what I mean by "Unit" (see LinneanCoreDefinitions for elaborations, as linked below):
	* [[LinneanCoreDefinitions#ProtonymDefinition][Protonyms]] (Basionyms)
	* [[LinneanCoreDefinitions#NewCombinationDefinition][New Combinations]]
	* [[LinneanCoreDefinitions#NameStringDefinition][Name-strings]] that include infrageneric (superspecific) epithets (= intended to express hierarchical opinion)
	* [[LinneanCoreDefinitions#NameStringDefinition][Name-strings]] that include more than one infraspecific epithet (= intended to express hierarchical opinion)
	* [[LinneanCoreDefinitions#VariantSpellingDefinition][Variant Spellings]]
	* Variants of [[LinneanCoreDefinitions#NameStringWithAuthorshipDefinition][Name-string with authorship]]
	* Individual [[LinneanCoreDefinitions#NameUsageDefinition][Name-usage]] instances

We would all agree that [[LinneanCoreDefinitions#ProtonymDefinition][protonym]] instances should be represented by their own unique LC instance (= GUID). We would also all agree that there wouldn't be a new LC GUID for every single [[LinneanCoreDefinitions#NameUsageDefinition][name-usage]] instance.

So, the question is, which of the others should be represented by a unique LC GUID, vs. represented as either secondary information tagged on to a "legitimate" LC instance and/or represented in TCS outside the LC subsection.

#NewCombination
<h3>*New Combinations*</h3>

I think most of us would agree that [[LinneanCoreDefinitions#NewCombinationDefinition][New Combinations]] (which constitute discrete botanical "names") should be thought of as unique LC instances, separate from the [[LinneanCoreDefinitions#ProtonymDefinition][Protonyms]] upon which they are based (i. e., a [[LinneanCoreDefinitions#NewCombinationDefinition][New Combination]] would receive its own GUID).  The ICZN Code does not regulate these, but it wouldn't be too hard in most cases to represent the "first use of a new combination" as a functional equivalent to the ICBN-regulated [[LinneanCoreDefinitions#NewCombinationDefinition][new combinations]].
	* Gregor: I agree that this should be in the domain. Note that in the botanical world, although the first combination receives special status, later different combinations may constitute homonymic combinations, and may be interesting to trace. In some cases later combinations (in ignorance of the earlier) have been widely cited with the later author. Thus, somewhat conveniently, both in botany and zoology multiple combinations per protonym are in the domain. -- 2004-10-30

The LC schema would then need to track at least two publication instances: the one for the [[LinneanCoreDefinitions#ProtonymDefinition][protonym]], and a separate one for the [[LinneanCoreDefinitions#NewCombinationDefinition][New Combination]]. In the case of [[LinneanCoreDefinitions#ProtonymDefinition][protonym]] instances, I assume that both sets of publication elements would be populated with identical data?  Or would the [[LinneanCoreDefinitions#NewCombinationDefinition][New Combination]] publication elements be left empty for Protonym instances?
	* Gregor: I think the comb. nov. or nomen nov. should refer to the [[LinneanCoreDefinitions#ProtonymDefinition][protonym]] by id. If instead we think we have to flatten the schema and include rather than refer the [[LinneanCoreDefinitions#ProtonymDefinition][protonym]] data, then we do have two publications. However, if the object instance describes a [[LinneanCoreDefinitions#ProtonymDefinition][protonym]] (basionym) itself the combination elements should be missing rather than duplicated. (I still have trouble with the term protonym, what I say is really true for basionym, whereas I know that you consider a nom. nov. recombination as a [[LinneanCoreDefinitions#ProtonymDefinition][protonym]]. In this case another referred or included protonym exists, even though the LC object instance is in zoology considered a [[LinneanCoreDefinitions#ProtonymDefinition][protonym]] - but not in botany a basionym.
	* Richard: See additional discussion at the [[LCProtonymDiscussion][protonym discussion]] section -- 31 Oct 2004


#InfragenericInfraspecific
<h3>Name-strings with infrageneric epithets &amp; Name-strings with multiple infraspecific epithets</h3>

The issues surronding both of these categories of names are essentially identical, so I have lumped them together here.  My comments below RE: infrageneric names could also be applied to additional (more than one) infraspecific epithets.

In Christchurch, we discussed the issue of [[LinneanCoreDefinitions#NameStringDefinition][name-strings]] that included infrageneric epithets, and whether they should be thought of as distinct LC intsances, or different applications of the same LC instance.  For example, consider: <i>Anthias hawaiiensis</i> and <i>Anthias</i> <i>Pseudanthias</i>) <i>hawaiiensis</i>. Would these represent different LC instances? Or different usages of the same LC instance?  The same question framed a different way: If a TCS instance involved an "<nop>AccordingTo" publication that used the [[LinneanCoreDefinitions#NameStringDefinition][name-string]] "<i>Anthias </i>(<i>Pseudanthias</i>)<i> hawaiiensis</i>", which of the following should we assume:
	1 There will be separate LC instances for each of these [[LinneanCoreDefinitions#NameStringDefinition][name-strings]], so the TCS instance will simply point to the appropriate one?
	2 There will only be one LC instance for this "name" (<i>Anthias hawaiiensis</i>), and the fact that the <nop>AccordingTo publication inserted the subgeneric epithet "<i>Pseudanthias</i>" will be recorded in a separate TCS schema element (e.g., "<nop>NameAsUsed") that is outside of the LC schema, but adjacent to the LC pointer within TCS? (i. e., it's treated as a [[LinneanCoreDefinitions#NameUsageDefinition][Name-usage]] anchored to the TCS instance, rather than the LC instance)
	3 The fact that the <nop>AccordingTo publication placed the species "<i>hawaiiensis</i>" within the subgenus "<i>Pseudanthias</i>" will not be recorded anywhere in the TCS instance, but instead would be derived from TCS "Relationships" -- i.e., the TCS instance for <i>Anthias hawaiiensis</i> would be mapped as "included in" a separate TCS instance for <i>Anthias </i>(<i>Pseudanthias</i>?

My vote is for one of the first two options.  If the first option, then LC would essentially be expanding the definition of "[[LinneanCoreDefinitions#NewCombinationDefinition][New Combinations]]" to inlude more than just ICBN-Code defined "[[LinneanCoreDefinitions#NewCombinationDefinition][New Combinations]]" -- but also different infrageneric combinations and different combinations of multiple infraspecific epithets.  It seems to me that if Zoological names can "artificially" create [[LinneanCoreDefinitions#NewCombinationDefinition][New Combinations]] instances, then the Botanical code can similarly "artificially" create non-code-governed "[[LinneanCoreDefinitions#NewCombinationDefinition][New Combinations]]" instances that involve only variations of infrageneric placement and/or multiple infraspecific epithets.

If the second option (which I slightly prefer), then there will *need* to be an additional element within TCS to capture the specific "<nop>NameAsUsed", outside of the LC subset schema.

	* Gregor: I strongly oppose the first option. I believe it violates the principle that a concept can have multiple name-strings that do not change the concept. Since different opinions about phylogeny above the concept circumscription do not change the circumscription ("the arrangement of shapes in other shapes does not change the circumscription shape"), mixing issues of hierarchy concepts and circumscription concepts is not desirable. This is the point I try to make in LinneanCoreDisentangle.
	* Richard: How does it violate this principle any more than the botanical tradition that "<em>Anthias hawaiiensis</em>" is a different "Name" (=different LC instance) from "<em>Pseudanthias hawaiiensis</em>" violates the same principle?  In the botanical view, these two Name-strings would represent different LC instances.  In the zoological view, they would be the same "Name" instance, merely representing diffferent hierarchy concepts.  Why is it O.K. to treat these as two separate instances, but not treat "<em>Anthias</em> (<em>Pseudanthias</em>) <em>hawaiiensis</em>" as a third instance?  I'm not arguing in favor of the first option -- I'm just trying to understand your logic. -- 31 Oct 2004
	* Gregor: I guess I have to admit that this is botanical "logic". Botany attempts not canonicalization in the Genus-recombination case, or in the case recombination into different rank (elevating from subspecies to species rank). It does provide rules for "canonicalization" of names in the case of epithet spelling changes, different phrases expressing rank connector strings, and "redundant" infrageneric or infraspecific ranks. I think it ties in with the botanical concept of authorship priority - for recombinations changing the canonical name there is a special, first, publication. For other hierarchy expressions there is none. However, I agree with Rich that for Zoologist, where the genus recombination is no different from a subgenus recombination this seems odd. So perhaps, it does make sense to treat additional name-strings (with/without author) similar to genus or rank-level recombinations. I have become quite uncertain now! At least forget the "principle" stuff I said!
	* Gregor (cont.): Option 2 or 3 are fine for me, I believe this is a TCS issue. However, I propose a fourth option to be added to the list by Rich:
	4 Either as part of LC, or as an extension of it for the purposes of GBIF ECAT data providers, expressions of hierarchy may be provided separately from the atomized canonical name. The issue is closely related to the issue of "CurrentAcceptedName" (not yet discussed here). <br/> Hierarchy expressions include those commonly expressed as part of the name-string (e.&nbsp;g. infrageneric epithets like subgenus, multiple infraspecific epithets) and those not part of name strings (e.&nbsp;g. order, family, infrageneric section). I propose to define a "CurrentHierarchy" container, containing a list of rank-name-author tuples. No ordering requirements are made, but the list may be reordered according to rank by processing applications. <br/> <br/>In contrast to the "CanonicalName", which allows identity comparisons, the purpose of "CurrentHierarchy" information is to capture and preserve the hierarchy opinion. It is not intended to record the *original* hierarchy (such as in first publication of a protonym, or a combination), for which separate <nop>OriginalAuthorsTaxonname and <nop>OriginalAuthorsHierarchy or similar text fields are intended (as James pointed out, it may be relevant to understand aspects of hierarchy of original name-usage, e. g. where a differential diagnosis is given.<br/><br/>The main reason for asking for this is to offer data suppliers to ECAT to provide their hierarchy and current accepted name information in a way that can be collected and aggregated without the need for conflict resolution. It is a practical request, that ultimately could end up in TCS. It is easier to supply, and allows "data mining" applications to obtain hierarchy models - and perhaps interpret them as TCS data where possible. In the context of GBIF it can be expected that GBIF will use a primary consensus hierarchy at least down to the family level, and in most groups down to the genus level. However, infrageneric/specific hierarchy can practically only be obtained from individual data providers. -- Gregor Hagedorn -- 2004-10-30
	* Richard: I would like to see this proposal in the context of ECAT, rather than in the LC schema itself.  I don't think the "Core" part of "LinneanCore" should provide elements for opinion-based hierarchy information. -- 31 Oct 2004

#VariantSpelling
<h3>*Variant Spellings*</h3>

In many ways, the issues with [[LinneanCoreDefinitions#VariantSpellingDefinition][Variant Spellings]] are the same as those described in the previous section on infrageneric names and multiple infraspecific epithets.  One critical question here would be: are ALL variant spellings handled the same way?  Or should there be a distinction between variants that are clear examples of a [[LinneanCoreDefinitions#LapsusDefinition][Lapsus]], vs. "founded" spelling changes (see Gregor's comments about half-way down the LinneanCoreDisentangle page, and some additional comments at LinneanCoreDefinitions).
	* Gregor: I agree on the problems distinguishing between lapsus and founded spelling. In practice, I find it less important though. The only special spellings to me are the original, and the most recently corrected one. If an erroneous (assumed to be "founded"!) orthographical correction occurred in between, the difference to a lapsus is not great. Importantly, I believe these name-strings should *not* be LC records, i.&nbsp;e. not receive their own GUIDs. The string itself is a sufficient identifier for them. Instead, they should be provided for as a list of name-strings under the heading "<nop>NameVariants" as part of the LC record for which they define a name variant. I believe the practical utility of such name variants is very great. I am uncertain whether to support atomization for them or not. I tend to think not. However, the elements in the list may be tuples of "NameVariantString", "UsageCitation", "Annotation"? Another approach may be to provide for variants in each of the atomized elements,  i.e. separately for Genus, epithet, -- Gregor Hagedorn -- 2004-10-30

#VariantAuthorship
<h3>Variants of *Name-string plus authorship*</h3>

Personally, I do *NOT* think LC should consider authorship as part of the "name" per se, and thus variations of authorship should not be tracked via separate LC instances.  Does anyone think that variations of authorship as they appear in the context of different [[LinneanCoreDefinitions#NameUsageDefinition][name-usages]] need to be tracked robustly (i. e., in a structured way)?  Or is this only for human-readable information; probably tied to a [[LinneanCoreDefinitions#NameUsageDefinition][name-usage]] instance (rather than an LC instance).
	* Gregor: I am confused. Do you propose that for homonyms only a single LC record should exist? I think probably not, but then I do not see why variants in the author spelling are differnt from the variant spellings above - other than that more variants exists. However, for any use case I can think of (esp. for comparing name usage instances to obtain a "fuzzy set" of things that may be comparable), I believe epithet and author variant are equally important. -- Gregor Hagedorn -- 2004-10-30
	* Richard: In the context of subsequent discussions, I see now why my wording above is confusing.  All I meant was that it is not the job of LC to track variants of authorships attributed to the same (non-homonymous) Name-string.  For example, the following items should not be given separate LC instances, simply because they may have appeared in the literature:<br/>* <em>Pseudanthias hawaiiensis</em> (Randall) Hoover<br/>* <em>Pseudanthias hawaiiensis</em> (Randal) Hoover<br/>* <em>Pseudanthias hawaiiensis</em> (J.E.Randall) Hoover<br/>* <em>Pseudanthias hawaiiensis</em> (Randall) Randall<br/>* <em>Pseudanthias hawaiiensis</em> (Randall, 1979)<br/>* <em>Pseudanthias hawaiiensis</em> Randall<br/>Some of these items are cleary "incorrect" (Lapsus); some are correct within one Code-context, but not in another.  The point is, they should not all receive separate LC instances.  Sorry for my originally confusing language. -- 31 OCt 2004
	* Gregor: This is good progress, I start myself to think more clearly. I agree, Name variants are not LC objects in a sense that they should receive a GUID. I do continue to think that <nop>NameVariants should somehow be collected and transferred. This is pretty worthless for nomenclaturists, but extremely useful for people who try to use, index, and cross-reference name usage. I also maintain, that I see no principal differences between name-string without-author and name-string with-author name variants; I would treat them analogous. -- Gregor Hagedorn -- 2004-11-01
	* Richard: I agree that we need to somehow track <nop>NameVariants in a way that they are cleanly searchable along with non-variant names.  My personal feeling is that they do not belong within the LC structure.  I prefer to think of them more as "Usage"-based information.  As such, the logical place in my mind to store them is in the <nop>NameSimple element of TCS; but some might argue that not every <nop>NameVariant can be anchored to a TCS Concept instance -- so there may be problems with this approach. -- 02 Nov 2004
	* Richard: I would still like to establish separate short-hand terms for "Name-string without-author" and "Name-string with-author", because in my view, they are different things, used for different purposes (I would rather search on "Name-string without-author", but display "Name-string with-author").  Personally, I don't see a need for establishing a single element in the schema that contains both Name-string and authorship information -- I would rather see only separate nomenclatural and authority elements that are concatenated as needed for presentation purposes.  As such, it would be convienent to have one term to describe a "Name-string without-author" separately from "Name-string with-author".  I guess we can always use these two four-word terms in our discussions, but that seems cumbersome. -- 02 Nov 2004

---

Are there any other categories of "names" that I have missed?

-- Main.RichardPyle - 29 Oct 2004
---

I would like to add the following categories of names to the discussion:

	* Names that include a concept suffix: "s. lat.", "secundum author", "sensu author",  "non author", "pro parte"
		* Richard: If you mean these should each be treated as separate LC "Name" instances (independant of TCS), then I strongly disagree -- at least for the "Core" part of LC. Perhaps as an extension; but I would rather spend more time fixing whatever needs fixing in TCS, than develop temporary extensions to LC that will be replaced by TCS anyway.
		* Gregor: People wanting to share their checklist data with GBIF, consisting of name-strings-with-or-without-author-with-or-without-concept-suffix then have no means to transfer atomized data? -- 2004-11-01
		* Richard: Yes they do:
			* "name-strings-with-author-without-concept" -- populate Name-string elements and authorship elements of LC
			* "name-strings-without-author-without-concept" -- populate Name-string elements of LC but leave authorship elements of LC empty
			* "name-strings-with-or-without-author-with-concept-suffix" -- Populate appropriate fields of TCS, with Name-string elements and authorship elements of LC embeded within the LC portion of the TCS instance
		* Gregor (cont.): Checklists are relatively simple data structures, and as far as I see TCS does not attempt to cover the check-list use case (I would say rightly so, but am I wrong?). -- 2004-11-01
		* Richard:  I *think* this is *exactly* the sort thing that "Nominal"-type TCS instances was intended for: names without implied concepts. (I assume by "checklists" you mean nomenclatural checklists with no implied concept; if a concept is implied, then the names are proper TCS concepts and should use the checklist itself as the "AccordingTo" publication. -- 02 Nov 2004
		* Gregor (cont.): I do see a need for a name-atomization-canonicalization structure. I see no mechanism in TCS of handling this (for existing, already published or digitized data; TCS certainly can create equivalent and more specific relationships for new data in the TCS framework). -- Gregor Hagedorn -- 2004-11-01
		* Richard: Agreed!  TCS will use LC for its name-atomization-canonicalization structure. But I think search performance will be improved if there is a single element (either within LC, or outside of LC [<nop>SimpleName in TCS]) that has the full concatenated canonical Name-string.
	* Names that include rank categories not governed by the applicable code. Examples are race, breed, forma specialis, pathovar, serovar, candidatus, etc. In essence in Christchurch we discussed to include them - however I see two problems: a) although desirable, generating a canonical name is more difficult. b) these names may have to be added in addition to a code-governed infraspecific rank. One question I have: is there a good summary term to apply to these rank categories? Non-code-constrained-ranks or infracoderanks sound silly to me.
		* Richard: I don't know the term to use (if you come up with one, please propose it on the LinneanCoreDefinitions page); but I agree we need to think about these.  I wonder:  how are they different from names above the rank of family (also not governed by the Codes).  My initial feeling is that the higher-rank names are more "real" than the "subinfraspecific" names you described above -- but I can't think of a logical reason behind this feeling. -- 01 Nov 2004 [but still Halloween here in Hawaii]
		* Gregor: I think they are different in that they confer circumscription information, not a hierarchy information. If I read "forma spec. soandso" in a publication, it informs me about the class (taxon) concept and the width of this concept. If I read "Microbotryum violaceum (Microbotryaceae)" or "Microbotryum violaceum (Ustilaginaceae)" or "Ustilago violaceum (Ustilaginaceae)" I only learn about other opinion the author is holding, but it does not change the circumscription of the class concept to which the author appends his or her information about this taxon. -- I think the perspective is important: are you informing about the concepts, or are you informing about organism properties including distribution or molecular data. If I study a gene, the forma specialis is *more* real than the family. If I study systematics, it may be less real, in the sense of ill-defined, difficult to recognize, not relevant for the large picture. -- Gregor Hagedorn -- 2004-11-01
		* Richard: By "real", I meant "real" in a purely Linnean nomenclatural sense.  Most taxonomists would agree that names at the rank of Order, Class, etc. are "legitimate" scientific names (even though they are not covered by Codes); whereas many might consider the infrasubspecific designations to be "illegitimate" usage of Linnean nomenclature.  I'm not sure I fully understand your difference between Hierarchy and cirumscription here.  In my mind, only the terminal (lowest-rank) unit of a Name-string denotes circumscription; and all higher-rank units reflect hierarchy.  Except for limits stated in the Codes, the logic seems to apply equally to any multi-unit Name-string, whether it is "Genusname speciesname subspeciesname racename", or "Classname: Ordername: Familyname: Genusname". -- 02 Nov 2004.
		* Gregor: I basically agree with your. There is "entanglement" built into Linnean names. Without claiming that my thoughts are crystal-clear (I consider myself a "botanical proxy" for those better qualified experts not participating in the discussion!), my thinking is that there is hierarchy and name-identity. I agree with circumscription = lowest rank. However, some name parts/elements confer both hierarchy and identity. Those only confering hierarchy (subgenus etc.) are redundant in a canonical name, those confering also identity are not. Only the Genus must be globally unique within each nomenclatural code. To express a subgenus, you must use a combination (ICBN 6.7) like "Arytera sect. Mischarytera" -- 2004-11-02

	* A minor point, but I am especially uncertain about this: some historic names have infraspecific rank connecting terms like beta, gamma that frequently are interpreted as "var" or "ssp.". Is it sufficient to treat these under <nop>OriginalAuthorsSpelling?

-- Gregor Hagedorn - 30 Oct. 2004
---