wiki-archive/twiki/data/UBIF/LCNameUnitDiscussion.txt

---+!! %TOPIC%

%META:TOPICINFO{author="GregorHagedorn" date="1100040360" format="1.0" version="1.11"}%
%META:TOPICPARENT{name="LinneanCoreDefinitions"}%
Discussion on [[LinneanCoreDefinitions#NameUnit][Name-unit]]

	* Gregor: a multi-part *what*? Name-string? Name-string-with-authors? I think you mean words or terms in a name string, but I do not understand the definition. Do you mean "Aceraceae: Acer rubrum spp. rubrum L." has 6, 5, 4, or 3 units? In general, I propose to use "part" instead of unit: *Name-part*: A part of a multi-part Name-string-with-authors, delimited by whitespace, expcept in the case of authors where each name constitutes a part, and in the case of rank connecting terms composed of multiple words ("fm. spec."). What about the sensu suffixes? -- 30. Oct. 2004
		* Richard: I meant nomenclatural units of a multi-part scientific *Name-string* (sensu me) -- *not* authorship components.  In other words, the canonical names. -- 30 Oct 2004
		* Richard: "Aceraceae: <em>Acer rubrum</em> spp. <em>rubrum</em> L." can be broken into two separate *Name-strings*: "Aceraceae" and "<em>Acer rubrum</em> spp. <em>rubrum</em>" (note omission of "L.").  I consider these as separate *Name-strings* because none of the Codes would consider "Aceraceae" as part of the same "name" as <em>Acer rubrum</em> spp. <em>rubrum</em> (perhaps the definition of *Name-unit* should explicitly state that it only refers to units (parts) of *Rank* Genus or lower; and only to *Rank* Genus in cases where the *Rank* of the complete *Name-string* is below Genus).  Thus, the *Name-string* "Aceraceae" would not be considered a *Name-unit*, because it is not part of a multi-part *Name-string*. The *Name-string* "<em>Acer rubrum</em> spp. <em>rubrum</em>" has exactly three *Name-units*: <em>Acer</em> (*Rank*=Genus); <em>rubrum</em> (*Rank*=Species); <em>rubrum</em> (*Rank*=subspecies). -- 30 Oct 2004
		* Richard: I originally also wrote it as *Name-part*, but later decided that "unit" was a better qualifier.  I could be convinced to go back to *Name-part*. -- 30 Oct 2004
		* Richard: What do you mean by "sensu suffixes"?  Do you mean like: "<em>Acer rubrum</em> spp. <em>rubrum</em> L. *sensu* Pyle"?  If so, then I would say that the "sensu" suffix is absolutely not a *Name-unit*, nor would it be part of the *Name-string*.  It's not even part of a "name" per se -- rather, it's part of a *Concept-string* (i. e., a matter for TCS to discuss). -- 30 Oct 2004
		* Gregor: I completely agree that we are not discussing concepts here, but the definitions discussed are more general and should remain usable. I see no reason why a concept should not have a name-literal and a name-string. "Concept-string" would imply it expresses a concept, but I believe we are referring to a string of characters that is naming a concept in a way that it can be compared. I think the definition of Name-string should cover that. In any case, I think the definitions should cover the parts not covered by LC, so we can express what we are talking about. -- 31. Oct. 2004
		* Richard: I guess it all boils down to the tediously old question, "what is a name?".  I still see a clear distinction between units of the character string: "<em>Pseudanthias ventralis</em> subsp. <em>hawaiiensis</em> (Randall) Hoover" that are part of a Linnean scientific name, and units that are part of the authorship of that name. Thus, I would feel most comfortable with "Name-string" for the first three units plus the "subsp." indicator, and something like "NameAuthor-string" for the "Name-string" with the addition of "(Randall) Hoover" [I actually see three subparts of the "NameAuthor-string": "Name-string", "ProtonymAuthor", and "CombinationAuthor".]  Perhaps the assemblage of "<em>Pseudanthias ventralis</em> subsp. <em>hawaiiensis</em> (Randall) Hoover sensu Pyle" could be referred to as "NameAuthorConcept-string", with the addition of the subpart "ConceptAuthor".  Perhaps it would be better if all uses of "Author" in this paragraph were replaced by "Citation" (to accomodate the addition of Year values)?  In any case, I completely agree that the definitions should extend beyond LC, as you suggest. -- 02 Nov 2004
		* Gregor: I think "Citation" is better than authors. "NomenclaturalCitation" and "ConceptCitation" make more sense to me than "authors". However, I do not think that "s. str." or "p. p." are adequately covered by the latter. Also, I think a separate "UsageCitation" may be helpful. Strictly any usage can be considered a new concept, but I believe this is operationally not very useful. -- 02 Nov 2004
		* Gregor: I think I should make clear that I have no problems with "name" itself being restricted to just the name without authors. That is the way it is used in the codes as well. However, "Name-string" implies a usage context to me, so in my logic I never think about it without some form of disambiguation. Moreover, on the name-usage side (which contains the data I am interested in) a wild mixture of names with and without authors exist. Some publication may make a statement in the preface, and use names without authors from thereon, some publication use the author in the first usage, and later on the name without, etc. I think that explains why in my feeling the normal name-usage-string should be a superset of pure names and names with various disambiguation. My unease and my reason to discuss this to length is that I personally always misread the term name-string, I find it deceptive if applied in a name-usage context - and I feel only there does it matter to abstract to a string. -- 02 Nov 2004

---
	* JMS: I propose to use "name-token" instead of "name-unit" because token is more straightforward (in computer jargon?).  Unit is already used ABCD in different sense.  Name-morphorme is another candidate, but it would not allow to have hyphen inside. -- 09 Nov 2004
		* Gregor: I agree that the term "unit" is overloaded and in our context somewhat preoccupied by ABCD. However, perhaps the same is true for token? xml schema defines: "token represents tokenized strings. The <20>value space<63> of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The <20>lexical space<63> of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces." -- 09 Nov 2004
%META:TOPICMOVED{by="GregorHagedorn" date="1099217815" from="UBIF.NameUnit" to="UBIF.LCNameUnitDiscussion"}%