head 1.16; access; symbols; locks; strict; comment @# @; 1.16 date 2009.11.25.03.14.38; author GarryJolleyRogers; state Exp; branches; next 1.15; 1.15 date 2009.11.20.02.45.30; author LeeBelbin; state Exp; branches; next 1.14; 1.14 date 2007.03.06.17.30.00; author TWikiGuest; state Exp; branches; next 1.13; 1.13 date 2006.05.04.11.25.52; author GregorHagedorn; state Exp; branches; next 1.12; 1.12 date 2005.03.21.22.22.20; author JenniferForman; state Exp; branches; next 1.11; 1.11 date 2004.06.21.11.29.59; author GregorHagedorn; state Exp; branches; next 1.10; 1.10 date 2004.06.11.09.17.30; author GregorHagedorn; state Exp; branches; next 1.9; 1.9 date 2004.05.28.14.29.49; author GregorHagedorn; state Exp; branches; next 1.8; 1.8 date 2004.05.25.11.06.48; author GregorHagedorn; state Exp; branches; next 1.7; 1.7 date 2004.05.24.10.55.00; author GregorHagedorn; state Exp; branches; next 1.6; 1.6 date 2004.05.11.13.28.00; author GregorHagedorn; state Exp; branches; next 1.5; 1.5 date 2004.05.03.09.04.33; author GregorHagedorn; state Exp; branches; next 1.4; 1.4 date 2004.05.01.23.45.00; author GregorHagedorn; state Exp; branches; next 1.3; 1.3 date 2004.04.30.12.23.01; author BobMorris; state Exp; branches; next 1.2; 1.2 date 2004.04.27.16.58.02; author BryanHeidorn; state Exp; branches; next 1.1; 1.1 date 2004.04.26.19.09.00; author GregorHagedorn; state Exp; branches; next ; desc @none @ 1.16 log @none @ text @%META:TOPICINFO{author="GarryJolleyRogers" date="1259118878" format="1.1" version="1.16"}% %META:TOPICPARENT{name="ClosedTopicSchemaDiscussionSDD09"}% ---+!! %TOPIC% This topic is an attempt to find a general solution for TheProblemOfSex. I am still struggling with it and very much hope that you can help by commenting, including your feelings about the options. I am not sure whether this should be on the agenda for Berlin, but in the longer run I believe we need a solution. It has not been discussed at any meeting so far (although see the related GeographicalRestrictions), perhaps because the inclusion of diagnostic keys only occurred in Lisbon and this really precipitates this issue. Main.GregorHagedorn --- %GREEN%I believe SecondaryClassifiersProposal addresses many of the issues. Please also look there. -- Main.BobMorris - 30 Apr 2004 %ENDCOLOR% --- I suggest that you preferably look at the Word document (attached as zip file at the end). The Word document is in tracking mode, so you can write directly into it, and upload the file including the comments. I am pasting the converted and manually edited text of the document here as well if you want to add your comments here. However, I am not sure everything converted OK. I did this as a test, but I believe conversion is way to labor intensive to be feasable for future WIKI discussions on such documents. ---
[… we had quite a few discussions about sex and stage on the WIKI (e. g. http://wiki.cs.umb.edu/twiki/bin/view/SDD/TheProblemOfSex. I believe this is one of the solvable things and that it should now be solved in SDD …!
Currently I have worked through the beginning several times, but it is getting rough towards the end, when we come to conclusions and proposals. I am still undecided what the best strategy is, and I hope that you can help me with some comments, including your feelings on this…]
When objects are classified to the most specific level recognized in the class hierarchy (in biology = species, subspecies, or variety), their descriptions are still not necessarily identical. Some differences are due to random effects in the individual history of an object, others are however systematically repeatable (and, in biology, genetically coded). The most important types of genetically coded intra-class variation are polymorphisms and systematic changes occurring during developmental or life-cycle stages of an object (Table 1).
Table 1. Examples for classification systems and sources of intra-class-variation in biology and the study of musical instruments
Classification system |
Biological organisms |
Musical instruments |
Phylogenetic / Inherited (→ multiple characteristics are linked) |
Evolutionary history /taxonomic classification (e. g., order/family/genus) |
Craftsmanship, technological, or industrial traditions of instrument creation |
Operational (arbitrarily based on a single characteristic) |
Tree/shrub/herb, water vs. land plants |
Sachs-Hornbostel (idiophones, membranophones, chordophones, aerophones, electrophones) |
|
|
|
Source of further variation |
Biological organisms |
Musical instruments |
Individual history |
|
|
a) chance effects |
Scarring of skin, mutilations |
Scratching, discoloration |
b) systematic responses |
Phenotypic responses like flowering time or variable shape to maximize resource utilization, |
Response to humidity or submerging in water |
c) essential and |
Developmental stages: e. g., egg/embryo, larva, adult; Life-cycle stages: e. g., gametophyte, sporophyte |
Phases in the construction of an instrument |
Genetic polymorphism |
Sexes or blood types (= multiple alleles for a gene present within populations) |
Perhaps: decorative styles stretching across multiple instrument types and traditions |
A characteristic that is variable within a class is not necessarily uninformative for diagnostic purposes. If a plant has both red and white flowers and other plant species have yellow, blue, red, or white colors, specifying the flower color of an object reduces the set of remaining classes in identification. A description "flowers red or white" is a meaningful part of a diagnostic class description.
However, certain kinds of polymorphisms change highly systematically. A description "sex male" is meaningful for an object, but "sex female or male" is not a meaningful part of a class description, since the two sexes by definition occur together. Similarly, recording the presence of life stages may or may not be meaningful, depending on the taxonomic scope and whether all classes have a larval and an adult stage. This problem of character "saturation" (= all potential character states present) can be automatically detected if a character has been recorded either for all classes or for a sufficient sample of objects. It normally does not require the recording of additional information.
Another problem specific to intra-class variation is, however, more difficult to solve. Some of the characteristics already mentioned form an operational classification system. In biology these secondary systems are independent of the primary system of taxonomic names. The most frequently encountered examples of such "secondary classifiers" are designations of sex (male/female), generation (e. g., spring/summer), and life cycle or development stage (e. g., larva, adult). The values of such classifiers are not directly observable characters, but rather typify sets of correlated character expressions. Objects with different classifier values will have moderately or strongly different descriptions.
If for a secondary classifier like "sex" the object descriptions differ only in expected characteristics (the sex organs), the values of the classifiers are suppressed in class/taxon descriptions. Other weakly correlated characteristics (e. g., males being a little smaller than females) will be presented as a generalized description (e. g., as a size range including both sexes). However, when many diagnostically relevant characteristics (e. g. wing pattern of butterflies or bird plumage), or (for someone having some experience with a taxonomic group) unexpected characteristics) differ between sexes, separate descriptions will be prepared. This case, where more than the sexual organs differ between sexes, is called "sexual dimorphism".
Again, however, the values will not be part of the description, but will be used to group or structure the descriptions. Depending on the amount of differences, the grouping may precede the class/taxon name be a subheading within class/taxon descriptions, or only an annotation at individual descriptive statements (Table 2). Furthermore, if different sexes or life cycle stages are keyed out separately in diagnostic keys, the classifier values are usually added to the name that is keyed out.
Table 2. Examples for different presentations of sex and life cycle stage classifiers.
Stage grouping preceding class/taxon name |
Stage as subheading within description |
Sex as annotation within description |
Larval descriptions Colias alfacariensis Ribbe 1905 Colias crocea (Geoffroy, 1785) Adult butterfly Colias alfacariensis Ribbe 1905 Colias crocea (Geoffroy, 1785) |
Colias alfacariensis Ribbe 1905 Distribution: … Common characteristics: … Larva: … Adult (imago): … |
Colias alfacariensis Ribbe 1905 … Larva: Size … body green, … Adult (imago): … Size …, wings white (females) or clouded yellow (males) |
Storing the information about classifiers as character data is satisfactorily for object descriptions, but not for class descriptions. Although sets of correlated characters can be detected algorithmically, it is very difficult to impossible to detect which of the correlated "characters" are truly observable characters, and which "characters" summarize and generalize sets of character correlations.
Before proposing an information model for secondary classifiers like sex, generation, or stages, it must first be decided whether it is appropriate to generalize these to a single concept. As a first step, definitions of the most important classifier concepts in biology will be discussed.
Many organism have a breeding system involving multiple mating types as a mechanism to improve outcrossing (= prevent or reduce inbreeding). Mating types may be classified as: Sex and morphological or physiological self-incompatibility systems. Note that instead of using "mating type" as a generalized term (i. e. including sex), many authors use it when referring to compatibility types (this statement is based on a pers. study using internet search mechanisms). The reason for the latter usage mainly seems to be that authors work on taxonomic groups that do not show a differentiation into sexes (e. g., yeasts).
In biological usage, sex is defined as the sum of morphological and behavioral features that distinguish organisms on the basis of their reproductive function (EB 2001, CED 1992). The concept of sex is limited to two different sexes ("male", "female"); however, the combination "hermaphrodite" (a single organism being both male and female) and the absence or sex may also be considered states. In contrast, the number of compatibility types differs strongly among organism groups, as do the names used for individual types (e. g., "+"/"–", "A"/"a"/"alpha", "b1"/"b2"/"b3"). All mating types are usually genetically determined (an exception is, e. g., the marine worm Bonellia with environmental sex-determination, EB 2001).
In many animals, either sex is the only mating type, or sex and self-incompatibility system are always correlated. The difference between the two concepts can, e. g., be seen in plants like Nicotiana that are sexual hermaphrodites in having both anthers and gynoecium in each individual, but have a physiological self-incompatibility system to prevent inbreeding. Similarly, fungi may produce differentiated male and female organs on the same thallus but remain self-incompatible (heterothallic) due to a separate physiological self-incompatibility system. Most fungi or algae have no morphologically identifiable sex system and are classified only according to their self-incompatibility system (which is often only called "mating type").
An example of a morphological self-incompatibility system (= heteromorphy) is the heterostyly in plants (e. g., in Primula species: distyly or in Lythrum salicaria and Eichhornia: tristyly). This mechanism is independent of the sex system, but closely linked with a physiological incompatibility system where present (Richards 1986).
The term generation is relative consistently used in biology and involves a cycle of reproduction. Although different generations are often genetically different (especially after sexual reproduction), this is not a necessity. Reproduction may be vegetative (e. g., parts of a plant break off, are dispersed, and root again forming the next generation). In single celled organisms generation and cell division are synonymous. The essential definition of "generation" thus denotes a reduction to dispersal or persistence stage and the consequential regrowth of the full organism.
Life cycle or developmental stages always denote an aspect of temporal development. Life cycle may be defined as "the series of changes in the life of an organism, including reproduction" (EB 2001: dictionary). Two kinds of life cycles exist (EB 2001: "life cycle"):
Within a single generation developmental stages (or phases) may occur. These may either partition a continuous variation (e. g., embryo, baby, youth, and adult) or may relate to distinct structural changes (e. g., egg, larva, pupa, imago in holometabolic insects). The term "life cycle stage" is often used as a synonym of developmental stage (which conforms to the dictionary definition cited above). This causes no problem in organisms that complete their life cycle in a single generation, but appears unfortunate in organisms having a multigenerational life cycle but also developmental stages.
In the case of multigenerational life cycles, the term "life cycle stage" is dominant over the use of "generation". For example, in the red algae Polysiphonia the haploid generation (gametophyte) is differentiated into male and female individuals, the following two diploid generations (carposporophyte, tetrasporophyte) are not sexually differentiated. All three generations are considered "life cycle stages". The practical use of "generation" as a classifier concept is thus restricted to organisms with a single-generational life cycle. An example are the spring and summer generations of some butterflies that are markedly differently colored, e. g. "Araschnia levana gen. vern." versus "A. levana gen. aest." (seasonal dimorphism).
A special problem is the dikaryotization of many basidiomycetes. After the sexual partners have fused, the new nucleus divides and propagates itself through an existing cellular structure (the previously monokaryotic hyphae). It is unclear whether this should be considered a generation because of the genetic change, a life cycle stage because of the change in ploidy, or a developmental stage.
This dikaryotization is also involved in the life cycle of the rust fungi, which is a good practical "data challenge" for modeling the classifiers. The entire life cycle of many rust fungi (e. g., Puccinia graminis) includes five different spore types (pycnospores, aeciospores, uredospores, teliospores, basidiospores). Each spore type has to be described separately and thus needs a classifier to distinguish the descriptions. The spore types relate to two full generations (1. pycnospores + aeciospores, 2. uredospores + teliospores) plus one reduced generation (the basidiospore-producing phragmobasidium after germination of teliospore). The first generation is initially monokaryotic, but is later dikaryotized in a sexual process in which the pycnospores function as gametes. It then produces dikaryotic aeciospores that create the second generation on an alternative host plant. In this second generation, the uredospores create new infections that are second generation individuals indistinguishable from those created by aeciospores. Thus, a secondary epidemic life cycle exists in addition to the complete life cycle involving the other spore stages. Thus, in rust fungi the dominant classifier concept involves aspects of developmental stages, generations, and sex.
[@@@@It is an important point to find further cases to be able to decide on an appropriate generalization term for these "classifier concepts". The best strategy to find additional cases is to imagine what groupings within a taxon might be keyed out separately in keys. I can imagine that keys may also key out morphological variants that have no taxonomic rank. Can anybody provide an example?]
Other concepts that exhibit similar classification or grouping properties in descriptive data are:
● Social insects such as ants, bees, termites, and wasps have morphologically differentiated individuals belonging to different castes (queen, workers, soldiers, etc.). The castes are a polymorphism between generations which cannot be treated as life cycle stages, because most individuals are sterile and die without progeny. Instead, they may be viewed as polymorphic generations. The individual differences are caused by responses to nutrition during early development, i. e. to environmental factors. In contrast to the seasonal dimorphism, however, the frequencies of individuals in a population are largely under genetical control because the environment itself is controlled by the behavior of the population.
● Descriptions may be based on living or dead material. Many characteristics can only be observed when living (e. g., in Orbilia, see Baral 1992).
● Descriptions based of different preservation methods, such as drying or ethanol conservation.
The various classifier concepts discussed above all describe why multiple classes of descriptions may exist within the most specific class defined in the primary classification system. It seems advisable for a descriptive data information model to provide a generalized mechanism rather than individually treating specific classification systems like sex, life cycle stages, etc.
● The number of secondary classification systems is relatively large
● The model would become specific to biological descriptions
● The individual classification systems may be interrelated in complex ways as has been shown in the example of the castes of social insects or the spore stages of rust fungi.
No existing generalized term for such classifier concepts discussed could be found. An internet search for a generalized name for at least sex, generation, and life cycle stages was unsuccessful. The following definition is therefore proposed:
[@@@@Request to reviewers: Please inform me, if you know of a discussion of this problem!]
Secondary classifiers = a classification that may be required in addition to the primary class names (which may in biology be taxon names or non-taxonomic names like disease names). Secondary classifiers provide an opportunity to add further naming dimensions to the descriptions. They are, however, not necessarily nested within the primary class names. Multiple secondary classifiers (and for a single classifier concept, multiple values = states) can be added to each class name reference.
[@@@@ I am not entirely happy and find "secondary classifiers" not truly intuitive. It is the best I could come up with!
Also note: the last point is debatable. It will occur primarily if descriptions are generalized. For example, the descriptions of the second, third, and fourth instar may be so similar, that they are joined in a single generalized description. This would, however, be a non-persistent report. It is unclear whether such data would also have to be recorded. Please comment!]
Annotated collection of other candidate terms [@@@@Please comment or add!]:
* "Classifiers" alone is too general. * "Non-taxonomic classifiers" is inappropriate, the primary class names may already be non-taxonomic, as in disease names (also SDD aims to create a general model applicable without reference to biological terminology). * "Determinants/classification determinants"? * "Description classifiers" – perhaps more intuitive than "secondary"? * "Phenotypical classifiers" would be confusing, since phenotypic is usually considered and antonym to genotypic. Classifier concepts may be phenotypic (environmental sex determination), genotypic (genotypic sex determination), or ontogenetic (development stages).A confusing aspect of classifiers is that – although the values do not contribute to the class descriptions – the existence of values or their frequency is part of the descriptive knowledge expressed in descriptive databases. The frequency of males and females is a property of classes/taxa (e. g., in social hymenoptera), and different classes/taxa may have different development or life cycle stages (e. g., reduced forms of the full heteroecious rust life cycle, or neoteny in animals). Such information may perhaps be considered separate characters, i. e. distinct from a "secondary classifier" mechanism.
In theory, the frequency of sexes could be calculated from descriptions that have a male/female sex classification. In practice this will not be possible, since the sampling of descriptions in a database (and of specimens in a collection) is highly non-random. Although presence/absence suffers less from sampling bias, complete and systematic bias (e. g., the database contains only adults) is not infrequent. Thus, classifier-related characters will normally have to be recorded independently from the data recorded in some kind of classifier mechanism.
Note that some classifier-related characters are often omitted from descriptions optimized for identification, because they are inconvenient to study (e. g., requiring observation over prolonged periods or population sampling). This is, however, no unique property of classifier-related characters. In the SDD model, the convenience of a character for identification purposes is separately recorded ("rated", compare section @@@@). Furthermore, some classifier-related characters are quite convenient, e. g. "sex status" with the states "monoclinous (having male and female organs in the same flower)" and "diclinous (in different flowers)".
On the other hand, classifier-related characters have an influence on classifiers. If a "life cycle type" character of plants has the states "annual, biennial, perennial", a possible life cycle stage "plant in the second year" is inapplicable for "annual". Similarly, if "heterostyly" has the states "monostylous, bistylous, tristylous", and a related heterostyly classifier the values "short, medium, long style", the entire classifier would not be applicable for heterostyly = monostylous, and only the values "short" and "long style" would be applicable for heterostyly = bistylous.
The special properties of sex, generation, life cycle stages are not discussed in the CSIRO DELTA or
Secondary classifiers like sex and life cycle stages may be considered part of the class name, i. e nested within the taxonomic hierarchy. In applications based on the DELTA information model, the item names for larvae and adults of the monarch butterfly may be "Danaus plexippus (larvae)" and "Danaus plexippus (imago)". Some databases may even treat them explicitly as "pseudo-ranks" (see Bob Morris' comment in http://wiki.cs.umb.edu/twiki/bin/view/SDD/TheProblemOfSex).
If added to the item name, DELTA applications will not be able to distinguish the added classifier information from an infraspecific taxon. An advantage of this method is that it allows using the "variant item" mechanism: In addition to a main item description, additional descriptions containing only those characters that differ from the main item may be added as variant items in DELTA. This can be used to simplify the recording of those parts of the descriptions that differ according to classifier values. However, since the variant item mechanism is limited to a single hierarchical level, it is not possible to treat sexes of infraspecific items using this mechanism (or the mechanism is not available for infraspecific taxa).
Figure 1. Treating sex as an infraspecific taxon works well on the side of descriptions, but requires to add two new "pseudo-taxa" to each taxon, both in the list of class (= taxon) names (which is referenced by descriptions) and in the class hierarchy.
In a system like DELTA that implements the name of description as an unconstrained string, adding sex or stage information to the name is a feasible solution. However, if the class names of descriptions are formalized and handled through references to a formal list of class names (which in SDD provides only local proxy objects, that again reference external nomenclatural databases), this approach soon becomes highly undesirable (Fig. 1). The following major problems can be identified:
● For each identifiable class additional dependent classes for each sex or life cycle stage must be introduced. Furthermore, it is possible to identify the sex and stage of a butterfly as female imago, but the taxon only to family level. If classifiers are handled as additional ranks of the taxonomic hierarchy, male/female and larva/imago "pseudo-taxa" would have to be added to higher taxa as well as to species or infraspecific taxa to allow such identifications.
● These additional "pseudo-classes" would also have to be added to the class hierarchy definition. This may be an automatic process, but formally the information that "has to be expressed. As humans we consider the fact that "Danaus plexippus (larvae)" can be generalized to "Danaus plexippus" automatic, but it involves a parsing of the string and semantic knowledge that allows us to distinguish between a classifier "(larvae)" and a taxonomic author name is the same position.
● The class name would become language specific. Taxonomic classes would require different names in German and English (this problem is not entirely specific to classifiers, it is generally present if diseases instead of taxa are described).[@@@@in fact I believe SDD 0.9 has a problem here, see Wiki topic LanguageSpecificClassNames (http://wiki.cs.umb.edu/twiki/bin/view/SDD/LanguageSpecificClassNames)!]
● The taxonomic hierarchy is naturally nested. Classifiers act as separate dimensions independent of this hierarchy (Figs. 2 and 3). Although in general any single dimension that is independent of a hierarchy may also be viewed as nested within the hierarchy, in the presence of more than one classifier arbitrary nesting will have to be made (Fig. 4).
Furthermore, the classifier dimensions may or may not be dependent (Fig. 5):
Another problem is that for reporting, the classifiers may have a higher grouping priority than the entire class hierarchy (e. g., for caterpillar and butterfly stages separate descriptions and diagnostic keys are presented, compare Table 2, p. 1). Although it is possible that software may support this, it is an operation unnatural for hierarchical arrangements and is not required for the naturally nested taxonomic hierarchy.
One possible solution would be to handle classifiers in an unconstrained string introduced in addition to the formal class name reference. This would avoid many problems noted above, but would not allow any classifier specific processing like producing generalized descriptions for sex, but not for stage.
|
|
Figure 4. Sex and stage arbitrarily nested inside the taxonomic hierarchy. Males and females of different taxa or stages are assumed to have no relation or similarity. |
Figure 5. The dimensions of sex and life cycle stages may be dependent and nested (top; e. g. red algae) or independent (bottom; e. g. butterflies) of each other. |
Secondary classifiers may be considered normal characters (as "shape" or "color"). This approach is probably rarely found in DELTA data sets (but compare the section "Classifier-related characters", above). However, Prometheus II (
Using normal characters to express classifier information has the advantage that applications have no additional implementation tasks because existing mechanisms are used. However, it has serious problems in that:
● Secondary classifiers are important factors when aggregating specimen data, or generalizing multiple taxon descriptions to higher taxon descriptions. If the aggregation/generalization algorithm can test which observations belong to secondary classifiers like sex or stage, it could make rule-based decisions whether to ignore sex or stage differences, or whether to create separate descriptions for them.
● The solution does not work for guided keys (e. g., larvae and adults of the monarch butterfly are keyed out in separate places in a single key, or in separate keys).
The first problem could be solved by defining an additional flag for certain state sets, indicating which define secondary classifiers. However, no satisfactory solution seems to exist for the problem of dealing with guided keys.
This solution is usually used in cases where the descriptions of different stages are drastically different, perhaps the stages are even structurally different (e. g. caterpillar and butterfly). An entirely separate set of characters is prepared for each stage (Fig. 6). Because of the fundamental differences, only a limited amount of characters (overall size, DNA) are truly duplicated. For these characters no generalization analysis is possible without adding additional information to the terminology.
An extreme case, where almost all characters are duplicated, is the description of the life cycle and spore stages of rust fungi (Fig. 7). The abstraction of the spore stages is highly desirable here, both for analytical and for identification purposes. During identification, several rust spore stages are difficult or impossible to distinguish based on their morphology alone.
Figure 6. Character × description matrix where development stages are expressed through separate sets of characters.
Figure 7. Character × description matrix where spore stages of rust fungi are expressed through separate sets of characters. Each set is assumed to contain the similar characters (length, width, shape, septation, wall thickness, surface ornamentation, etc.) that are specialized only through the spore stage they describe. One generalization dimension abstracts from objects to a class description. However, another desirable generalization dimension shown below the main matrix would generalize to a "generalized spore". The arrows show the generalization only for the first character in each set. The class description in the lower matrix combines both generalizations.
The introduction of a separate "secondary classifier" mechanism which is proposed for SDD is very similar to using normal characters to express secondary classifiers. The classifier characters are analogous to normal characters, but used in a separated context. This allows them to be treated differently when generalizing descriptive information (objects to class, classes to higher classes). Furthermore, the independent mechanism allows them to be added to the diagnostic keys as well.
The introduction of explicit secondary classifiers does not prevent the existence of classifier-specific character groups. However, they will only be necessary where structures or properties apply only to a certain sex or stage. In contrast to the character set model described in the previous section (Figs. 6 and 7), the existence of classifier does not force the duplication of characters (Fig. 8).
Figure 8. Character × description matrix where development stages are designated using a secondary classifier mechanism. Some characters are applicable only to certain stages, but other characters are common to different stages. The generalization algorithm providing the class descriptions from object descriptions has detected that the common characters for different stages are strongly different. Thus, separate, stage-specific class descriptions have been prepared.
[@@must be further expanded!@@]
Problem: Whether object identification should be with or without classifier needs discussion! In biology an collected object may have multiple stages (e. g. on a single herbarium sheet). These may or may not be described together. Is it meaningful to have classifiers at all at the object identification? Currently preliminary added there, but I wonder whether they should not be removed!
What is the implicit assumption for a definition of Object? It seems reasonable to define it not as a specimen, but as an individual genetic unit on a preservation unit like a herbarium sheet. If that is so, do we still need multiple values for a single secondary classifier concept?
Problem: Classifier information in keys may be specific to a class reference, which would be well handled by a classifier mechanism that is added to class references. However, equivalent information may apply to the entire key (which may only deal with larval stages of insects). Although this will be clear to humans from reading the key label, it would be highly desirable to also provide a machine-readable definition. Adding classifier information also to entire keys further complicates the model and evaluation of data. Is this avoidable in some other way of handling classifier information?
Class references are thought to need an additional secondary classifier mechanism in descriptions and diagnostic keys and they may be desirable in the identification of objects (= specimens; Fig. 9). Class references remain without classifiers in the definition of class (= taxonomic) hierarchy and class synonyms (Fig. 10).
Figure 9. Visualization of objects with class references that require an additional secondary classifier mechanism (sex, life cycle, or developmental stage).
Figure 10. Only the class references in the class hierarchy (in biology = taxonomic hierarchy) and for the definition of synonyms do not require the secondary classifier mechanism (compare Fig. 9).
[@@@@ The basic options are probably:
a) free-form field at the taxon result object in a key (which would have to be manually translated into each language)
b) specialized generalized "micro-description" facility for classifiers alone, at each keyed out or described taxon object
c) use normal characters and add some flagging to allow detection of classifiers; plus provide a "micro-description" facility in the keys (where no character data are normally available)
d) and provide a generalized ontology at the general concept/character state facility to recognize sex and stages (i. e. similar to proposal 2, above)?]
Secondary classifiers like sex and stage are handled by a specialized mechanism that is designed as an extension to standard class name references. The standard class reference type is used to model taxonomic hierarchy and synonymizations (which inside the description model only mirror data from external nomenclature or taxonomy providers). The extended class reference type provides an additional sequence of secondary classifier values. These are references to concept states defined at concept nodes. They do not refer to character states! The extended class reference type is used to (see Fig. 9):
TODO: Add new concept tree type "secondary classification concepts"
Question: which modifiers would be necessary at secondary classifiers?
--- (End of document pasted directly. Also a zipped [[%ATTACHURL%/D20_TempSecClassDraft3.zip][RTF version that includes all figures]] is provided.) -- Gregor Hagedorn - 26. April 2004 Polymorphisms, due to sexual differences, life cycle, developmental stages and other factors, frequently appear to be a problem, because we wish to assign the organisms to the same taxonomic category while the descriptions of the members of the category may vary widely. This conflict can be attributed to the interrelationship between function or purpose of the taxonomic category and the description. The taxonomic category can lead to a descriptions or a description may lead to the conclusion that an item belongs to a particular taxonomic category. This relationship however does not mean that the taxonomic category and the descriptions are synonymous or in most other aspects equivalent. The description is simply list of list of observable attributes. Groups of these attributes may appear only within individual “phases” of an organism while always being associated with the taxonomic category. All characteristics from any phase are “true” of the taxonomic category but we wish to organize them as to these correlated characters as well as make explicit the existence of the phases. While this may not help at any point in time for an identification it will help over time. Advantages of “phase” representation type: Each of the three representations “Stage grouping proceeding taxonomic name”, “Stage as subheading within a description”, and “Sex/stage as annotation within a description” have communicative advantages and disadvantages. I believe that, in the end for ease of writing, authors of descriptions will continue view the organization of these attribute in all three ways and more. “Stage grouping proceeding taxonomic name” has the advantage of allowing one to focus on the attributes of that one stage. After all, a person encountering one individual of a species will only encounter it in one of its “phases”. This allows a reader to pick a “phase” or “stage” first, to pick which key to use. For example, go to the “larval descriptions” part of the key if you have larva and not an adult. “Stage as subheading within a description” presumes the including of attributes from other “phases” or “stages” providing a more compete picture of a population over time. This representation also allows a reader to more easily compare “stages.” “Sex/stage as annotation within a description” is best for cases where most attributes are shared among the “stages” and only a few are dimorphic. Transformational equivalence: The choice we make in for the data structures is in part determined by an evaluation of the transformational equivalence of the representations. All other things being equal, we should choose the representation that can be mechanically transformed into all three frameworks above. It is the job of the application to make the transformations for the user. The current SDD framework and many other TDWG standards are taxonomic category –centric. They are organized around these concepts. This makes it difficult to represent the first type “Stage grouping proceeding taxonomic name”, in SDD as Gregor points out with the introduction of “pseudo-taxa” to address the issue of two sexes under the same taxa.[… we had quite a few discussions about sex and stage on the WIKI (e. g. http://wiki.cs.umb.edu/twiki/bin/view/BDI.SDD/TheProblemOfSex. I believe this is one of the solvable things and that it should now be solved in BDI.SDD …!
d296 1 a296 1 * "Non-taxonomic classifiers" is inappropriate, the primary class names may already be non-taxonomic, as in disease names (also BDI.SDD aims to create a general model applicable without reference to biological terminology). d308 1 a308 1Note that some classifier-related characters are often omitted from descriptions optimized for identification, because they are inconvenient to study (e. g., requiring observation over prolonged periods or population sampling). This is, however, no unique property of classifier-related characters. In the BDI.SDD model, the convenience of a character for identification purposes is separately recorded ("rated", compare section @@@@). Furthermore, some classifier-related characters are quite convenient, e. g. "sex status" with the states "monoclinous (having male and female organs in the same flower)" and "diclinous (in different flowers)".
d318 1 a318 1Secondary classifiers like sex and life cycle stages may be considered part of the class name, i. e nested within the taxonomic hierarchy. In applications based on the DELTA information model, the item names for larvae and adults of the monarch butterfly may be "Danaus plexippus (larvae)" and "Danaus plexippus (imago)". Some databases may even treat them explicitly as "pseudo-ranks" (see Bob Morris' comment in http://wiki.cs.umb.edu/twiki/bin/view/BDI.SDD/TheProblemOfSex).
d326 1 a326 1In a system like DELTA that implements the name of description as an unconstrained string, adding sex or stage information to the name is a feasible solution. However, if the class names of descriptions are formalized and handled through references to a formal list of class names (which in BDI.SDD provides only local proxy objects, that again reference external nomenclatural databases), this approach soon becomes highly undesirable (Fig. 1). The following major problems can be identified:
d332 1 a332 1● The class name would become language specific. Taxonomic classes would require different names in German and English (this problem is not entirely specific to classifiers, it is generally present if diseases instead of taxa are described).[@@@@in fact I believe BDI.SDD 0.9 has a problem here, see Wiki topic LanguageSpecificClassNames (http://wiki.cs.umb.edu/twiki/bin/view/BDI.SDD/LanguageSpecificClassNames)!]
d418 1 a418 1The introduction of a separate "secondary classifier" mechanism which is proposed for BDI.SDD is very similar to using normal characters to express secondary classifiers. The classifier characters are analogous to normal characters, but used in a separated context. This allows them to be treated differently when generalizing descriptive information (objects to class, classes to higher classes). Furthermore, the independent mechanism allows them to be added to the diagnostic keys as well.
d464 1 a464 1[… we had quite a few discussions about sex and stage on the WIKI (e. g. http://wiki.cs.umb.edu/twiki/bin/view/SDD/TheProblemOfSex. I believe this is one of the solvable things and that it should now be solved in SDD …!
d138 1 a138 1 e. g., egg/embryo, larva, adult;● Social insects such as ants, bees, termites, and wasps have morphologically differentiated individuals belonging to different castes (queen, workers, soldiers, etc.). The castes are a polymorphism between generations which cannot be treated as life cycle stages, because most individuals are sterile and die without progeny. Instead, they may be viewed as polymorphic generations. The individual differences are caused by responses to nutrition during early development, i. e. to environmental factors. In contrast to the seasonal dimorphism, however, the frequencies of individuals in a population are largely under genetical control because the environment itself is controlled by the behavior of the population.
d268 1 a268 1● Descriptions may be based on living or dead material. Many characteristics can only be observed when living (e. g., in Orbilia, see Baral 1992).
d270 1 a270 1● Descriptions based of different preservation methods, such as drying or ethanol conservation.
d277 1 a277 1● The number of secondary classification systems is relatively large
d279 1 a279 1● The model would become specific to biological descriptions
d281 1 a281 1● The individual classification systems may be interrelated in complex ways as has been shown in the example of the castes of social insects or the spore stages of rust fungi.
d295 5 a299 5 * "Classifiers" alone is too general. * "Non-taxonomic classifiers" is inappropriate, the primary class names may already be non-taxonomic, as in disease names (also SDD aims to create a general model applicable without reference to biological terminology). * "Determinants/classification determinants"? * "Description classifiers" – perhaps more intuitive than "secondary"? * "Phenotypical classifiers" would be confusing, since phenotypic is usually considered and antonym to genotypic. Classifier concepts may be phenotypic (environmental sex determination), genotypic (genotypic sex determination), or ontogenetic (development stages). d308 1 a308 1Note that some classifier-related characters are often omitted from descriptions optimized for identification, because they are inconvenient to study (e. g., requiring observation over prolonged periods or population sampling). This is, however, no unique property of classifier-related characters. In the SDD model, the convenience of a character for identification purposes is separately recorded ("rated", compare section @@@@). Furthermore, some classifier-related characters are quite convenient, e. g. "sex status" with the states "monoclinous (having male and female organs in the same flower)" and "diclinous (in different flowers)".
d318 1 a318 1Secondary classifiers like sex and life cycle stages may be considered part of the class name, i. e nested within the taxonomic hierarchy. In applications based on the DELTA information model, the item names for larvae and adults of the monarch butterfly may be "Danaus plexippus (larvae)" and "Danaus plexippus (imago)". Some databases may even treat them explicitly as "pseudo-ranks" (see Bob Morris' comment in http://wiki.cs.umb.edu/twiki/bin/view/SDD/TheProblemOfSex).
d326 1 a326 1In a system like DELTA that implements the name of description as an unconstrained string, adding sex or stage information to the name is a feasible solution. However, if the class names of descriptions are formalized and handled through references to a formal list of class names (which in SDD provides only local proxy objects, that again reference external nomenclatural databases), this approach soon becomes highly undesirable (Fig. 1). The following major problems can be identified:
d328 1 a328 1● For each identifiable class additional dependent classes for each sex or life cycle stage must be introduced. Furthermore, it is possible to identify the sex and stage of a butterfly as female imago, but the taxon only to family level. If classifiers are handled as additional ranks of the taxonomic hierarchy, male/female and larva/imago "pseudo-taxa" would have to be added to higher taxa as well as to species or infraspecific taxa to allow such identifications.
d330 1 a330 1● These additional "pseudo-classes" would also have to be added to the class hierarchy definition. This may be an automatic process, but formally the information that "has to be expressed. As humans we consider the fact that "Danaus plexippus (larvae)" can be generalized to "Danaus plexippus" automatic, but it involves a parsing of the string and semantic knowledge that allows us to distinguish between a classifier "(larvae)" and a taxonomic author name is the same position.
d332 1 a332 1● The class name would become language specific. Taxonomic classes would require different names in German and English (this problem is not entirely specific to classifiers, it is generally present if diseases instead of taxa are described).[@@@@in fact I believe SDD 0.9 has a problem here, see Wiki topic LanguageSpecificClassNames (http://wiki.cs.umb.edu/twiki/bin/view/SDD/LanguageSpecificClassNames)!]
d334 1 a334 1● The taxonomic hierarchy is naturally nested. Classifiers act as separate dimensions independent of this hierarchy (Figs. 2 and 3). Although in general any single dimension that is independent of a hierarchy may also be viewed as nested within the hierarchy, in the presence of more than one classifier arbitrary nesting will have to be made (Fig. 4).
d398 1 a398 1● Secondary classifiers are important factors when aggregating specimen data, or generalizing multiple taxon descriptions to higher taxon descriptions. If the aggregation/generalization algorithm can test which observations belong to secondary classifiers like sex or stage, it could make rule-based decisions whether to ignore sex or stage differences, or whether to create separate descriptions for them.
d400 1 a400 1● The solution does not work for guided keys (e. g., larvae and adults of the monarch butterfly are keyed out in separate places in a single key, or in separate keys).
d418 1 a418 1The introduction of a separate "secondary classifier" mechanism which is proposed for SDD is very similar to using normal characters to express secondary classifiers. The classifier characters are analogous to normal characters, but used in a separated context. This allows them to be treated differently when generalizing descriptive information (objects to class, classes to higher classes). Furthermore, the independent mechanism allows them to be added to the diagnostic keys as well.
d464 1 a464 1[… we had quite a few discussions about sex and stage on the WIKI (e. g. http://efgblade.cs.umb.edu/twiki/bin/view/SDD/TheProblemOfSex. I believe this is one of the solvable things and that it should now be solved in SDD …!
Currently I have worked through the beginning several times, but it is getting rough towards the end, when we come to conclusions and proposals. I am still undecided what the best strategy is, and I hope that you can help me with some comments, including your feelings on this…]
When objects are classified to the most specific level recognized in the class hierarchy (in biology = species, subspecies, or variety), their descriptions are still not necessarily identical. Some differences are due to random effects in the individual history of an object, others are however systematically repeatable (and, in biology, genetically coded). The most important types of genetically coded intra-class variation are polymorphisms and systematic changes occurring during developmental or life-cycle stages of an object (Table 1).
Table 1. Examples for classification systems and sources of intra-class-variation in biology and the study of musical instruments
Classification system |
Biological organisms |
Musical instruments |
Phylogenetic / Inherited (→ multiple characteristics are linked) |
Evolutionary history /taxonomic classification (e. g., order/family/genus) |
Craftsmanship, technological, or industrial traditions of instrument creation |
Operational (arbitrarily based on a single characteristic) |
Tree/shrub/herb, water vs. land plants |
Sachs-Hornbostel (idiophones, membranophones, chordophones, aerophones, electrophones) |
|
|
|
Source of further variation |
Biological organisms |
Musical instruments |
Individual history |
|
|
a) chance effects |
Scarring of skin, mutilations |
Scratching, discoloration |
b) systematic responses |
Phenotypic responses like flowering time or variable shape to maximize resource utilization, |
Response to humidity or submerging in water |
c) essential and |
Developmental stages: e. g., egg/embryo, larva, adult; Life-cycle stages: e. g., gametophyte, sporophyte |
Phases in the construction of an instrument |
Genetic polymorphism |
Sexes or blood types (= multiple alleles for a gene present within populations) |
Perhaps: decorative styles stretching across multiple instrument types and traditions |
A characteristic that is variable within a class is not necessarily uninformative for diagnostic purposes. If a plant has both red and white flowers and other plant species have yellow, blue, red, or white colors, specifying the flower color of an object reduces the set of remaining classes in identification. A description "flowers red or white" is a meaningful part of a diagnostic class description.
However, certain kinds of polymorphisms change highly systematically. A description "sex male" is meaningful for an object, but "sex female or male" is not a meaningful part of a class description, since the two sexes by definition occur together. Similarly, recording the presence of life stages may or may not be meaningful, depending on the taxonomic scope and whether all classes have a larval and an adult stage. This problem of character "saturation" (= all potential character states present) can be automatically detected if a character has been recorded either for all classes or for a sufficient sample of objects. It normally does not require the recording of additional information.
Another problem specific to intra-class variation is, however, more difficult to solve. Some of the characteristics already mentioned form an operational classification system. In biology these secondary systems are independent of the primary system of taxonomic names. The most frequently encountered examples of such "secondary classifiers" are designations of sex (male/female), generation (e. g., spring/summer), and life cycle or development stage (e. g., larva, adult). The values of such classifiers are not directly observable characters, but rather typify sets of correlated character expressions. Objects with different classifier values will have moderately or strongly different descriptions.
If for a secondary classifier like "sex" the object descriptions differ only in expected characteristics (the sex organs), the values of the classifiers are suppressed in class/taxon descriptions. Other weakly correlated characteristics (e. g., males being a little smaller than females) will be presented as a generalized description (e. g., as a size range including both sexes). However, when many diagnostically relevant characteristics (e. g. wing pattern of butterflies or bird plumage), or (for someone having some experience with a taxonomic group) unexpected characteristics) differ between sexes, separate descriptions will be prepared. This case, where more than the sexual organs differ between sexes, is called "sexual dimorphism".
Again, however, the values will not be part of the description, but will be used to group or structure the descriptions. Depending on the amount of differences, the grouping may precede the class/taxon name be a subheading within class/taxon descriptions, or only an annotation at individual descriptive statements (Table 2). Furthermore, if different sexes or life cycle stages are keyed out separately in diagnostic keys, the classifier values are usually added to the name that is keyed out.
Table 2. Examples for different presentations of sex and life cycle stage classifiers.
Stage grouping preceding class/taxon name |
Stage as subheading within description |
Sex as annotation within description |
Larval descriptions Colias alfacariensis Ribbe 1905 Colias crocea (Geoffroy, 1785) Adult butterfly Colias alfacariensis Ribbe 1905 Colias crocea (Geoffroy, 1785) |
Colias alfacariensis Ribbe 1905 Distribution: … Common characteristics: … Larva: … Adult (imago): … |
Colias alfacariensis Ribbe 1905 … Larva: Size … body green, … Adult (imago): … Size …, wings white (females) or clouded yellow (males) |
Storing the information about classifiers as character data is satisfactorily for object descriptions, but not for class descriptions. Although sets of correlated characters can be detected algorithmically, it is very difficult to impossible to detect which of the correlated "characters" are truly observable characters, and which "characters" summarize and generalize sets of character correlations.
Before proposing an information model for secondary classifiers like sex, generation, or stages, it must first be decided whether it is appropriate to generalize these to a single concept. As a first step, definitions of the most important classifier concepts in biology will be discussed.
Many organism have a breeding system involving multiple mating types as a mechanism to improve outcrossing (= prevent or reduce inbreeding). Mating types may be classified as: Sex and morphological or physiological self-incompatibility systems. Note that instead of using "mating type" as a generalized term (i. e. including sex), many authors use it when referring to compatibility types (this statement is based on a pers. study using internet search mechanisms). The reason for the latter usage mainly seems to be that authors work on taxonomic groups that do not show a differentiation into sexes (e. g., yeasts).
In biological usage, sex is defined as the sum of morphological and behavioral features that distinguish organisms on the basis of their reproductive function (EB 2001, CED 1992). The concept of sex is limited to two different sexes ("male", "female"); however, the combination "hermaphrodite" (a single organism being both male and female) and the absence or sex may also be considered states. In contrast, the number of compatibility types differs strongly among organism groups, as do the names used for individual types (e. g., "+"/"–", "A"/"a"/"alpha", "b1"/"b2"/"b3"). All mating types are usually genetically determined (an exception is, e. g., the marine worm Bonellia with environmental sex-determination, EB 2001).
In many animals, either sex is the only mating type, or sex and self-incompatibility system are always correlated. The difference between the two concepts can, e. g., be seen in plants like Nicotiana that are sexual hermaphrodites in having both anthers and gynoecium in each individual, but have a physiological self-incompatibility system to prevent inbreeding. Similarly, fungi may produce differentiated male and female organs on the same thallus but remain self-incompatible (heterothallic) due to a separate physiological self-incompatibility system. Most fungi or algae have no morphologically identifiable sex system and are classified only according to their self-incompatibility system (which is often only called "mating type").
An example of a morphological self-incompatibility system (= heteromorphy) is the heterostyly in plants (e. g., in Primula species: distyly or in Lythrum salicaria and Eichhornia: tristyly). This mechanism is independent of the sex system, but closely linked with a physiological incompatibility system where present (Richards 1986).
The term generation is relative consistently used in biology and involves a cycle of reproduction. Although different generations are often genetically different (especially after sexual reproduction), this is not a necessity. Reproduction may be vegetative (e. g., parts of a plant break off, are dispersed, and root again forming the next generation). In single celled organisms generation and cell division are synonymous. The essential definition of "generation" thus denotes a reduction to dispersal or persistence stage and the consequential regrowth of the full organism.
Life cycle or developmental stages always denote an aspect of temporal development. Life cycle may be defined as "the series of changes in the life of an organism, including reproduction" (EB 2001: dictionary). Two kinds of life cycles exist (EB 2001: "life cycle"):
Within a single generation developmental stages (or phases) may occur. These may either partition a continuous variation (e. g., embryo, baby, youth, and adult) or may relate to distinct structural changes (e. g., egg, larva, pupa, imago in holometabolic insects). The term "life cycle stage" is often used as a synonym of developmental stage (which conforms to the dictionary definition cited above). This causes no problem in organisms that complete their life cycle in a single generation, but appears unfortunate in organisms having a multigenerational life cycle but also developmental stages.
In the case of multigenerational life cycles, the term "life cycle stage" is dominant over the use of "generation". For example, in the red algae Polysiphonia the haploid generation (gametophyte) is differentiated into male and female individuals, the following two diploid generations (carposporophyte, tetrasporophyte) are not sexually differentiated. All three generations are considered "life cycle stages". The practical use of "generation" as a classifier concept is thus restricted to organisms with a single-generational life cycle. An example are the spring and summer generations of some butterflies that are markedly differently colored, e. g. "Araschnia levana gen. vern." versus "A. levana gen. aest." (seasonal dimorphism).
A special problem is the dikaryotization of many basidiomycetes. After the sexual partners have fused, the new nucleus divides and propagates itself through an existing cellular structure (the previously monokaryotic hyphae). It is unclear whether this should be considered a generation because of the genetic change, a life cycle stage because of the change in ploidy, or a developmental stage.
This dikaryotization is also involved in the life cycle of the rust fungi, which is a good practical "data challenge" for modeling the classifiers. The entire life cycle of many rust fungi (e. g., Puccinia graminis) includes five different spore types (pycnospores, aeciospores, uredospores, teliospores, basidiospores). Each spore type has to be described separately and thus needs a classifier to distinguish the descriptions. The spore types relate to two full generations (1. pycnospores + aeciospores, 2. uredospores + teliospores) plus one reduced generation (the basidiospore-producing phragmobasidium after germination of teliospore). The first generation is initially monokaryotic, but is later dikaryotized in a sexual process in which the pycnospores function as gametes. It then produces dikaryotic aeciospores that create the second generation on an alternative host plant. In this second generation, the uredospores create new infections that are second generation individuals indistinguishable from those created by aeciospores. Thus, a secondary epidemic life cycle exists in addition to the complete life cycle involving the other spore stages. Thus, in rust fungi the dominant classifier concept involves aspects of developmental stages, generations, and sex.
[@@@@It is an important point to find further cases to be able to decide on an appropriate generalization term for these "classifier concepts". The best strategy to find additional cases is to imagine what groupings within a taxon might be keyed out separately in keys. I can imagine that keys may also key out morphological variants that have no taxonomic rank. Can anybody provide an example?]
Other concepts that exhibit similar classification or grouping properties in descriptive data are:
● Social insects such as ants, bees, termites, and wasps have morphologically differentiated individuals belonging to different castes (queen, workers, soldiers, etc.). The castes are a polymorphism between generations which cannot be treated as life cycle stages, because most individuals are sterile and die without progeny. Instead, they may be viewed as polymorphic generations. The individual differences are caused by responses to nutrition during early development, i. e. to environmental factors. In contrast to the seasonal dimorphism, however, the frequencies of individuals in a population are largely under genetical control because the environment itself is controlled by the behavior of the population.
● Descriptions may be based on living or dead material. Many characteristics can only be observed when living (e. g., in Orbilia, see Baral 1992).
● Descriptions based of different preservation methods, such as drying or ethanol conservation.
The various classifier concepts discussed above all describe why multiple classes of descriptions may exist within the most specific class defined in the primary classification system. It seems advisable for a descriptive data information model to provide a generalized mechanism rather than individually treating specific classification systems like sex, life cycle stages, etc.
● The number of secondary classification systems is relatively large
● The model would become specific to biological descriptions
● The individual classification systems may be interrelated in complex ways as has been shown in the example of the castes of social insects or the spore stages of rust fungi.
No existing generalized term for such classifier concepts discussed could be found. An internet search for a generalized name for at least sex, generation, and life cycle stages was unsuccessful. The following definition is therefore proposed:
[@@@@Request to reviewers: Please inform me, if you know of a discussion of this problem!]
Secondary classifiers = a classification that may be required in addition to the primary class names (which may in biology be taxon names or non-taxonomic names like disease names). Secondary classifiers provide an opportunity to add further naming dimensions to the descriptions. They are, however, not necessarily nested within the primary class names. Multiple secondary classifiers (and for a single classifier concept, multiple values = states) can be added to each class name reference.
[@@@@ I am not entirely happy and find "secondary classifiers" not truly intuitive. It is the best I could come up with!
Also note: the last point is debatable. It will occur primarily if descriptions are generalized. For example, the descriptions of the second, third, and fourth instar may be so similar, that they are joined in a single generalized description. This would, however, be a non-persistent report. It is unclear whether such data would also have to be recorded. Please comment!]
Annotated collection of other candidate terms [@@@@Please comment or add!]:
* "Classifiers" alone is too general. * "Non-taxonomic classifiers" is inappropriate, the primary class names may already be non-taxonomic, as in disease names (also SDD aims to create a general model applicable without reference to biological terminology). * "Determinants/classification determinants"? * "Description classifiers" – perhaps more intuitive than "secondary"? * "Phenotypical classifiers" would be confusing, since phenotypic is usually considered and antonym to genotypic. Classifier concepts may be phenotypic (environmental sex determination), genotypic (genotypic sex determination), or ontogenetic (development stages).A confusing aspect of classifiers is that – although the values do not contribute to the class descriptions – the existence of values or their frequency is part of the descriptive knowledge expressed in descriptive databases. The frequency of males and females is a property of classes/taxa (e. g., in social hymenoptera), and different classes/taxa may have different development or life cycle stages (e. g., reduced forms of the full heteroecious rust life cycle, or neoteny in animals). Such information may perhaps be considered separate characters, i. e. distinct from a "secondary classifier" mechanism.
In theory, the frequency of sexes could be calculated from descriptions that have a male/female sex classification. In practice this will not be possible, since the sampling of descriptions in a database (and of specimens in a collection) is highly non-random. Although presence/absence suffers less from sampling bias, complete and systematic bias (e. g., the database contains only adults) is not infrequent. Thus, classifier-related characters will normally have to be recorded independently from the data recorded in some kind of classifier mechanism.
Note that some classifier-related characters are often omitted from descriptions optimized for identification, because they are inconvenient to study (e. g., requiring observation over prolonged periods or population sampling). This is, however, no unique property of classifier-related characters. In the SDD model, the convenience of a character for identification purposes is separately recorded ("rated", compare section @@@@). Furthermore, some classifier-related characters are quite convenient, e. g. "sex status" with the states "monoclinous (having male and female organs in the same flower)" and "diclinous (in different flowers)".
On the other hand, classifier-related characters have an influence on classifiers. If a "life cycle type" character of plants has the states "annual, biennial, perennial", a possible life cycle stage "plant in the second year" is inapplicable for "annual". Similarly, if "heterostyly" has the states "monostylous, bistylous, tristylous", and a related heterostyly classifier the values "short, medium, long style", the entire classifier would not be applicable for heterostyly = monostylous, and only the values "short" and "long style" would be applicable for heterostyly = bistylous.
The special properties of sex, generation, life cycle stages are not discussed in the CSIRO DELTA or
Secondary classifiers like sex and life cycle stages may be considered part of the class name, i. e nested within the taxonomic hierarchy. In applications based on the DELTA information model, the item names for larvae and adults of the monarch butterfly may be "Danaus plexippus (larvae)" and "Danaus plexippus (imago)". Some databases may even treat them explicitly as "pseudo-ranks" (see Bob Morris' comment in http://efgblade.cs.umb.edu/twiki/bin/view/SDD/TheProblemOfSex).
If added to the item name, DELTA applications will not be able to distinguish the added classifier information from an infraspecific taxon. An advantage of this method is that it allows using the "variant item" mechanism: In addition to a main item description, additional descriptions containing only those characters that differ from the main item may be added as variant items in DELTA. This can be used to simplify the recording of those parts of the descriptions that differ according to classifier values. However, since the variant item mechanism is limited to a single hierarchical level, it is not possible to treat sexes of infraspecific items using this mechanism (or the mechanism is not available for infraspecific taxa).
Figure 1. Treating sex as an infraspecific taxon works well on the side of descriptions, but requires to add two new "pseudo-taxa" to each taxon, both in the list of class (= taxon) names (which is referenced by descriptions) and in the class hierarchy.
In a system like DELTA that implements the name of description as an unconstrained string, adding sex or stage information to the name is a feasible solution. However, if the class names of descriptions are formalized and handled through references to a formal list of class names (which in SDD provides only local proxy objects, that again reference external nomenclatural databases), this approach soon becomes highly undesirable (Fig. 1). The following major problems can be identified:
● For each identifiable class additional dependent classes for each sex or life cycle stage must be introduced. Furthermore, it is possible to identify the sex and stage of a butterfly as female imago, but the taxon only to family level. If classifiers are handled as additional ranks of the taxonomic hierarchy, male/female and larva/imago "pseudo-taxa" would have to be added to higher taxa as well as to species or infraspecific taxa to allow such identifications.
● These additional "pseudo-classes" would also have to be added to the class hierarchy definition. This may be an automatic process, but formally the information that "has to be expressed. As humans we consider the fact that "Danaus plexippus (larvae)" can be generalized to "Danaus plexippus" automatic, but it involves a parsing of the string and semantic knowledge that allows us to distinguish between a classifier "(larvae)" and a taxonomic author name is the same position.
● The class name would become language specific. Taxonomic classes would require different names in German and English (this problem is not entirely specific to classifiers, it is generally present if diseases instead of taxa are described).[@@@@in fact I believe SDD 0.9 has a problem here, see Wiki topic LanguageSpecificClassNames (http://efgblade.cs.umb.edu/twiki/bin/view/SDD/LanguageSpecificClassNames)!]
● The taxonomic hierarchy is naturally nested. Classifiers act as separate dimensions independent of this hierarchy (Figs. 2 and 3). Although in general any single dimension that is independent of a hierarchy may also be viewed as nested within the hierarchy, in the presence of more than one classifier arbitrary nesting will have to be made (Fig. 4).
Furthermore, the classifier dimensions may or may not be dependent (Fig. 5):
Another problem is that for reporting, the classifiers may have a higher grouping priority than the entire class hierarchy (e. g., for caterpillar and butterfly stages separate descriptions and diagnostic keys are presented, compare Table 2, p. 1). Although it is possible that software may support this, it is an operation unnatural for hierarchical arrangements and is not required for the naturally nested taxonomic hierarchy.
One possible solution would be to handle classifiers in an unconstrained string introduced in addition to the formal class name reference. This would avoid many problems noted above, but would not allow any classifier specific processing like producing generalized descriptions for sex, but not for stage.
|
|
Figure 4. Sex and stage arbitrarily nested inside the taxonomic hierarchy. Males and females of different taxa or stages are assumed to have no relation or similarity. |
Figure 5. The dimensions of sex and life cycle stages may be dependent and nested (top; e. g. red algae) or independent (bottom; e. g. butterflies) of each other. |
Secondary classifiers may be considered normal characters (as "shape" or "color"). This approach is probably rarely found in DELTA data sets (but compare the section "Classifier-related characters", above). However, Prometheus II (
Using normal characters to express classifier information has the advantage that applications have no additional implementation tasks because existing mechanisms are used. However, it has serious problems in that:
● Secondary classifiers are important factors when aggregating specimen data, or generalizing multiple taxon descriptions to higher taxon descriptions. If the aggregation/generalization algorithm can test which observations belong to secondary classifiers like sex or stage, it could make rule-based decisions whether to ignore sex or stage differences, or whether to create separate descriptions for them.
● The solution does not work for guided keys (e. g., larvae and adults of the monarch butterfly are keyed out in separate places in a single key, or in separate keys).
The first problem could be solved by defining an additional flag for certain state sets, indicating which define secondary classifiers. However, no satisfactory solution seems to exist for the problem of dealing with guided keys.
This solution is usually used in cases where the descriptions of different stages are drastically different, perhaps the stages are even structurally different (e. g. caterpillar and butterfly). An entirely separate set of characters is prepared for each stage (Fig. 6). Because of the fundamental differences, only a limited amount of characters (overall size, DNA) are truly duplicated. For these characters no generalization analysis is possible without adding additional information to the terminology.
An extreme case, where almost all characters are duplicated, is the description of the life cycle and spore stages of rust fungi (Fig. 7). The abstraction of the spore stages is highly desirable here, both for analytical and for identification purposes. During identification, several rust spore stages are difficult or impossible to distinguish based on their morphology alone.
Figure 6. Character × description matrix where development stages are expressed through separate sets of characters.
Figure 7. Character × description matrix where spore stages of rust fungi are expressed through separate sets of characters. Each set is assumed to contain the similar characters (length, width, shape, septation, wall thickness, surface ornamentation, etc.) that are specialized only through the spore stage they describe. One generalization dimension abstracts from objects to a class description. However, another desirable generalization dimension shown below the main matrix would generalize to a "generalized spore". The arrows show the generalization only for the first character in each set. The class description in the lower matrix combines both generalizations.
The introduction of a separate "secondary classifier" mechanism which is proposed for SDD is very similar to using normal characters to express secondary classifiers. The classifier characters are analogous to normal characters, but used in a separated context. This allows them to be treated differently when generalizing descriptive information (objects to class, classes to higher classes). Furthermore, the independent mechanism allows them to be added to the diagnostic keys as well.
The introduction of explicit secondary classifiers does not prevent the existence of classifier-specific character groups. However, they will only be necessary where structures or properties apply only to a certain sex or stage. In contrast to the character set model described in the previous section (Figs. 6 and 7), the existence of classifier does not force the duplication of characters (Fig. 8).
Figure 8. Character × description matrix where development stages are designated using a secondary classifier mechanism. Some characters are applicable only to certain stages, but other characters are common to different stages. The generalization algorithm providing the class descriptions from object descriptions has detected that the common characters for different stages are strongly different. Thus, separate, stage-specific class descriptions have been prepared.
[@@must be further expanded!@@]
Problem: Whether object identification should be with or without classifier needs discussion! In biology an collected object may have multiple stages (e. g. on a single herbarium sheet). These may or may not be described together. Is it meaningful to have classifiers at all at the object identification? Currently preliminary added there, but I wonder whether they should not be removed!
What is the implicit assumption for a definition of Object? It seems reasonable to define it not as a specimen, but as an individual genetic unit on a preservation unit like a herbarium sheet. If that is so, do we still need multiple values for a single secondary classifier concept?
Problem: Classifier information in keys may be specific to a class reference, which would be well handled by a classifier mechanism that is added to class references. However, equivalent information may apply to the entire key (which may only deal with larval stages of insects). Although this will be clear to humans from reading the key label, it would be highly desirable to also provide a machine-readable definition. Adding classifier information also to entire keys further complicates the model and evaluation of data. Is this avoidable in some other way of handling classifier information?
Class references are thought to need an additional secondary classifier mechanism in descriptions and diagnostic keys and they may be desirable in the identification of objects (= specimens; Fig. 9). Class references remain without classifiers in the definition of class (= taxonomic) hierarchy and class synonyms (Fig. 10).
Figure 9. Visualization of objects with class references that require an additional secondary classifier mechanism (sex, life cycle, or developmental stage).
Figure 10. Only the class references in the class hierarchy (in biology = taxonomic hierarchy) and for the definition of synonyms do not require the secondary classifier mechanism (compare Fig. 9).
[@@@@ The basic options are probably:
a) free-form field at the taxon result object in a key (which would have to be manually translated into each language)
b) specialized generalized "micro-description" facility for classifiers alone, at each keyed out or described taxon object
c) use normal characters and add some flagging to allow detection of classifiers; plus provide a "micro-description" facility in the keys (where no character data are normally available)
d) and provide a generalized ontology at the general concept/character state facility to recognize sex and stages (i. e. similar to proposal 2, above)?]
Secondary classifiers like sex and stage are handled by a specialized mechanism that is designed as an extension to standard class name references. The standard class reference type is used to model taxonomic hierarchy and synonymizations (which inside the description model only mirror data from external nomenclature or taxonomy providers). The extended class reference type provides an additional sequence of secondary classifier values. These are references to concept states defined at concept nodes. They do not refer to character states! The extended class reference type is used to (see Fig. 9):
TODO: Add new concept tree type "secondary classification concepts"
Question: which modifiers would be necessary at secondary classifiers?
--- (End of document pasted directly. Also a zipped [[%ATTACHURL%/D20_TempSecClassDraft3.zip][RTF version that includes all figures]] is provided.) -- Gregor Hagedorn - 26. April 2004 Polymorphisms, due to sexual differences, life cycle, developmental stages and other factors, frequently appear to be a problem, because we wish to assign the organisms to the same taxonomic category while the descriptions of the members of the category may vary widely. This conflict can be attributed to the interrelationship between function or purpose of the taxonomic category and the description. The taxonomic category can lead to a descriptions or a description may lead to the conclusion that an item belongs to a particular taxonomic category. This relationship however does not mean that the taxonomic category and the descriptions are synonymous or in most other aspects equivalent. The description is simply list of list of observable attributes. Groups of these attributes may appear only within individual “phases” of an organism while always being associated with the taxonomic category. All characteristics from any phase are “true” of the taxonomic category but we wish to organize them as to these correlated characters as well as make explicit the existence of the phases. While this may not help at any point in time for an identification it will help over time. Advantages of “phase” representation type: Each of the three representations “Stage grouping proceeding taxonomic name”, “Stage as subheading within a description”, and “Sex/stage as annotation within a description” have communicative advantages and disadvantages. I believe that, in the end for ease of writing, authors of descriptions will continue view the organization of these attribute in all three ways and more. “Stage grouping proceeding taxonomic name” has the advantage of allowing one to focus on the attributes of that one stage. After all, a person encountering one individual of a species will only encounter it in one of its “phases”. This allows a reader to pick a “phase” or “stage” first, to pick which key to use. For example, go to the “larval descriptions” part of the key if you have larva and not an adult. “Stage as subheading within a description” presumes the including of attributes from other “phases” or “stages” providing a more compete picture of a population over time. This representation also allows a reader to more easily compare “stages.” “Sex/stage as annotation within a description” is best for cases where most attributes are shared among the “stages” and only a few are dimorphic. Transformational equivalence: The choice we make in for the data structures is in part determined by an evaluation of the transformational equivalence of the representations. All other things being equal, we should choose the representation that can be mechanically transformed into all three frameworks above. It is the job of the application to make the transformations for the user. The current SDD framework and many other TDWG standards are taxonomic category –centric. They are organized around these concepts. This makes it difficult to represent the first type “Stage grouping proceeding taxonomic name”, in SDD as Gregor points out with the introduction of “pseudo-taxa” to address the issue of two sexes under the same taxa.● The class name would become language specific. Taxonomic classes would require different names in German and English (this problem is not entirely specific to classifiers, it is generally present if diseases instead of taxa are described).[@@@@in fact I believe SDD 0.9 has a problem here, see Wiki topic AudienceSpecificClassNames (http://efgblade.cs.umb.edu/twiki/bin/view/SDD/AudienceSpecificClassNames)!]
@ 1.8 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1085483208" format="1.0" version="1.8"}% d5 1 a5 1 I am not sure whether this should be on the agenda for Berlin, but in the longer run I believe we need a solution. It has not been discussed at any meeting so far, perhaps because the inclusion of diagnostic keys only occurred in Lisbon and this really precipitates this issue. Main.GregorHagedorn @ 1.7 log @none @ text @d1 2 a2 2 %META:TOPICINFO{author="GregorHagedorn" date="1085396100" format="1.0" version="1.7"}% %META:TOPICPARENT{name="TheProblemOfSex"}% @ 1.6 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1084282080" format="1.0" version="1.6"}% d293 5 a297 9● "Classifiers" alone is too general.
● "Non-taxonomic classifiers" is inappropriate, the primary class names may already be non-taxonomic, as in disease names (also SDD aims to create a general model applicable without reference to biological terminology).
● "Determinants/classification determinants"?
● "Description classifiers" – perhaps more intuitive than "secondary"?
● "Phenotypical classifiers" would be confusing, since phenotypic is usually considered and antonym to genotypic. Classifier concepts may be phenotypic (environmental sex determination), genotypic (genotypic sex determination), or ontogenetic (development stages).
d496 4 a499 4 Transformational equivalence: The choice we make in for the data structures is in part determined by an evaluation of the transformational equivalence of the representations. All other things being equal, we should choose the representation that can be mechanically transformed into all three frameworks above. It is the job of the application to make the transformations for the user. The current SDD framework and many other TDWG standards are taxonomic category –centric. They are organized around these concepts. This makes it difficult to represent the first type “Stage grouping proceeding taxonomic name”, in SDD as Gregor points out with the introduction of “pseudo-taxa” to address the issue of two sexes under the same taxa. Species 1 Species 1 (female) Species 2 (male) d507 1 a507 1 %META:FILEATTACHMENT{name="D20_TempSecClassDraft3.zip" attr="" comment="zipped rtf file, including the missing figures " date="1083005151" path="C:\Data\Desktop\DESCR\D20_TempSecClassDraft3.zip" size="210309" user="GregorHagedorn" version="1.1"}% @ 1.5 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1083575073" format="1.0" version="1.5"}% d512 11 a522 11 %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image001.gif" attr="" comment="" date="1083005220" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image001.gif" size="4449" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image002.gif" attr="" comment="" date="1083005286" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image002.gif" size="1865" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image003.gif" attr="" comment="" date="1083005322" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image003.gif" size="2176" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image004.gif" attr="" comment="" date="1083005347" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image004.gif" size="2960" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image005.gif" attr="" comment="" date="1083005363" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image005.gif" size="1367" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image006.gif" attr="" comment="" date="1083005379" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image006.gif" size="1382" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image007.gif" attr="" comment="" date="1083005397" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image007.gif" size="4011" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image008.gif" attr="" comment="" date="1083005410" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image008.gif" size="7930" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image009.gif" attr="" comment="" date="1083005430" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image009.gif" size="5050" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image011.gif" attr="" comment="" date="1083005478" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image011.gif" size="3015" user="GregorHagedorn" version="1.1"}% %META:FILEATTACHMENT{name="d20_tempsecclassdraft3_image010.gif" attr="" comment="" date="1083005511" path="C:\WINGH\Temp\d20_tempsecclassdraft3_image010.gif" size="7724" user="GregorHagedorn" version="1.1"}% @ 1.4 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1083455100" format="1.0" version="1.4"}% d161 1 a161 1A characteristic that is variable within a class is not necessarily uninformative for diagnostic purposes. If a plant has both red and white flowers [@@@@as in medit. Ranunculus xxx, search species name!] and other plant species have yellow, blue, red, or white colors, specifying the flower color of an object reduces the set of remaining classes in identification. A description "flowers red or white" is a meaningful part of a diagnostic class description.
@ 1.3 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="BobMorris" date="1083327781" format="1.0" version="1.3"}% d7 1 a7 1 %GREEN%I believe SecondaryClassifiersProposal addresses many of the issues. Please comment there. -- Main.BobMorris - 30 Apr 2004 %ENDCOLOR% d396 1 a396 1Secondary classifiers may be considered normal characters (as "shape" or "color"). This approach is probably rarely found in DELTA data sets (but compare the section "Classifier-related characters", above). However, Prometheus II (McDonald & al., submitted)[@@check!] explicitly considers sex and life cycle as "qualitative description elements", i. e. as properties of structures = as UM or OM character in the sense used by DELTA. [@@currently the actual source is from the