head 1.4; access; symbols; locks; strict; comment @# @; 1.4 date 2009.11.25.03.14.33; author GarryJolleyRogers; state Exp; branches; next 1.3; 1.3 date 2009.11.20.02.45.24; author LeeBelbin; state Exp; branches; next 1.2; 1.2 date 2007.03.06.17.30.00; author TWikiGuest; state Exp; branches; next 1.1; 1.1 date 2006.05.10.15.05.40; author GregorHagedorn; state Exp; branches; next ; desc @none @ 1.4 log @none @ text @%META:TOPICINFO{author="GarryJolleyRogers" date="1259118873" format="1.1" version="1.4"}% %META:TOPICPARENT{name="DELTAandBDI.SDD"}% ---+!! %TOPIC% ---++Synopsis This document attempts to verbally describe how DELTA data sets may be converted for use by BDI.SDD_ compatible applications. First the conversion of the DELTA core directives is fully discussed, next a list of advanced DELTA directives is presented with annotations as to whether or how they can be evaluated in BDI.SDD_. *This document is currently incomplete and frequently refers to an outdated version, it needs revision!* See also "Comparison of sections and subsections with DELTA features" for a useful overview and the general [[DELTAandBDI.SDD][overview of DELTA related material]]. ---++Files to import Some DELTA compatible applications store all terminological and descriptive data in a single file (Pankey, optionally DeltaAccess). However, CSIRO DELTA (the most widely used DELTA application) uses multiple files. The core files are SPECS, CHARS, and ITEMS. In addition, valuable information is often found in the CNOTES, CIMAGES, KIMAGES, TIMAGES, TOINT, TOKEY, and TONAT files. Some files are created for specific purposes (TOINT = translation into the CSIRO IntKey binary format, TOKEY = creation of printable keys, and TONAT = creation of natural language descriptions) and use the remaining files through an include file mechanism. As a result, the information in CSIRO DELTA files may be overlapping. For example, both TOINT and TOKEY may define different character reliability or include/exclude character directives. In general the TOINT file (if present) is a good starting point to resolve all DELTA include commands ("*Input File") and import the data. A better strategy is, however, to also look for further include files and directives in other files (esp. TOKEY and TONAT) and add any new directives to the BDI.SDD_ conversion process. Thus, the directives in TOINT (and files referenced therein) have priority over possibly differing information in TOKEY and TONAT. Note: since CSIRO DELTA usually uses fixed file names, the BDI.SDD_ project name can either be obtained from the (rarely used) *Heading or *Registration Heading directives (see Advanced conversion concepts further below), or the name of the folder containing these files can be used. ---++Simple export This sections deals primarily with the task of exporting the most basic DELTA directives, *Character List (= *Character Descriptions), *Character Types and *Item Descriptions into an BDI.SDD_ document. These directives are relatively complex and map to a number of BDI.SDD_ data elements. Less detailed information about additional DELTA directives can be found in the next chapter "Advanced conversion concepts". ---+++Terminology: Character definition The *Character List (= *Character Descriptions) directive contains both the primary definition of the character and the states. An example for a character definition could look like (color markup added for the purpose of the discussion): #1. Stem <presence>/
  1. absent/
  2. present/ The definition is delimited by the two symbols (above formatted in red): # indicates a new character, the slash ('/') the end of a text content. The slash is only a DELTA delimiter if it is followed by a blank or end of a line character. For example in: "#99. Length/width ratio/ " only the last slash is considered a delimiter. In general new line character are not significant in DELTA and blanks can be used instead. The following code is identical to the example above1: #1. Stem <presence>/ 1. absent/ 2. present/ The character definition (blue) contains the character number and the character label (which includes a definition of the natural language wording). Character numbers start with '#' to distinguish them from state numbers. Similarly the state definition (green) contains a state number and a state label plus wording. *Character or state numbers:* In DELTA the number in front of characters and states serves both as a unique identifier to reference a character or state from within item descriptions, and as a definition of the sequence in which characters or states are displayed or reported. As a consequence, new character or states can only be added at the end, inserting them requires a revision of all item descriptions using this terminology. Also, it is not possible to have different character arrangements for different reporting purposes. In BDI.SDD_, the key values present at the characters and state definitions only define a unique identifier. The state sequence is defined through the sequence of states in the xml data stream, and whereas the character sequence is independently defined through inclusion of characters into trees (which are called Concept trees in BDI.SDD_). The sequence of characters in BDI.SDD_ Terminology/Characters should not be considered informative! To export characters and states to BDI.SDD_, the DELTA numbers can be used directly as BDI.SDD_ key values. However, to preserve the sequence of characters, in addition to Terminology/Characters, also a Terminology/ConceptTrees/ConceptTree must be generated, see Terminology: Character sequence and hierarchy below. *Character or state labels and wordings:* DELTA interlaces the label and a wording for natural language generation in a single text string. To obtain the BDI.SDD_ Label/Text, one can either replace the DELTA "comment" markers (<>) with parenthesis ("(" and ")") or omit them altogether. Some DELTA character definitions are designed such that the best label is obtained by omitting the comment markers ("Stem <presence>" would become "Stem presence". A DELTA conversion utility could display a preview of characters and ask whether to convert comments in labels with parentheses or without. Otherwise, the generally acceptable conversion would be to place the comments in parentheses ("Stem <presence>" becomes "Stem (presence)"). This applies to both character and states: In the example above, "absent" could also be "absent <or contracted and invisible>". The BDI.SDD_ Label/Wording/TextBefore (for characters) or Label/Wording/Text (for states) is obtained by stripping any comment from the DELTA character or state name.

The DELTA strategy of interlacing different text representations has the advantage that whenever a change is made, both representations are changed together. It provides for compact, albeit somewhat difficult to read definitions. In English it works quite well to produce natural language descriptions, and in many languages acceptable compromises can be found. However, the mixed representation does put severe constraints on which natural language texts can be produced (or conversely, how readable the character or state labels in interactive use are). Finding good compromises between these purposes requires a significant amount of experience from the biologist starting with structured descriptions. The BDI.SDD_ standard has therefore decided to provide separate data elements for the label and the natural language wording.

*Inner comments:* The character or state label may contain comments within comments. How these are to be interpreted is in part defined by the *Omit Inner Comments directive (see below). *Units:* The character definition of a numerical character in DELTA could read: "#2. Stem <height>/ cm/". The part after the character label that is shown in brown is intended for a unit of measurement. If the plant height is measured only as the maximum (see *Omit Lower For Characters directive), DELTA may produce "Stem up to 150 cm". However, some DELTA programs use the unit element also for categorical characters. In this case it defines a text that is output after the list of states. To convert DELTA data to BDI.SDD_ the suggested rule is: for numerical characters write the unit text to Numerical/MeasurementUnit, for categorical or even text characters write it to Label/Representation/Wording/TextAfter (both within Characters/Character). ---+++Terminology: Character sequence and hierarchy DELTA provides a limited character hierarchy in the form of the *Character Headings and *Item Subheadings directives. Since an BDI.SDD_ concept tree has to be created anyways to store the character sequence, one may as well also evaluate these frequently used DELTA directives. Always create a concept tree of the type "UserDefinedHierarchy" and the role "GeneralDefault". If *Character Headings is present, the tree should be based on the headings as nodes, and the sequence of characters in the DELTA *Character List as leaves. If *Character Headings is absent, the characters are listed directly in the root node. If both the *Character Headings and *Item Subheadings directives are present, a second tree should be created, again using the headings from *Item Subheadings and the sequence from the *Character List. Both trees are of type user defined, but now the roles GeneralDefault and TerminologyEditorView should be added to the first, and only the role NaturalLanguageReporting to the second. ---+++INCOMPLETE, MORE TO COME! Add: discussion of text character @@@@@@@@ Add: unit (i. e. a second text following the first character label) in the case of categorical character types (OM or UM): Add to label plus use as TextAfter ---++Advanced conversion concepts Comments inside the item descriptions can be analyzed and compared with lists of known frequency (usually, rarely, etc.), probability (probably, perhaps) or other modifiers (strongly, at the tip, etc.). In the future such lists should be downloadable as multilingual templates that help in setting up new projects. If a modifier is recognized, instead of converting the DELTA comment into a Note, it would be translated into a modifier. This process is employed, for example, by DeltaAccess to support modifiers despite the limitations of the DELTA format. Based on the list of directives shown in "Complexity of BDI.SDD_ versus DELTA", the following table annotates advanced DELTA directives as to how they map to concepts in BDI.SDD_. *Currently this is just a first start, this needs to be further worked out! Also note that many special directives are not yet supported in BDI.SDD_. The reason for this is that first the general structures of BDI.SDD_ should become stable. We believe that the special problems dealt with in these directives can later easily be integrated into the BDI.SDD_ structure.*
*Absolute ErrorNot yet provided for in BDI.SDD_, needs discussion!
*Add Characters= item specific Include Characters directive. Not supported in BDI.SDD_.
*Alternate Comma(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!)
*Applicable CharactersConceptTrees/ConceptTrees/Concept/DependencyRules
*Character Descriptions(DELTA core directive, synonym of Character List, discussed above under Simple export)
*Character For Output FilesNot supported in BDI.SDD_. Applications may define rules how to create document names based on taxon names or abbreviations, or they may provide external data structures for such a functionality. It should not, however, be stored as a descriptive character.
*Characters For SynonymySynonymy is provided for in the taxon tree: Entities/ClassHierarchies/ClassHierarchy//Nodes/Node/Synonyms
*Character For Taxon ImagesDelete this character from the Character list, use data inside this character for MediaResources inside CodedDescriptions. Compare *Character Images.
*Character For Taxon NamesBDI.SDD_ discusses whether a Abbreviation element should be added to Entities/Classes/Class. Background: In DELTA the normal taxon name is directly provided in the Item Description directive. However, for certain reports of exports abbreviated taxon names are required, which can be provided as text in the character identified in this directive.
*Character HeadingsExport to a user-defined BDI.SDD_ concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "TerminologyEditorView" to the tree. Background: This defines a single level of headings, used when the character list is reported.
*Character ImagesBDI.SDD_ distinguishes between images intended for selecting a character or state, and images supporting the definition of the character in general. In most cases these images will be useful as Label/Representation/Selector media resources. However, one should remember that DELTA provides no images for states. Instead, the Character images are overlayed using a proprietory mechanism similar to the html usemap/map/area hotspot mechanism. No similar hotspot mechanism is defined in BDI.SDD_ (it would conflict with the desired multi-resolution support for media resources). Therefore, usually manual editing of the images to extract state-specific selectors will be necessary to port DELTA identification keys containing images.
*Character Keyword ImagesNot supported in BDI.SDD_. Recommended action: import the referenced images as media resources into the Resources section, but do not further link these images to other objects. Importing them will simplify later manual restructuring of a project. Background: Intkey specific directive to "allows selection of character keywords from image screens (instead of from text screens)". The DELTA User Guide does not define what a character keyword is, but probably this refers to the use of the "*Define Characters directive".
*Character List(DELTA core directive, discussed above under Simple export)
*Character NotesCreate an BDI.SDD_ Terminology/GlossaryEntry and link this to the character
*Character ReliabilitiesPlanned for (discussed in Brazil 2002), but not yet introduced into the schema.
*Character Types(DELTA core directive, discussed above under Simple export)
*Character Weights(Synonym of Character Reliabilities, see above)
*Chinese FormatIf special algorithmic support is required in applications, they can detect the fact that the current output is for chinese from the language (lang) attribute in the audience definition. However, we do need a Chinese speaker to test whether the BDI.SDD_ wording proposal works for Chinese or not.
*CommentThe comment directive itself is in practice often used to express document-wide information like authorship, copyright, version, revision status or dates, for which DELTA provides no specific directives. Although automatic extraction of this information is difficult, it may be useful to display the content of all comment directives when a user is asked to supply required BDI.SDD_ project definition information. This directive should not be confused with the comments in <> signs that can be added at several places inside DELTA directives.
*Decimal PlacesCurrently under discussion in BDI.SDD_, proposal are made but not yet fixed. Decimal places in DELTA are character specific, whereas in BDI.SDD_ they have to be specific to the statistical measure (mean, standard deviation, and sample size value usually use different number of decimal places).
*Define CharactersExport to a user-defined BDI.SDD_ concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "InteractiveIdentification" to the tree. Background: This is an Intkey-specific heading definition, combining groups of characters into a named character group. Whereas in the other two heading directives in DELTA (Character Headings, Item Subheadings) each character can only be a member of a single heading group, character can be a member in multiple of these definitions.
*Define NamesExport to a non-taxonomic (flag!) class hierarchy. Multiple class hierarchy and the non-tax. flag are newly introduced after 0.9 and need discussion!
*Define Taxasee *Define Names above.
*Dependent CharactersConceptTrees/ConceptTrees/Concept/DependencyRules (is a DELTA-synonym for Inapplicable character)
*Emphasize CharactersWording (text before/after and delimiter) information in BDI.SDD_ may contain formatting marks. Problem: However, currently it is assumed that each Wording itself is balanced. To achieve emphasis of character content, one would have to define <em> in text before and </em> in text after. This is currently not possible in BDI.SDD_, needs discussion!
*Emphasize Features(See above)
*Exclude CharactersSubset definitions for characters are defined
*Exclude ItemsBDI.SDD_ provides for a freely definable hierarchy of taxon definitions, see ClassHierarchy. Export or reporting of BDI.SDD_ data may be limited to higher taxa in this hierarchy. If desired, a class hierarchy may be (ab)used to define unrelated, non-taxonomic groupings of taxa. A separate
*HeadingThe project title (first line of title, compare *Registration Subheading)
*Image DirectoryMany projects will store all images relative to a specific folder (directory) on a web server or in file system. However, to support those projects that need to access images or other resources from multiple locations, BDI.SDD_ also needs other mechanisms. It would be possible to provide a common-root-URL in the project definition, which could be left empty, if multiple roots are used. The disadvantage would be that consumers of BDI.SDD_ data would have to follow this logic, combining each URL to a full URL. Currently this is not implemented in the BDI.SDD_ model. Instead each media resource URL is expected to be complete. Nevertheless, applications can easily analyze the start of the used URLs and automatically extract a root-URL from these, truncating the individual URLs to relative values. This works without loss of data between applications.
*Implicit Values#we need discussion here#
*Inapplicable CharactersConceptTrees/ConceptTrees/Concept/DependencyRules (is a DELTA-synonym for Dependent character)
*Include Characters(Compare *Exclude Characters above)
*Include Items(Compare *Exclude Items above)
*Item AbundancesDiscussed in Brazil 2002. Conclusion: not to be supported in BDI.SDD_
*Item Descriptions(DELTA core directive, discussed above under Simple export)
*Item Headings#usage of this directive not yet understood, needs study#
*Item SubheadingsExport to a user-defined BDI.SDD_ concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "NaturalLanguageReporting" to the tree. Background: This defines a single level of headings, used when natural language descriptions are generated.
*Item Weights(synonym of *Item Abundances with a different scaling factor)
*Key Character ListDELTA provides a separate list of character states that the normal character states can be mapped to (using the DELTA *Key States directive). BDI.SDD_ currently provides mapping, but only within the general set, i. e. the states that numeric or categorical data are mapped to can also be scored directly. The intent is to allow recording data that only exist in mapped form (e. g. numerical data only available as a categorized class histogram, which is rather frequently found in monographic treatments. It is unclear whether this will work in practice and #we need discussion here#!
*Key StatesTerminology/Characters/Categorical|Numerical/Mappings
*Link Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!)
*Mandatory CharactersProject-wide definitions of mandatory characters (= scoring is required in each item description) are useful only in small to medium sized project. Most larger collaborative projects span a taxonomic diversity, where no characters are mandatory for all taxa. BDI.SDD_ therefore discusses mechanisms to define characters as mandatory in parts of the taxonomic hierarchy. A possible method to do this is to add a new coding status value that, if defined at a higher taxon, expresses that all lower taxa are expected to be scored. No formal proposal to do this is yet available in BDI.SDD_ version 0.9.
*New Files At ItemsNot supported in BDI.SDD_; should be handled algorithmically in the application or with application-specific data.
*New Paragraphs At Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!) Paragraphs currently do not work yet!
*Nonautomatic Controlling Characters  related to character dependency. #usage of this directive not yet understood, needs study#
*Omit Final Comma(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!)
*Omit Inapplicables#usage of this directive not yet understood, needs study#
*Omit Inner CommentsIn the DELTA system, this indicates that comments inside comments should be ignored for the purpose of generating descriptions. Example: "15,1<rarely <5%>>/3 41,2<<?need to examine fresher material>>" would become "15,1<rarely>/3 41,2". Recommendation for converting DELTA to BDI.SDD_: If the directive is present, inner comments should be imported as BDI.SDD_ annotations, if it is absent inner comments should be simple left in place (in the label, not in the wording text), the <> being replaced by ().
*Omit Lower For CharactersNot yet handled in BDI.SDD_. Indicates that even if lower values (e. g. of height) may occasionally be present in the data matrix, they should be omitted from natural language descriptions. A range 0-150 is then output as up to 150. More precisely: The values omitted are the lower extreme and normal values of a range, and the central value. Note that DELTA does not support the opposite, only reporting a lower range or extreme (DeltaAccess supports both, e. g. to create "fruiting body with at least 10 setae").
*Omit Or For Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!) Whether this is covered or not depends on the undecided use of a global Vocabulary list in BDI.SDD_!
*Omit Period For Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!)
*Omit Space Before Units
*Percent ErrorNot yet provided for in BDI.SDD_, needs discussion!
*Registration Heading(synonym of the *Heading directive, deprecated but may still be in use)
*Registration SubheadingStill in use, defines a subheading for an entire project, often containing a version number or version date. Should be displayed together with *Comments for human consumption when filling in the BDI.SDD_ project definition.
*Replace Semicolon By Comma(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD_ proposal needs testing!)
*Scale CharactersNot yet provided for in BDI.SDD_, needs discussion!
*Startup ImagesDepending on the project, this may map to Icon in ProjectDefinition
*Taxon ImagesCodedDescriptions/MediaResources
*Taxon Keyword ImagesNot supported in BDI.SDD_; compare *Character Keyword Images.
*Taxon Links= full text document resources for taxa e. g. species pages
*Use Controlling Characters First#usage of this directive not yet understood, needs study#
*Vocabulary(currently under discussion in BDI.SDD_)
---++References M. J. Dallwitz and T. A. Paine 1999. Definition of the DELTA format. [Distributed as MS Word file with the CSIRO DELTA editor for Windows, version 1.3.0.8]

1 New line character in DELTA When importing DELTA data, the best strategy is to replace any kind of new line (0A, 0D) and tab (09) characters with a blank, and replace multiple blanks with a single blank. Note, however, that when producing DELTA, new line characters may occur only in certain places (between directives, within text, but not within the directive name), and that the line length accepted by some DELTA applications (including CSIRO CONFOR) is limited to a value somewhere below 100 characters per line. This is not defined in the DELTA standard description (Dallwitz & Paine 1999), but is a limitation of existing applications.

*See also the overview [[DELTAandBDI.SDD][DELTA and BDI.SDD_]].* -- Main.GregorHagedorn - 10 May 2006 (First published outside of Wiki on 2003-12-11 by Gregor Hagedorn) DELTA example files: * [[%ATTACHURL%/SDD_DELTA_Exp_CD.txt][SDD_DELTA_Exp_CD.txt]] * [[%ATTACHURL%/SDD_DELTA_Exp_Jur.txt][SDD_DELTA_Exp_Jur.txt]] %META:FILEATTACHMENT{name="SDD_DELTA_Exp_CD.txt" attr="h" autoattached="1" comment="" date="1147272192" path="SDD_DELTA_Exp_CD.txt" size="28979" user="Main.GregorHagedorn" version="1"}% %META:FILEATTACHMENT{name="SDD_DELTA_Exp_Jur.txt" attr="h" autoattached="1" comment="" date="1147272244" path="SDD_DELTA_Exp_Jur.txt" size="4794" user="Main.GregorHagedorn" version="1"}% %META:TOPICMOVED{by="GregorHagedorn" date="1147273184" from="SDD.DELTAtoSDD" to="SDD.DeltaToSDD"}% @ 1.3 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="LeeBelbin" date="1258685124" format="1.1" reprev="1.3" version="1.3"}% d7 1 a7 1 This document attempts to verbally describe how DELTA data sets may be converted for use by BDI.SDD compatible applications. First the conversion of the DELTA core directives is fully discussed, next a list of advanced DELTA directives is presented with annotations as to whether or how they can be evaluated in BDI.SDD. d17 1 a17 1 In general the TOINT file (if present) is a good starting point to resolve all DELTA include commands ("*Input File") and import the data. A better strategy is, however, to also look for further include files and directives in other files (esp. TOKEY and TONAT) and add any new directives to the BDI.SDD conversion process. Thus, the directives in TOINT (and files referenced therein) have priority over possibly differing information in TOKEY and TONAT. d19 1 a19 1 Note: since CSIRO DELTA usually uses fixed file names, the BDI.SDD project name can either be obtained from the (rarely used) *Heading or *Registration Heading directives (see Advanced conversion concepts further below), or the name of the folder containing these files can be used. d24 1 a24 1 This sections deals primarily with the task of exporting the most basic DELTA directives, *Character List (= *Character Descriptions), *Character Types and *Item Descriptions into an BDI.SDD document. These directives are relatively complex and map to a number of BDI.SDD data elements. Less detailed information about additional DELTA directives can be found in the next chapter "Advanced conversion concepts". d41 1 a41 1 *Character or state numbers:* In DELTA the number in front of characters and states serves both as a unique identifier to reference a character or state from within item descriptions, and as a definition of the sequence in which characters or states are displayed or reported. As a consequence, new character or states can only be added at the end, inserting them requires a revision of all item descriptions using this terminology. Also, it is not possible to have different character arrangements for different reporting purposes. In BDI.SDD, the key values present at the characters and state definitions only define a unique identifier. The state sequence is defined through the sequence of states in the xml data stream, and whereas the character sequence is independently defined through inclusion of characters into trees (which are called Concept trees in BDI.SDD). The sequence of characters in BDI.SDD Terminology/Characters should not be considered informative! To export characters and states to BDI.SDD, the DELTA numbers can be used directly as BDI.SDD key values. However, to preserve the sequence of characters, in addition to Terminology/Characters, also a Terminology/ConceptTrees/ConceptTree must be generated, see Terminology: Character sequence and hierarchy below. d43 1 a43 1 *Character or state labels and wordings:* DELTA interlaces the label and a wording for natural language generation in a single text string. To obtain the BDI.SDD Label/Text, one can either replace the DELTA "comment" markers (<>) with parenthesis ("(" and ")") or omit them altogether. Some DELTA character definitions are designed such that the best label is obtained by omitting the comment markers ("Stem <presence>" would become "Stem presence". A DELTA conversion utility could display a preview of characters and ask whether to convert comments in labels with parentheses or without. Otherwise, the generally acceptable conversion would be to place the comments in parentheses ("Stem <presence>" becomes "Stem (presence)"). This applies to both character and states: In the example above, "absent" could also be "absent <or contracted and invisible>". The BDI.SDD Label/Wording/TextBefore (for characters) or Label/Wording/Text (for states) is obtained by stripping any comment from the DELTA character or state name. d45 1 a45 1

The DELTA strategy of interlacing different text representations has the advantage that whenever a change is made, both representations are changed together. It provides for compact, albeit somewhat difficult to read definitions. In English it works quite well to produce natural language descriptions, and in many languages acceptable compromises can be found. However, the mixed representation does put severe constraints on which natural language texts can be produced (or conversely, how readable the character or state labels in interactive use are). Finding good compromises between these purposes requires a significant amount of experience from the biologist starting with structured descriptions. The BDI.SDD standard has therefore decided to provide separate data elements for the label and the natural language wording.

d49 1 a49 1 *Units:* The character definition of a numerical character in DELTA could read: "#2. Stem <height>/ cm/". The part after the character label that is shown in brown is intended for a unit of measurement. If the plant height is measured only as the maximum (see *Omit Lower For Characters directive), DELTA may produce "Stem up to 150 cm". However, some DELTA programs use the unit element also for categorical characters. In this case it defines a text that is output after the list of states. To convert DELTA data to BDI.SDD the suggested rule is: for numerical characters write the unit text to Numerical/MeasurementUnit, for categorical or even text characters write it to Label/Representation/Wording/TextAfter (both within Characters/Character). d53 1 a53 1 DELTA provides a limited character hierarchy in the form of the *Character Headings and *Item Subheadings directives. Since an BDI.SDD concept tree has to be created anyways to store the character sequence, one may as well also evaluate these frequently used DELTA directives. d75 1 a75 1 Based on the list of directives shown in "Complexity of BDI.SDD versus DELTA", the following table annotates advanced DELTA directives as to how they map to concepts in BDI.SDD. *Currently this is just a first start, this needs to be further worked out! Also note that many special directives are not yet supported in BDI.SDD. The reason for this is that first the general structures of BDI.SDD should become stable. We believe that the special problems dealt with in these directives can later easily be integrated into the BDI.SDD structure.* d78 3 a80 3 *Absolute ErrorNot yet provided for in BDI.SDD, needs discussion! *Add Characters= item specific Include Characters directive. Not supported in BDI.SDD. *Alternate Comma(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) d83 1 a83 1 *Character For Output FilesNot supported in BDI.SDD. Applications may define rules how to create document names based on taxon names or abbreviations, or they may provide external data structures for such a functionality. It should not, however, be stored as a descriptive character. d86 4 a89 4 *Character For Taxon NamesBDI.SDD discusses whether a Abbreviation element should be added to Entities/Classes/Class. Background: In DELTA the normal taxon name is directly provided in the Item Description directive. However, for certain reports of exports abbreviated taxon names are required, which can be provided as text in the character identified in this directive. *Character HeadingsExport to a user-defined BDI.SDD concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "TerminologyEditorView" to the tree. Background: This defines a single level of headings, used when the character list is reported. *Character ImagesBDI.SDD distinguishes between images intended for selecting a character or state, and images supporting the definition of the character in general. In most cases these images will be useful as Label/Representation/Selector media resources. However, one should remember that DELTA provides no images for states. Instead, the Character images are overlayed using a proprietory mechanism similar to the html usemap/map/area hotspot mechanism. No similar hotspot mechanism is defined in BDI.SDD (it would conflict with the desired multi-resolution support for media resources). Therefore, usually manual editing of the images to extract state-specific selectors will be necessary to port DELTA identification keys containing images. *Character Keyword ImagesNot supported in BDI.SDD. Recommended action: import the referenced images as media resources into the Resources section, but do not further link these images to other objects. Importing them will simplify later manual restructuring of a project. Background: Intkey specific directive to "allows selection of character keywords from image screens (instead of from text screens)". The DELTA User Guide does not define what a character keyword is, but probably this refers to the use of the "*Define Characters directive". d91 1 a91 1 *Character NotesCreate an BDI.SDD Terminology/GlossaryEntry and link this to the character d95 4 a98 4 *Chinese FormatIf special algorithmic support is required in applications, they can detect the fact that the current output is for chinese from the language (lang) attribute in the audience definition. However, we do need a Chinese speaker to test whether the BDI.SDD wording proposal works for Chinese or not. *CommentThe comment directive itself is in practice often used to express document-wide information like authorship, copyright, version, revision status or dates, for which DELTA provides no specific directives. Although automatic extraction of this information is difficult, it may be useful to display the content of all comment directives when a user is asked to supply required BDI.SDD project definition information. This directive should not be confused with the comments in <> signs that can be added at several places inside DELTA directives. *Decimal PlacesCurrently under discussion in BDI.SDD, proposal are made but not yet fixed. Decimal places in DELTA are character specific, whereas in BDI.SDD they have to be specific to the statistical measure (mean, standard deviation, and sample size value usually use different number of decimal places). *Define CharactersExport to a user-defined BDI.SDD concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "InteractiveIdentification" to the tree. Background: This is an Intkey-specific heading definition, combining groups of characters into a named character group. Whereas in the other two heading directives in DELTA (Character Headings, Item Subheadings) each character can only be a member of a single heading group, character can be a member in multiple of these definitions. d102 1 a102 1 *Emphasize CharactersWording (text before/after and delimiter) information in BDI.SDD may contain formatting marks. Problem: However, currently it is assumed that each Wording itself is balanced. To achieve emphasis of character content, one would have to define <em> in text before and </em> in text after. This is currently not possible in BDI.SDD, needs discussion! d105 1 a105 1 *Exclude ItemsBDI.SDD provides for a freely definable hierarchy of taxon definitions, see ClassHierarchy. Export or reporting of BDI.SDD data may be limited to higher taxa in this hierarchy. If desired, a class hierarchy may be (ab)used to define unrelated, non-taxonomic groupings of taxa. A separate d107 1 a107 1 *Image DirectoryMany projects will store all images relative to a specific folder (directory) on a web server or in file system. However, to support those projects that need to access images or other resources from multiple locations, BDI.SDD also needs other mechanisms. It would be possible to provide a common-root-URL in the project definition, which could be left empty, if multiple roots are used. The disadvantage would be that consumers of BDI.SDD data would have to follow this logic, combining each URL to a full URL. Currently this is not implemented in the BDI.SDD model. Instead each media resource URL is expected to be complete. Nevertheless, applications can easily analyze the start of the used URLs and automatically extract a root-URL from these, truncating the individual URLs to relative values. This works without loss of data between applications. d112 1 a112 1 *Item AbundancesDiscussed in Brazil 2002. Conclusion: not to be supported in BDI.SDD d115 1 a115 1 *Item SubheadingsExport to a user-defined BDI.SDD concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "NaturalLanguageReporting" to the tree. Background: This defines a single level of headings, used when natural language descriptions are generated. d117 1 a117 1 *Key Character ListDELTA provides a separate list of character states that the normal character states can be mapped to (using the DELTA *Key States directive). BDI.SDD currently provides mapping, but only within the general set, i. e. the states that numeric or categorical data are mapped to can also be scored directly. The intent is to allow recording data that only exist in mapped form (e. g. numerical data only available as a categorized class histogram, which is rather frequently found in monographic treatments. It is unclear whether this will work in practice and #we need discussion here#! d119 4 a122 4 *Link Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) *Mandatory CharactersProject-wide definitions of mandatory characters (= scoring is required in each item description) are useful only in small to medium sized project. Most larger collaborative projects span a taxonomic diversity, where no characters are mandatory for all taxa. BDI.SDD therefore discusses mechanisms to define characters as mandatory in parts of the taxonomic hierarchy. A possible method to do this is to add a new coding status value that, if defined at a higher taxon, expresses that all lower taxa are expected to be scored. No formal proposal to do this is yet available in BDI.SDD version 0.9. *New Files At ItemsNot supported in BDI.SDD; should be handled algorithmically in the application or with application-specific data. *New Paragraphs At Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) Paragraphs currently do not work yet! d124 1 a124 1 *Omit Final Comma(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) d126 4 a129 4 *Omit Inner CommentsIn the DELTA system, this indicates that comments inside comments should be ignored for the purpose of generating descriptions. Example: "15,1<rarely <5%>>/3 41,2<<?need to examine fresher material>>" would become "15,1<rarely>/3 41,2". Recommendation for converting DELTA to BDI.SDD: If the directive is present, inner comments should be imported as BDI.SDD annotations, if it is absent inner comments should be simple left in place (in the label, not in the wording text), the <> being replaced by (). *Omit Lower For CharactersNot yet handled in BDI.SDD. Indicates that even if lower values (e. g. of height) may occasionally be present in the data matrix, they should be omitted from natural language descriptions. A range 0-150 is then output as up to 150. More precisely: The values omitted are the lower extreme and normal values of a range, and the central value. Note that DELTA does not support the opposite, only reporting a lower range or extreme (DeltaAccess supports both, e. g. to create "fruiting body with at least 10 setae"). *Omit Or For Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) Whether this is covered or not depends on the undecided use of a global Vocabulary list in BDI.SDD! *Omit Period For Characters(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) d131 1 a131 1 *Percent ErrorNot yet provided for in BDI.SDD, needs discussion! d133 3 a135 3 *Registration SubheadingStill in use, defines a subheading for an entire project, often containing a version number or version date. Should be displayed together with *Comments for human consumption when filling in the BDI.SDD project definition. *Replace Semicolon By Comma(Should be covered by Wordings defined inside the ConceptTrees. Current BDI.SDD proposal needs testing!) *Scale CharactersNot yet provided for in BDI.SDD, needs discussion! d138 1 a138 1 *Taxon Keyword ImagesNot supported in BDI.SDD; compare *Character Keyword Images. d141 1 a141 1 *Vocabulary(currently under discussion in BDI.SDD) d152 1 a152 1 *See also the overview [[DELTAandBDI.SDD][DELTA and BDI.SDD]].* @ 1.2 log @Added topic name via script @ text @d1 2 a4 2 %META:TOPICINFO{author="GregorHagedorn" date="1147273540" format="1.1" version="1.1"}% %META:TOPICPARENT{name="DELTAandSDD"}% d7 1 a7 1 This document attempts to verbally describe how DELTA data sets may be converted for use by SDD compatible applications. First the conversion of the DELTA core directives is fully discussed, next a list of advanced DELTA directives is presented with annotations as to whether or how they can be evaluated in SDD. d11 1 a11 1 See also "Comparison of sections and subsections with DELTA features" for a useful overview and the general [[DELTAandSDD][overview of DELTA related material]]. d17 1 a17 1 In general the TOINT file (if present) is a good starting point to resolve all DELTA include commands ("*Input File") and import the data. A better strategy is, however, to also look for further include files and directives in other files (esp. TOKEY and TONAT) and add any new directives to the SDD conversion process. Thus, the directives in TOINT (and files referenced therein) have priority over possibly differing information in TOKEY and TONAT. d19 1 a19 1 Note: since CSIRO DELTA usually uses fixed file names, the SDD project name can either be obtained from the (rarely used) *Heading or *Registration Heading directives (see Advanced conversion concepts further below), or the name of the folder containing these files can be used. d24 1 a24 1 This sections deals primarily with the task of exporting the most basic DELTA directives, *Character List (= *Character Descriptions), *Character Types and *Item Descriptions into an SDD document. These directives are relatively complex and map to a number of SDD data elements. Less detailed information about additional DELTA directives can be found in the next chapter "Advanced conversion concepts". d41 1 a41 1 *Character or state numbers:* In DELTA the number in front of characters and states serves both as a unique identifier to reference a character or state from within item descriptions, and as a definition of the sequence in which characters or states are displayed or reported. As a consequence, new character or states can only be added at the end, inserting them requires a revision of all item descriptions using this terminology. Also, it is not possible to have different character arrangements for different reporting purposes. In SDD, the key values present at the characters and state definitions only define a unique identifier. The state sequence is defined through the sequence of states in the xml data stream, and whereas the character sequence is independently defined through inclusion of characters into trees (which are called Concept trees in SDD). The sequence of characters in SDD Terminology/Characters should not be considered informative! To export characters and states to SDD, the DELTA numbers can be used directly as SDD key values. However, to preserve the sequence of characters, in addition to Terminology/Characters, also a Terminology/ConceptTrees/ConceptTree must be generated, see Terminology: Character sequence and hierarchy below. d43 1 a43 1 *Character or state labels and wordings:* DELTA interlaces the label and a wording for natural language generation in a single text string. To obtain the SDD Label/Text, one can either replace the DELTA "comment" markers (<>) with parenthesis ("(" and ")") or omit them altogether. Some DELTA character definitions are designed such that the best label is obtained by omitting the comment markers ("Stem <presence>" would become "Stem presence". A DELTA conversion utility could display a preview of characters and ask whether to convert comments in labels with parentheses or without. Otherwise, the generally acceptable conversion would be to place the comments in parentheses ("Stem <presence>" becomes "Stem (presence)"). This applies to both character and states: In the example above, "absent" could also be "absent <or contracted and invisible>". The SDD Label/Wording/TextBefore (for characters) or Label/Wording/Text (for states) is obtained by stripping any comment from the DELTA character or state name. d45 1 a45 1

The DELTA strategy of interlacing different text representations has the advantage that whenever a change is made, both representations are changed together. It provides for compact, albeit somewhat difficult to read definitions. In English it works quite well to produce natural language descriptions, and in many languages acceptable compromises can be found. However, the mixed representation does put severe constraints on which natural language texts can be produced (or conversely, how readable the character or state labels in interactive use are). Finding good compromises between these purposes requires a significant amount of experience from the biologist starting with structured descriptions. The SDD standard has therefore decided to provide separate data elements for the label and the natural language wording.

d49 1 a49 1 *Units:* The character definition of a numerical character in DELTA could read: "#2. Stem <height>/ cm/". The part after the character label that is shown in brown is intended for a unit of measurement. If the plant height is measured only as the maximum (see *Omit Lower For Characters directive), DELTA may produce "Stem up to 150 cm". However, some DELTA programs use the unit element also for categorical characters. In this case it defines a text that is output after the list of states. To convert DELTA data to SDD the suggested rule is: for numerical characters write the unit text to Numerical/MeasurementUnit, for categorical or even text characters write it to Label/Representation/Wording/TextAfter (both within Characters/Character). d53 1 a53 1 DELTA provides a limited character hierarchy in the form of the *Character Headings and *Item Subheadings directives. Since an SDD concept tree has to be created anyways to store the character sequence, one may as well also evaluate these frequently used DELTA directives. d75 1 a75 1 Based on the list of directives shown in "Complexity of SDD versus DELTA", the following table annotates advanced DELTA directives as to how they map to concepts in SDD. *Currently this is just a first start, this needs to be further worked out! Also note that many special directives are not yet supported in SDD. The reason for this is that first the general structures of SDD should become stable. We believe that the special problems dealt with in these directives can later easily be integrated into the SDD structure.* d78 3 a80 3 *Absolute ErrorNot yet provided for in SDD, needs discussion! *Add Characters= item specific Include Characters directive. Not supported in SDD. *Alternate Comma(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) d83 1 a83 1 *Character For Output FilesNot supported in SDD. Applications may define rules how to create document names based on taxon names or abbreviations, or they may provide external data structures for such a functionality. It should not, however, be stored as a descriptive character. d86 4 a89 4 *Character For Taxon NamesSDD discusses whether a Abbreviation element should be added to Entities/Classes/Class. Background: In DELTA the normal taxon name is directly provided in the Item Description directive. However, for certain reports of exports abbreviated taxon names are required, which can be provided as text in the character identified in this directive. *Character HeadingsExport to a user-defined SDD concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "TerminologyEditorView" to the tree. Background: This defines a single level of headings, used when the character list is reported. *Character ImagesSDD distinguishes between images intended for selecting a character or state, and images supporting the definition of the character in general. In most cases these images will be useful as Label/Representation/Selector media resources. However, one should remember that DELTA provides no images for states. Instead, the Character images are overlayed using a proprietory mechanism similar to the html usemap/map/area hotspot mechanism. No similar hotspot mechanism is defined in SDD (it would conflict with the desired multi-resolution support for media resources). Therefore, usually manual editing of the images to extract state-specific selectors will be necessary to port DELTA identification keys containing images. *Character Keyword ImagesNot supported in SDD. Recommended action: import the referenced images as media resources into the Resources section, but do not further link these images to other objects. Importing them will simplify later manual restructuring of a project. Background: Intkey specific directive to "allows selection of character keywords from image screens (instead of from text screens)". The DELTA User Guide does not define what a character keyword is, but probably this refers to the use of the "*Define Characters directive". d91 1 a91 1 *Character NotesCreate an SDD Terminology/GlossaryEntry and link this to the character d95 4 a98 4 *Chinese FormatIf special algorithmic support is required in applications, they can detect the fact that the current output is for chinese from the language (lang) attribute in the audience definition. However, we do need a Chinese speaker to test whether the SDD wording proposal works for Chinese or not. *CommentThe comment directive itself is in practice often used to express document-wide information like authorship, copyright, version, revision status or dates, for which DELTA provides no specific directives. Although automatic extraction of this information is difficult, it may be useful to display the content of all comment directives when a user is asked to supply required SDD project definition information. This directive should not be confused with the comments in <> signs that can be added at several places inside DELTA directives. *Decimal PlacesCurrently under discussion in SDD, proposal are made but not yet fixed. Decimal places in DELTA are character specific, whereas in SDD they have to be specific to the statistical measure (mean, standard deviation, and sample size value usually use different number of decimal places). *Define CharactersExport to a user-defined SDD concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "InteractiveIdentification" to the tree. Background: This is an Intkey-specific heading definition, combining groups of characters into a named character group. Whereas in the other two heading directives in DELTA (Character Headings, Item Subheadings) each character can only be a member of a single heading group, character can be a member in multiple of these definitions. d102 1 a102 1 *Emphasize CharactersWording (text before/after and delimiter) information in SDD may contain formatting marks. Problem: However, currently it is assumed that each Wording itself is balanced. To achieve emphasis of character content, one would have to define <em> in text before and </em> in text after. This is currently not possible in SDD, needs discussion! d105 1 a105 1 *Exclude ItemsSDD provides for a freely definable hierarchy of taxon definitions, see ClassHierarchy. Export or reporting of SDD data may be limited to higher taxa in this hierarchy. If desired, a class hierarchy may be (ab)used to define unrelated, non-taxonomic groupings of taxa. A separate d107 1 a107 1 *Image DirectoryMany projects will store all images relative to a specific folder (directory) on a web server or in file system. However, to support those projects that need to access images or other resources from multiple locations, SDD also needs other mechanisms. It would be possible to provide a common-root-URL in the project definition, which could be left empty, if multiple roots are used. The disadvantage would be that consumers of SDD data would have to follow this logic, combining each URL to a full URL. Currently this is not implemented in the SDD model. Instead each media resource URL is expected to be complete. Nevertheless, applications can easily analyze the start of the used URLs and automatically extract a root-URL from these, truncating the individual URLs to relative values. This works without loss of data between applications. d112 1 a112 1 *Item AbundancesDiscussed in Brazil 2002. Conclusion: not to be supported in SDD d115 1 a115 1 *Item SubheadingsExport to a user-defined SDD concept tree (Terminology/ConceptTrees, Type=UserDefinedHierarchy). Add the Role "NaturalLanguageReporting" to the tree. Background: This defines a single level of headings, used when natural language descriptions are generated. d117 1 a117 1 *Key Character ListDELTA provides a separate list of character states that the normal character states can be mapped to (using the DELTA *Key States directive). SDD currently provides mapping, but only within the general set, i. e. the states that numeric or categorical data are mapped to can also be scored directly. The intent is to allow recording data that only exist in mapped form (e. g. numerical data only available as a categorized class histogram, which is rather frequently found in monographic treatments. It is unclear whether this will work in practice and #we need discussion here#! d119 4 a122 4 *Link Characters(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) *Mandatory CharactersProject-wide definitions of mandatory characters (= scoring is required in each item description) are useful only in small to medium sized project. Most larger collaborative projects span a taxonomic diversity, where no characters are mandatory for all taxa. SDD therefore discusses mechanisms to define characters as mandatory in parts of the taxonomic hierarchy. A possible method to do this is to add a new coding status value that, if defined at a higher taxon, expresses that all lower taxa are expected to be scored. No formal proposal to do this is yet available in SDD version 0.9. *New Files At ItemsNot supported in SDD; should be handled algorithmically in the application or with application-specific data. *New Paragraphs At Characters(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) Paragraphs currently do not work yet! d124 1 a124 1 *Omit Final Comma(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) d126 4 a129 4 *Omit Inner CommentsIn the DELTA system, this indicates that comments inside comments should be ignored for the purpose of generating descriptions. Example: "15,1<rarely <5%>>/3 41,2<<?need to examine fresher material>>" would become "15,1<rarely>/3 41,2". Recommendation for converting DELTA to SDD: If the directive is present, inner comments should be imported as SDD annotations, if it is absent inner comments should be simple left in place (in the label, not in the wording text), the <> being replaced by (). *Omit Lower For CharactersNot yet handled in SDD. Indicates that even if lower values (e. g. of height) may occasionally be present in the data matrix, they should be omitted from natural language descriptions. A range 0-150 is then output as up to 150. More precisely: The values omitted are the lower extreme and normal values of a range, and the central value. Note that DELTA does not support the opposite, only reporting a lower range or extreme (DeltaAccess supports both, e. g. to create "fruiting body with at least 10 setae"). *Omit Or For Characters(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) Whether this is covered or not depends on the undecided use of a global Vocabulary list in SDD! *Omit Period For Characters(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) d131 1 a131 1 *Percent ErrorNot yet provided for in SDD, needs discussion! d133 3 a135 3 *Registration SubheadingStill in use, defines a subheading for an entire project, often containing a version number or version date. Should be displayed together with *Comments for human consumption when filling in the SDD project definition. *Replace Semicolon By Comma(Should be covered by Wordings defined inside the ConceptTrees. Current SDD proposal needs testing!) *Scale CharactersNot yet provided for in SDD, needs discussion! d138 1 a138 1 *Taxon Keyword ImagesNot supported in SDD; compare *Character Keyword Images. d141 1 a141 1 *Vocabulary(currently under discussion in SDD) d152 1 a152 1 *See also the overview [[DELTAandSDD][DELTA and SDD]].* @ 1.1 log @none @ text @d1 2 @