In this topic, he and his principal programmer Patrick will discuss questions about or problems with BDI.SDD_. See http://uio.mbl.edu/services/key.html in general, and http://uio.mbl.edu/BDI.SDD_/player.php for ongoing development of making it BDI.SDD_ compatible. -- Main.GregorHagedorn - 07 Jun 2004 Note Gregor: We are currently rewriting the generalities of the Lucid LIF to BDI.SDD_ conversion in a new topic: ConvertingLIF2BDI.SDD! --- A version of the X:ID Player is available online using XML in the structure of the XML Schema as defined by BDI.SDD_. Gregor Hagedorn supplied us with a current version of the BDI.SDD_ schema, and we have X:ID running off of this structure. This can be found at http://uio.mbl.edu/BDI.SDD_/player.php . This version of the X:ID Player only runs the key to Atlantic Tunas transformed by our default stylesheet. By clicking the "XML" option in the functions bar will allow you to view the BDI.SDD_ compliant XML and the source code of the default stylesheet used to transform the XML. I have attached a example of the XML ([[%ATTACHURL%/XIDwithBDI.SDD.xml][XIDwithBDI.SDD.xml]]) that is now created by the X:ID Player. It is meant to comply with the latest version of BDI.SDD_ as supplied to us by Gregor. I was still unclear as to the structure and content of the "<nop>CodedDescriptions" section of BDI.SDD_. It seems that this is where the matrix information of the Lucid-style .LIF file should go. For example, here is the matrix section of our .LIF to atlantic tunas: [..Main Data (txs)..]%BR% 6100000100001100100010000011111%BR% 6010000010001010010010000101110%BR% 6001000100001001010010000131110%BR% 6001000100010100001001000101100%BR% 6001000001001010100000101000000%BR% 6000100100001010000100100101000%BR% 6000010100001010001000310101100%BR% 6000001000101010000100100101000%BR% This matrix scores taxa by rows and states by columns. So, it seems you would like, for each taxon, the values of each state in their respective character. If this is not how you intended the <nop>CodedDescriptions sections to look, could you post a longer example of this section, possibly 15 or more lines, so that I can change it to your standards. --- I just attempted to load a rather large LIF (1070 states X 540 taxa) into the BDI.SDD_ version of X:ID with the above declaration of the matrix information. The XML file that it created was 4.5 megabytes and over 185,000 lines long, and this was only after 30 seconds when the process timed out. Only the scores of 54 of 540 taxa were displayed in this manner, meaning the XML file, if complete, would be more than 9 megabytes and would take more than 60 seconds to create. This is something important for each of us to keep in mind when using a verbose language such as BDI.SDD_ to describe sets of data that are big. Perhaps the X:ID data can be presented without the behind-the-scenes information, such as the matrix information itself. Consider the key above (1070 X 540) where there are 577,800 values in the matrix to be considered. This means that, to represent the data in a detailed language such as XML, where every value gets its own line, the <nop>CodedDescriptions section that lists all the matrix data will have to be at least 577,800 lines long, not to mention extra space for tags. But, this same information is represented in a matrix that is only 540 lines long, each line 1070 character wide. Something to consider when using BDI.SDD_ to verbosely describe huge amounts of information. -- Main.PatrickLeary - 07 Jun 2004 --- BDI.SDD_ is certainly verbose, but not that much. In a way xml is meant to be verbose so that it is extensible and self-documenting. There is even some unneccessary redundancy between state and character refs: the character refs are redundant, but this was explicitly requested to simplify life for character-based processors). -- However: In BDI.SDD_ only positive statements are made, the absence of a state is implicit, provided that the character has been scored at all. The value you are using in your example:<br /> State ref="2"<br /> Value 1 /Value<br /> /State<br /> is not BDI.SDD_. You can provide values only for numeric measurement data, not categorical states. If I understand LIF correctly, the 0 indicates absence, 1 presence, and other values are used to code frequency, uncertainty. So step 1 would be to provide a translation of values other than 1 to BDI.SDD_ (these facts about states would be expressed as frequency and certainty modifiers). So in the conversion, drop all 0 valued states, drop the value from those with 1, and add the BDI.SDD_ modifiers equivalent for those greater 1. I will try to help with the modifier question. Also, you can mimize the space needed for the state refs within each character, by not indenting them. By the way: Your work provides a very valuable LIF to BDI.SDD_ converter that has its own utility - separately from X:ID! -- Main.GregorHagedorn - 07 Jun 2004 --- Separately: any reason why you use xidisopen="F" xidlast="F" xidimage="F" xidmetric="F" instead of defining your extensions in a namespace like xid:isopen="F" xid:last="F" xid:image="F" xid:metric="F" as I proposed in the example file? �Did the namespace extension not work for you? I am asking this because the question whether attributes extension from other namespaces should be allowed in BDI.SDD_ needs to be answered. I tried to add that in the special XID-modified BDI.SDD_ version experimentally. Did I make it wrong? -- Main.GregorHagedorn - 07 Jun 2004 -- Main.GregorHagedorn - 07 Jun 2004 --- It would be helpful if the Schema to which this complies were provided here also. In general, possibly BDI.SDD_ should require that a schema be named in the XML PI (including enough to guarantee that the correct schema version is deducible for validity checking). -- Main.BobMorris - 07 Jun 2004 --- I have attached the BDI.SDD_ Schema and example XML that was given to me by Gregor. It has been cut down in size for use specifically with X:ID: [[%ATTACHURL%/SDD_for_XID.zip][SDD_for_XID.zip]]. I tried to enter attributes with an xid namespace, for example - xid:isopen, but our XML parser crashed at this point and said it did not recognize the namespace. I have never used different namespaces in XML files, so I'm not sure if there is some way to explicitly define a new namespace somewhere in the file. So, I temporarily renamed the attributes by removing the colon, for example - xidisopen. As far as the <nop>CodedDescriptions section goes, I think I now understand you correctly, but let me verify. So, instead of listing the Lucid value for the state under a taxon, it is implied that a state is scored as present/pass for a taxon if there is a reference tag under the corresponding taxon? And absence of a reference tag under a taxon is understood to mean that the state is scored as absent/fail for that taxon? [Yes, Gregor] You were correct in your assumption about Lucid scoring. Lucid scores taxon under the following scheme:%BR% 0-absent%BR% 1-present%BR% 2-unknown%BR% 3-rare%BR% 4-present but may be misinterpreted as absent%BR% 5-absent but may be misinterpreted as present%BR% 6-metric data So, values are to be used only for metric data? [Yes] In a Lucid-style key, metric data is not entered as a single value, but instead there are parameters. There are extreme upper, normal upper, normal lower, and extreme lower values. So, for one metric state/taxon pair, there are four metric values to be entered. * [In BDI.SDD_ you define any number of statistical measures, including these. The Lucids ones are equivalent to "Maximum value", "Undefined upper limit (legacy data)", "Undefined lower limit (legacy data)" and "Minimum value". The general stuff is defined in GeneralDeclarations, see example file, and then in a character measures are defined in <nop>Character/Numerical/StatisticalMeasures. There a key is defined that is to be coded descriptions, and a keyref that points to the general semantics in <nop>GeneralDeclarations/UnivariateStatisticalMeasures. You can copy the definition you need from the example file (the general, not the simplified XID). However, Two problems here: a) I have removed numerics support in the XID schema to simplify getting into BDI.SDD_ for you. So when you want to use numerics, you have to use the full version. b) The current BDI.SDD_ version is undecided about how exactly this should be supported, so expect limited changes here.] -- Main.GregorHagedorn - 07 Jun 2004 Finally, if I am correct in my current understanding of the <nop>CodedDescriptions section of BDI.SDD_, in that a state is listed if present and not listed if it is absent, then there are still size issues to be considered with large sets of data. In my above example of the 1070 X 540 matrix, lets say a quarter of all scores are 1-present, and the rest are 0-absent, there is still more than 120,000 lines of data in the <nop>CodedDescriptions section alone. This will result in an XML file of at least 5 megabytes. Just something to consider down the road. -- Main.PatrickLeary - 07 Jun 2004 --- So to map LIF to BDI.SDD_, we need: * Express that a state is unknown. Note that in BDI.SDD_ a feature <nop>CodingStatus exists, which defines whether a property of an object is unknown (see e.g. <nop>GeneralDeclarations in the BDI.SDD_ example files: <nop>CodingStatus key="2" debugkey="Unknown"). However, this differs from the fact that a specific state is unknown. The way to express this in BDI.SDD_ is to say it is "perhaps" this state. Example: the Lucid statement "flower elliptic, unknown whether ovatate" is in BDI.SDD_ interpreted as "flower elliptic, perhaps ovatate". Thus we first need to define in Terminology/Modifiers a Certainty modifier for perhaps: (see Modifier key="41" in the SDD_tech.xml example file). Example for the application of this in a description: <verbatim> <CodedDescription key="101"><Header><ClassName ref="1"/></Header> <CodedData> <Character ref="1"> <State ref="1"><Certainty ref="41"/></State> </Character> </verbatim> * A frequency modifier rare to be defined in Terminology/Modifiers which would be added for states scored as 3 (see BDI.SDD_ example file Modifier key="22"). Example for the application of this in a description: <verbatim> <CodedDescription key="101"><Header><ClassName ref="1"/></Header> <CodedData> <Character ref="1"> <State ref="1"><Frequency ref="22"/></State> </Character> </verbatim> * A certainty modifier with the special flag Specification/IsTrueByMisinterpretation set to true (see Modifier key="42"). This is to be used for Lucid value 5. Example: <verbatim> <CodedDescription key="101"><Header><ClassName ref="1"/></Header> <CodedData> <Character ref="1"> <State ref="1"><Certainty ref="42"/></State> </Character> </verbatim> * There is no equivalent to Lucid value 4 in BDI.SDD_. The fact that something is present, but can be erroneously overlooked is considered so general that is is not part of data coding, but instead should be part of the reasoning of the query or identification method. -- <em>(If anybody disagrees on this, and can provide examples or scenarios, it would be an urgent issue to fix!)</em> * Appropriate statistical measures for the metric data. (See the discussion above!) The size issue is known. It is common to most xml data. BDI.SDD_ does not make any assumptions that processors will work natively on BDI.SDD_. An identification tool may easily read the BDI.SDD_ data, process it throwing aways those parts it is not interested in, and storing it in a matrix view. I believe it will be very valuable when you have created a large file and it would be good if this file could be shared for testing purposes. However, to me the issue seems to be to either define an extensible format based on xml or a non-extensible one optimized for a specific purpose like LIF. Any suggestions about options for BDI.SDD_ are appreciated! Regarding extensibility: Note that descriptions may be associated with images or documents. The "file" element you add in the <nop>ClassName for the first taxon should in BDI.SDD_ rather be a <nop>MediaResource ref in the description (we consider illustrations of the entire taxon part of description, not part of the taxon name). <nop>MediaResource can occur in the <nop>CodedDescription itself (after the Header) or in a specific character (if the image only applies to this). After defining a <nop>MediaResource in <nop>ExternalDataInterface: <verbatim> <MediaResource key="125"> <Label><Representation language="en"><Text>Melampsora evonymi-caprearum</Text></Representation></Label> <!-- Label is required, but if the source provides no separate title or description of a resource, the url may be used here as well --> <ObjectLink><URL>www.xxx.org/img/Melampsora_evonymi-caprearum.png</URL></ObjectLink> <Type>Image</Type> <Caption><!-- Caption is optional --> <Representation language="fr"><Text><i>Melampsora evonymi-caprearum</i> Kleb., stade II sur <i>Salix caprea</i>L.</Text></Representation> <Representation language="de"><Text><i>Melampsora evonymi-caprearum</i> Kleb., Sporenstadium II auf <i>Salix caprea</i> L.</Text></Representation> <Representation language="en"><Text><i>Melampsora evonymi-caprearum</i> Kleb., spore stage II on <i>Salix caprea</i> L.</Text></Representation> </Caption> </MediaResource> </verbatim> you can refer to such resources for the entire description like: <verbatim> <CodedDescription key="101"> <Header><ClassName ref="1"/></Header> <MediaResources><MediaResource ref="123"/><MediaResource ref="124"/></MediaResources> ... </verbatim> -- Main.GregorHagedorn - 7/8. Thank you for the last() hint, and I can't believe that I overlooked that in the first place. I changed xidopen in the State declatation to xidchosen, which I think is a bit clearer and avoids the problem of having two identical attributes serving different functions (though I realize it does not eliminate the problem that the attribute is there to begin with). For the time being, since we have no link between X:ID and other name sources, so I also eliminated the xidsource attribute. d226 1 a226 1 For State images, I left a short, empty <Icon/> tag in the Representation section for the states. Is this incorrect according to SDD? The reason for this, and for the xidimage tag in the Class definitions is that we initially were trying to avoid listing out all the image titles and URLs. On our server-side, we do not need this information for the functioning of X:ID, so we left out this information to save time and space. d231 7 d239 10 @ 1.12 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1086799278" format="1.0" version="1.12"}% d195 1 a195 1 * For states you can use <nop>MediaResources which is present inside Label/Representation/, in sequence after Text. There is also an Icon (type is mediaresourceref) between Text and <nop>MediaResources. For characters we don't have that at the moment directly, you would have to provide it in the Labels of a ConceptTree (a tree or list that defines an arrangement of characters). -- Main.GregorHagedorn - 9. Jun 2004 d197 1 a197 1 * This should go into the ObjectLinks inside the ClassName proxy. These are explicitly designed to define relations to outside datasource. Yours would be a good test, whether you can find a way to express these relations in a generalized way (LSIDs?) -- Main.GregorHagedorn - 9. Jun 2004 d221 12 a232 1 %META:FILEATTACHMENT{name="XIDwithSDD.xml" attr="" comment="" date="1086791975" path="C:\Documents and Settings\medinfo12\Desktop\XIDwithSDD.xml" size="22900" user="PatrickLeary" version="1.3"}% @ 1.11 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="PatrickLeary" date="1086792420" format="1.0" version="1.11"}% d82 1 a82 1 * A coding status value for unknown. This is present in <nop>GeneralDeclarations in the SDD example files, can simply be copied (<nop>CodingStatus key="2" debugkey="Unknown"). In the coded description <nop>CodingStatus normally replaces a state (but a comination is possible as well). Example: d87 1 a87 1 <CodingStatus ref="1"/> d90 2 a91 1 * A frequency modifier rare to be defined in Terminology/Modifiers which would be added for states scored as 3 (see SDD example file Modifier key="22"). Example: d195 1 d197 1 d199 1 d201 2 a202 1 * xidisopen does dertime if a character is open, and it also determines whether a state has been selected. I agree that a different attribute could be used for state selection, and I'll talk with Dave about that issue. d204 1 d212 9 @ 1.10 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1086770880" format="1.0" version="1.10"}% d179 1 d181 26 a206 1 %META:FILEATTACHMENT{name="XIDwithSDD.xml" attr="" comment="" date="1086724456" path="C:\Documents and Settings\medinfo12\Desktop\XIDwithSDD.xml" size="22336" user="PatrickLeary" version="1.2"}% @ 1.9 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="PatrickLeary" date="1086724680" format="1.0" version="1.9"}% d140 1 d142 2 a143 1 Also, for the time being, I changed all Lucid-style scores of "4-present but may be misinterpreted as absent" as just being presentl, since the feature itself is present seems to me the most relevant point. The fact that it can be misinterpreted could prehaps be another type of <nop>CodingStatus. d146 1 d151 15 d167 13 @ 1.8 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1086703740" format="1.0" version="1.8"}% d137 13 a149 1 %META:FILEATTACHMENT{name="XIDwithSDD.xml" attr="" comment="" date="1086613760" path="C:\Documents and Settings\medinfo12\Desktop\XIDwithSDD.xml" size="35813" user="PatrickLeary" version="1.1"}% @ 1.7 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="PatrickLeary" date="1086634500" format="1.0" version="1.7"}% d3 1 a3 1 David Remsen (dremsen-at-mbl.edu) from uBio (Universal Biological Indexer and Organizer) is leading the development of a new xml-based internet identification key. In this topic, he and his principal programmer Patrick will discuss questions about or problems with SDD. See http://uio.mbl.edu/services/key.html in general, and http://uio.mbl.edu/SDD/player.php for ongoing development of making it SDD compatible. ##(To see my WIKI starter notes, simply open rev. 1.2 of this topic.)## d5 1 a5 3 -- Main.GregorHagedorn - 07 Jun 2004 ------- d82 24 a105 3 * A coding status value for unknown. This is present in <nop>GeneralDeclarations in the SDD example files, can simply be copied (<nop>CodingStatus key="2" debugkey="Unknown"). * A frequency modifier rare to be defined in Terminology/Modifiers which would be added for states scored as 3 (see SDD example file Modifier key="22") * A certainty modifier with the special flag Specification/IsTrueByMisinterpretation set to true (see Modifier key="42"). This is to be used for Lucid value 5. d111 21 a131 1 Regarding extensibility: Note that descriptions may be associated with images or documents. In SDD the file element you add to in the <nop>ClassName for the first taxon should rather be a <nop>MediaResource ref in the description. This can occur in the <nop>CodedDescription itself (after the Header) or in a specific character (if the image only applies to this). a132 1 -- Main.GregorHagedorn - 07 Jun 2004 d134 1 a136 6 Gregor, as I have little experience with XML Schemas (I am quite familiar with DTDs), I am having a hard time finding the proper syntax you would like when adding these specialized scoring options. Could you please include a bit of XML that describes what you would like? Just a short example for each of the scoring options. Also, I am unclear with the <nop>MediaResource reference you would like, as you have no example in the XML files. The File tag that I have under the taxon (Class) refers only to that taxon (Class), it has no relevance to any characters, so where should the <nop>MediaResource tag go? And could you please also include and XML example of this syntax as well? Thanks. -- Main.PatrickLeary - 07 Jun 2004 @ 1.6 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1086622560" format="1.0" version="1.6"}% d81 2 d95 9 @ 1.5 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="PatrickLeary" date="1086620640" format="1.0" version="1.5"}% d35 1 a35 1 SDD is certainly verbose, but not that much (also, there is some redundancy between state and character refs. The character refs are redundant, but this was explicitly requested to simplify life for character-based processors). In SDD only positive statements are made, the absence of a state is implicit, provided that the character has been scored at all. The value you are using in your example:<br /> d45 2 d49 1 a49 1 Separately: any reason why you use xidisopen="F" xidlast="F" xidimage="F" xidmetric="F" instead of defining your extensions in a namespace like xid:isopen="F" xid:last="F" xid:image="F" xid:metric="F" as I proposed in the example file? �Did the namespace extension not work for you? I am asking this because the question whether attributes extension from other namespaces should be allowed in SDD needs to be answered. I tried to add that in the special XID-modified SDD version experimentally. Did I make it wrong? a52 1 a53 1 It would be helpful if the Schema to which this complies were provided here also. In general, possibly SDD should require that a schema be named in the XML PI (including enough to guarantee that the correct schema version is deducible for validity checking). d55 1 a55 1 -- Main.BobMorris - 07 Jun 2004 d59 1 a59 1 I have attached the [[%ATTACHURL%/SDDforXID.xsd][SDD Schema]] that was given to me by Gregor. It has been cut down in size for use specifically with X:ID. I have also attached the [[%ATTACHURL%/XIDexample.xml][example XML]] file that Gregor sent, again specifically for X:ID. d63 1 a63 1 As far as the <nop>CodedDescriptions section goes, I think I now understand you corectly, but let me verify. So, instead of listing the Lucid value for the state under a taxon, it is implied that a state is scored as present/pass for a taxon if there is a reference tag under the corresponding taxon? And absence of a reference tag under a taxon is understood to mean that the state is scored as absent/fail for that taxon? d74 2 a75 3 So, values are to be used only for metric data? In a Lucid-style key, metric data is not entered as a single value, but instead there are parameters. There are extreme upper, normal upper, normal lower, and extreme lower values. So, for one metric state/taxon pair, there are four metric values to be entered. Could you explain, or give a short example, of how I would go about defining these other ways of scoring a taxon. What are the SDD tags for defining other scores besides absent and present? d81 12 d94 1 a94 2 %META:FILEATTACHMENT{name="SDDforXID.xsd" attr="" comment="SDD Schema for XID" date="1086620456" path="C:\Documents and Settings\medinfo12\Desktop\SDDforXID.xsd" size="77083" user="PatrickLeary" version="1.1"}% %META:FILEATTACHMENT{name="XIDexample.xml" attr="" comment="Gregor's Example" date="1086620509" path="C:\Documents and Settings\medinfo12\Desktop\XIDexample.xml" size="8575" user="PatrickLeary" version="1.1"}% @ 1.4 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="BobMorris" date="1086619628" format="1.0" version="1.4"}% d56 26 d83 2 @ 1.3 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1086615180" format="1.0" version="1.3"}% d3 1 a3 1 David Remsen (dremsen-at-mbl.edu) from uBio (Universal Biological Indexer and Organizer) is leading the development of a new xml-based internet identification key. In this topic, he and his principle programmer Patrick will discuss questions about or problems with SDD. See http://uio.mbl.edu/services/key.html in general, and http://uio.mbl.edu/SDD/player.php for ongoing development of making it SDD compatible. ##(To see my WIKI starter notes, simply open rev. 1.2 of this topic.)## d50 6 @ 1.2 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="PatrickLeary" date="1086613860" format="1.0" version="1.2"}% d3 1 a3 17 David Remsen (dremsen-at-mbl.edu) from uBio (Universal Biological Indexer and Organizer) is leading the development of a new xml-based internet identification key In this topic, he and his principle programmer Patrick will discuss questions about or problems with SDD. See http://uio.mbl.edu/services/key.html in general, and http://uio.mbl.edu/SDD/player.php for ongoing development of making it SDD compatible. Note to David & Patrick: Please read the main SDD wiki topic linked on top. There you find how the register on the WIKI. You can change anything here, including the stuff I write. If you think the topic is On the WIKI you can use xhtml in addition to WIKI style! Useful for href, img, etc... You can attach files, and if some discussion should be rather a separate subtopic, simple write something in mix of upper and lower case letters, save, and then click on the ? after the new topic name to create a topic. Conversely, to link to a WIKI topic, simply type its name, the upper/lower case will take care of the linking. To type upper/lower case words not meant to be WIKI topics, add <nop> in front of it. If discussions are really minor points, after editing you can click the two checkboxes at the bottom of the preview screen, to avoid email notifications being sent to all SDD wiki members. I look forward to this, Gregor d11 1 a11 1 I have attached a example of the XML ([[%ATTACHURL%/XIDwithSDD.xml][XIDwithSDD.xml]]) that is now created by the X:ID Player. It is meant to comply with the latest version of SDD as supplied to us by Gregor. I was still unclear as to the structure and content of the "<nop>CodedDescriptions" section of SDD. It seems that this is where the matrix information of the Lucid-stlye .LIF file should go. For example, here is the matrix section of our .LIF to atlantic tunas: d23 1 a23 3 This matrix scrores taxa by rows and states by columns. So, it seems you would like, for each taxon, the values of each state in their respective character. If this is not how you intended the <nop>CodedDescriptions sections to look, could you post a longer example of this section, possibly 15 or more lines, so that I can change it to your standards. -- Main.PatrickLeary d25 1 a25 1 --------- d31 19 a49 3 -- Main.PatrickLeary * [[%ATTACHURL%/XIDwithSDD.xml][XIDwithSDD.xml]]: @ 1.1 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="GregorHagedorn" date="1086606660" format="1.0" version="1.1"}% d22 31 @