wiki-archive/twiki/temp-gjr/BDI/SDD/AmnhLegacyLiteratureProject...

268 lines
8.8 KiB
Plaintext
Raw Permalink Normal View History

head 1.5;
access;
symbols;
locks; strict;
comment @# @;
1.5
date 2009.11.25.03.14.30; author GarryJolleyRogers; state Exp;
branches;
next 1.4;
1.4
date 2009.11.20.02.45.21; author LeeBelbin; state Exp;
branches;
next 1.3;
1.3
date 2007.03.06.17.30.00; author TWikiGuest; state Exp;
branches;
next 1.2;
1.2
date 2005.01.06.18.48.30; author DrewKoning; state Exp;
branches;
next 1.1;
1.1
date 2005.01.06.16.21.52; author BobMorris; state Exp;
branches;
next ;
desc
@none
@
1.5
log
@none
@
text
@%META:TOPICINFO{author="GarryJolleyRogers" date="1259118870" format="1.1" version="1.5"}%
---+!! %TOPIC%
* [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]]: Natural Language Coding of a taxonomic treatment in the AMNH legacy literature project
This is my first attempt at an BDI.SDD_ coding of the Novitates ant
treatment of Baikurus casei from the digitized version of
http://research.amnh.org/informatics/ants/pdf/N3208.pdf the initial
digitization of which is
http://research.amnh.org/informatics/ants/xml_lite/N3208.xml
See also other representations of this particular treatment at
http://research.amnh.org/informatics/ants/
Especially see Ants.WebHome
I've done the treatment entirely with the
<nop>NaturalLanguageDescriptions ("NLD") of
BDI.SDD_. This kind of coding is intended exactly for this purpose, and is
designed to encode the text at any level of conceptual granularity
desired. In particular, it is the aim of NLD to support subsequent
refinement of the markup to subsequently refined concepts. For
example, in the document [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]], I use an BDI.SDD_
Terminology with two <nop>ConceptTrees. The first <nop>ConceptTree is a (rather flat) tree of BDI.SDD_
whose <nop>ConceptTreeType is <nop>PartsCompositionHierarchy. It has Concepts
named HEAD, ALITRUNK, and GASTER, corresponding to the named parts in
the treatment. The second tree is of type <nop>MethodHierarchy and has
entries labelled MATERIALS EXAMINED, ETYMOLOGY, and DISCUSSION. (These
probably do not really correspond to the intent of a <nop>MethodHierarchy
in BDI.SDD_, which is more designed to detail how data was
gathered. However, it is interesting to note that the
<nop>ConceptTrees could quite easily be reorganized, retyped, and
renamed (i.e. relabeled) in the Terminology without having to recode
the <nop>NaturalLanguageDescription. That's because in the Description are
only references to the keys that uniquely specify the Concept and
these would survive such a reorganization in most cases. (There are
uniqueness constraints on those keys that would have to be maintained
in such a reorganization.)
The essential point of BDI.SDD_ NLD markup is that subsequent agents acting
on the document with a refined Terminology could mark it up
further. For example, the HEAD section could be refined to separate
markup of the ocelli, the eyes, etc., all the way to detailed
character/state based markup that would be as searchable with XQuery
as had it derived from a database in the first place. Accomplishing
that markup is either expert handwork, or the subject of research in natural
language processing of morphological characters such as that in Bryan
Heidorn's lab.
What's missing in this coding. It's important to the AMNH work to also
include certain automatically marked document characteristics,
including page boundaries, figure locations, etc. I haven't tried to
capture that here in part because there have not yet been any attempts
to use BDI.SDD_ markup as fragments, e.g. via namespaces, of something
else. This doesn't seem difficult in principle, although the heavy
dependence of BDI.SDD_ on key/keyref mechanisms could make validation
complicated in such a scenario. Particularly unclear to me are the
consequences of the (common) case where these document objects come
within an informational object, such as the text of the use of a
Concept in a <nop>NaturalLanguageDescription. So I have just ignored
them in [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]]. By design, BDI.SDD_ has only a single
top-level element (<nop>DescriptiveData) which might make it annoying
to use BDI.SDD_ objects embedded in something other than an BDI.SDD_ document,
which is what I have coded here.
Less obscure is that I have omitted the (well-defined) use of
mechanisms for describing special objects like tables, figures, and
images. Normally in BDI.SDD_ these might be treated as part of the
<nop>ExternalDataInterface except that they are embedded in the
document in this case. So possibly they are just another kind of
Concept, though I solicit BDI.SDD_'ers opionion on whether that makes
sense. This might be made moot by suitable mechanisms for embedding
BDI.SDD_ elements in other kinds of documents. Possibly there is a
difficult schema integration problem here.
Finally note two things:
1. [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]] has had the debugRef tool applied so
that it is easier to see in situ what the Concepts used in
descriptions are since the descriptions just
use references, much akin to foreign keys in a traditional
database. The tool inserts a heuristically chosen text (usually a
mandatory label) from the referenced object---here a Concept---at the
point of its reference.
2. My experience with BDI.SDD_, both in the design and use, has
been with <nop>CodedDescriptions. I hope someone who has attempted NLDs
will comment on how to improve what I've done here.
-- Main.BobMorris - 06 Jan 2005
-- Main.BobMorris - 06 Jan 2005
%META:FILEATTACHMENT{name="NovitatesTreatment_1_0.xml" attr="" comment="Natural Language Coding of a taxonomic treatment in the AMNH legacy literature project" date="1105037309" path="NovitatesTreatment_1_0.xml" size="8744" user="DrewKoning" version="1.2"}%
@
1.4
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="LeeBelbin" date="1258685121" format="1.1" reprev="1.4" version="1.4"}%
d7 1
a7 1
This is my first attempt at an BDI.SDD coding of the Novitates ant
d19 1
a19 1
BDI.SDD. This kind of coding is intended exactly for this purpose, and is
d23 2
a24 2
example, in the document [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]], I use an BDI.SDD
Terminology with two <nop>ConceptTrees. The first <nop>ConceptTree is a (rather flat) tree of BDI.SDD
d30 1
a30 1
in BDI.SDD, which is more designed to detail how data was
d40 1
a40 1
The essential point of BDI.SDD NLD markup is that subsequent agents acting
d54 1
a54 1
to use BDI.SDD markup as fragments, e.g. via namespaces, of something
d56 1
a56 1
dependence of BDI.SDD on key/keyref mechanisms could make validation
d61 1
a61 1
them in [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]]. By design, BDI.SDD has only a single
d63 1
a63 1
to use BDI.SDD objects embedded in something other than an BDI.SDD document,
d68 1
a68 1
images. Normally in BDI.SDD these might be treated as part of the
d71 1
a71 1
Concept, though I solicit BDI.SDD'ers opionion on whether that makes
d73 1
a73 1
BDI.SDD elements in other kinds of documents. Possibly there is a
d86 1
a86 1
2. My experience with BDI.SDD, both in the design and use, has
@
1.3
log
@Added topic name via script
@
text
@d1 1
a3 1
%META:TOPICINFO{author="DrewKoning" date="1105037310" format="1.0" version="1.2"}%
d7 1
a7 1
This is my first attempt at an SDD coding of the Novitates ant
d19 1
a19 1
SDD. This kind of coding is intended exactly for this purpose, and is
d23 2
a24 2
example, in the document [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]], I use an SDD
Terminology with two <nop>ConceptTrees. The first <nop>ConceptTree is a (rather flat) tree of SDD
d30 1
a30 1
in SDD, which is more designed to detail how data was
d40 1
a40 1
The essential point of SDD NLD markup is that subsequent agents acting
d54 1
a54 1
to use SDD markup as fragments, e.g. via namespaces, of something
d56 1
a56 1
dependence of SDD on key/keyref mechanisms could make validation
d61 1
a61 1
them in [[%ATTACHURL%/NovitatesTreatment_1_0.xml][NovitatesTreatment_1_0.xml]]. By design, SDD has only a single
d63 1
a63 1
to use SDD objects embedded in something other than an SDD document,
d68 1
a68 1
images. Normally in SDD these might be treated as part of the
d71 1
a71 1
Concept, though I solicit SDD'ers opionion on whether that makes
d73 1
a73 1
SDD elements in other kinds of documents. Possibly there is a
d86 1
a86 1
2. My experience with SDD, both in the design and use, has
d93 1
a93 1
@
1.2
log
@none
@
text
@d1 2
@
1.1
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="BobMorris" date="1105028512" format="1.0" version="1.1"}%
d93 1
a93 1
%META:FILEATTACHMENT{name="NovitatesTreatment_1_0.xml" attr="" comment="Natural Language Coding of a taxonomic treatment in the AMNH legacy literature project" date="1105028309" path="NovitatesTreatment_1_0.xml" size="8733" user="BobMorris" version="1.1"}%
@