wiki-archive/twiki/data/UBIF/SchemaDiscussion.txt,v

1135 lines
36 KiB
Plaintext

head 1.39;
access;
symbols;
locks; strict;
comment @# @;
1.39
date 2009.11.25.03.14.42; author GarryJolleyRogers; state Exp;
branches;
next 1.38;
1.38
date 2009.11.20.02.35.37; author LeeBelbin; state Exp;
branches;
next 1.37;
1.37
date 2007.03.06.17.30.00; author TWikiGuest; state Exp;
branches;
next 1.36;
1.36
date 2006.05.10.08.57.52; author GregorHagedorn; state Exp;
branches;
next 1.35;
1.35
date 2006.05.08.10.42.50; author GregorHagedorn; state Exp;
branches;
next 1.34;
1.34
date 2005.05.12.04.46.04; author BobMorris; state Exp;
branches;
next 1.33;
1.33
date 2004.08.25.10.51.18; author GregorHagedorn; state Exp;
branches;
next 1.32;
1.32
date 2004.08.13.10.38.30; author GregorHagedorn; state Exp;
branches;
next 1.31;
1.31
date 2004.08.09.15.34.43; author GregorHagedorn; state Exp;
branches;
next 1.30;
1.30
date 2004.07.21.14.42.11; author GregorHagedorn; state Exp;
branches;
next 1.29;
1.29
date 2004.07.19.08.07.00; author GregorHagedorn; state Exp;
branches;
next 1.28;
1.28
date 2004.07.16.10.40.03; author GregorHagedorn; state Exp;
branches;
next 1.27;
1.27
date 2004.07.15.20.25.46; author GregorHagedorn; state Exp;
branches;
next 1.26;
1.26
date 2004.07.15.18.22.00; author GregorHagedorn; state Exp;
branches;
next 1.25;
1.25
date 2004.07.09.11.09.00; author GregorHagedorn; state Exp;
branches;
next 1.24;
1.24
date 2004.06.18.13.35.00; author GregorHagedorn; state Exp;
branches;
next 1.23;
1.23
date 2004.06.15.20.03.42; author RenatoDeGiovanni; state Exp;
branches;
next 1.22;
1.22
date 2004.06.15.10.26.11; author GregorHagedorn; state Exp;
branches;
next 1.21;
1.21
date 2004.06.11.10.14.00; author GregorHagedorn; state Exp;
branches;
next 1.20;
1.20
date 2004.06.11.09.08.11; author GregorHagedorn; state Exp;
branches;
next 1.19;
1.19
date 2004.06.10.08.39.03; author GregorHagedorn; state Exp;
branches;
next 1.18;
1.18
date 2004.06.09.10.10.00; author GregorHagedorn; state Exp;
branches;
next 1.17;
1.17
date 2004.06.02.13.41.00; author GregorHagedorn; state Exp;
branches;
next 1.16;
1.16
date 2004.06.01.17.10.00; author GregorHagedorn; state Exp;
branches;
next 1.15;
1.15
date 2004.06.01.08.41.00; author GregorHagedorn; state Exp;
branches;
next 1.14;
1.14
date 2004.05.30.22.13.22; author GregorHagedorn; state Exp;
branches;
next 1.13;
1.13
date 2004.05.28.12.15.00; author GregorHagedorn; state Exp;
branches;
next 1.12;
1.12
date 2004.05.25.11.04.37; author GregorHagedorn; state Exp;
branches;
next 1.11;
1.11
date 2004.05.25.05.45.07; author GregorHagedorn; state Exp;
branches;
next 1.10;
1.10
date 2004.05.24.10.53.00; author GregorHagedorn; state Exp;
branches;
next 1.9;
1.9
date 2004.05.24.07.54.28; author GregorHagedorn; state Exp;
branches;
next 1.8;
1.8
date 2004.05.22.20.40.42; author DonaldHobern; state Exp;
branches;
next 1.7;
1.7
date 2004.05.22.05.17.20; author DonaldHobern; state Exp;
branches;
next 1.6;
1.6
date 2004.05.21.17.25.00; author GregorHagedorn; state Exp;
branches;
next 1.5;
1.5
date 2004.05.21.12.36.00; author GregorHagedorn; state Exp;
branches;
next 1.4;
1.4
date 2004.05.21.12.30.33; author BobMorris; state Exp;
branches;
next 1.3;
1.3
date 2004.05.21.08.07.33; author DonaldHobern; state Exp;
branches;
next 1.2;
1.2
date 2004.04.19.16.27.49; author GregorHagedorn; state Exp;
branches;
next 1.1;
1.1
date 2004.04.18.13.43.52; author WalterBerendsohn; state Exp;
branches;
next ;
desc
@none
@
1.39
log
@none
@
text
@%META:TOPICINFO{author="GarryJolleyRogers" date="1259118882" format="1.1" version="1.39"}%
%META:TOPICPARENT{name="WebHome"}%
---+!! %TOPIC%
I propose the following features, please discuss specifics under the following topics:
* TopLevelStructure
* DerivationHistory
* ContentMetadata (metadata part)
* ObsoleteTopicProxyDataModel (other name: perhaps <nop>"ExternalDataInterfaces" section...)
* GloballyUniqueIDs (DOI, LSID, etc.)
* EnumeratedValues
Please also comment on the specific topics:
* !NEW UBIFDesignRequirements NEW!
* Name for the project/data set metadata part: ContentMetadataSearchingName
* ExtendLanguageWithNeutralAndUnknown
* CompositeDateTypeTypes
* ProtocolIssues
* PublicationRepresentation
(Resolved topics:)
* Name for proxy data section: ObsoleteProxyListsInAllTdwgGbifStandards (Use <nop>"ExternalDataInterface")
* ResolvedTopicCommonSchemaSearchingName (name for the common infrastructure parts should be UBIF)
* ResolvedTopicDatasetCapitalization (use Dataset instead of <nop>DataSet!)
* ResolvedTopicAttributesLowerOrUpperCase (use all lower case attributes)
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 19. July 2004
---
(The current schema can be found at BDI.SDD_.CurrentSchemaVersion)
%META:TOPICMOVED{by="GregorHagedorn" date="1089914416" from="SDD.UnifiedBioInfoFramework" to="UBIF.SchemaDiscussion"}%
@
1.38
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="LeeBelbin" date="1258684537" format="1.1" reprev="1.38" version="1.38"}%
d31 1
a31 1
(The current schema can be found at BDI.SDD.CurrentSchemaVersion)
@
1.37
log
@Added topic name via script
@
text
@d1 2
a4 2
%META:TOPICINFO{author="GregorHagedorn" date="1147251472" format="1.1" version="1.36"}%
%META:TOPICPARENT{name="WebHome"}%
d31 1
a31 1
(The current schema can be found at SDD.CurrentSchemaVersion)
@
1.36
log
@none
@
text
@d1 2
@
1.35
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1147084970" format="1.1" version="1.35"}%
d9 1
a9 1
* EnumerationTypes
@
1.34
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="BobMorris" date="1115873164" format="1.0" version="1.34"}%
d4 6
a9 6
* TopLevelStructure
* DerivationHistory
* ContentMetadata (metadata part)
* ProxyDataModel (other name: perhaps <nop>"ExternalDataInterfaces" section...)
* GloballyUniqueIDs (DOI, LSID, etc.)
* EnumerationTypes
d12 6
a17 6
* !NEW UBIFDesignRequirements NEW!
* Name for the project/data set metadata part: ContentMetadataSearchingName
* ExtendLanguageWithNeutralAndUnknown
* CompositeDateTypeTypes
* ProtocolIssues
* PublicationRepresentation
d20 4
a23 4
* Name for proxy data section: UseProxyListsInAllTdwgGbifStandards (Use <nop>"ExternalDataInterface")
* ResolvedTopicCommonSchemaSearchingName (name for the common infrastructure parts should be UBIF)
* ResolvedTopicDatasetCapitalization (use Dataset instead of <nop>DataSet!)
* ResolvedTopicAttributesLowerOrUpperCase (use all lower case attributes)
d31 1
@
1.33
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1093431078" format="1.0" version="1.33"}%
d3 28
a30 26
I propose the following features, please discuss specifics under the following topics:
* TopLevelStructure
* DerivationHistory
* ContentMetadata (metadata part)
* ProxyDataModel (other name: perhaps <nop>"ExternalDataInterfaces" section...)
* GloballyUniqueIDs (DOI, LSID, etc.)
* EnumerationTypes
Please also comment on the specific topics:
* !NEW UBIFDesignRequirements NEW!
* Name for the project/data set metadata part: ContentMetadataSearchingName
* ExtendLanguageWithNeutralAndUnknown
* CompositeDateTypeTypes
* ProtocolIssues
(Resolved topics:)
* Name for proxy data section: UseProxyListsInAllTdwgGbifStandards (Use <nop>"ExternalDataInterface")
* ResolvedTopicCommonSchemaSearchingName (name for the common infrastructure parts should be UBIF)
* ResolvedTopicDatasetCapitalization (use Dataset instead of <nop>DataSet!)
* ResolvedTopicAttributesLowerOrUpperCase (use all lower case attributes)
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 19. July 2004
---
(The current schema can be found at SDD.CurrentSchemaVersion)
@
1.32
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1092393510" format="1.0" version="1.32"}%
d9 1
a9 1
* EnumerationsTypes
@
1.31
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1092065683" format="1.0" version="1.31"}%
d6 1
a6 1
* SourceMetadata (metadata part)
d13 1
a13 1
* Name for the project/data set metadata part: SourceMetadataSearchingName
@
1.30
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1090420931" format="1.0" version="1.30"}%
d16 1
@
1.29
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1090224420" format="1.0" version="1.29"}%
d27 1
@
1.28
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1089974403" format="1.0" version="1.28"}%
a2 18
This topic is dedicated to the discussion of UBIF (Unified Bioscience Information Framework), a proposal to define a common foundation for several TDWG/GBIF standards like SDD, ABCD or <nop>TaxonNames (see also the [[http://tdwg.napier.ac.uk/phpwiki/index.php/HomePage][Taxonomic Concept Transfer Schema WIKI]]).
It may be helpful if below we could provide a brainstorm of which features are considered most important - perhaps with indications of priority - for overarching design patterns all TDWG schemata. I believe it may even be useful here, if you are not already intimately acquainted with past SDD or ABCD structures! -- [[Main.GregorHagedorn][Gregor Hagedorn]] - 19 Apr 2004
I think this first question is still valid. It potentially goes beyond that what I currently propose! -- [[Main.GregorHagedorn][Gregor Hagedorn]] - 2. June 2004
---
<h2>Overview over UBIF functionality</h2>
It will be desirable to have a common foundation schema for data exchange and integration across knowledge domains. It would be designed for biological data, but applicable to other knowledge areas as well. Its main features would be:
* A top-level structure of Datasets collections containing independent Dataset objects. The collection is purposely semantically neutral; relations between Dataset have to be discovered by the data consumer or are assumed to be implicit in the protocol requesting the data. [Please discuss in TopLevelStructure]
* Derivation metadata that support tracing and debugging the online transformation history data. They provide important technical information about access providers and the path of potentially multiple portals involved. [Please discuss in DerivationHistory]
* Source metadata describing the data collection from which the dataset was derived. The dataset may represent the entire dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and project metadata together provide all necessary information for UDDI support. [Please discuss in SourceMetadata]
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken. [Please discuss in ProxyDataModel, see also UseProxyListsInAllTdwgGbifStandards which contains a diagram!]
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 2. June 2004
---
d9 1
d23 1
a23 3
PS. Please review/comment on/correct the text given above under "Overview over UBIF functionality". I cannot do much more without feedback or contributions...
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 18. June 2004
@
1.27
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1089923146" format="1.0" version="1.27"}%
d29 1
a29 1
* !NEW SDD.UBIFDesignRequirements NEW!
@
1.26
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1089915720" format="1.0" version="1.26"}%
d26 1
a26 1
* SDD.GloballyUniqueIDs (DOI, LSID, etc.)
@
1.25
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1089371340" format="1.0" version="1.25"}%
d26 1
a26 1
* GloballyUniqueIDs (DOI, LSID, etc.)
d29 1
a29 1
* !NEW UBIFDesignRequirements NEW!
d32 1
d46 1
a46 1
%META:TOPICMOVED{by="GregorHagedorn" date="1086948948" from="SDD.OverarchingPatternsForTdwgSchemata" to="SDD.UnifiedBioInfoFramework"}%
@
1.24
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1087565700" format="1.0" version="1.24"}%
d26 1
d43 2
a44 2
---
@
1.23
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="RenatoDeGiovanni" date="1087329821" format="1.0" version="1.23"}%
d3 1
a3 1
This topic is dedicated to the discussion of UBIF (Unified Bioscience Information Framework), a proposal to define a common foundation for several TDWG/GBIF standards like SDD, ABCD or <nop>TaxonNames.
d10 11
a25 1
* ...
d27 2
a28 1
Please also comment on the specific minor topics:
d38 1
a38 13
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 1 June 2004
---
Here is an attempt to give an overview over the common schema, as I currently view it (but without having received much review...):
It will be desirable to have a common foundation schema for data exchange and integration across knowledge domains. It would be designed for biological data, but applicable to other knowledge areas as well. Its main features would be:
* A top-level structure of Datasets collections containing independent Dataset objects. The collection is purposely semantically neutral; relations between Dataset have to be discovered by the data consumer or are assumed to be implicit in the protocol requesting the data. [Please discuss in TopLevelStructure]
* Derivation metadata that support tracing and debugging the online transformation history data. They provide important technical information about access providers and the path of potentially multiple portals involved. [Please discuss in DerivationHistory]
* Source metadata describing the data collection from which the dataset was derived. The dataset may represent the entire dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and project metadata together provide all necessary information for UDDI support. [Please discuss in SourceMetadata]
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken. [Please discuss in ProxyDataModel, see also UseProxyListsInAllTdwgGbifStandards which contains a diagram!]
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 2. June 2004
d40 1
a40 1
Note: I tried to design the relation between common and specific (SDD, ABCD, <nop>TaxonNames) schemata as two namespaces but run into a DesignWith2NamespacesDoesNotValidate problem!
d42 2
a43 1
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 9. June 2004
@
1.22
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1087295171" format="1.0" version="1.22"}%
d3 1
a3 1
This topic is dedicated to the discussion of UBIF (Unified Bioscience Information Framework), a proposal to define a common foundation fro several TDWG/GBIF standards like SDD, ABCD or <nop>TaxonNames.
d37 1
a37 1
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace a links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken. [Please discuss in ProxyDataModel, see also UseProxyListsInAllTdwgGbifStandards which contains a diagram!]
@
1.21
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086948840" format="1.0" version="1.21"}%
d13 1
a13 1
* ProjectDefinition (metadata part)
d36 1
a36 1
* Source metadata describing the data collection from which the dataset was derived. The dataset may represent the entire dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and project metadata together provide all necessary information for UDDI support. [Please discuss in ProjectDefinition]
@
1.20
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086944891" format="1.0" version="1.20"}%
d3 3
a5 1
As a basis for the discussion of overarching issues that are applicable to several TDWG standards, it may be helpful if below we could provide a brainstorm of which features are considered most important - perhaps with indications of priority - for overarching design patterns all TDWG schemata. I believe it may even be useful here, if you are not already intimately acquainted with past SDD or ABCD structures! The SDD schema is under CurrentSchemaVersion. -- [[Main.GregorHagedorn][Gregor Hagedorn]] - 19 Apr 2004
a17 1
* Name for the common infrastructure parts: ResolvedTopicCommonSchemaSearchingName
a18 1
* Name for proxy data section: UseProxyListsInAllTdwgGbifStandards
d22 2
d36 1
a36 1
* Project metadata describing the data collection from which the dataset was derived. The dataset may represent the entire dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and project metadata together provide all necessary information for UDDI support. [Please discuss in ProjectDefinition]
d44 1
@
1.19
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086856743" format="1.0" version="1.19"}%
d16 1
a16 1
* Name for the common infrastructure parts: CommonInfrastructureSearchingName
@
1.18
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086775800" format="1.0" version="1.18"}%
d17 1
a17 1
* Name for the project/data set metadata part: DatasetMetadataSearchingName
@
1.17
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086183660" format="1.0" version="1.17"}%
d37 5
a41 2
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 2. June 2004
@
1.16
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086109800" format="1.0" version="1.16"}%
d3 1
a3 5
Overarching issues applicable to several TDWG standards
As a basis for the discussion, it may be helpful if below we could provide a brainstorm of which features are considered most important - perhaps with indications of priority - for overarching design patterns all TDWG schemata. I believe it may even be useful here, if you are not already intimately acquainted with past SDD or ABCD structures! The SDD schema is under CurrentSchemaVersion.
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 19 Apr 2004
d5 1
d8 1
a8 2
I try to split this into the following topic:
d11 2
a12 1
* ProxyDataModel
d21 1
a21 1
Resolved topics:
d28 11
@
1.15
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1086079260" format="1.0" version="1.15"}%
d5 1
a5 1
As a basis for the discussion, it may be helpful if below we could provide a brainstorm of which features are considered most important - perhaps with indications of priority - for overarching design patterns all TDWG schemata. I believe it may even be useful here, if you are not already intimately acquainted with past SDD or ABCD structures!
d25 2
a26 2
* ResolvedTopicDatasetCapitalization (use Dataset instead of <nop>DataSet?)
* ResolvedTopicAttributesLowerOrUpperCase
d28 1
a28 1
-- [[Main.GregorHagedorn][Gregor Hagedorn]] - 28 May 2004
@
1.14
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085955202" format="1.0" version="1.14"}%
d19 2
a20 1
* Name for CommonInfrastructureSearchingName
d30 1
a30 2
---
@
1.13
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085746500" format="1.0" version="1.13"}%
d19 1
a19 1
* Name for CommonInfrastructureNameForStandard
@
1.12
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085483077" format="1.0" version="1.12"}%
d7 3
a9 1
-- Gregor Hagedorn - 19 Apr 2004
d20 1
a20 1
* DatasetCapitalization (use Dataset instead of <nop>DataSet?)
a21 1
* AttributesLowerOrUpperCase
d23 7
a29 1
-- Gregor Hagedorn - 24 Apr 2004
@
1.11
log
@none
@
text
@d1 2
a2 2
%META:TOPICINFO{author="GregorHagedorn" date="1085463907" format="1.0" version="1.11"}%
%META:TOPICPARENT{name="SDD2004Berlin"}%
d22 2
a23 1
-- Gregor Hagedorn - 24 Apr 2004
@
1.10
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085395980" format="1.0" version="1.10"}%
d20 1
@
1.9
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085385268" format="1.0" version="1.9"}%
d9 13
a21 155
Following from the meeting in Berlin, here is my attempt to document what we are trying to achieve at the top level of the TDWG schemata. Our goal is to bring all of these standards to the point at which the document structure is as follows (using "Header", "Body" and "XxxMetadata" as placeholder names for now). This is an example which could possibly be valid but for which I have no use case at this point, including an SDD and an ABCD data set in the same document:
<verbatim>
<DataSets>
<DataSet>
<Header> <!-- Contains metadata elements common to all standards -->
<TransformationHistory/>
<ProjectMetadata/>
</Header>
<Body> <!-- Contains all schema specific metadata and data elements -->
<SDDMetadata/> <!-- Whatever SDD metadata remains from Header -->
<GeneralDeclarations/>
<Entities/>
<Resources/>
<Terminology/>
<Descriptions/>
</Body>
</DataSet>
<DataSet>
<Header> <!-- Contains metadata elements common to all standards -->
<TransformationHistory/>
<ProjectMetadata/>
</Header>
<Body> <!-- Contains all schema specific metadata and data elements -->
<ABCDMetadata/> <!-- Whatever ABCD metadata remains from Header -->
<Units/>
</Body>
</DataSet>
</DataSets>
</verbatim>
The exact selection of top-level elements to be included in the "Header" (and those which would remain standard-specific) is clearly undefined at this point (and there could be a future migration of extra elements into the common element). The Body element could be a type which gets extended or a substitution group according to what seems most manageable. Header is clearly superfluous but may help to clarify the structure.
Does this match with what others are expecting? Do we need to separate this off now as a separate TDWG activity?
-- Main.DonaldHobern - 21 May 2004
---
1. I would prefer to keep it here and invite everybody to join the WIKI. This discussion is working. I added the topic to our top-level topic (SDD.WebHome).
2. I prefer to omit the Header and Body section. I think little is gained by them. Furthermore, <nop>GeneralDeclarations, and Resources (i.e. the <nop>ProxyData for references to Agents, Media, Geography, Publication databases) should well be shared data types (used by multiple data standards) as well. However, they are not really "Header" information. So the distinction between metadata header and data body is NOT congruent with the distinction common/shared data types and specific types.
<verbatim>
<DataSets>
<DataSet>
<TransformationHistory/>
<ProjectMetadata/>
<SDDMetadata/> <!-- Whatever SDD metadata remains from Header -->
<GeneralDeclarations/>
<Entities/>
<Resources/>
<Terminology/>
<Descriptions/>
</DataSet>
<DataSet>
<TransformationHistory/>
<ProjectDefinition/>
<ABCDMetadata/>
<Units/>
</DataSet>
</DataSets>
</verbatim>
-- Gregor Hagedorn - 21 May 2004
---
To the Donald Hobern question: Does this match with what others are expecting? Do we need to separate this off now as a separate TDWG activity? I reply: yes and probabably. I also wonder if in future we might need even something like this:
<verbatim>
<gbif:Envelope xmlns="gbif" xmnls:gbif="http://gbif.org/ns/2004-001">
<Envelope-TransformationHistory>
<tdwg:DataSets xmlns="tdwg" xmlns:tdwg="blahblah">
<DataSets-TransformationHistory>
<DataSet>
<Header>
<DataSet-TransformationHistory>
...
</Header>
<Body>
...
</DataSet>
...
</tdwg:DataSets>
<somebodyElse:DataSets xmnls:somebodyElse="...">
...
</somebodyElse:DataSets>
</gbif:Envelope>
</verbatim>
This may be overkill, but it recognizes these issues, which may or may not be of concern (perhaps because they sometimes duplicate concerns of messaging-leval (e.g. SOAP) layers:
* simple service aggregation, e.g. response to "Send me everything you can discover about Ithomia patilla
* tracking forwarding through agents that don't modify content in any deep way (e.g. "I normalized the address from "Bob Morris at UMASS-Boston" to "urn:email:ram@@cs.umb.edu" and used "urn:serviceAddress:smtp.cs.umb.edu" to deliver it.)
* memorializing wholesale <nop>DataSets removal, e.g. "I have removed 12 <tdwg:DataSets> objects"
-- Bob Morris - 21 May 2004
---
Overkill... Your 3 level structure with transformations on each of these (Envelope, Datasets, Dataset) is not the end. You can have a common transformation on envelopes obtained from different sources. So the only solution would be to have a tree of Dataset objects, each node of which can define a transformation.
I think it is sufficient if you receive two Datasets with two Dataset objects each (A, B, C, D), you output one Datasets collection with ABCD and add your transformation to each of the histories inside ABCD. Note that if different datasets come from different sources or have different formats (1 ABCD, 2 SDD, 1 <nop>TaxonNames), different software agents may derive them. See the following two images:<br />
<img src="%ATTACHURLPATH%/TransformationsCombined.png" width="482" height="258" alt="Combined -&gt; tree" /><br />
(= this is Bob's DataSets-TransformationHistory)<br />
<img src="%ATTACHURLPATH%/TransformationsSeparate.png" width="482" height="237" alt="Separate -&gt; remains flat" /><br />
(= here all transformations are reduced to atomic DataSet-Transformations)<br />
I think you can do all your three bullets just with the simple solution.
See also the topic DerivationHistory and DatasetCapitalization!
-- Gregor Hagedorn - 21 May 2004
I agree with Gregor that we can simply merge different <nop>DataSets objects into one when we need to. They have no other purpose and have no metadata of their own (that is in each of the <nop>DataSet objects).
I would also be happy for us to lose the Header envelope from my model, but I am not so sure about the Body. It provides a simple place to do other things like adding constraints for the data elements in the <nop>DataSet. We could extend the <nop>DataSet itself but this feels cleaner to me (perhaps just a matter of choice).
-- Main.DonaldHobern - 22 May 2004
Donald is right, the constraints (especially identity constraints if key/keyref linking is used) are a problem. There are no problems with Derivations or Transformations, they are constraint free. However, depending on the design, there may already be a constraint involving the globally unique <nop>ProjectName/ID in <nop>ProjectMetadata. This occurs if a combined globally unique key is constructed from <nop>Project/Dataset ID + local object keys -- we may choose to fully change all local keys instead, making them immediately global, so that no constraint would have to point into <nop>ProjectMetadata.
However, in the current SDD design, the <nop>ProjectMetadata uses Resource objects, e.g. to define the geographical, taxonomic, and publication scope, <nop>ProjectIcon, or the agents owning or editing the entire project datasets. These are keyrefs pointing to Resources/Geography, <nop>Entities/ClassNames, Resources/Publications, <nop>Resources/MediaResources and Resources/Agents.
I hope that the different TDWG/GBIF standard groups can agree on a common Proxy-Model to refer to the reciprocal and external data. If so the <nop>ProxyData could become part of the header, in which case it would be easy to have:
<verbatim>
<DataSets>
<DataSet><!-- define xs:key to external resources/proxy data here -->
<TransformationHistory/>
<ProjectMetadata/>
<GeneralDeclarations/><!-- This may or may not be here -->
<ProxyData/><!-- including the elements currently in SDD.Entities and SDD.Resources -->
<SDD><!-- define xs:key to SDD-specifics like states, glossary, etc. here -->
<SDDMetadata/> <!-- Actually, at the moment we have none, only SDD.ConfigurationData -->
<Terminology/>
<CodedDescriptions/>
<IdentificationKeys/>
</SDD>
</DataSet>
<DataSet>
<TransformationHistory/>
<ProjectDefinition/>
<ABCD>
<Metadata/>
<Units/>
</ABCD>
</DataSet>
</DataSets>
</verbatim>
The design really depends on a discussion on the common use of a ProxyDataModel (which does not necessarily mean the currently proposed SDD.model. However, we are lacking a discussion here! At the moment I know that Walter Berendsohn is reluctant to adopt it because it makes the model more complicated, and I have not yet any feedback from Jessie Kennedy on it... Also, this is really a topic the GBIF DADI Science Subcommitee should comment upon. Anybody elses insight into the general architectural implications of defining such a generalized interface between knowledge and data domains, or into the specifics of the proposed linking mechanims is just as welcome! (Please use the ProxyDataModel topic for this discussion.)
-- Gregor Hagedorn - 24 May 2004
%META:FILEATTACHMENT{name="TransformationsCombined.png" attr="h" comment="" date="1085161946" path="C:\Data\Desktop\TransformationsCombined.png" size="7316" user="GregorHagedorn" version="1.1"}%
%META:FILEATTACHMENT{name="TransformationsSeparate.png" attr="h" comment="" date="1085162000" path="C:\Data\Desktop\TransformationsSeparate.png" size="8452" user="GregorHagedorn" version="1.1"}%
@
1.8
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="DonaldHobern" date="1085258442" format="1.0" version="1.8"}%
d15 1
a15 1
<ProjectDefinition/>
d29 1
a29 1
<ProjectDefinition/>
d55 1
a55 1
<ProjectDefinition/>
d108 1
a108 1
Overkill! Your 3 level structure with transformations on each of these (Envelope, Datasets, Dataset) is not the end. You can have a common transformation on envelopes obtained from different sources. So the only solution would be to have a tree of Dataset objects, each node of which can define a transformation.
d112 1
d114 1
d121 1
a121 1
I agree with Gregor that we can simply merge different <nop>DataSets objects into one when we need to. They have no other purpose and have no metadata of their own (that is in each of the <nop>DataSet objects).
d127 35
@
1.7
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="DonaldHobern" date="1085203040" format="1.0" version="1.7"}%
d119 1
a119 1
I agree with Gregor that we can simply merge different DataSets objects into one when we need to. They have no other purpose and have no metadata of their own (that is in each of the DataSet objects).
d121 3
a123 1
I would also be happy for us to lose the Header envelope from my model, but I am not so sure about the Body. It provides a simple place to do other things like adding constraints for the data elements in the DataSet. We could extend the DataSet itself but this feels cleaner to me (perhaps just a matter of choice).
a124 2
-- Main.DonaldHobern - 22 May 2004
@
1.6
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085160300" format="1.0" version="1.6"}%
d119 6
@
1.5
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1085142960" format="1.0" version="1.5"}%
d75 1
a75 2
To the Donald Hobern question: Does this match with what others are expecting? Do we need to separate this off now as a separate TDWG activity?
I reply: yes and probabably. I also wonder if in future we might need even something like this:
d84 1
d106 15
a120 2
---
@
1.4
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="BobMorris" date="1085142633" format="1.0" version="1.4"}%
d3 1
a3 1
I would prefer to discuss the over-arching topics ("Overarching issues applicable to several TDWG standards" and "Federated usage scenarios") on a single day. I think this merits an entire day, without presentations. The explicit aim for the Metadata part should be to make recommendations for all emerging TDWG/GBIF standards.
d5 1
a5 5
-- Main.WalterBerendsohn - 18 Apr 2004
I agree. The federation issues depend to some extent on a basic understanding of SDD, but the problematic questions can probably be discussed with a minimum understanding of SDD (or assuming SDD = DELTA). I changed the [[SDD2004Berlin][Agenda for review meeting in Berlin]] accordingly.
(As a basis for the discussion, it may be helpful if below we could provide a brainstorm of which features are considered most important - perhaps with indications of priority - for overarching design patterns all TDWG schemata. I believe it may even be useful here, if you are not already intimately acquainted with past SDD or ABCD structures!)
a19 1
<Terminology/>
d22 1
a45 1
To the Main.DonaldHobern - 21 May 2004 question: Does this match with what others are expecting? Do we need to separate this off now as a separate TDWG activity?
d47 30
a76 1
I reply: yes and probabably. I also wonder if in future we might need even something like this:
d99 1
a99 7
This may be overkill, but it recognizes these issues, which may or may not be of concern (perhaps because they sometimes duplicate concerns of messaging-leval (e.g. SOAP) layers:
* simple service aggregation, e.g. responae to "Send me everything you can discover about _Ithomia patilla_
* tracking forwarding through agents that don't modify content in any deep way (e.g. "I normalized the address from "Bob Morris at UMASS-Boston" to "urn:email:ram@@cs.umb.edu" and used "urn:serviceAddress:smtp.cs.umb.edu" to deliver it.)
* memorializing wholesale <nop>DataSets removal, e.g. "I have removed 12 &lt;tdwg:DataSets> objects"
d101 7
@
1.3
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="DonaldHobern" date="1085126853" format="1.0" version="1.3"}%
d47 36
a82 2
-- Main.DonaldHobern - 21 May 2004
@
1.2
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="GregorHagedorn" date="1082392069" format="1.0" version="1.2"}%
d13 29
a41 1
---
d43 6
@
1.1
log
@none
@
text
@d1 1
a1 1
%META:TOPICINFO{author="WalterBerendsohn" date="1082295831" format="1.0" version="1.1"}%
d3 12
a14 1
I would prefer to discuss the over-arching topics ("Overarching issues applicable to several TDWG standards" and "Federated usage scenarios") on a single day. I think this merits an entire day, without presentations. The explicit aim for the Metadata part should be to make recommendations for all emerging TDWG/GBIF standards. -- Main.WalterBerendsohn - 18 Apr 2004
@