wiki-archive/twiki/data/InvasiveSpecies/TechnicalImplementationDayT...

316 lines
5.7 KiB
Plaintext

%META:TOPICINFO{author="AnnieSimpson" date="1142378227" format="1.0" version="1.1"}%
%META:TOPICPARENT{name="TechnicalImplementationWorkingGroup"}%
TECH GROUP
2006-02-22 - Day 2
Recommendations indexed by priority number
Michael
Structure issue
Taxon/Location
Location/Taxon
Location + Taxon + Relations
What to do with it?
Checklist Structure
Bob (BM): This is an XSLT issue.
Markus (MD): XSLT is fast
<taxon id='T1'>
<location='L1'> </location>
<location='L2'> </location>
</taxon>
<location='L1'>
<taxon id='T1'> </taxon>
<taxon id='T2'> </taxon>
</location>
|| *Advantages* | *Drawbacks* |
|1. Taxon/Location|Natural choice, Human readable, Final choice by SD (after trials), Match facts sheet |Repetition|
|2. Location/Taxon |Match checklist, Match source, XSLT==> Option 1 |Repetition, Location is harder to manage|
|3. Relational|Size, Match database structure, More flexibility|Manage keys, Require key generation (data is not normalized), More application tools, Difficult to process input|
*MB: Would Expertise fit as a top-level element?*
*BM: Likely to have keys/ID*
Hannu (HS): The data provider could map the data into the desired structure
What would be the impact on the checklist?
Shawn (SD): Keep it relational
Facts sheet = about 1 group of species + data from provider
this is weakly location-dependent
Facts sheet can vary a lot
HS: GBIF data repository tool can be used to process you spreadsheet data.
BM: Tools almost exist
DS: Not easy to normalize from actual data
Risk of duplication / Matching problems
BM: if we accept spreadsheet
- indentation
- multiple items in same cell (concatenated items)
MD: option 1 is more human-readable
(Wouter) WA: How do you extent the model if you should option 1?
SD: Define unique identifaction process
Location can be anything; difficult ==> Species (Option 1.)
BM/SD: Scientific name is not a good key ==> need better identification
HS: 10 more years for good naming system
HS: People will need to design a back-end database
BM: look at exercises ==> both Taxon (=1) & Location (=3)
IAS experts work in a specific location
MD: Favours Option 2.
_Taxon specific could be managed worldwide_
BM: Extensibility?
Problem if XML references IDREF
ODM: What will be do with the data?
SD/MD/HS/BM: portals can be built
WA: Problem using speadsheets / people mess it up
HS: Give users a tool to generate correct XML
BM: Excel file with comments & VBS = Solution
HS: Repository tool to validate input spreadsheets
---+++1. Strategy
MB: This is 100% technical
BM: Is there any restriction in the checklist?
MB: Use scientific name or common names + synonyms
BM: There is variation in the use of names
Best practices manuals
MD: ABCD is different: Higher taxon
MB: this is better. Let's use ABCD Taxonomic type
Add synonyms to the schema
MB/BM: Biostatus could refer to the world
MD: Can we add DateOfLastObservation (for extinct species)?
MB: Please refer to ProjectOrCaseStudy
BM: We should not specialized any of the types. Leave them optional
BM: biostatus & optional facts
but biostatus is part of facts. Not a good idea
MD: biostatus is generic?
BM: We need to rearrange facts/biostatus ==> HS: later
---+++Defer 1, 2 & 3
SD: Strategy is done
_HS: Need to look at use cases_
_BM: need scenarios (several hundreds)_
---+++DC/ABCD Extensions
BM: No more primary data
HS: Providers are keen to share primary data
SD/MD/HS: Some special data
HS: How do we support this additional unique information?
_MD: It could be recorded as a fact_
ABCD/EGS (Extended for GeoScientists)
_MD: Extension to a fact_
---+++Architecture
MD: List of supported operations (TAPIR):
- Get metadata about the host
- Capabilities (more technical description of information/functions available)
- Inventory operations (paging): list of distinct values / optional count/group by
- Search (paging):
- Get full / subset of schema
- (HS) Statistics: could be interesting
- Log request: proposal
- Ping
- Read/Only: only accept annotations
There is existing software
DIGIR does not support our schema
HS: "BioCase" protocol
MD: TAPIR is a messaging protocol
BM: Document data quality
SD: If I need to share a database
With DIGIR/BioCase/TAPIR (HS: about an hour setup time)
- Install PHP or other (ZOPE, Perl, XML, No ASP)
- Install DIGIR / provider software
- Define metadata for provider
- Map local DB (R/W) to DC schema (internal mapping)
- Register to GBIF UDDI (GISIN will be a new thematic network)
- GBIF requests a confirmation from the node manager before indexing
WA/MD: ODBC does not solve the differences in SQL syntax (ex. Group By problems)
BM: someone will implement it under SQL Server
- ODBC to spreadsheet/Access option
- MD: Easy
- SD: Are you sure? What about MS-SQL, MS-Access?
ODM: Could we port TAPIR to SOAP
MD: requires work
MD: SOAP does not bring a significant advantage
HS/MD: TAPIR is another type of web service (HTTP GET)
SD: We could think about simple Get Request/Response exchanges
- Request information on a species
- Parameters: Species, Location, ...
---+++TAPIR vs SOAP
_BM: DIGIR/TAPIR is the recommended approach_
GBIF has addressed complex issues (caching/indexing)
*BM: TAPIR should be the transport mechanism*
*Keep it open*
*Adapt to MS world/SOAP ==> more development required*
SD: Options to implement a data provider
MD: Different ways to harvest data to populate GBIF central cache
HS: ToR aim at an open architecture for GISIN
In practice. Ex. Malika in Morocco.
Most developing countries will not implement a sophisticated data provider
_GISIN will be an heterogeneous network_
*There should be an easy implementation for each popular platform.*
-- Main.AnnieSimpson - 14 Mar 2006