head 1.3; access; symbols; locks; strict; comment @# @; 1.3 date 2006.03.22.01.02.25; author MichaelBrowne; state Exp; branches; next 1.2; 1.2 date 2006.03.21.22.50.56; author MichaelBrowne; state Exp; branches; next 1.1; 1.1 date 2006.03.14.14.35.50; author AnnieSimpson; state Exp; branches; next ; desc @none @ 1.3 log @none @ text @%META:TOPICINFO{author="MichaelBrowne" date="1142989345" format="1.0" version="1.3"}% %META:TOPICPARENT{name="AgadirMeeting"}% Report for Checklist Working Group Checklist working group discussed the elements to be included in the schema. During the break out sessions, discussion was concentrated on reported data. In this regard, fact sheets were not discussed but the group suggests that GISIN needs to keep a way to include them in the future. Checklist Working Group identified important concepts that consistently appear in checklists and should be supported by the schema. 1. Species name Species name (genus, species, “infraspecific categories”, author and date) is required as a string (more discussion needed RE string and separated elements). A synonym element should be included in the schema. 2. Location See location elements in current schema. Recommendation: decimal degrees are preferred for geographical coordinates. This could be an annotation to the geographic coordinates field. 3. Biostatus See biostatus elements in current schema and recommendations from the TerminologyWorkingGroup. 4. Impacts See impact elements in current schema and recommendations from the TerminologyWorkingGroup. 5. Population dynamic data Population dynamics elements in the current schema may be revised in consultation with the TerminologyWorkingGroup to include the following elements:  spatial extent;  spatial density;  means of spread;  rate of spread;  population rate of change;  contact Rob Emery for additional elements. 6. Introduction and/or observation dates Introduction and/or observations dates in the current schema may be revised to include the following elements.  dates of successive introductions;  date of first observation; 7. Pathways and dispersal 8. Use See use element in the current schema. 9. Management See management elements in the current schema. 10. Hosts See hosts elements in the current schema. 11. Referencing/documenting information sources Use objective criteria (ex. Peer reviewed literature, expert interviews, etc...) for reliability/validity statement, and date of record or last updated. _Issues that the break out group did not have time to discuss include:_ 12. Should there be a description container (e.g. measurements) 13. Habitat -- Main.AnnieSimpson - 14 Mar 2006 -- Discussion about robust TimeStamping of IAS data: MichaelBrowne In order to sell GISIN to donors, we should emphasis the role GISIN can play in providing the data that will underpin the IAS indicators being developed to monitor progress on global biodiversity targets. We should pay particular attention to ‘time stamping’ of data in the schema so that a snapshot of e.g. spatial extent of all known aquatic IAS in 2006 can be compared to a snapshot of their spatial extent in 2010. Other data types potentially useful for IAS indicators include; Money being spent on eradication/control Ecosystem response to eradication/control activities Pressure indicators: e.g. Number of border intercepts/incursions Number of new organisms introduced per annum Number of IAS by country Number of countries with active control programmes. Good 'Time stamping' of IAS data is also vital to modelling. AnnieSimpson Your comment about the ability to consult the GISIN taking time into account is very important. Along with how invasive species arrive (pathways), when (a time stamp) will be vital to modeling and measures of management success. Do you have suggestions on how to keep these parameters in the fore? E.g., is a date stamp a required element? (I think the answer is no, but we then need to create fillters to mask data without timestamps). JerryCooper I think you need to consider carefully what exactly you are time-stamping. Do you want to use the data to create a time sequence for the acquisition of the information about the distribution of organisms, or the time data embedded within that information? I suggest the community wants the latter and not the former. I.e. is it 'date stamp' data associated with the original collection/observation of the primary data record (already done via DC/ABCD), is it the date of publication of a compiled list (already done via the literature reference data), or is it the date that information was acquired via a GISIN indexer? Only the first is really useful for any meaningful time-series analysis of spread, the second is marginally useful, and the latter is simply bookeeping. MichaelBrowne Yes, we must consider carefully what we are time-stamping and yes, it is the time data embedded within information that we are interested in for time-series analysis. As you say (for an IAS indicator of spatial extent) embedded time data can potentially be; Date of observation (DC/ABCD) Date of first observation (Sometimes stated explicitly in IAS community, but derived in DC/ABCD data) Date of subsequent observations (DC/ABCD) Date of introduction (Often stated explicitly in IAS community) Date of publication of literature reference (e.g. date of publication of a compiled list) …there maybe other dates of interest? I guess the IAS indicators people could focus on seeking global DC/ABCD observation data for a few flagship IAS to establish their 2006 distributions, and then do it again in 2010, but it is likely that any increased spread revealed is partially due to increased availability of data. It is also likely that the 2006 global distribution of an IAS according to the literature is far more globally comprehensive than its 2006 distribution according to DC/ABCD observation data. Even though the time data embedded within the literature is less robust, this kind of data is probably the best available for global assessments. AnnieSimpson We've lived through the dangers of creating an ambiguous date stamp in metadata cataloguing tools here at NBII. I agree it is essential to distinguish what kind of date any record has and that 'date record acquired' is practically useless. But do we really need to distinguish between date of first observation (derived in DC/ABCD data) and date of introduction (not supported by existing standards)? Michael, you request more date types of interest, but I feel that (in the interest of keeping the cataloguing effective/streamline) we should not dig too deeply for more possibilities, but rather (if possible) map any new types to the ones you mention here. Bob Morris Obviously I am neutral by reason of incompetence on the question of \which/ elements have dates associated with them. However, the question of "date record acquired" or related issues is not as simple as it might appear. There are several reasons for this, the main one being cache maintenance. In general, I urge that data providers always offer some form of "guaranteeed good until" date (i.e. a promise not to update records before that date) so that applications (whether indexers or individual end-user applications) can have confidence that their current version of the record actually reflects what the data provider has. Most scientists tend to resist this on some principle that when a datum is known to need change they want it changed immediately in order to have the most up-to-date science available. I find that an odd view from people who are willing to tolerate months if not years for paper publications to appear, and who justify hoarding data so that the data gatherer can analyze it without being "scooped". Worst of all, what absence of a "good until" really induces is that an application which in fact cares whether it has up to date information is then motivated to put more traffic against the original data server instead of caches and indexers. Since service is likely to be faster from those secondary services, most applications will use them. In turn, the consequence is that most applications will be ignorant of the fact that the currency of the data they have is not, in fact, in the control of the policy of the original provider, but rather in the control of the policy of the secondary, which may be capricious, useless, undocumented, irregular, ill-advised, or all of the above. And the client applications have no way to tell, nor any way to tell which record is best if two apparently similar records acquired from different secondaries should differ. (The later one is not automatically the better in the case that yet a third, unknown, revision is lurking at the original provider---which might even have rolled the record back to the earlier one!). IAS is an interesting special case since EDRR [early detection and rapid response] puts a high value on new occurrence records. But I don't know if checklists are the appropriate place to address this. @ 1.2 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="MichaelBrowne" date="1142981456" format="1.0" version="1.2"}% d92 1 a92 1 Discussion about robust "time stamping" of IAS data: @ 1.1 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="AnnieSimpson" date="1142346950" format="1.0" version="1.1"}% d88 1 d91 2 a92 148 Priorities for Breakout Groups 1 C/T 12 STRATEGY - SHAWN 2 C/T 15 STRUCTURE - HANNU 3 C/T 4 CORE - BOB 4 C/T 26 DC/ABCD EXTENSIONS - TECH 5 24 MESSAGES - ODM 6 TECH 11 DOC-EXCHANGE - BOB 7 TECH 13 RECORDS - MARKUS 8 TECH 10 INTEGRITY - BOB 9 TECH 19 MAPPING - WOUTER 10 21 EXCLUSION - MICHAEL 11 C/T 20 SYNONYMS - WOUTER 12 C/T 23 TAXO/COMMON NAME - NAIMA 13 CHK 17 CORRECTION 14 CHK 14 DATA USE / OWNERSHIP - MARKUS 15 TECH 9 EXTENSIBILITY - BOB 16 TECH 8 XML-ENUM - BOB 17 TECH 7 STRONG TYPING - BOB 18 TECH 6 DOC-INSTANCE - BOB 19 TECH 5 DOC SIZE - BOB 50 MGT 22 DISSEMINATION - NAIMA 98 (16) UPDATE - CHRISTINE 99 (25) LIFEFORM - BOB 99 TECH 1 OUTSOURCE - KEVIN 99 CHK 2 USE - MICHAEL 99 CHK 3 NAMES - HANNU 99 (18) TIME REQUIRED - MICHAEL 1. OUTSOURCE - KEVIN Leave information out when/where possible - outsource data (e.g. common name doesn't belong). Future data services - potential sources for common names and other descriptive data. 2. USE - MICHAEL Michael: Ref: I3N Argentina database (common names for species- X - to be shared in the future?). How much of the data in each database should be included in the schema as a piece of the overall core global dataset? 50% of visitors to fishbase access via common names. But wrt GISIN schema - who will use it and what for? = most useful to educated users, familiar with the issue and data accessible via the schema. GISIN will not be useful e.g. to managers looking for management information - they need summarial information or guidance on species management. Majority of GISIN-users may be more interested in a GUID rather than a common name as an access-point. But a list of all common names of species-X would also be useful and maybe only obtainable from GISIN. Is this a job for a third party? 3. NAMES - HANNU Hannu: GISIN may be the first place where information about a species in a country is recorded - therefore common name is important. As a registry of organisms in new places - new common names may also be recorded for the first time in GISIN. Kevin - perhaps outsource this in the future. 4. CORE - BOB Small schema that would be a 'core' group of elements with various extensions might be more affective. Elements specific to taxonomic groups would be extensions of the core. Keep the core stable. Olivier: core consists mostly of references to other information items (have as many references as possible). 5. DOC SIZE - BOB Reduce instance document size. The schema is an XML document but it doesn't look like the documents that are produced from these databases. The schema is a set of rules that define these documents. (checklist) 6. DOC-INSTANCE - BOB Find improvements to schema that increase instance-document integrity so that it can't break as it moves around the network and goes through different processing applications. (technical). Instance documents whose integrity can be identified by existing software. 7. STRONG TYPING - BOB Strong typing. There are too many string types. E.g. long list of impact types needs to be moved into schema mechanisms so that processing software doesn't need to understand the commentary. Most of the things that are in the commentary right now - we need to agree on them. Enumerations that are comments in the present schema need to be turned into enumerations. (future?) Markus: if enumerations are forced then you limit extensibility. 8. XML-ENUM - BOB This is a weakness of XML schema. (Technical - and made recommendations to the other groups). (If an element is going to remain the same then it can be turned into enumerations e.g. realm) 9. EXTENSIBILITY - BOB Improve extensibility of the schema. (Technical) 10. INTEGRITY - BOB Improve integrity of the schema. There are some known weaknesses in XML. (Technical) 11. DOC-EXCHANGE - BOB Instance document exchange (transport). GBIF mechanisms are already in place (TAPIR). Good if these docs move using established protocols. (Technical) [LISTEN - BOB] Listen to other groups needs to accomplish these things effectively. 12. WHERE TO GO - SHAWN Scope of schema - where are we going with this project? TECHNICAL 1) Extensibility of the core for serving key audience needs. 2) Three main things - search for species/individual queries for that species - once the species is found - obtaining information about that species in a general way - collection locations for that species (for mapping, prediction etc.) 3) # records returned (TECHNICAL) e.g. find all species presence in a Bay = @@10,000 records returned (with collection data everywhere not just for that species in that Bay). Possible solution? Allow users to find species by pathway and location. Then allow users to narrow the search from the species list that results. Defines GUI but also schema functionality. Search and filter applications will take care of this. Do you want to filter as part of the schema development or at the application level? The discussion on 21: Primary data can be referred to with other tools and may not necessarly be part of the schema. d94 2 a95 1 These primary data can be accessed with a specific program d97 2 a98 1 Ask the technical people as why eliminate primary data, and ask for clarification about how to access data from GISIN. d100 23 a122 142 Does GiSIN has a list of dbses so data can be found? Data such as collection data are important, but this type of data can be accessed through GBIF. Extensibility 13. RECORDS - MARKUS Markus: Is it reasonable to expect a return of this many collection records? Michael: Roughly 20,000 known invasive species in the world (Rod Randall, Global Compendium of Weeds). Some are environmental weeds, others are agricultural weeds. Add this info. to information in the GISD = as many as 30K known IAS through all taxa. 40% of spp in the GISD that are not plants (e.g. mammals) = 20/30K species that are known with 30-40 global distribution records mostly at country level. This excludes collection and observation data described by Shawn. GBIF is a potential third-party source for collection and observation data but it lacks the native/alien/invasive flagging/definition. 14. DATA USE / OWNERSHIP MARKUS TECHNICAL & CHECKLIST Would the GISIN take the whole dataset or only the information for those species tagged as introduced. Fishbase has information on 29K species but only 600+ have been recorded as being moved around the world/introduced. There is information available for species that are not necessarily flagged as invasive elsewhere in the database. Re: utility of secondary datasets. All of the info in the dbase is from a record source (e.g. expert on a family). We would like the source reference to be flagged to give credit. Would there be a general reference set for the whole GISIN schema? Would those references stay within the databases? Hannu: that is part of the metadata. It's up to the metadata provider to make sure that ownership data is present. Kevin: there are several levels of ownership though (e.g. owner of the dbase, owner of particular datum. How do we track this? Michael: provide assignation of a reference to every single element in the schema. However this may be impractical to implement. Christine: This issue is associated with the reliability/validation of the data elements. 15. STRUCTURE - HANNU TOP PRIORITY Adapt the schema to capture data structure of the schema checklist – elements Genus plus species together in same box or separate? Synonym container? May be rferred by TCSGUID Should there be a description container? Hannu: ref - UK Nat. Biodiv. Network. How do we motivate GISIN providers to share their data? Michael: One motivation may be the ability to 'credit' the information source. Is the problem only with the level of work required to present that level of referencing information or is there another problem? Hannu: No it's only the hard work. References can be included for the sake of correctness or for giving credit. Make institution level metadata inheritable down to the element level. (CHECKLIST GROUP) [16. UPDATE - CHRISTINE] (Provider responsibility) Christine: Will there be an option for experts to enter and modify data that's already there but is incorrect? Michael: Corrections should be submitted to the provider. 17. CORRECTION Most checklists are of 'exotic' species that are not necessarily identified as 'invasive'. Metadata element: What is the criteria for including species on your list? Or biostatus element based on 'impacts'? 18. TIME REQUIRED - MICHAEL Michael: Tools given to providers to make data accessible to GISIN. What effort would a provider have to go to (how much time) to prepare data for GISIN? Hannu: Using the exercise example - it could be done in one morning. Markus: half a day but it depends on how close the dbase resembles the structure of the schema. E.g. differing terminology that's not a big issue. But if you need to separate taxonomic information and taxonomic reference to fit them into the schema then that would be more difficult and take more time. Specimen/collection database converted to suit the GISIN (taxon) dbase then that is a larger effort. 19. MAPPING - WOUTER TECHNICAL Some people will design or modify their dbase based on the schema. Create a mapping tool where providers could see and compare their fields with the schema fields/elements. There is such a tool in MSAccess to help map your database against other databases. 20. SYNONYMS - WOUTER CHECKLIST & TECHNICAL Tracing synonymies and common names. Related to validation - similar species (look-alikes) - list of experts 21. EXCLUSION - MICHAEL Michael: Should provider tools allow providers to keep certain data/elements out of GISIN? Hannu: Yes. Allow providers to filter what they are providing. 22. DISSEMINATION - NAIMA TECHNICAL Once the schema is finalized what will be the way of disseminating it to other countries so it can be used effectively? Annie: Within the GISIN secretariat, there will be a person responsible for determining this (an "implementer") along with incentives. 23. TAXO/COMMON NAME - NAIMA Search by taxonomic name and by common name. 24. MESSAGES - ODM Schema is for information exchange (request/response mechanism). We should think about the messages that will be exchanged. We may need to break the schema into different chunks according to the types of requests that might be submitted or messages exchanged. Wouter made comment about being able to clearly ID species independent of the information we have on the species. We should consider how the schema is going to be used as we develop it. Know your users and their needs. Allow basic and more advanced queries. 25. LIFEFORM - BOB Is it intended that that LifeForm have added to it an attribute with values constrained to the list in the annotation? -- BobMorris - 03 Jan 2006 26. DC EXTENSIONS - TECH Do we need Darwin Core/ABCD Extensions to the Schema for Primary Data? -- Main.AnnieSimpson - 14 Mar 2006 @