wiki-archive/twiki/data/InvasiveSpecies/GisinRequirements.txt

%META:TOPICINFO{author="RenatoDeGiovanni" date="1251753994" format="1.1" version="1.8"}%
%META:TOPICPARENT{name="WebHome"}%
---++Discussion Page for GISIN's Requrement as they relate to TDWG
This topic stems from discussions between Annie Simpson, Jim Graham and Donald Hobern.

---+++Donald's Seeding requirements outline
   1. Data models. GISIN has defined its data models (see note below*), but we should take these as inputs into TDWG's consideration of a unified ontology for biodiversity data, to make sure that all your elements have a natural place in the TDWG standards and that it is possible for the GISIN data to be encoded using TDWG standards (even if the element names are not always the same as yours).
   1. Data exchange. GISIN shares the requirement with all other projects for simplified data sharing approaches (by "data sharing", I am referring here to the publishing side of the equation. Integration is addressed below).  We should have in place
      a. Simple, robust TAPIR tools with easy installation and configuration for those who do want to handle sharing their own data through a their own queriable interface.
      a. Clear explanations of how to share data using the TDWG (and GISIN) data standards as static files (tab-delimited, CSV or XML files representing a whole data set).  This should be something which users can do without special tools and without expert knowledge.  In effect, it should include instructions of the form, 'For data sets held as Excel spreadsheets, select "File" -> "Save As..." and choose "Tab-delimited" as the file format'.
      a. Clear explanations of how to associate metadata with a TAPIR endpoint or a static file.
      a. Clear explanations of how to use LSIDs within data sets, including static files.
      a. Tools which can be hosted by GISIN, ALA and others to make LSIDs from TAPIR endpoints and static files resolvable without the data sharer having to do anything.
   1. Data integration - reusable tools for GISIN and others to build searchable caches and portals based on a set of data providers using the above data models and data exchange approaches.

-- Main.LeeBelbin - 25 Feb 2009

It's great to see everything available at http://wiki.tdwg.org/twiki/bin/view/InvasiveSpecies/WebHome. It was good to reacquaint myself with the huge version of IASPS that Jerry Cooper, Bob Morris and I developed. 

Thanks to Lee for getting the discussion going on Donald's &#8216;Seeding Requirements&#8217;. To me, the landscape ahead is confusing so I'm hoping that this discussion will  focus our thinking and contribute to the success of the next GISIN meeting:
1.       RE: Data models. Some GISIN data models (e.g. SpeciesStatus, Dispersal, Impacts, Management) don&#8217;t have a natural place in the TDWG standards and must be considered as extensions, right?
2.       RE: Data exchange. Hosting tools to make LSIDs resolvable sounds very interesting. Where do we get the universal LSIDs from, and when?
3.       RE: Data integration. I like the requirement for tools to build searchable caches and portals based on a set of data providers &#8211; that was key part of the original vision for GISIN.
A key requirement has been to offer simple solutions (e.g. the exchange of text files) in order to maximise opportunities for the IAS community to provide data, as well as php and asp toolkits for advanced data providers. How do we access (and integrate) data mediated via these different approches?

-- Main.MichaelBrowne - 27 Feb 2009

*Note from Annie - 3 of 6 defined GISIN data models have been implemented: Species Status, Species Resource URL, and Occurrence. Management Status, Impact Status, and Dispersal Status are expected to be implemented during a GISIN standards working group pre-meeting to e-Biosphere. A 7th and perhaps final model, Citations, which will likely be straight from Dublin Core, will hopefully be both defined and implemented then as well.

-- Main.AnnieSimpson - 18 Mar 2009

Ref. Donald's Seeding requirements outline, point 2.

I need more explanation of subpoints c, d, e: "Clear explanations of how to associate metadata with a TAPIR endpoint or a static file."; "Clear explanations of how to use LSIDs within data sets, including static files."; "Tools which can be hosted by GISIN, ALA and others to make LSIDs from TAPIR endpoints and static files resolvable without the data sharer having to do anything."

What is a 'TAPIR endpoint'? 

What do you mean by 'how to use LSIDs within datasets'? How to refer to them, write them, generate them?

And how can an LSID be resolvable without the data sharer having to do anything? THAT would be a good thing, but I cannot visualize it.
<blockquote>
Anybody with sufficient technical resources can issue an LSID, and can offer resolution services for that LSID, whether they are the holder of the data or not. Normally, one would expect that this is done with the consent of the data provider, but this is not a technical requirement. I imagine that Lee has in mind that the LSID issuance and resolution service be delegated by the provider to a third party, such as GBIF or GISIN, in a way that requires that delegated party  need access to nothing more than the data itself. For highest utility, that would probably entail some "metastandards" be developed that would help insure that the LSID "delegee" be doing something consistent across all data providers, but these are unlikely to need further involvement of the data contributor. 

Here's an analogy: Imagine you wish to be part of a distributed book lending service--- a sort of interlibrary loan service, except that patrons come directly to you to borrow the book. But you don't want on your own to issue or provide the library catalog data that tells exactly what the book is, where a patron should come to get it, whether it is checked out, and other such metadata. Further, your neighbors want to participate by having some of their books shelved in your house and have nothing to do with the whole lending process. In general, because ISBN's (the GUID) and catalog data are nowadays not issued by the book owner but by someone else unrelated to the book owner, there is very little for you to do except let people in the front door. As a book lender, you probably are motivated to do something that allows people to easily find the book once in---maybe maintain a shelf map indexed by ISBN, but your neighbor who owns some of the books doesn't do anything more than hand them to you for your lending service. To complete the analogy one thing is required: to uniquely identify the copy of the book available for lending in a way that can make its presence and identity known widely. If as the lending library you don't want to do this on your own, you mainly need to let an outside party in to look at your shelf map. They can, according to a scheme of their own choosing, make a globally unique identifier for your copy of the book, cature location data from your shelf map, and put this all n into a wonderful interlibrary loan catalog that will involve you not one whit. If that outside party is wise, they will choose a standardized unique identifier scheme whereby the identifier itself makes clear where patrons should go to find the interlibrary loan catalog itself, and thereby locate your copy of the book and information about it.   In fact, you don't even need a good shelf map if you are willing to let patrons wander around in your database---oops I mean your house---looking for the book. They mainly need to be able to know that, when they take a book off your shelf it corresponds to the one whose GUID they got from the interlibrary loan catalog. As it happens, any guid scheme corresponding to the [[http://www.ietf.org/rfc/rfc3406.txt][ IETF  URN specification]] and related specs meet these finding aid requirements.  LSID is one such URN scheme. 

-- Main.BobMorris - 26 Mar 2009
</blockquote>


-- Main.AnnieSimpson - 18 Mar 2009

The link to the 'Protocol Issues List' at http://www.niiss.org/cwis438/websites/GISINDirectory/Tech/Issue_List.php?Type=2&WebSiteID=4 will be useful for those working on implementation of ManagementStatus, ImpactStatus, and DispersalStatus during the GISIN standards working group pre-meeting to e-Biosphere. The Protocol Issues List outlines issues discussed and resolved during past GISIN meetings, and includes alternatives to the current versions of these 3 data models.

-- Main.MichaelBrowne - 26 Mar 2009

I have posted the following message to the TDWG list:

Dear TDWG Members and Friends,

Those of you who attended the conference in Perth last year will remember Annie Simpson's explanation of how hard it still is for a group like the Global Invasive Species Information Network (GISIN) to take TDWG's work and apply it directly to solve data integration issues in their community.

As I have previously suggested, I believe we should look closely at GISIN's requirements and make sure that those in a similar situation have all the tools and documentation required to start connecting data easily in ways that are compatible with TDWG standards.  We have most of the pieces in place.  We just need to organise and document them better.

Invasive species are of great significance to many countries.  This means that addressing GISIN's requirements is directly relevant to a large number of TDWG participants.  However their scenario is also a template for data sharing in many other biodiversity-related projects.  They need to aggregate information from many sources to populate several inter-related data models (BioStatus, Occurrences, ProfileURLs, ImpactStatus, ManagementStatus and DispersalStatus).  Several of these are very close to existing data models based on TDWG standards (for example, the Occurrence model can be considered to be extended Darwin Core).  Others are examples of the kind of community-specific data which all biodiversity projects need to share.

TDWG's goal should be to ensure that GISIN and similar groups can follow a simple set of instructions (a "cook book") and use standard tools to build a network of this kind, and that they can do so in a way which ensures that related communities can also benefit from the data they share.  For example, if GISIN members use the GISIN toolkit to share data, we ought to make sure that no further steps are required for GBIF, OBIS and others to integrate GISIN Occurrence data into their own indexes of species occurrences, or for EOL, ALA and others to integrate GISIN ImpactStatus, ManagementStatus and DispersalStatus data into their species profiles.

I am therefore writing this email as an appeal for TDWG members to step forward and help to complete the work we have started.  I am setting a challenge for us to do whatever still needs to be done to our standards, tools and data sharing recommendations so that we can produce such a cook book by the end of 2009.  I don't believe this is an unrealistic goal.  We can start work now on identifying and resolving issues.  We can plan activities at the 2009 TDWG conference in November to resolve any issues still outstanding at that point and to hold hackathon activities to fill any gaps in our tool set.  This work also aligns well with the e-Biosphere conference's plan to develop a roadmap for biodiversity informatics for the next 10 years.

We have added a page to the TDWG Invasive Species wiki to seed discussion of what we need to do - http://wiki.tdwg.org/twiki/bin/view/InvasiveSpecies/GisinRequirements.  Please take the time to look at this page (and at the information on GISIN's requirements at http://wiki.tdwg.org/twiki/bin/view/InvasiveSpecies/WebHome and http://www.niiss.org/cwis438/websites/GISINDirectory/Tech/ProtocolSpecification.php).  Append comments and suggestions on what else we may need to do to the wiki, or simply reply to this message.  In particular, if you are interested in being involved with the relevant TDWG Interest Groups in addressing some of the listed requirements, please contact me or the relevant Interest Group leaders.

I believe this is the most significant thing that TDWG can do right now to support the work of biodiversity informatics worldwide.  Please consider making your own contribution to make it happen.

Best wishes,

Donald

-- Main.DonaldHobern - 25 Mar 2009

Annie,

A 'TAPIR endpoint' or 'access point' is simply the address (URL) of a TAPIR service.

-- Main.RenatoDeGiovanni - 31 Aug 2009
Archive entire twiki setup from owl as of today 2016-02-24 10:46:01 +00:00			`%META:TOPICINFO{author="RenatoDeGiovanni" date="1251753994" format="1.1" version="1.8"}%`
			`%META:TOPICPARENT{name="WebHome"}%`
			`---++Discussion Page for GISIN's Requrement as they relate to TDWG`
			`This topic stems from discussions between Annie Simpson, Jim Graham and Donald Hobern.`

			`---+++Donald's Seeding requirements outline`
			`1. Data models. GISIN has defined its data models (see note below*), but we should take these as inputs into TDWG's consideration of a unified ontology for biodiversity data, to make sure that all your elements have a natural place in the TDWG standards and that it is possible for the GISIN data to be encoded using TDWG standards (even if the element names are not always the same as yours).`
			`1. Data exchange. GISIN shares the requirement with all other projects for simplified data sharing approaches (by "data sharing", I am referring here to the publishing side of the equation. Integration is addressed below). We should have in place`
			`a. Simple, robust TAPIR tools with easy installation and configuration for those who do want to handle sharing their own data through a their own queriable interface.`
			`a. Clear explanations of how to share data using the TDWG (and GISIN) data standards as static files (tab-delimited, CSV or XML files representing a whole data set). This should be something which users can do without special tools and without expert knowledge. In effect, it should include instructions of the form, 'For data sets held as Excel spreadsheets, select "File" -> "Save As..." and choose "Tab-delimited" as the file format'.`
			`a. Clear explanations of how to associate metadata with a TAPIR endpoint or a static file.`
			`a. Clear explanations of how to use LSIDs within data sets, including static files.`
			`a. Tools which can be hosted by GISIN, ALA and others to make LSIDs from TAPIR endpoints and static files resolvable without the data sharer having to do anything.`
			`1. Data integration - reusable tools for GISIN and others to build searchable caches and portals based on a set of data providers using the above data models and data exchange approaches.`

			`-- Main.LeeBelbin - 25 Feb 2009`

			`It's great to see everything available at http://wiki.tdwg.org/twiki/bin/view/InvasiveSpecies/WebHome. It was good to reacquaint myself with the huge version of IASPS that Jerry Cooper, Bob Morris and I developed.`

			`Thanks to Lee for getting the discussion going on Donald's ‘Seeding Requirements’. To me, the landscape ahead is confusing so I'm hoping that this discussion will focus our thinking and contribute to the success of the next GISIN meeting:`
			`1. RE: Data models. Some GISIN data models (e.g. SpeciesStatus, Dispersal, Impacts, Management) don’t have a natural place in the TDWG standards and must be considered as extensions, right?`
			`2. RE: Data exchange. Hosting tools to make LSIDs resolvable sounds very interesting. Where do we get the universal LSIDs from, and when?`
			`3. RE: Data integration. I like the requirement for tools to build searchable caches and portals based on a set of data providers – that was key part of the original vision for GISIN.`
			`A key requirement has been to offer simple solutions (e.g. the exchange of text files) in order to maximise opportunities for the IAS community to provide data, as well as php and asp toolkits for advanced data providers. How do we access (and integrate) data mediated via these different approches?`

			`-- Main.MichaelBrowne - 27 Feb 2009`

			`*Note from Annie - 3 of 6 defined GISIN data models have been implemented: Species Status, Species Resource URL, and Occurrence. Management Status, Impact Status, and Dispersal Status are expected to be implemented during a GISIN standards working group pre-meeting to e-Biosphere. A 7th and perhaps final model, Citations, which will likely be straight from Dublin Core, will hopefully be both defined and implemented then as well.`

			`-- Main.AnnieSimpson - 18 Mar 2009`

			`Ref. Donald's Seeding requirements outline, point 2.`

			`I need more explanation of subpoints c, d, e: "Clear explanations of how to associate metadata with a TAPIR endpoint or a static file."; "Clear explanations of how to use LSIDs within data sets, including static files."; "Tools which can be hosted by GISIN, ALA and others to make LSIDs from TAPIR endpoints and static files resolvable without the data sharer having to do anything."`

			`What is a 'TAPIR endpoint'?`

			`What do you mean by 'how to use LSIDs within datasets'? How to refer to them, write them, generate them?`

			`And how can an LSID be resolvable without the data sharer having to do anything? THAT would be a good thing, but I cannot visualize it.`
			`<blockquote>`
			Anybody with sufficient technical resources can issue an LSID, and can offer resolution services for that LSID, whether they are the holder of the data or not. Normally, one would expect that this is done with the consent of the data provider, but this is not a technical requirement. I imagine that Lee has in mind that the LSID issuance and resolution service be delegated by the provider to a third party, such as GBIF or GISIN, in a way that requires that delegated party need access to nothing more than the data itself. For highest utility, that would probably entail some "metastandards" be developed that would help insure that the LSID "delegee" be doing something consistent across all data providers, but these are unlikely to need further involvement of the data contributor.

			Here's an analogy: Imagine you wish to be part of a distributed book lending service--- a sort of interlibrary loan service, except that patrons come directly to you to borrow the book. But you don't want on your own to issue or provide the library catalog data that tells exactly what the book is, where a patron should come to get it, whether it is checked out, and other such metadata. Further, your neighbors want to participate by having some of their books shelved in your house and have nothing to do with the whole lending process. In general, because ISBN's (the GUID) and catalog data are nowadays not issued by the book owner but by someone else unrelated to the book owner, there is very little for you to do except let people in the front door. As a book lender, you probably are motivated to do something that allows people to easily find the book once in---maybe maintain a shelf map indexed by ISBN, but your neighbor who owns some of the books doesn't do anything more than hand them to you for your lending service. To complete the analogy one thing is required: to uniquely identify the copy of the book available for lending in a way that can make its presence and identity known widely. If as the lending library you don't want to do this on your own, you mainly need to let an outside party in to look at your shelf map. They can, according to a scheme of their own choosing, make a globally unique identifier for your copy of the book, cature location data from your shelf map, and put this all n into a wonderful interlibrary loan catalog that will involve you not one whit. If that outside party is wise, they will choose a standardized unique identifier scheme whereby the identifier itself makes clear where patrons should go to find the interlibrary loan catalog itself, and thereby locate your copy of the book and information about it. In fact, you don't even need a good shelf map if you are willing to let patrons wander around in your database---oops I mean your house---looking for the book. They mainly need to be able to know that, when they take a book off your shelf it corresponds to the one whose GUID they got from the interlibrary loan catalog. As it happens, any guid scheme corresponding to the [[http://www.ietf.org/rfc/rfc3406.txt][ IETF URN specification]] and related specs meet these finding aid requirements. LSID is one such URN scheme.

			`-- Main.BobMorris - 26 Mar 2009`
			`</blockquote>`


			`-- Main.AnnieSimpson - 18 Mar 2009`

			`The link to the 'Protocol Issues List' at http://www.niiss.org/cwis438/websites/GISINDirectory/Tech/Issue_List.php?Type=2&WebSiteID=4 will be useful for those working on implementation of ManagementStatus, ImpactStatus, and DispersalStatus during the GISIN standards working group pre-meeting to e-Biosphere. The Protocol Issues List outlines issues discussed and resolved during past GISIN meetings, and includes alternatives to the current versions of these 3 data models.`

			`-- Main.MichaelBrowne - 26 Mar 2009`

			`I have posted the following message to the TDWG list:`

			`Dear TDWG Members and Friends,`

			`Those of you who attended the conference in Perth last year will remember Annie Simpson's explanation of how hard it still is for a group like the Global Invasive Species Information Network (GISIN) to take TDWG's work and apply it directly to solve data integration issues in their community.`

			`As I have previously suggested, I believe we should look closely at GISIN's requirements and make sure that those in a similar situation have all the tools and documentation required to start connecting data easily in ways that are compatible with TDWG standards. We have most of the pieces in place. We just need to organise and document them better.`

			Invasive species are of great significance to many countries. This means that addressing GISIN's requirements is directly relevant to a large number of TDWG participants. However their scenario is also a template for data sharing in many other biodiversity-related projects. They need to aggregate information from many sources to populate several inter-related data models (BioStatus, Occurrences, ProfileURLs, ImpactStatus, ManagementStatus and DispersalStatus). Several of these are very close to existing data models based on TDWG standards (for example, the Occurrence model can be considered to be extended Darwin Core). Others are examples of the kind of community-specific data which all biodiversity projects need to share.

			TDWG's goal should be to ensure that GISIN and similar groups can follow a simple set of instructions (a "cook book") and use standard tools to build a network of this kind, and that they can do so in a way which ensures that related communities can also benefit from the data they share. For example, if GISIN members use the GISIN toolkit to share data, we ought to make sure that no further steps are required for GBIF, OBIS and others to integrate GISIN Occurrence data into their own indexes of species occurrences, or for EOL, ALA and others to integrate GISIN ImpactStatus, ManagementStatus and DispersalStatus data into their species profiles.

			I am therefore writing this email as an appeal for TDWG members to step forward and help to complete the work we have started. I am setting a challenge for us to do whatever still needs to be done to our standards, tools and data sharing recommendations so that we can produce such a cook book by the end of 2009. I don't believe this is an unrealistic goal. We can start work now on identifying and resolving issues. We can plan activities at the 2009 TDWG conference in November to resolve any issues still outstanding at that point and to hold hackathon activities to fill any gaps in our tool set. This work also aligns well with the e-Biosphere conference's plan to develop a roadmap for biodiversity informatics for the next 10 years.

			We have added a page to the TDWG Invasive Species wiki to seed discussion of what we need to do - http://wiki.tdwg.org/twiki/bin/view/InvasiveSpecies/GisinRequirements. Please take the time to look at this page (and at the information on GISIN's requirements at http://wiki.tdwg.org/twiki/bin/view/InvasiveSpecies/WebHome and http://www.niiss.org/cwis438/websites/GISINDirectory/Tech/ProtocolSpecification.php). Append comments and suggestions on what else we may need to do to the wiki, or simply reply to this message. In particular, if you are interested in being involved with the relevant TDWG Interest Groups in addressing some of the listed requirements, please contact me or the relevant Interest Group leaders.

			`I believe this is the most significant thing that TDWG can do right now to support the work of biodiversity informatics worldwide. Please consider making your own contribution to make it happen.`

			`Best wishes,`

			`Donald`

			`-- Main.DonaldHobern - 25 Mar 2009`

			`Annie,`

			`A 'TAPIR endpoint' or 'access point' is simply the address (URL) of a TAPIR service.`

			`-- Main.RenatoDeGiovanni - 31 Aug 2009`