wiki-archive/twiki/data/TAPIR/DatasourceServiceProposal.txt

---++++ General strategy

Concentrate only on a datasource service protocol. Try to get the best from previous proposals, solve remaining inconsistencies, and also incorporate some new suggestions.

---++++ Files
   * *Protocol Schema* :

---+++++ Types of services

The 3 types of services identified (DatasourceServices, ProviderServices and MessageBrokerServices) are similar in many aspects, but they are not the same. Using the same protocol schema to validate messages from all services would certainly make the schema more complicated and probably less restrictive (allowing many unreal combinations of elements). Using the same specification to describe the all services would probably lack the desired clarity and simplicity. Ideally we would need 3 separate specifications and 3 separate protocols, but at the present moment it may not be possible to produce all of them.

This proposal tries to address only the most important service: the DatasourceService.

Additional reasons for NOT addressing ProviderService now:

   * The capability to query more than one datasource at the same time is a new feature (not directly related to the integration of DiGIR and BioCASE, and not necessarily easy to implement).
   * The capability of being a discovery service for local datasources is only being used by DiGIR and could be easily substituted by a UDDI service (either by creating new themes in the GBIF UDDI repository, or by installing a  free and open source repository like [[http://ws.apache.org/juddi/][jUDDI]]).

Additional reasons for NOT addressing MessageBrokerService now:

   * Although the DiGIR protocol has some elements especifically related to that service (e.g. "responseWrapper" element and multiple destinations), it is not validating all messages exchanged between clients and "portal" engine, and it seems the protocol is not officialy supposed to specify these types of messages. BioCASE on its turn uses a java API for that purpose. Making the protocol compliant with MessageBrokers would not be a trivial task, and probably other more generic protocols would better support query distribution.

---++++ Details

   * Address only DatasourceServices (resources) in the protocol.

   * Use the same approach regarding AccessPoints:
   * Only resources will have an access point (provider software access point will not be covered by the protocol).
   * Resources will accept all kind of requests.

   * Eliminate DifferencesInHeaderInformation by (see DatasourceHeaderProposalOne):

   * Removing the optional attribute "resource" in both "source" and "destination" elements (affects DiGIR).
   * Removing the "destination" element (affects both).
   * Allowing multiple "source" elements to be able to track each address in the process, and incorporating the "sendTime" element as an attribute of "source". However, intermediaries don't need to stamp a source element - it's an optional behavior (affects both - related to new BioCASE proposal).
   * Making the address ("accesspoint") an attribute of "source", not its content. In responses from a resource, the address should be the exact accesspoint of the resource (affects both).
   * Including an optional "software" element with attributes "name" and "version" inside the "source" element. Software could have many occurrences (affects both).
   * remove the "type" element and only use the immediate element after "header" to determine the type of requests/responses.

   * Eliminate DifferencesInRequestTypes by:
   * Including a MetadataRequest DatasourceServiceProposal Response (affects BioCASE).
   * Defining the metadata elements in a separate schema (affects both).
   * Including a CapabilitiesRequest DatasourceServiceProposal Response (affects DiGIR).
   * Including a PingRequest DatasourceServiceProposal Response (affects both).

   * Use the same metadata response initially based on DiGIR but with some changes: (see DatasourceMetadataProposalTwo)
   * Resource renamed to datasource.
   * Included accessPoint in the datasource.
   * Removed the implementation element (duplicated).
   * Moved elements minQueryTermLength, maxSearchResponseRecords, maxInventoryResponseRecords to the CapabilitiesRequest.
   * Removed recordBasis and recordIdentifier elements.
   * Changed useRestrictions to a generic "rights" element (equivalent to previously suggested IPR and to a Dublin Core element). Accepts xml:lang attribute.
   * Provider metadata is now a datasource relatedEntity.
   * Datasource name renamed to label and accepts xml:lang attribute.
   * Included generic multiple element typeOfContent. Networks should agree on a controlled vocabulary. (equivalent to Dublin Core "type" element).
   * Keywords element accepts xml:lang and has maxOccurs unbounded.
   * Citation is an element (with maxOccurs unbounded and xml:lang).
   * Included "language" element related to the datasource content (from Dublin Core).
   * Conceptual schema needs an associated schema location in its content.
   * Datasource has a list of related entities.
   * Each entity needs a GUID.
   * Each entity accepts a list of names (xml:lang), a list of roles (with controlled vocabulary defined by networks), a logo url, a description (xml:lang), an acronym and related information links.
   * If values of related entities come from a local cache, we recommend a diagnostics warning in responses.

   * Use the same CapabilitiesRequest and Response by: (see DatasourceCapabilitiesProposalOne)
   * Including a list of conceptual schemas being used (and all mapped concepts).
   * Including a list of request methods supported.
   * Including a list of local settings (minQueryTermLength and a generic maxResponseSize substituting maxSearchResponseRecords and maxInventoryResponseRecords).
   * Optionally including a more detailed list of software in the response.
   * Including supported operators separated by categories.

   * Eliminate DifferencesInConceptualBinding by: (see ConceptualBindingProposalOne)
   * Referencing concepts through simple xpaths (affects DiGIR).
   * Allowing concepts from different conceptual schemas to be present in the same request by prefixing the path (affects mainly BioCASE).

   * Use the same InventoryRequest and Response: (see DatasourceInventoryProposalOne)
   * Allowing filters (affects BioCASE).
   * Allowing counts, both partial and total (affects BioCASE).
   * Using the name "inventory" for this type of request (affects BioCASE).
   * Allowing more than one concept in inventory/scan requests (affects both).
   * Optionally request sorting of result list. Order by sequence of requested concepts if multiple.

   * Eliminate DifferencesWithOperators by:
   * adding a new unary logical operator called "not" (affects DiGIR).
   * adding a new unary comparative operator called "isNull" (affects DiGIR).
   * dropping the operators "orNot", "andNot" and "notEquals" (affects both).
   * dropping the operator "isNotNull" (affects BioCASE).

   * Additional enhacements to operators:
   * allow comparative operators to compare any 2 expressions, which can be made of literals (values) or concepts. This would allow to compare 2 concepts also instead of the current restriction to compare a concept with a literal only.
   * allow functions.
   * change maxOccurs of LOPs to unbounded (affects both).

   * Use the same way to call providers:
   * Using a single POST or GET parameter called "request" which contains either the XML message or an URL pointing to the XML message.

   * Changes in filters:
   * allow search requests without filters (affects both).
   * evaluate to false if concept is not mapped but compared.
   * always insert info/warn diagnostic if unmapped concept encountered in request.

   * Use the same error handling strategy:
   * use common Error Codes and prefixes for classification of codes.

   * How should we deal with NULL values when producing responses? NULL is non existing informaion and the element containing a NULL value should therefore be dropped and not be part of the response.

   * Use the same SearchRequest by:
   * ...