head 1.3; access; symbols; locks; strict; comment @# @; 1.3 date 2007.02.27.23.54.39; author RenatoDeGiovanni; state Exp; branches; next 1.2; 1.2 date 2007.02.15.12.51.12; author JoseCuadra; state Exp; branches; next 1.1; 1.1 date 2007.02.15.11.34.59; author JoseCuadra; state Exp; branches; next ; desc @none @ 1.3 log @none @ text @%META:TOPICINFO{author="RenatoDeGiovanni" date="1172620479" format="1.1" version="1.3"}% %META:TOPICPARENT{name="TapirWorkshop2007"}% ---++ TAPIR Harvester Proposal ---+++ Proposed class diagram for harvester * ClassDiagram1.png:
ClassDiagram1.png ---+++ Explanation of classes * RequestManager: * Load LocalConfiguration in RequestManager * Obtains Provider access points (URLs) ProviderLookup * Create instance of ThreadPool with as many general Threads as specified by local configuration * Loop through Providers * Perform MetaDataRequest and store response in Provider * Harvester Only – If date of Last Update is not after the last time Harvester gathered information from this access point, then Continue to next Provider * Harvester Only – If GMT is not equal to (in their indexingPreferences) Provider startTime ± harvestStartTimeOffset (also consider frequency in indexingPreferences), then Continue to next Provider * Perform CapabilitiesRequest and store response in Provider * According to Provider Capabilities and requestType, get KVP for URL from RequestConfigutation * Perform SearchRequest * Check SearchSummary: While (Start index+Page Size) < totalMatched, repeat SearchRequest on next page and Send to ResponseManager in a thread-safe environment * Send response to ResponseManager in a thread-safe environment * PerformRequest(String requestType, String url, int pageNum, int pageSize) * Get request URL for specified requestType (MetaData, Capabilities, Search or Inventory) * Adjust KVP values inURL for paging according to pageNum and pageSize (requestType Search only) * Create instance of CommunicationProcess with URL * Check for availableThreads in the ThreadPool * If Available: Add this CommunicationProcess to the ThreadPool * Otherwise: Create new Thread outside of ThreadPool and execute * Returns XML response * ProviderLookup: * Returns Provider access points (URLs) from flat file or from central UDDI * Provider: * Stores data relevant to each individual AccessPoint * RequestConfiguration (capabilities): * According to capabilities, retrieve and return KVP from configuration file (yet to be defined) * Request (Configuration): * Return URL from specified function * DigirRequest: * Extends general Request for functions specific to DiGIR * TapirRequest: * Extends general Request for functions specific to TAPIR * Return URL of KVP parameters from specified function (MetaData, Capabilities, Search or Inventory) according to Configuration * MetaDataRequest(String accessPoint): * If MetaData for this particular accessPoint is stored in local cache and is not expired, return. Otherwise, perform MetaDataRequest * CapabilitiesRequest(String accessPoint): * If Capabilities for this particular accessPoint is stored in local cache and is not expired, return. Otherwise, perform CapabilitiesRequest * searchRequest(String accessPoint) * inventoryRequest(String accessPoint) * BioCaseRequest: * Extends general Request for functions specific to BioCASE * ThreadPool: * Initialized to a certain number of threads according to local configuration * ResponseManager: * Handles response according to local configuration * Portals could display result with XSLT * Harvesters could cache data in local DB * Create RSS feed * Add data to OAI environment * UnMarshall response and create specific Response type * Response: * DigirResponse * TapirResponse * BioCaseResponse ---+++ Proposed execution flow * Load LocalConfiguration * Create RequestManager, Initialize ThreadPool with runnable objects (HTTPTheads) based on local configuration * RequestManager does ProviderLookup * (Harvester Only - RequestManager performs MetadataRequest to get date of last update) * Loop through providers, performs CapabilitiesRequest, or load capabilities from local cache * (Harvester Only – if there have been no changes since last harvest, move to next provider) * See steps 7-10 * Create specific request URL and configure according to capabilities using RequestConfiguration * RequestManager adjusts KVP parameters according to paging size and page number * RequestManager passes URL to an HTTPThread in the ThreadPool * HTTPThread connects to the provider * HTTPThread gets back XML and initializes the Response object * Response is passed to the ResponseManager in a thread-safe environment * Repeat steps 7-11 until all pages have been accessed according to local specifications * ResponseManager deals with the response according to local specifications (Harvester, Portal, …) * Process response with XSLT * Cache data * Create RSS feed * Add to OAI environment %META:FILEATTACHMENT{name="ClassDiagram1.png" attachment="ClassDiagram1.png" attr="" comment="" date="1171537240" path="ClassDiagram1.png" size="12559" stream="ClassDiagram1.png" user="Main.JoseCuadra" version="1"}% @ 1.2 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="JoseCuadra" date="1171543872" format="1.1" reprev="1.2" version="1.2"}% d13 67 a79 73 * RequestManager: * Load LocalConfiguration in RequestManager * Obtains Provider access points (URLs) ProviderLookup * Create instance of ThreadPool with as many general Threads as specified by local configuration * Loop through Providers * Perform MetaDataRequest and store response in Provider * Harvester Only – If date of Last Update is not after the last time Harvester gathered information from this access point, then Continue to next Provider * Harvester Only – If GMT is not equal to (in their indexingPreferences) Provider startTime ± harvestStartTimeOffset (also consider frequency in indexingPreferences), then Continue to next Provider * Perform CapabilitiesRequest and store response in Provider * According to Provider Capabilities and requestType, get KVP for URL from RequestConfigutation * Perform SearchRequest * Check SearchSummary: While (Start index+Page Size) < totalMatched, repeat SearchRequest on next page and Send to ResponseManager in a thread-safe environment * Send response to ResponseManager in a thread-safe environment * PerformRequest(String requestType, String url, int pageNum, int pageSize) * Get request URL for specified requestType (MetaData, Capabilities, Search or Inventory) * Adjust KVP values inURL for paging according to pageNum and pageSize (requestType Search only) * Create instance of CommunicationProcess with URL * Check for availableThreads in the ThreadPool * If Available: Add this CommunicationProcess to the ThreadPool * Otherwise: Create new Thread outside of ThreadPool and execute * Returns XML response * ProviderLookup: * Returns Provider access points (URLs) from flat file or from central UDDI * Provider: * Stores data relevant to each individual AccessPoint * RequestConfiguration (capabilities): * According to capabilities, retrieve and return KVP from configuration file (yet to be defined) * Request (Configuration): * Return URL from specified function * DigirRequest: * Extends general Request for functions specific to DiGIR * TapirRequest: * Extends general Request for functions specific to TAPIR * Return URL of KVP parameters from specified function (MetaData, Capabilities, Search or Inventory) according to Configuration * MetaDataRequest(String accessPoint): * If MetaData for this particular accessPoint is stored in local cache and is not expired, return. Otherwise, perform MetaDataRequest * CapabilitiesRequest(String accessPoint): * If Capabilities for this particular accessPoint is stored in local cache and is not expired, return. Otherwise, perform CapabilitiesRequest * searchRequest(String accessPoint) * inventoryRequest(String accessPoint) BioCaseRequest: * Extends general Request for functions specific to BioCASE ThreadPool: * Initialized to a certain number of threads according to local configuration ResponseManager: * Handles response according to local configuration * Portals could display result with XSLT * Harvesters could cache data in local DB * Create RSS feed * Add data to OAI environment * UnMarshall response and create specific Response type Response: DigirResponse: TapirResponse: BioCaseResponse: a91 1 a108 1 @ 1.1 log @none @ text @d1 1 a1 1 %META:TOPICINFO{author="JoseCuadra" date="1171539299" format="1.1" reprev="1.1" version="1.1"}% d89 1 a89 1 * Load LocalConfiguration d91 1 a91 1 * Create RequestManager, Initialize ThreadPool with runnable objects (HTTPTheads) based on local configuration d93 1 a93 1 * RequestManager does ProviderLookup d95 1 a95 1 * (Harvester Only - RequestManager performs MetadataRequest to get date of last update) d97 1 a97 1 * Loop through providers, performs CapabilitiesRequest, or load capabilities from local cache * (Harvester Only – if there have been no changes since last harvest, move to next provider) d99 1 a99 1 * See steps 7-10 d101 1 a101 1 * Create specific request URL and configure according to capabilities using RequestConfiguration d103 1 a103 1 * RequestManager adjusts KVP parameters according to paging size and page number d105 1 a105 1 * RequestManager passes URL to an HTTPThread in the ThreadPool d107 1 a107 1 * HTTPThread connects to the provider d109 1 a109 1 * HTTPThread gets back XML and initializes the Response object d111 1 a111 1 * Response is passed to the ResponseManager in a thread-safe environment d113 1 a113 1 * Repeat steps 7-11 until all pages have been accessed according to local specifications d115 1 a115 1 * ResponseManager deals with the response according to local specifications (Harvester, Portal, …) d117 4 a120 4 * Process response with XSLT * Cache data * Create RSS feed * Add to OAI environment @