45 lines
9.7 KiB
Plaintext
45 lines
9.7 KiB
Plaintext
%META:TOPICINFO{author="MatthewDougherty" date="1229051575" format="1.1" reprev="1.1" version="1.1"}%
|
|
---++ *Communiqué to theTaxonomic Data Working Group-imaging group wiki, Global Biodiversity Information Facility*
|
|
|
|
The *Consortium for the Management of Experimental Data in Structural Biology (MEDSBIO)* met at Brookhaven National Laboratory on May 22, 2008 for a workshop on Raw Image Formats in Structural Biology.
|
|
|
|
Represented were individuals from the x-ray, electron, and optical communities who discussed their unique impediments and approaches to their scientific imagery. It was noted that each community has independent imaging problems and solutions resulting in a multitude of incompatible image formats. It was also recognized that there is a significant opportunity to assert shared interests and coordinate strategies leading these groups to a common scientific image framework.
|
|
|
|
The universal concerns are (1) the unmanageable increase of pixels due to improving digital acquisition resolutions, faster acquisition speeds, and expanding user expectations, (2) the inflexibility of current image file formats to adapt to future imaging modalities, and (3) the inability to integrate all of these multi-modal pixels into a standardized archive-ready scientific image model (which does not exist).
|
|
|
|
Pixel datasets are becoming extremely large and consequently more difficult for researchers to interact with because of outdated file format designs that perform poorly in regards to computation and storage. Although homogeneous pixels nearly constitute the entire image, the heterogeneous community-specific metadata is also increasing because of the requirement to better define and incorporate supplemental observations throughout the research pipeline. The rapid evolution of scientific instrumentation is resulting in a constellation of imaging modalities offering new and exciting methods to visualize biological processes; unfortunately existing image formats do not have extensible designs to assimilate new types of imagery, resulting in the recurrent practice to create new incompatible image formats to acquire these new pixel modalities and metadata. What is commonly absent is an n-dimensional scientific pixel model supported by a high-performance computational storage tool having long-term maintenance and stability; the lack of which is giving rise to images that are becoming increasingly impenetrable to visualization and analysis. As instrumentation improves and data rates increase, there is a tension between the need for high efficiency in upstream data collection rates and the computational demands of the data formats best suited to long term data management, and considerable doubt that a "one-size-fits-all" single format approach is workable, requiring consideration of data frameworks with multiple interoperable formats, rather than single rigid formats.
|
|
|
|
This conundrum is further compounded in that the downstream communities such as visualization software developers and repositories have limited or no involvement in the design of these image formats. Further, it is these downstream communities who frequently bear the burden of long-term image management and who attest to the veracity and the provenance of the imagery. There is a critical need to engage the perspectives of the downstream users, such as biologists, archivists, computer, visualization, and library scientists, in the design requirements and development of this scientific image model. It is essential for complex multi-modal integration that pixel and metadata frameworks created by acquisition and end-user communities are at least minimally coordinated so as to avoid data space collisions and allow for methods of provenance.
|
|
|
|
In a related issue, the relevance of the DICOM clinical imaging format must be understood and strategically dealt with. Although this is less of a concern in the beamline communities because of their limited involvement with clinical imagery, DICOM’s current plans include the imaging modalities of electron and optical microscopy, which raises questions about interoperability between the clinical and non-clinical biomedical research communities.
|
|
|
|
An important coincidence noted at the meeting is that each MEDSBIO community is independently prototyping image formats using the Hierarchical Data Format (HDF). Developed by the National Center for Supercomputer Applications, HDF is a data storage standard that currently holds the vast majority of atmospheric, oceanic, climate and earth observation imagery. Establishing a common image strategy using HDF presents a significant opportunity to coordinate many scientific communities’ imaging designs, resulting in a unified, scalable, high-performance, multi-modal, n-dimensional, open source specification capable of encapsulating community specific metadata in concert with a common pixel model.
|
|
|
|
Also discussed is the vexing issue of _“social inertia preventing adoption”_, attributed to the ubiquity and variety of image formats that are routinely accessed by the wide range of existing research software and commercial image packages. It is nearly impossible to convince researchers to abandon their operational data formats, at the same time persuade software developers to invest time coding for new hypothetical image formats with uncertain futures. A new and advanced image format must be able to simultaneously present images in the old familiar file formats requiring no change to existing application software, and also present the images through a scalable image model API for software developers, researchers and vendors to utilize when they are ready to make the necessary technical transitions to the advanced high performance image design.
|
|
|
|
Presented at the MEDSBIO workshop was a prototype described as HDF-VFS. When mounting an HDF file as a virtual file system, a hyperspace is created between (1) the operating system process space that manages mass storage, and (2) the user-space applications that are requesting access to the image files on mass storage. This hyperspace enables the intriguing possibilities of systemically intercepting, re-organizing and transparently converting images through community-specific modules. This allows for two crucial facilities: the ability to efficiently manage pixels in a variety of image formats through a common pixel model, and the ability to transparently present these images in a variety of formats and in a manner that all existing software applications could immediately utilize.
|
|
|
|
It is a typical research practice of using folder and file names to routinely rely on the computer’s hierarchical file system as a means to manage and organize image files. This familiar organizational framework can provide subtle cues for guiding HDF-VFS in transparently converting and archiving imagery into a common image model. Additionally these community-specific modules within the hyperspace could also automatically assign LSIDs to JPEG images as they are being copied into HDF-VFS; or based on the community-specific metadata and folder hierarchies, the creation of images could automatically generate NISO Metadata for Images in XML Schemas. This methodology can also become a vehicle to operationally implement the pending JPEG2000 part 10 (JP3D, lossless spatial standard) in a coherent and reliable manner; or coordinate MPEG4 imagery with MPEG7 metadata.
|
|
|
|
The enabling issues of funding and implementation should lead to the development of an open standard for the entire scientific community, suitable for bioinformatics. A frequent attitude is that this is a technical issue to be resolved independently by sub-communities’ researchers and vendors, and that it will somehow evolve into a generic standard for the scientific end users and archives. Unlike other standards activities such as digital video MPEG and clinical imagery DICOM, whose underlying constituency mandated that vendors must fund and produce such specifications for reasons of economics and reliability; the overall scientific community is hard pressed to find vendors who see it as their core business mission to take on the whole problem at the trench level. The tactical difficulty is that the image pipelines created by the scientific communities are vast, parochial, and loosely organized; and as such the scientific imaging communities are unable to assert the demand on vendors to form and finance such a wide consensus. Because the implications are so profound and strategic, failure to establish a scalable n-dimensional scientific image standard in a manner that results in efficient, interoperable, and open public specifications, promotes a less than optimal research environment and a less certain future capability for data repositories and researchers.
|
|
|
|
In summary, the advanced image acquisition communities are developing new image modalities, at the same time are accumulating and disseminating prodigious amounts of biological images. Consequently, new coherent image format designs are needed and are being independently developed for reasons of performance, interoperability and archiving. It is a rare opportunity to coordinate the various scientific image acquisition formats into an advanced image standard; at the same time obtain a wider set of requirements defined by downstream communities. It behooves the acquisition community to solicit cooperation and guidance outside its own ranks, for it is the downstream communities who will be the ultimate benefactors of the pixels and evolving metadata.
|
|
|
|
Matthew Dougherty [[mailto:matthewd@bcm.edu]\
|
|
|
|
|
|
Herbert Bernstein [[mailto:yaya@dowling.edu]\
|
|
|
|
|
|
Kevin Elicieri [[mailto:eliceiri@wisc.edu]\
|
|
|
|
|
|
The MEDSBIO meeting minutes can be located at:
|
|
WebHome, http://www.medsbio.org/meetings/BNL_May08_imgCIF_Workshop_Report.html.
|
|
|
|
|
|
|
|
|
|
|
|
-- Main.MatthewDougherty - 12 Dec 2008
|