An advanced IT platform for molecular Biodiversity studies
One of the main goals of our project is to design and set-up an advanced informatic platform to make available to researchers of our team a large number of data and tools useful for their researchers concerning molecular Biodiversity. To this end we designed a service oriented platform that will provide, both through the web services standard and the REST paradigm, a number of basic and advanced tools concerning:
• Data integration and mining: a number of procedures and components have been designed to implement an integrated data and knowledge base to provide our researches with the most current, complete and useful repository from which extract information valuable for our research areas. It covers both the Biodiversity and the clinical domains.
• Analytics: Text mining and, in general, tools to extract organized information from unstructured information if a fundamental need in all modern disciplines where huge amount of data are available. We are developing new analytics algorithms to extract organized information from unstructured sources (in our case both text and images).
• Semantics: through semantic approaches we are trying to label (e.g. by adopting the LSID standard), catalogue and systematize all pieces of information in our integrated knowledge base.
• Semantic search: In order to extract efficiently information and knowledge form a so huge source as our integrated data base, an effective search engine is a component that cannot be renounced. We are currently designing an advanced query engine that will allow the users to search our repositories by applying complex search criteria involving entities and relations among them.
• Bioinformatic Algorithms: A number of publicly available bioinformatic algorithms and a number of new ones developed by our teams have been integrated within the platform and made available to our researchers.
All these components, appropriately aggregated and integrated, are concurring to build advanced workplaces, applications and solutions focused to accomplish specific research tasks.
The following picture shows an high level component view of the bioinformatic platform for Biodiversity we are setting-up.
An integrated repository for molecular Biodiversity studies
The core component of the MBLab IT platform is the integrated repository in which data and information concerning all studies MBLab is carrying out are aggregated. This repository is conceived as the container in which both data produced within our labs and public data useful for Biodiversity studies fall in. So, it comes up by our effort to provide our researchers with data and knowledge bases where relevant information are physically (data warehousing) or virtually (data federation) integrated, so that it can plays the role of the main information source for our research activities.
The entity model of this integrated repository is the result of our studies of the state of the art in terms of data representation in the Biodiversity and Bioinformatics domains, and in particular it benefits from results achieved in various projects such as Chado, Sequence Ontology, BOLD and others. It has been implemented physically on a relational schema that enables optimized data retrieval. A high level view of the entity model is shown in the picture below.
This data model is supported by a number of ontologies, both public (e.g. Gene Ontology, Sequence Ontologies, etc.) and private, that reduce the ambiguity typical of the clinical and bioinformatic domains as well enhance the precision in characterizing concepts and relations.