Translational bioinformaticsTranslational bioinformatics (TBI) is an emerging field in the study of health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.
HistoryTranslational bioinformatics is a relatively young field within translational research. Google trends indicate the use of "bioinformatics" has decreased since the mid 1990s when it was suggested as a transformative approach to biomedical research. It was coined, however, close to ten years earlier. TBI was then presented as means to facilitate data organization, accessibility and improved interpretation of the available biomedical research. It was considered a decision support tool that could integrate biomedical information into decision-making processes that otherwise would have been omitted due to the nature of human memory and thinking patterns.
Initially, the focus of TBI was on ontology and vocabulary designs for searching the mass data stores. However, this attempt was largely unsuccessful as preliminary attempts for automation resulted in misinformation. TBI needed to develop a baseline for cross-referencing data with higher order algorithms in order to link data, structures and functions in networks. This went hand in hand with a focus on developing curriculum for graduate level programs and capitalization for funding on the growing public acknowledgement of the potential opportunity in TBI.
When the first draft of the human genome was completed in the early 2000s, TBI continued to grow and demonstrate prominence as a means to bridge biological findings with clinical informatics, impacting the opportunities for both industries of biology and healthcare. Expression profiling, text mining for trends analysis, population-based data mining providing biomedical insights, and ontology development has been explored, defined and established as important contributions to TBI. Achievements of the field that have been used for knowledge discovery include linking clinical records to genomics data, linking drugs with ancestry, whole genome sequencing for a group with a common disease, and semantics in literature mining. There has been discussion of cooperative efforts to create cross-jurisdictional strategies for TBI, particularly in Europe. The past decade has also seen the development of personalized medicine and data sharing in pharmacogenomics. These accomplishments have solidified public interest, generated funds for investment in training and further curriculum development, increased demand for skilled personnel in the field and pushed ongoing TBI research and development.
Benefits and opportunitiesAt present, TBI research spans multiple disciplines; however, the application of TBI in clinical settings remains limited. Currently, it is partially deployed in drug development, regulatory review, and clinical medicine. The opportunity for application of TBI is much broader as increasingly medical journals are mentioning the term "informatics" and discussing bioinformatics related topics. TBI research draws on four main areas of discourse: clinical genomics, genomic medicine, pharmacogenomics, and genetic epidemiology. There are increasing numbers of conferences and forums focused on TBI to create opportunities for knowledge sharing and field development. General topics that appear in recent conferences include: (1) personal genomics and genomic infrastructure, (2) drug and gene research for adverse events, interactions and repurposing of drugs, (3) biomarkers and phenotype representation, (4) sequencing, science and systems medicine, (5) computational and analytical methodologies for TBI, and (6) application of bridging genetic research and clinical practice.
With the help of bioinformaticians, biologists are able to analyze complex data, set up websites for experimental measurements, facilitate sharing of the measurements, and correlate findings to clinical outcomes. Translational bioinformaticians studying a particular disease would have more sample data regarding a given disease than an individual biologist studying the disease alone.
Since the completion of the human genome, new projects are now attempting to systematically analyze all the gene alterations in a disease like cancer rather than focusing on a few genes at a time. In the future, large-scale data will be integrated from different sources in order to extract functional information. The availability of a large number of human genomes will allow for statistical mining of their relation to lifestyles, drug interactions, and other factors. Translational bioinformatics is therefore transforming the search for disease genes and is becoming a crucial component of other areas of medical research including pharmacogenomics.
In a study evaluating the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis of genomic medicine, cloud-based analysis had similar cost and performance in comparison to a local computational cluster. This suggests that cloud-computing technologies might be a valuable and economical technology for facilitating large-scale translational research in genomic medicine.
StorageVast amounts of bioinformatical data are currently available and continue to increase. For instance, the GenBank database, funded by the National Institute of Health (NHI), currently holds 82 billion nucleotides in 78 million sequences coding for 270,000 species. The equivalent of GenBank for gene expression microarrays, known as the Gene Expression Omnibus (GEO), has over 183,000 samples from 7,200 experiments and this number doubles or triples each year. The European Bioinformatics Institute (EBI) has a similar database called ArrayExpress which has over 100 000 samples from over 3,000 experiments. All together, TBI has access to more than a quarter million microarray samples at present.
To extract relevant data from large data sets, TBI employs various methods such as data consolidation, data federation, and data warehousing. In the data consolidation approach, data is extracted from various sources and centralized in a single database. This approach enables standardization of heterogeneous data and helps address issues in interoperability and compatibility among data sets. However, proponents of this method often encounter difficulties in updating their databases as it is based on a single data model. In contrast, the data federation approach links databases together and extracts data on a regular basis, then combines the data for queries. The benefit of this approach is that it enables the user to access real-time data on a single portal. However, the limitation of this is that data collected may not always be synchronized as it is derived from multiple sources. Data warehousing provides a single unified platform for data curation. Data warehousing ingrates data from multiple sources into a common format, and is typically used in bioscience exclusively for decision support purposes.
AnalyticsAnalytic techniques serve to translate biological data using high-throughput techniques into clinically relevant information. Currently, numerous software and methodologies for querying data exist, and this number continues to grow as more studies are conducted and published in bioinformatics journals such as Genome Biology, BMC Bioinformatics, BMC Genomics, and Bioinformatics. To ascertain the best analytical technique, tools such as Weka have been created to cipher through the array of software’s and select the most appropriate technique abstracting away the need to know a specific methodology.
IntegrationData integration involves developing methods that use biological information for the clinical setting. Integrating data empowers clinician’s with tools for data access, knowledge discovery, and decision support. Data integration serves to utilize the wealth of information available in bioinformatics to improve patient health and safety. An example of data integration is the use of decision support systems (DSS) based on translational bioinformatics. DSS used in this regard identify correlations in patient electronic medical records (EMR) and other clinical information systems to assist clinicians in their diagnoses.
Companies are now able to provide whole human genome sequencing and analysis as a simple outsourced service. Second- and third-generation versions of sequencing systems are planned to increase the amount of genomes per day, per instrument, to 80. According to the CEO of Complete Genomics Cliff Reid, the total market for whole human genome sequencing around the world has increased five-fold during 2009 and 2010, and was estimated to be 15,000 genomes for 2011. Furthermore, if the price were to fall to $1,000 per genome, he maintained that the company would still be able to make a profit. The company is also working on process improvements to bring down the internal cost to around $100 per genome, excluding sample-prep and labor costs.
According to the National Human Genome Research Institute (NHGRI), the costs to sequence the entire genome has significantly decreased from over $95 million in 2001 to $7,666 in January 2012. Similarly, the cost of determining one megabase (a million bases) has also decreased from over $5,000 in 2001 to $0.09 in 2012. In 2008, sequencing centers transitioned from Sanger-based (dideoxy chain termination sequencing) to 'second generation' (or 'next-generation') DNA sequencing technologies. This caused a significant drop in sequencing costs.
Future directionsTBI has the potential to play a significant role in medicine; however, many challenges still remain. The overarching goal for TBI is to "develop informatics approaches for linking across traditionally disparate data and knowledge sources enabling both the generation and testing of new hypotheses". Current applications of TBI face challenges due to a lack of standards resulting in diverse data collection methodologies. Furthermore, analytic and storage capabilities are hindered due to large volumes of data present in current research. This problem is projected to increase with personal genomics as it will create an even greater accumulation of data.
Challenges also exist in the research of drugs and biomarkers, genomic medicine, protein design metagenomics, infectious disease discovery, data curation, literature mining, and workflow development. Continued belief in the opportunity and benefits of TBI justifies further funding for infrastructure, intellectual property protection and accessibility policies.
Available funding for TBI in the past decade has increased. The demand for translational bioinformatics research is in part due to the growth in numerous areas of bioinformatics and health informatics and in part due to the popular support of projects like the Human Genome Project. This growth and influx of funding has enabled the industry to produce assets such as a repository of gene expression data and genomic scale data while also making progress towards the concept of creating a $1000 genome and completing the Human Genome Project. It is believed by some that TBI will cause a cultural shift in the way scientific and clinical information are processed within the pharmaceutical industry, regulatory agencies, and clinical practice. It is also seen as a means to shift clinical trial designs away from case studies and towards EMR analysis.
Leaders in the field have presented numerous predictions with regards to the direction TBI is, and should take. A collection of predictions is as follows: #Lesko (2012) states that strategy must occur in the European Union to bridge the gap between academic and industry in the following ways – directly quoted: ##Validate and publish informatics data and technology models to accepted standards in order to facilitate adoption, ##Transform electronic health records to make them more accessible and interoperable, ##Encourage information sharing, engage regulatory agencies, and ##Encourage increasing financial support to grow and develop TBI #Altman (2011), at the 2011 AMIA Summit on TBI, predicts that: ##Cloud computing will contribute to major biomedical discovery. ##Informatics applications to stem cell science will increase ##Immune genomics will emerge as powerful data ##Flow cytometry informatics will grow ##Molecular & expression data will combine for drug repurposing ##Exome sequencing will persist longer than expected Progress in interpreting non-coding DNA variations #Sarkar, Butte, Lussier, Tarczy-Hornoch & Ohno-Machado (2011) state that the future of TBI must establish a way to manage the large amount of available data and look to integrate findings from projects such as the eMERGE (Electronic Medical Records and Genomics) project funded by NIH, the Personal Genome Project, the Exome Project, the Million Veteran Program and the 1000 Genomes Project.
"In an information-rich world, the wealth of information means a dearth of something else—a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that it might consume." (Herbert Simon,1971).
Associations, conferences and journalsBelow is a list of existing associations, conferences and journals that are specific to TBI. By no means is this an all-inclusive list, and should be developed as others are discovered.