IBM Reference Architecture for Genomics: Speed, Scale, Smarts

An IBM Redpaper publication

Published 20 May 2015

cover image

IBM Form #: REDP-5210-00
(24 pages)

More options

Rate and comment

Authors: Frank Lee, Ph.D.


Genomic medicine promises to revolutionize biomedical research and clinical care. By investigating the human genome in the context of biological pathways, drug interaction, and environmental factors, it is now possible for genomic scientists and clinicians to identify individuals at risk of disease, provide early diagnoses based on biomarkers, and recommend effective treatments.

However, the field of genomics has been caught in a flood of data as huge amounts of information are generated by next-generation sequencers and rapidly evolving analytical platforms such as high-performance computing clusters.

This data must be quickly stored, analyzed, shared, and archived, but many genome, cancer and medical research institutions and pharmaceutical companies are now generating so much data that it can no longer be timely processed, properly stored or even transmitted over regular communication lines. Often they resort to disk drive and shipping companies to transfer raw data to external computing center for processing and storage, creating an obstacle for speedy access and analysis of data.

In addition to scale and speed, it is also important for all the genomics information to be linked based on data models and taxonomies, and to be annotated with machine or human knowledge. This smart data can then be factored into the equation when dealing with genomic, clinical, and environmental data, and be made available to a common analytical platform.

To address the challenging needs for speed, scale, and smarts for genomic medicine, an IBM® end-to-end reference architecture has been created that defines the most critical capabilities for genomics computing: Data management (Datahub), workload orchestration (Orchestrator), and enterprise access (AppCenter).

The IBM Reference Architecture for genomics can be deployed with various infrastructure and informatics technologies. IBM has also been working with a growing ecosystem of customers and partners to enrich the portfolio of solutions and products that can be mapped into the architecture.

This IBM Redpaper™ publication describes the following topics:

  • Overview of IBM Reference Architecture for Genomics
  • Datahub for data management[
  • Orchestrator for workload management[
  • AppCenter for managing user interface[

This paper is targeted toward technical professionals (scientists, consultants, IT architects, and IT specialists) responsible for creating and providing life sciences solutions.

Table of contents

Overview of IBM Reference Architecture for Genomics
Datahub for data management
Orchestrator for workload management
AppCenter for managing user interface

Follow IBM Redbooks

Follow IBM Redbooks