SlideShare una empresa de Scribd logo
1 de 6
• Cognizant 20-20 Insights




Better Target Identification and
Validation Through an Integrative
Analysis of Biological Data
   Executive Summary                                      poses a peculiar challenge. In these fields, rela-
                                                          tionships among data sets are rarely simple and
   Pharmaceutical target identification and valida-
                                                          often are not apparent without deeper investi-
   tion today is an exercise in complex data mining.
                                                          gation. So geologists study aerial photography,
   The amount, breadth and depth of biological data
                                                          satellite images, rock analysis and seismographic
   available for such mining is increasing exponen-
                                                          data to attempt to locate oil basins; meteorolo-
   tially, signaling both opportunities and challenges
                                                          gists examine ocean currents and surface tem-
   for the biopharma industry.
                                                          peratures, barometric pressure, polar ice cover
   More data should lead to more insights and better      and more to predict climate conditions.
   decisions. However, the sheer volume of available
                                                          The drug discovery process similarly requires
   data is overwhelming. Further, biological data
                                                          assimilation and analysis of seemingly discon-
   findings must be considered in the context in which
                                                          nected data sources with the goal of gaining
   they were discovered and in light of their inter-
                                                          insights for forming a hypothesis to validate
   actions and/or dependencies on other data sets
                                                          through experimentation.
   and conditions. Integrating a wide variety of data
   sets with such understanding of their contexts is      In the initial steps of drug discovery, comprised
   a major logistical hurdle for biologists.              of target identification and validation, knowledge
                                                          of disease biology is crucial for picking the right
   To fully capitalize on the rich biological data
                                                          targets. Arguably the biggest breakthrough
   sources available today, scientists require a tech-
                                                          to that end was the completion of the human
   nological platform that eases data integration and
                                                          genome sequence in 2000.
   comparison across diverse types and sets of data
   regardless of their sources. The complexity of the     However, the genome sequence is static; it does
   technology supporting this platform should be          not reveal the dynamic role of the targets in a
   hidden while it offers an easy, streamlined means      variety of cellular circumstances. Today, this
   of interpreting results.                               data is available through genomics technologies
                                                          like microarrays, which can be used to measure
   Dynamic Context and Meaningful
                                                          thousands of mRNAs or DNA or proteins at
   Biological Insights                                    the same time. Now scientists have data at a
   In predictive fields such as oil exploration, compu-   molecular level along multiple cellular/molecular
   tational finance, or climatology, data abundance       dimensions that can help them understand the




   cognizant 20-20 insights | august 2011
dynamic roles of targets in normal and diseased            millions of data points with varied meanings.
          biological processes.                                      Each of these data types is distinct in terms
                                                                     of its format, analysis and interpretation. For
          The data volumes in public “omics”* data reposi-           example, the normalization of CGH data needs
          tories, like the Gene Expression Omnibus (GEO),            to be handled very differently from that of
          have grown exponentially in the decade since               gene expression data.
          the human genome was sequenced. A snapshot
          of the type of data sets in GEO reveals that           •   Context as critical as data. The context of
          omics data sets can be of varying types and that           the experimental data is extremely important
          microarray technology can be used to generate              in the biological world. To generate quality
          functionally diverse data sets.                            inferences and hypotheses, scientists need a
                                                                     thorough understanding of how the experi-
                             Omics technological advances            ments were performed. This context must be
      Locked within          like gene expression microar-           preserved while integrating data for analysis.
these huge volumes           rays and aCGH1 offering deeper
                                                                 •   Multiple public and proprietary data sources.
     of data about a         revelations about siRNAs2 and
                                                                     A significant number of public repositories hold
                             miRNAs3 have enabled us to
    cell’s DNA, RNA          produce diverse and mammoth
                                                                     valuable information about the experimental
                                                                     findings of various facets of cell biology. In
   and proteins and          volumes of data. In fact millions
                                                                     addition, every biopharma organization has
   the relationships         of data points can be produced
                                                                     its own internal data sources and maintains
                             for a whole genome single run
    among them are           and aggregated to even larger
                                                                     specific research findings. The types of data
                                                                     and the sheer divergence of available data
the information and          quantities in a very short time.
                                                                     sources and the variations in data formats
   insights required         Concomitant with the explosion          among these entities make it a formidable
     to successfully         of data is the need for spe-            task for scientists to maneuver through the
                                                                     data maze. Each of the data sources captured
         identify and        cialized data skills. Target
                                                                     contains different levels of detail, so compari-
                             identification has become
 validate high-value         a data mining exercise. The             sons can be challenging.
therapeutic targets.         current challenge is how one        •   Lack of universally accepted standards.
                             can understand the dynamic              The lack of common standards for data sets
          context of the data, and analyze and interpret it          results in the uncontrolled proliferation of data
          to gain meaningful biological insights. It’s chal-         as each research group describes the context
          lenging for the lead biologist working on a target         and the results of experiments in its own way.
          to interpret and integrate the vast amount of              Though standards such as Minimum Informa-
          numeric data available in his decision-making              tion About Microarray Experiments (MIAME)
          process. This leads to underuse or improper use            have been developed, multiple variants of
          of high throughput data sets.                              MIAME have evolved to describe the complex
                                                                     biological microarray experiments. These dis-
          The Challenges of Mining Today’s                           parities mean scientists must manually look
          Trove of Biological Data                                   into multiple databases and correlate data
          Locked within these huge volumes of data about             among them to validate the observations — a
          a cell’s DNA, RNA and proteins and the rela-               time-consuming, tiresome task.
          tionships among them are the information and
          insights required to successfully identify and
                                                                 •   Need for complex visualizations. Visualiza-
                                                                     tion plays a key role in finding the patterns in
          validate high-value therapeutic targets. Yet the
                                                                     the underlying data. The visualization tools
          very nature of the data and how it is uncovered
                                                                     for omics data have evolved from simple
          create further challenges to using it effectively.
                                                                     heatmaps to networks overlaying omics data
          These challenges include:
                                                                     and are continuing to evolve into three and
          •   Multiple data types from multiple high-                four dimensions incorporating time series
              throughput     technologies. High-through-             information. Researchers use various tools
              put technologies have evolved to measure               like Spotfire, R packages, Matlab, Excel and
              different molecular entities in a cell, such as        Cytoscape for visualizations. Typically, no
              DNA, RNA, protein, methylation status, etc.            single tool provides all the types of visualiza-
              Each high-throughput experiment results in             tions required by an omics researcher. Often,




                                  cognizant 20-20 insights       2
researchers must develop custom visual-               hypothesis generation even as it makes those
    izations to address their specific research           hypotheses more relevant.
    questions.
                                                          We believe that an effective data integration
•   Fast-evolving domain technologies. The                solution enables productivity gains of at least
    microarray technology that revolutionized             20%. Some of the key solution components that
    biological data generation is now common,             will drive such productivity gains are:
    and several other high-throughput technolo-
    gies, like next-generation sequencing, are            •   Faster insight generation.
    evolving that further push the boundaries of              The platform should inte- We believe that
    data generation. As they do so, the challenge             grate various public data an effective data
    of effectively mining this data to establish              sources and have the ability
    useful biological insights becomes increasingly           to combine those with com-
                                                                                            integration solution
    critical to address.                                      mercial and private data. It enables productivity
                                                              should help propagate bio- gains of at least 20%.
•   Lack of a common platform. In many modern
    pharmaceutical      research     organizations,           logical insights using any
    integrated access to all the data needed to               of the common identifiers and be easy to use,
    advance a discovery project is often impossible           enabling scientists to more easily connect the
    to achieve because it is spread across a variety          data to reach insights.
    of databases and visualization and analysis           •   More revelation of disease mechanisms.
    tools. Thus, the scientists maintain their own            Integrating different types of data should
    personal copies of the data, typically in the             help researchers investigate and understand
    form of Excel spreadsheets, an inefficient                disease mechanisms more thoroughly,
    solution.                                                 which assists in building the target/disease
                                                              association.
•   Annotated integration. Integration in the
    usual sense is assimilating the parts into the        •   Maximize utilization of     Oncology has long
    whole. However, this traditional notion of inte-          high throughput data
    gration is not possible with biological data. It is       sets. By providing easy
                                                                                          been a challenging
    much harder to numerically integrate experi-              access to diverse high-     domain because of its
    mental values in such a way that the resulting            throughput data sets,       complex biology. Our
    number represents a biologically meaningful               including those avail-
    phenomenon.                                               able within a company’s
                                                                                          goal with Inventus is
    A more pragmatic approach is to integrate                 research groups and         to help analyze the
    the results from each experiment after                    systems, an integration     data and answer key
                                                              platform would maximize
    analyzing them separately. This approach
                                                              the return on invest-
                                                                                          questions around
    can be extended to any molecular data type.
    For example, tissue microarray data can be                ment on generating such     annotation for
    analyzed independently by the pathologist,                data sets.                  additional identifiers,
    and the resulting inference of over-expression        •   Platform-independent        gene ontology,
    or under-expression of the protein of interest            and extensible. The solu-
    can be easily compared or integrated with the
                                                                                          mutation, expression
                                                              tion should be domain
    over-expression or under-expression inference             platform-independent and    and pathway within
    obtained from an RNA microarray experiment.               be extended to open plat-   the solution through
                                                              forms like caBIG or any
Making Information and Insights                                                           data propagation.
                                                              proprietary platform.
Obvious: An Integrated Solution
Researchers and scientists clearly need an intel-         The Cognizant Approach: Inventus
ligent, intuitive solution to data collation and cor-     We are creating a prototype for a biological data
relation so they can spend the majority of their          integration platform using oncology as a thera-
time interpreting complex biological relationships        peutic area. Oncology has long been a challeng-
and their impact on target identification and             ing domain because of its complex biology. Our
validation. A single technology platform enabling         goal with Inventus is to help analyze the data
a workflow that lets scientists look at different         and answer key questions around annotation for
biological data types in context and to quickly           additional identifiers, gene ontology, mutation,
analyze their relationships would streamline              expression and pathway within the solution



                         cognizant 20-20 insights         3
Inventus Analysis Workflow


                                                                                   Validate by Wetlab
                                                                                      Experiments
                                                                                   Design & Conduct




                                                                       Study                            Identify Gene
                                                                     Pathway                              of Interest
                                                                     Commons                                 NCBI




                                                                                    Study Mutation
                                                                                     & Expression
                                                                                   COSMIC BioGPS




                                   Figure 1


                                   through data propagation. The typical analysis                            To demonstrate the utility of Inventus, we queried
                                   workflow is depicted in Figure 1.                                         for gene TP53, a well-studied tumor suppressor
                                                                                                             gene. The default EntrezGene plug-in displays
                                   As the next step, we investigated existing open                           the basic functional information about TP53. The
                                   integration platforms like Life Science Grid (LSG)                        mutation plug-in displays mutation data for TP53
                                   and cancer Biomedical Informatics Grid (caBIG).                           from COSMIC summarizing the mutations studied/
                                   We built Inventus using LSG because it provided                           identified in various cancer types (see Figure 2).
                                   the necessary minimal information technology                              The scientist is quickly able to understand that
                                   framework. We developed the following plug-ins                            TP53 is frequently mutated in a wide variety of
                                   for Inventus:                                                             cancers. The BioGPS plug-in displays the gene
                                                                                                             expression data from the Novartis gene expression
                                   •   Web services-based pathway data from
                                                                                                             atlas and shows TP53 to be under-expressed (see
                                       Pathway Commons.
                                                                                                             Figure 3). The above data quickly highlights to the
                • Access to mutation data from
ept:
Investigation
of
Gene
TP53 COSMIC.                                                                      scientist that TP53 could be involved in cancer.
                                   •   Web services-based gene annotation informa-
                                       tion from NCBI EntrezGene.                                            The pathway plug-in from Pathway Commons
tation
                                                                                                             displays all the pathways that TP53 participates
ws the general known
                                   •   Portal-based data access to BioGPS.org for
                                                                                                             (see Figure 4). One can observe that TP53 is
P53 like aliases, functional
                                       gene expression.
                                                                                                             involved in cell cycle pathways. Launching the
somal location etc
                                   •   Cytoscape for visualizing pathways.                                   pathway in Cytoscape allows the scientist to
                                                                                                             further understand the interacting partners
derstands that TP53 is a
                                   

                                   Mutation                       Gene Expression
                                     Proof
of
Concept:
Investigation
of
Gene
TP53
                                                  Step 3: Gene Expression
                                              •       Action
ws the mutations studied in                          •      Scientist views the gene expression data
ata sources like COSMIC.                             from public data sources like Novartis Gene
                                                     Atlas.

tices that TP53 is frequently                 •       Inference
cancer types.                                        •     Scientist notices that TP53 could be lowly
                                                     expressed in cancer tissues.
                                              •




                                                  Step 4: Pathways
                                   Figure 2 •         Action                                                 Figure 3
s                   Confidential
                                                     •      Scientist views the pathways that TP53 is
                                                     involved from public pathway data sources like
                                                     Pathway Commons.
                                                                 cognizant 20-20 insights                     4
                                              •       Inference
                                                     •      Scientist notices that TP53 is involved in
                                                     various cell cycle pathways.

        

           Integration
with
Other
Tools
        Integration
with
Other
Tools
    Pathway                                   Cytoscape View
                               View Pathways in Cytoscape
                            View Pathways in Cytoscape




    Scientist analyses the the pathwayscytoscape to understand the the
         Scientist analyses pathways inFigure 5
    Figure 4                              in cytoscape to understand
         interacting partners of TP53.
    interacting partners of TP53.
    of TP53 and use network analysis tools within                      Seizing Target Identification
    Cytoscape to better understand TP53’s role in                      Opportunities, Better Understanding
    pathways and disease (see Figure 5). In cases                      Disease Mechanism
    of a protein like TP53, pathway analysis plays
                                                                       We are committed to helping pharmaceutical
    an ©2010, Cognizant Technology in the discovery process
      | important roleTechnology Solutions
           | ©2010, Cognizant      Solutions                Confidential
                                                    Confidential
                                                                       companies overcome the challenges posed by
    because tumor suppressors have been difficult
                                                                       vast and growing quantities of biological data to
    to reactivate using external intervention. Such
                                                                       seize the opportunities contained within that data.
    proteins may not pose as targets themselves but
                                                                       Drawing on our deep biopharma expertise and
    the downstream or upstream interacting partners
                                                                       experience, we take the initiative in developing
    could be potential targets.
                                                                       solutions to industry issues.
    The above example provides a glimpse of how
                                                                 To learn more about our solutions for integrat-
    the solution is useful to discovery scientists, who
                                                                 ing biological data for true insight generation and
    analyze high-throughput data sets and develop
                                                                 target identification and validation, contact us at
    a list with genes of interest to prioritize for
                                                                 inquiry@cognizant.com.
    additional individual exploration.




    Footnotes
    *
        refers to the study of biological systems; includes genomics (DNA), proteomics (Proteins),
        metabolomics (Small Molecules) etc.
    1
        aCGH — array comparative genomic hybridization
    2
        siRNA — small inhibitory RNA
    3
        miRNA — micro RNA



    References
    NCBI GEO: archive for functional genomics data sets — 10 years on. Tanya Barrett et al Nucleic Acids
    Research Advance Access published November 21, 2010

    LSG Sourceforge: http://sourceforge.net/projects/lsg/

    The Cancer Biomedical Informatics Grid (caBIG Fenstermacher D, Street C, McSherry T, Nayak V, Overby
    C, Feldman M. . Conf Proc IEEE Eng Med Biol Soc. 2005;1:743.




                             cognizant 20-20 insights             5
cPath: open source software for collecting, storing and querying biological pathways. Cerami EG et al
BMC Bioinformatics. 2006 Nov 13;7:497.

COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in
human cancer. Forbes SA et al Nucleic Acids Res. 2010 Jan;38 (Database issue)

BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources.
Chunlei Wu et al. Genome Biology 2009 Nov; 10(11)

Cytoscape: a community-based framework for network modeling. Killcoyne S et al Methods Mol Biol.
2009;563:219-39.

http://www.mged.org/Workgroups/MIAME/miame_2.0.html




About the Authors
G.D Mahesh Kumar is a Consulting Manager in Cognizant’s Discovery Informatics Center of Excellence
located in Sweden. He has over seven years of experience working in drug discovery in the pharma and
bioinformatics industries. Mahesh’s core skills include bioinformatics, biological data management and
research projects management. He has presented at international conferences and currently manages
projects and client relationships with European pharma clients. Mahesh holds a master’s degree in bio-
technology from Goa University, Goa, and has been a Project Management Institute member since 2006.
He can be reached at Maheshkumar.g.d@cognizant.com.

Raghuraman Krishnamurthy is a Chief Architect in Cognizant’s Life Sciences Business Unit’s Technology
Consulting Group. Raghu’s core skills include enterprise architecture, data, SOA, mobility application
and convergence of technologies. He has worked with several major pharmaceuticals in envisioning and
leading transformational initiatives, has presented papers at numerous conferences and was recently
named a senior member of the prestigious Association of Computing Machinery (ACM). Raghu holds a
master’s degree from the Indian Institute of Technology, Mumbai, and is a TOGAF-certified enterprise
architect. He can be reached at Raghuraman.krishnamurthy2@cognizant.com.

Sowmyanarayan Srinivasan heads Cognizant’s Discovery Informatics Center of Excellence. He has spent
over a decade focusing on building business solutions across the spectrum of discovery informatics.
Sowmya has worked with leading biopharma organizations to design solutions and consult on their
transformation initiatives. He has also helped to establish the Bangalore/India chapter of EPPIC Global
(Enterprising Pharmaceutical Professionals from the Indian subcontinent) in early 2000. Sowmya holds
a bachelor’s degree in engineering and master’s degree in business administration. He can be reached
at Sowmyanarayan.srinivasan@cognizant.com.




About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-
sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in
Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry
and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50
delivery centers worldwide and approximately 118,000 employees as of June 30, 2011, Cognizant is a member of the
NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and
fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.



                                         World Headquarters                  European Headquarters                 India Operations Headquarters
                                         500 Frank W. Burr Blvd.             1 Kingdom Street                      #5/535, Old Mahabalipuram Road
                                         Teaneck, NJ 07666 USA               Paddington Central                    Okkiyam Pettai, Thoraipakkam
                                         Phone: +1 201 801 0233              London W2 6BD                         Chennai, 600 096 India
                                         Fax: +1 201 801 0243                Phone: +44 (0) 20 7297 7600           Phone: +91 (0) 44 4209 6000
                                         Toll Free: +1 888 937 3277          Fax: +44 (0) 20 7121 0102             Fax: +91 (0) 44 4209 6060
                                         Email: inquiry@cognizant.com        Email: infouk@cognizant.com           Email: inquiryindia@cognizant.com


© Copyright 2011, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

Más contenido relacionado

Más de Cognizant

Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition EngineeredCognizant
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...Cognizant
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesCognizant
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateCognizant
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...Cognizant
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Cognizant
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Cognizant
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityCognizant
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersCognizant
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalCognizant
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueCognizant
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachCognizant
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudCognizant
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedCognizant
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the FutureCognizant
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformCognizant
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...Cognizant
 
The Timeline of Next
The Timeline of NextThe Timeline of Next
The Timeline of NextCognizant
 
Realising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainRealising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainCognizant
 
The Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional ChessboardThe Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional ChessboardCognizant
 

Más de Cognizant (20)

Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition Engineered
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for Sustainability
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for Insurers
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to Value
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First Approach
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the Cloud
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the Future
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 
The Timeline of Next
The Timeline of NextThe Timeline of Next
The Timeline of Next
 
Realising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainRealising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value Chain
 
The Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional ChessboardThe Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Better Target Identification and Validation Through an Integrative Analysis of Biological Data

  • 1. • Cognizant 20-20 Insights Better Target Identification and Validation Through an Integrative Analysis of Biological Data Executive Summary poses a peculiar challenge. In these fields, rela- tionships among data sets are rarely simple and Pharmaceutical target identification and valida- often are not apparent without deeper investi- tion today is an exercise in complex data mining. gation. So geologists study aerial photography, The amount, breadth and depth of biological data satellite images, rock analysis and seismographic available for such mining is increasing exponen- data to attempt to locate oil basins; meteorolo- tially, signaling both opportunities and challenges gists examine ocean currents and surface tem- for the biopharma industry. peratures, barometric pressure, polar ice cover More data should lead to more insights and better and more to predict climate conditions. decisions. However, the sheer volume of available The drug discovery process similarly requires data is overwhelming. Further, biological data assimilation and analysis of seemingly discon- findings must be considered in the context in which nected data sources with the goal of gaining they were discovered and in light of their inter- insights for forming a hypothesis to validate actions and/or dependencies on other data sets through experimentation. and conditions. Integrating a wide variety of data sets with such understanding of their contexts is In the initial steps of drug discovery, comprised a major logistical hurdle for biologists. of target identification and validation, knowledge of disease biology is crucial for picking the right To fully capitalize on the rich biological data targets. Arguably the biggest breakthrough sources available today, scientists require a tech- to that end was the completion of the human nological platform that eases data integration and genome sequence in 2000. comparison across diverse types and sets of data regardless of their sources. The complexity of the However, the genome sequence is static; it does technology supporting this platform should be not reveal the dynamic role of the targets in a hidden while it offers an easy, streamlined means variety of cellular circumstances. Today, this of interpreting results. data is available through genomics technologies like microarrays, which can be used to measure Dynamic Context and Meaningful thousands of mRNAs or DNA or proteins at Biological Insights the same time. Now scientists have data at a In predictive fields such as oil exploration, compu- molecular level along multiple cellular/molecular tational finance, or climatology, data abundance dimensions that can help them understand the cognizant 20-20 insights | august 2011
  • 2. dynamic roles of targets in normal and diseased millions of data points with varied meanings. biological processes. Each of these data types is distinct in terms of its format, analysis and interpretation. For The data volumes in public “omics”* data reposi- example, the normalization of CGH data needs tories, like the Gene Expression Omnibus (GEO), to be handled very differently from that of have grown exponentially in the decade since gene expression data. the human genome was sequenced. A snapshot of the type of data sets in GEO reveals that • Context as critical as data. The context of omics data sets can be of varying types and that the experimental data is extremely important microarray technology can be used to generate in the biological world. To generate quality functionally diverse data sets. inferences and hypotheses, scientists need a thorough understanding of how the experi- Omics technological advances ments were performed. This context must be Locked within like gene expression microar- preserved while integrating data for analysis. these huge volumes rays and aCGH1 offering deeper • Multiple public and proprietary data sources. of data about a revelations about siRNAs2 and A significant number of public repositories hold miRNAs3 have enabled us to cell’s DNA, RNA produce diverse and mammoth valuable information about the experimental findings of various facets of cell biology. In and proteins and volumes of data. In fact millions addition, every biopharma organization has the relationships of data points can be produced its own internal data sources and maintains for a whole genome single run among them are and aggregated to even larger specific research findings. The types of data and the sheer divergence of available data the information and quantities in a very short time. sources and the variations in data formats insights required Concomitant with the explosion among these entities make it a formidable to successfully of data is the need for spe- task for scientists to maneuver through the data maze. Each of the data sources captured identify and cialized data skills. Target contains different levels of detail, so compari- identification has become validate high-value a data mining exercise. The sons can be challenging. therapeutic targets. current challenge is how one • Lack of universally accepted standards. can understand the dynamic The lack of common standards for data sets context of the data, and analyze and interpret it results in the uncontrolled proliferation of data to gain meaningful biological insights. It’s chal- as each research group describes the context lenging for the lead biologist working on a target and the results of experiments in its own way. to interpret and integrate the vast amount of Though standards such as Minimum Informa- numeric data available in his decision-making tion About Microarray Experiments (MIAME) process. This leads to underuse or improper use have been developed, multiple variants of of high throughput data sets. MIAME have evolved to describe the complex biological microarray experiments. These dis- The Challenges of Mining Today’s parities mean scientists must manually look Trove of Biological Data into multiple databases and correlate data Locked within these huge volumes of data about among them to validate the observations — a a cell’s DNA, RNA and proteins and the rela- time-consuming, tiresome task. tionships among them are the information and insights required to successfully identify and • Need for complex visualizations. Visualiza- tion plays a key role in finding the patterns in validate high-value therapeutic targets. Yet the the underlying data. The visualization tools very nature of the data and how it is uncovered for omics data have evolved from simple create further challenges to using it effectively. heatmaps to networks overlaying omics data These challenges include: and are continuing to evolve into three and • Multiple data types from multiple high- four dimensions incorporating time series throughput technologies. High-through- information. Researchers use various tools put technologies have evolved to measure like Spotfire, R packages, Matlab, Excel and different molecular entities in a cell, such as Cytoscape for visualizations. Typically, no DNA, RNA, protein, methylation status, etc. single tool provides all the types of visualiza- Each high-throughput experiment results in tions required by an omics researcher. Often, cognizant 20-20 insights 2
  • 3. researchers must develop custom visual- hypothesis generation even as it makes those izations to address their specific research hypotheses more relevant. questions. We believe that an effective data integration • Fast-evolving domain technologies. The solution enables productivity gains of at least microarray technology that revolutionized 20%. Some of the key solution components that biological data generation is now common, will drive such productivity gains are: and several other high-throughput technolo- gies, like next-generation sequencing, are • Faster insight generation. evolving that further push the boundaries of The platform should inte- We believe that data generation. As they do so, the challenge grate various public data an effective data of effectively mining this data to establish sources and have the ability useful biological insights becomes increasingly to combine those with com- integration solution critical to address. mercial and private data. It enables productivity should help propagate bio- gains of at least 20%. • Lack of a common platform. In many modern pharmaceutical research organizations, logical insights using any integrated access to all the data needed to of the common identifiers and be easy to use, advance a discovery project is often impossible enabling scientists to more easily connect the to achieve because it is spread across a variety data to reach insights. of databases and visualization and analysis • More revelation of disease mechanisms. tools. Thus, the scientists maintain their own Integrating different types of data should personal copies of the data, typically in the help researchers investigate and understand form of Excel spreadsheets, an inefficient disease mechanisms more thoroughly, solution. which assists in building the target/disease association. • Annotated integration. Integration in the usual sense is assimilating the parts into the • Maximize utilization of Oncology has long whole. However, this traditional notion of inte- high throughput data gration is not possible with biological data. It is sets. By providing easy been a challenging much harder to numerically integrate experi- access to diverse high- domain because of its mental values in such a way that the resulting throughput data sets, complex biology. Our number represents a biologically meaningful including those avail- phenomenon. able within a company’s goal with Inventus is A more pragmatic approach is to integrate research groups and to help analyze the the results from each experiment after systems, an integration data and answer key platform would maximize analyzing them separately. This approach the return on invest- questions around can be extended to any molecular data type. For example, tissue microarray data can be ment on generating such annotation for analyzed independently by the pathologist, data sets. additional identifiers, and the resulting inference of over-expression • Platform-independent gene ontology, or under-expression of the protein of interest and extensible. The solu- can be easily compared or integrated with the mutation, expression tion should be domain over-expression or under-expression inference platform-independent and and pathway within obtained from an RNA microarray experiment. be extended to open plat- the solution through forms like caBIG or any Making Information and Insights data propagation. proprietary platform. Obvious: An Integrated Solution Researchers and scientists clearly need an intel- The Cognizant Approach: Inventus ligent, intuitive solution to data collation and cor- We are creating a prototype for a biological data relation so they can spend the majority of their integration platform using oncology as a thera- time interpreting complex biological relationships peutic area. Oncology has long been a challeng- and their impact on target identification and ing domain because of its complex biology. Our validation. A single technology platform enabling goal with Inventus is to help analyze the data a workflow that lets scientists look at different and answer key questions around annotation for biological data types in context and to quickly additional identifiers, gene ontology, mutation, analyze their relationships would streamline expression and pathway within the solution cognizant 20-20 insights 3
  • 4. Inventus Analysis Workflow Validate by Wetlab Experiments Design & Conduct Study Identify Gene Pathway of Interest Commons NCBI Study Mutation & Expression COSMIC BioGPS Figure 1 through data propagation. The typical analysis To demonstrate the utility of Inventus, we queried workflow is depicted in Figure 1. for gene TP53, a well-studied tumor suppressor gene. The default EntrezGene plug-in displays As the next step, we investigated existing open the basic functional information about TP53. The integration platforms like Life Science Grid (LSG) mutation plug-in displays mutation data for TP53 and cancer Biomedical Informatics Grid (caBIG). from COSMIC summarizing the mutations studied/ We built Inventus using LSG because it provided identified in various cancer types (see Figure 2). the necessary minimal information technology The scientist is quickly able to understand that framework. We developed the following plug-ins TP53 is frequently mutated in a wide variety of for Inventus: cancers. The BioGPS plug-in displays the gene expression data from the Novartis gene expression • Web services-based pathway data from atlas and shows TP53 to be under-expressed (see Pathway Commons. Figure 3). The above data quickly highlights to the • Access to mutation data from ept:
Investigation
of
Gene
TP53 COSMIC. scientist that TP53 could be involved in cancer. • Web services-based gene annotation informa- tion from NCBI EntrezGene. The pathway plug-in from Pathway Commons tation displays all the pathways that TP53 participates ws the general known • Portal-based data access to BioGPS.org for (see Figure 4). One can observe that TP53 is P53 like aliases, functional gene expression. involved in cell cycle pathways. Launching the somal location etc • Cytoscape for visualizing pathways. pathway in Cytoscape allows the scientist to further understand the interacting partners derstands that TP53 is a 
 Mutation Gene Expression Proof
of
Concept:
Investigation
of
Gene
TP53 Step 3: Gene Expression • Action ws the mutations studied in • Scientist views the gene expression data ata sources like COSMIC. from public data sources like Novartis Gene Atlas. tices that TP53 is frequently • Inference cancer types. • Scientist notices that TP53 could be lowly expressed in cancer tissues. • Step 4: Pathways Figure 2 • Action Figure 3 s Confidential • Scientist views the pathways that TP53 is involved from public pathway data sources like Pathway Commons. cognizant 20-20 insights 4 • Inference • Scientist notices that TP53 is involved in various cell cycle pathways.
  • 5. 
 Integration
with
Other
Tools Integration
with
Other
Tools Pathway Cytoscape View View Pathways in Cytoscape View Pathways in Cytoscape Scientist analyses the the pathwayscytoscape to understand the the Scientist analyses pathways inFigure 5 Figure 4 in cytoscape to understand interacting partners of TP53. interacting partners of TP53. of TP53 and use network analysis tools within Seizing Target Identification Cytoscape to better understand TP53’s role in Opportunities, Better Understanding pathways and disease (see Figure 5). In cases Disease Mechanism of a protein like TP53, pathway analysis plays We are committed to helping pharmaceutical an ©2010, Cognizant Technology in the discovery process | important roleTechnology Solutions | ©2010, Cognizant Solutions Confidential Confidential companies overcome the challenges posed by because tumor suppressors have been difficult vast and growing quantities of biological data to to reactivate using external intervention. Such seize the opportunities contained within that data. proteins may not pose as targets themselves but Drawing on our deep biopharma expertise and the downstream or upstream interacting partners experience, we take the initiative in developing could be potential targets. solutions to industry issues. The above example provides a glimpse of how To learn more about our solutions for integrat- the solution is useful to discovery scientists, who ing biological data for true insight generation and analyze high-throughput data sets and develop target identification and validation, contact us at a list with genes of interest to prioritize for inquiry@cognizant.com. additional individual exploration. Footnotes * refers to the study of biological systems; includes genomics (DNA), proteomics (Proteins), metabolomics (Small Molecules) etc. 1 aCGH — array comparative genomic hybridization 2 siRNA — small inhibitory RNA 3 miRNA — micro RNA References NCBI GEO: archive for functional genomics data sets — 10 years on. Tanya Barrett et al Nucleic Acids Research Advance Access published November 21, 2010 LSG Sourceforge: http://sourceforge.net/projects/lsg/ The Cancer Biomedical Informatics Grid (caBIG Fenstermacher D, Street C, McSherry T, Nayak V, Overby C, Feldman M. . Conf Proc IEEE Eng Med Biol Soc. 2005;1:743. cognizant 20-20 insights 5
  • 6. cPath: open source software for collecting, storing and querying biological pathways. Cerami EG et al BMC Bioinformatics. 2006 Nov 13;7:497. COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer. Forbes SA et al Nucleic Acids Res. 2010 Jan;38 (Database issue) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Chunlei Wu et al. Genome Biology 2009 Nov; 10(11) Cytoscape: a community-based framework for network modeling. Killcoyne S et al Methods Mol Biol. 2009;563:219-39. http://www.mged.org/Workgroups/MIAME/miame_2.0.html About the Authors G.D Mahesh Kumar is a Consulting Manager in Cognizant’s Discovery Informatics Center of Excellence located in Sweden. He has over seven years of experience working in drug discovery in the pharma and bioinformatics industries. Mahesh’s core skills include bioinformatics, biological data management and research projects management. He has presented at international conferences and currently manages projects and client relationships with European pharma clients. Mahesh holds a master’s degree in bio- technology from Goa University, Goa, and has been a Project Management Institute member since 2006. He can be reached at Maheshkumar.g.d@cognizant.com. Raghuraman Krishnamurthy is a Chief Architect in Cognizant’s Life Sciences Business Unit’s Technology Consulting Group. Raghu’s core skills include enterprise architecture, data, SOA, mobility application and convergence of technologies. He has worked with several major pharmaceuticals in envisioning and leading transformational initiatives, has presented papers at numerous conferences and was recently named a senior member of the prestigious Association of Computing Machinery (ACM). Raghu holds a master’s degree from the Indian Institute of Technology, Mumbai, and is a TOGAF-certified enterprise architect. He can be reached at Raghuraman.krishnamurthy2@cognizant.com. Sowmyanarayan Srinivasan heads Cognizant’s Discovery Informatics Center of Excellence. He has spent over a decade focusing on building business solutions across the spectrum of discovery informatics. Sowmya has worked with leading biopharma organizations to design solutions and consult on their transformation initiatives. He has also helped to establish the Bangalore/India chapter of EPPIC Global (Enterprising Pharmaceutical Professionals from the Indian subcontinent) in early 2000. Sowmya holds a bachelor’s degree in engineering and master’s degree in business administration. He can be reached at Sowmyanarayan.srinivasan@cognizant.com. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out- sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 118,000 employees as of June 30, 2011, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. 1 Kingdom Street #5/535, Old Mahabalipuram Road Teaneck, NJ 07666 USA Paddington Central Okkiyam Pettai, Thoraipakkam Phone: +1 201 801 0233 London W2 6BD Chennai, 600 096 India Fax: +1 201 801 0243 Phone: +44 (0) 20 7297 7600 Phone: +91 (0) 44 4209 6000 Toll Free: +1 888 937 3277 Fax: +44 (0) 20 7121 0102 Fax: +91 (0) 44 4209 6060 Email: inquiry@cognizant.com Email: infouk@cognizant.com Email: inquiryindia@cognizant.com © Copyright 2011, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.