How does a century old company, who used to consider data integration placing the binder on a book keep up with younger, nimbler companies? You ever-evolve! You must always be adapting and you MUST change your environment before it changes you!
This is the success story of Chemical Abstracts Services . A 108 year old company who is the world’s authority for curating and classifying chemical information.
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
From Print to the Cloud and Beyond: The Story of a Century Old Company and its Resiliency to Ever-Evolve
1. From Print to the Cloud and Beyond
The Story of a Century Old Company and its Resiliency to Ever-Evolve
2. Agenda
CAS Overview
CAS - In the Beginning… There was Print
CAS - The Age of Silos
CAS - IBM Integration. To the Cloud… and Beyond
Future Considerations
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.2
3. Agenda
CAS Overview
CAS - In the Beginning… There was Print
CAS - The Age of Silos
CAS - IBM Integration. To the Cloud… and Beyond
Future Considerations
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.3
4. CAS helps scientists around the world benefit from the published
work of their colleagues by monitoring, abstracting and indexing the
world's chemistry-related literature
CAS has been supporting scientists for more
than 100 years
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.4
Since 1907, CAS’s objective
has been to find, collect, and
organize all publicly disclosed
chemistry substance
information
5. CAS helps scientists around the world benefit
from the published work of their colleagues
CAplusSM
CAS REGISTRYSM
CHEMLIST®
CIN®
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.5
Markush
Indexing
Authority
Processing
Source
Selection
Document
Indexing
Reaction
Indexing
MARPAT®
CHEMCATS®
CAS scientists monitor, abstract and index the world's chemistry-
related literature
Proprietary, standardized indexing in CAS databases ensures
consistent, comprehensive search results.
CASREACT®
6. CAS products and services make it faster and
easier for scientist to find the information they
need for their research
CAS Registry Numbers® uniquely identify each
chemical substance without the ambiguity of multiple
naming conventions
STN® combines industry-leading search and retrieval
with unique and comprehensive content
SciFinder® offers a one-stop shop experience with
flexible search and discover options based on user
input and workflow
Science IP®, the CAS information search service
provides fast, comprehensive and accurate searches
of the world’s scientific and technical literature
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.6
CAS Registry Number 58-08-2
CAFFEINE!
7. Agenda
CAS Overview
CAS - In the Beginning… There was Print
CAS - The Age of Silos
CAS - IBM Integration. To the Cloud… and Beyond
Future Considerations
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.7
8. CAS Timeline
108 Years of Progress (and Counting)
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.8
9. CAS End-To-End Architecture
“In the Beginning… There was Print”
Data
Transformation
Data Validation
Data Curation
Data Integration
Data Presentation
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.9
Data Ingestion
10. “CAS Knows Jack”
Jack and Friends Beside Printed Chemical Abstracts
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.10
11. Agenda
CAS Overview
CAS - In the Beginning… There was Print
CAS - The Age of Silos
CAS - IBM Integration. To the Cloud… and Beyond
Future Considerations
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.11
12. Data Ingestion
Data Transformation
Data Validation
Data Normalization
Data Persistence
CAS End-To-End Architecture
“The Age of Silos”
Data Ingestion
Data Transformation
Data Validation
Data Curation
Data Integration
Data Persistence
Data Transformation
Data Validation
Data Integration
Data Presentation
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.12
13. Silo Challenges
Multiple Data Ingestion Points
– In some cases, the same data is being ingested twice
Multiple Views of the Data
– Each silo must perform complex transformations to its specific view
– Editorial manufactures normalized data based on a print model
– Product Development wants de-normalized, complete data
– Content Delivery has a mixed view of the data
Multiple Vocabulary Conventions
– Differing data definitions causes confusion across silos
No Unified, Authority Data Store
– Each silo has their own copy of the data in its own specific vocabulary
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.13
14. Editorial Legacy Systems
Many disparate databases used to store relational data
– Becomes difficult to maintain and support
Multiple database technologies used
– No unified platform
Challenges to support legacy systems
– Some legacy technologies are no longer supported
– Succession planning difficult to support legacy systems
– Special IT used so that legacy code would not need to be touched
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.14
15. Content Delivery Systems
Data was transformed into one common data model to bridge
gap between Editorial and Product View
– One common schema model was complex and unwieldy
– Common model contained “unnecessary” complexities
– Common model did not align with Product Development’s specifications
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.15
16. Product Development Systems
Product Development must code for “unnecessary” complexities
Data not completely de-normalized
– Additional development necessary to compile data
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.16
17. Silo Challenges
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.17
18. By the Numbers
Thousands of journals ingested per day
– Approximately 1 TB of data per week
Over 100 other data feeds ingested per day
Over 1.2 million messages processed per day
– Synced up with product data daily in less than 10 minutes
Over 6 TB of compiled data created per day
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.18
19. What is an Architect to Do?
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.19
20. Unify…Integrate…Simplify
Unify Data; Processes; Transformations; Data Ingestion
Integrate Disparate Systems; Services; Applications; and
Data Consumers
Simplify the Architecture!!!
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.20
21. • Run proof-of-concept and/or proof-of-technology and/or pilot project as needed
• Negotiate contract
• Adjust as needed
• Selection team members score vendor solutions
• Aggregate scores
• Select vendor with best aggregate score (judgement required)
• Bake-off if winner is too close to call
• Send RFP document to prospective vendors
• Hold clarification meetings with vendor teams
• Vendors send RFP response documents
• Vendors present their solutions and answer questions
• Create technology selection team
• Identify key requirements (based on architecture and tech stack governance)
• Assign weights
• Create RFP document and scorecard spreadsheet
Request For Proposal
Create RFP
Engage vendors
Score-driven selection
Validate selection
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.
22. Requirements
Data Integration
Durable Message Bus with Guaranteed Delivery
Any-to-Any Connectivity
Architectural Flexibility
Excellent Support
A Proven Solution
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.22
23. Agenda
CAS Overview
CAS - In the Beginning… There was Print
CAS - The Age of Silos
CAS - IBM Integration. To the Cloud… and Beyond
Future Considerations
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.23
24. Unify…Integrate…Simplify
Data Curation
Data Ingestion
Data Transformation
Data Validation
Data Normalization
Data Integration
Data Transformation
Data Validation
Data Integration
Data Presentation
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.24
Data Persistence
Data Flow Orchestration
25. Agenda
Overview
CAS - In the Beginning… There was Print
CAS - The Age of Silos
CAS - IBM Integration. To the Cloud… and Beyond
Future Considerations
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.25
26. To the Cloud… and Beyond!
Off-Prem Processing
Bursting Capabilities
Data Center Relief
Co-Location Capabilities
New Mobile Applications
Service Unification
Service Management
Service Integration
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.26
27. CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.27
28. Questions
CAS is a division of the American Chemical Society. Copyright 2015 American Chemical Society. All rights reserved.28
29. Connect with CAS:
Joseph Sapp
Lead Enterprise Application Architect
jsapp@cas.org
www.linkedin.com/in/joesapp
Editor's Notes
How does a century old company, who used to consider data integration placing the binder on a book keep up with younger, nimbler companies? You ever-evolve! You must always be adapting and you MUST change your environment before it changes you!
This is the success story of Chemical Abstracts Services . A 108 year old company who is the world’s authority for curating and classifying chemical information.
Like any success story CAS’ story has a beginning…it has chapters where challenges were faced…Necessary evolutions…and a Bright Future!
CAS is a division of the American Chemical Society. It was founded in 1907 with two goals:
First, American scientists recognized the need to participate globally in scientific information exchange. With today’s pace of research, this need is felt by all scientists around the world.
Second—and this goal sounds like it could have been written today!—scientists simply did not have time to read all of the current literature in their field.
So, the Chemical Abstracts Service was formed and pledged to address these two problems. In 1907, they published the first Chemical Abstracts. It contained 502 abstracts from scientific papers and patents. Today, more than 1.5 million abstracts with scientific indexing are added to the electronic database every year. Our extensive coverage ensures that your scientists are getting the most comprehensive and timely information to support their research efforts.
The ACS’ mission to “improve people’s lives through the transforming power of chemistry” is still alive today as CAS’ scientists continue to organize the scientific literature so that scientists around the world can find they research they need and use it to make discoveries that improve all of our lives.
The CAS Product and Content operations division has a mission to select a broad spectrum of chemical and related documents and report new substances, methods of synthesis and novel information contained in those documents. Every step in the editorial process enhances what is contained in the original document. Our scientists enhance titles, add abstracts, apply standardized precise indexing, and extract substances for registration or Markush indexing.
CAS creates the CAplus, CAS REGISTRY, CASREACT and MARPAT databases by selecting and analyzing documents that report new or novel chemical findings and reporting these findings through enhanced titles and abstracts, controlled subject index entries, and substance indexing. This controlled indexing enhances the retrieval and understanding of the original research publication.
CAS staff also create three additional databases to further support the work of researchers. CHEMCATS allows researchers to find commercially available chemicals, pricing and supplier contact information. CHEMLIST allows scientists to find whether a substance is regulated and by what agency. Chemical Industry Notes provides access to current business news.
You can access the CAS databases by using one of our online services: SciFinder or STN; or by asking Science IP to run a search for you. Both STN and Science IP also access many other technical databases. Your CAS Account Consultant can help you decide which product will best meet your needs.
In 1965, CAS introduced the CAS Registry Number which is the industry classification standard for chemical substances, structures and biosequences. In 1966, CAS expands its medium to offer microform and magnetic tape mediums. This is only 14 years after IBM introduced its first magentic tape device, the IBM 7 track. Chemical Abstracts offered an Online product back in 1980! 2 years before the TCP/IP protocol was created and a decade before the first commercial Internet Service Providers came out. In 1998, CAS begins receiving data in electronic format. 2009, CAS registers its 50 millionth substance! CAS registered 10 million more in just a span of 2 years! In 2013, CAS printed its last hard copy volume. But when you are a century-old company, there are times that lulls can occur in your technologies and you are faced with a quandry of moving towards newer technoligies, but also having to maintain and support your older technologies. And this is not only the case for technologies and products, but also the architecture itself. Even though we do not print any more, our architecture is still based on print and must, once again, evolve.
In the Beginning there was Print…There are certain basic functions that CAS performs and has always performed. They are Data Ingestion, Data Transformation, Data Validation and Curation, and Data Integration and Presentation. Back when CAS was print-based, these functions were done quite differently. Data Ingestion used to be dropping journals off at a truck dock. Transformation occurred when the pages were tore out of the journals and handed out for curation. Data validation and curation were done by scientists similar to those who curate today, however, back then it was done by writing on the journal pages manually. Data Integration and presenation were completed when the binding was placed on the book. We have come a very long way as a company and as our industry has evolved, CAS has needed to find more innovative ways of performing these same, basic functions.
In the Beginning there was Print…There are certain basic functions that CAS performs and has always performed. They are Data Ingestion, Data Transformation, Data Validation and Curation, and Data Integration and Presentation. Back when CAS was print-based, these functions were done quite differently. Data Ingestion used to be dropping journals off at a truck dock. Transformation occurred when the pages were tore out of the journals and handed out for curation. Data validation and curation were done by scientists similar to those who curate today, however, back then it was done by writing on the journal pages manually. Data Integration and presenation were completed when the binding was placed on the book. We have come a very long way as a company and as our industry has evolved, CAS has needed to find more innovative ways of performing these same, basic functions.
As CAS began to evolve from a print world to a digital world, there were moments that the architecture found itself in a transition period. This occurred in order to accommodate both the old world of print and the new world of digital, online content. I like to refer to this as “The Age of Silos”.
During the years of print, The Editorial area of Chemical Abstracts was responsible for everything. From data ingestion, to data curation and integration, and finally presentation of the final print product. As CAS evolved to a digital environment and began to consume more and more content, other areas of the architecture were created to spread responsibilities around. A Content Delivery area was created to create search and display files for end consumers. Later a Product Processing and Delivery area was created to better serve ever-demanding search and display capabilities. Because of this step-by-step evolving process, CAS finds itself in “The Age of Silos”. Where each silo of the architecture is not only responsible for its existing responsibilities, but also the responsibilities of its past legacy. This has caused its share of challenges.
Read from Slides
As I had mentioned before, because of its beginnings, each are of the architecture faces its own challenges. During the evolution from print, <read slides>
<read slides>
<read slides>
Accommodating both the legacy and new architectures, caused each area to run in their own silo. With their own processes; their own “unnecessary complexities; and their own architectures. The type of data that CAS produces is highly complex. Human Genome Sequences, that if printed out may go the length of this wall. Chemical names that give relational databases fits. The data is not policy information, it is complicated information by nature. So CAS is familiar with handling necessary complexities. It is the “unnecessary” complexities that we need to avoid.
This is no small problem when you consider the volumes of data that CAS processes.
EVOLVE! This sounds great! But where do you begin?
CAS underwent an RFP or Request For Proposal where it sent an RFP out to four companies. The companies presented their case and then each was evaluated using a quantitative scoring approach. And when it was all said and done, CAS selected IBM to help drive its solution.
So Why was IBM chosen? This brings us to CAS’ latest evolution.
This brings us to our latest evolution.
As I mentioned, this new evolution of the architecture is intended to unify CAS and allow the “tools in the CAS toolbox” if you will, to perform the tasks that they were intented to perform. DataPower can now be used to unify everything from Data Ingestion, to Data Transformation, to Data Integration. Allowing Editorial to produce the high quality of data that makes CAS not only the industry leader, but also the industry standard. And do what they do very well. Curate Data. IIB and MQ give CAS the flexibility it needs to manage and coordinate data flow orchestration and persistence, while product processing and delivery does what it does very well. Process and Deliver Services! All of this results in a more flexible architecture and higher speed to market.
With new found flexibility. Data, Process and Service Integration, CAS can now to prepare to evolve in an ever-changing world. Some of these new capabilities include: Off-prem processing; the extension of mobile applications; and service unification. Once the new architecture is fully in place, CAS began to think of other future considerations.
IBM offers a plethora of solutions that may be beneficial to CAS in the future as we are able to expand our views. Other IBM solutions that have been useful for others may include:
IBM BlueMix powered by Cloud Foundry
IBM API Manager
IBM PureApp
IBM Watson