This document provides a summary of a presentation on government big data. It introduces the speakers and discusses topics like on-premises vs hosted services, the Recovery Accountability and Transparency Board's (RATB) cloud services, and EMC Isilon's scale-out storage solutions for big data in the federal sector. Specific solutions mentioned include RATB websites, their logical system architecture in the cloud, data governance, analytics as a service, and examples of federal agencies using Isilon for healthcare, life sciences, and other big data applications.
2. Today’s Speakers
Steve Ressler
Founder and President
GovLoop
Marina Martin
Entrepreneur-in-Residence & Head of the Education Data Initiative
U.S. Department of Education
Gary Newgaard
Director of Federal Solutions
EMC Isilon
Shawn Kingsberry
CIO
Recovery Accountability and Transparency Board
3. Housekeeping
o Twitter Hash Tag: #gltrain
o If you would like to submit a question, just look for the "Ask a
question" console. The presenters will field your questions at the
end.
o If you have any technical difficulties during the training click on the
Help button located below the slide window.
o We will be e-mailing you a link to the archived version of this
training, so you can view it again or share it with a colleague, and
a GovLoop training certificate.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16. February 22, 2013
On Premises or Hosted Service
Public Transparency Website
Fraud Analytics as a Service
Big Data as a Service
RATB Cloud Services
High Level Technical Briefing
18. RATB Logical System Diagram
Logical RATB System Design Capabilities
• Public and Private Cloud providing separate
and distinct websites running off of a
common software, system, and data
warehouse infrastructure
• Elasticity to support millions of concurrent
users
• Content and Design team to support layout
and design requirements
• Secured access to sensitive data providing
virtual desktop as a service.
• Data automation providing scheduled
retrieval of required data sets.
• Risk framework providing streamlined
matching against risk databases.
• Link analysis systems and highly skilled
analysts.
• Partners with key industry companies
providing rapid development level
integration services.
19. RATB High Level Technologies
19
Social Media
Web Infrastructure
Visualization, Analysis,
and Reporting
Data Layer
Infrastructure
Disclaimer of Endorsement:
Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States
Government. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government, and shall not be used for advertising or product endorsement purposes.
Note: There are extensive products in this “infrastructure layer” These are the key components.
A more comprehensive list can be made available by request.
22. Advanced Analytics Cloud
What is FederalAccountability.gov
• The portal allows Federal agencies and
Inspectors General the ability to review
and evaluate the risk assessment of
entities, companies, and universities
receiving Federal Funds.
22
HIGHLIGHTS
• Deployed Security: FIPS 140-2
• Infrastructure: Secured Private Cloud
U.S. Department of Defense U.S. Environmental Protection Agency, OIG
U.S. Department of Education, OIG U.S. Department of Justice, OIG / Civil
Division
U.S. Department of Homeland Security,
OIG
U.S. Army
National Science Foundation, OIG U.S. Social Security Administration, OIG
U.S. Department of Agriculture U.S. Census Bureau
Corporation for National and
Community Service OIG
U.S. Department of Commerce, OIG
U.S. Department of the Interior U.S. Department of Labor
U.S. Department of Health and Human
Services
U.S. Department Housing and Urban
Development, OIG
Executive Office for the US States and
Attorney
RATB Cloud Service Customers
23. Advanced Analytics Cloud
Desktop and Analytics As A Service
23
Structured and Unstructured Data ETL
uReveal
ESRI
ARC GIS
Server
Oracle
ENDECA
FastAlert
Analysts, Investigators
Palantir
Accountability
Scorecard
uReveal
ESRI
ArcGIS
Server
ENDECA
STORE
FastAlert
SQLServer
2008
Palantir
Persistence
Engine
ScoreCard
SQLServer
2008
HANA In-
Memory
Computing
Single Sign-On Identity and Access Management
Security Layer (Netwitness, Archer, Juniper SSL VPN…)
VMWare
View
VDI
Stakeholders Request For
Assistance
24. Cloud Hub Categorization
PEOPLE PROCESS TECHNOLOGY
MODULARLAYERS
SINGLEPANEOFGLASS
Note: The specific details behind the RATB Cloud Hub Categorization can be provided by request.
Note to Presenter : Use this presentation to introduce prospects and customers to the benefits of Isilon scale-out NAS. According to recent IBM estimates, there are exponentially-growing amounts of unstructured data, projected to exceed 5 exabytes every 10 minutes by the end of 2013. Likewise, there are a rapidly growing number of people who believe that over the next few years harnessing this “big data” and transforming it into knowledge will lead to better decisions, which will in turn transform society and our federal government. Current estimates from Deltek, a market research, analysis and reporting firm, projects demand for such big data solutions will grow from $4.9B in FY2012 to $7.2B by 2017, and EMC Isilon is well situated to address this market demand.
We ’ll start with a look at Big Data in the enterprise.
Note to Presenter : View in Slide Show mode for animation. We hear a lot about Big Data, but sometimes the definition isn ’t clear. Here is a useful definition of Big Data from Wikipedia: Big Data is data that challenges the capabilities of a system to capture, manage, and process it within a tolerable elapsed time. In the context of today ’s presentation, two key attributes that we’ll be discussing is the volume of data and the composition of the data. In terms of “volume,” we’ll focus on the multi-terabyte to multi-petabyte range. And for “composition,” we’ll focus primarily on unstructured, file-based data. In this context, Big Data includes audio, video, graphics, images, and enterprise file data sets such as office files, home directories, VMDKs, and large-scale file archives. Isilon supports all kinds of unstructured and file-based data.
This slide shows the rapid growth of data stored by organizations as projected by IDC in its 2011 White Paper. This phenomenon is affecting organizations in virtually all industry segments and a wide range of enterprise use cases. In addition to the rapid growth of data, this research also shows a remarkable shift in the composition of data being generated and stored—the shift from traditional “block-based” data to unstructured, file-based data. Note to Presenter : Click now in Slide Show mode for animation. IDC projects that by 2013, 80 percent of all incremental storage capacity required by organizations will be for file-based data. Note to Presenter : IDC White Paper can be found here: http://h18006.www1.hp.com/storage/pdfs/4AA3-4945ENW.pdf
The rapid growth of Big Data is a major concern for many CIOs. In fact, a recent Gartner study showed that addressing the challenges of data growth, system performance, and scalability are the top concerns of CIOs. Note to Presenter : Click now in Slide Show mode for animation. According to a recent ESG (Enterprise Strategy Group) study, these concerns are driving most organizations—more than 80 percent—to adopt scale-out storage for their Big Data needs. With this as a background, let ’s now look at Isilon scale-out storage. Note to Presenter : Gartner report can be found here: http://www.gartner.com/id=1456135 (abstract only) ESG report can be found here: http://www.esg-global.com/market-reports/scale-out-nas-driving-value-for-rapidly-growing-file-based-storage-environments/
These are prime examples of data-intensive industries where Isilon storage systems have been proven to deliver significant customer benefits: Medical Imaging Gene Sequencing Seismic Exploration in the Oil & Gas industry Video & Graphics (Media & Entertainment) Satellite Images Product Development Companies in these industries have been the leading edge because large-scale files and unstructured data—Big Data—have caused these firms to adopt innovative storage approaches and embrace Isilon.
As you can see here, nearly all Federal Agencies are doing or planning projects in Big Data. With the Life Sciences and Medical/Healthcare segments topping the list over Defense and Energy In the recent past, many federal agencies and top executive leadership have gone on record to identify the need for or importance of “Big Data.” For example, the President’s Council of Advisors on Science and Technology has indicated that “every Federal agency needs to have a ‘big data’ strategy because of the exploding volume of data.” In March of 2012, the White House announced a $200 million Big Data Initiative to create tools to improve scientific research by making sense of the huge amounts of data now available. The investment was made to improve the technologies for getting insight from complex and large sets of digital data. According to John Holdren, director of the White House Office of Science and Technology Policy, the Big Data initiative “ promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security .” At the Nov 2012 C5ISR Conference, Chris Miller, Executive Director for the Navy Space and Naval Warfare Systems Command (SPAWAR), noted that “ Big Data is the #1 enabling technology challenge .” As a final testimonial on the importance of big data, in Jan 2013 the Defense Information Systems Agency ( DISA ), which provides the strategic vision and steers the course of IT for the Department of Defense (DoD) and much of the White House, has identified “Big Data” as one of their top eight focus areas within their Strategic Plan for 2013-2018, and one of only four technologies which it judges to be “essential to modern warfare” . In the current U.S. Federal business sector, the drive for operational and cost efficiencies rules the day. EMC Isilon is well positioned to usher in this new era of fiscal frugality and responsibility while delivering “best in class” capacity and performance for the Network Attached Storage (NAS) market. While seemingly unrealistic to anticipate new federal spending while such dramatic cost reductions are underway, there is tremendous potential for Information Technology (IT) to achieve the desired efficiencies while increasing overall operational effectiveness. To translate the big data $5B opportunity to a more comprehensible target business opportunity, Deltek research cites an example where in FY 2013, each agency is directed to spend at least 30 percent less on travel expenses than in FY 2010 and maintain this reduced spending level each year through FY 2016. The Office of Management and Budget (OMB) has directed agencies to use the near-term savings to improve the transparency and accountability of federal spending, and much of this reinvestment of savings is to be channeled into IT as a means to achieve the desired objectives. Therefore, budgets are expected to grow in the Federal IT initiatives identified as supporting the above referenced travel management challenge, as a small but relevant example. Specifically, according to GovWin, the Federal Enterprise Architecture ’s Business Reference Model shows that spending on Travel Management systems will grow from $52 million in FY2012 to $63 million in FY2013. This represents a nearly 20 percent increase during a period of otherwise weak overall growth, or even anticipated reductions. Furthermore, research shows that new spending on IT systems, referred to in civilian agencies as Development/Modernization/Enhancement (DME), or in DoD circles as “modernization through sustainment”, is growing at about 30% of overall spending, across the entire federal government and throughout most federal agencies. Therefore, the potential is believed to be high for technology application and solution sales opportunities, in the civilian and military/defense market segments for both new development activities and ongoing sustainment, according to Deltek. So if/when you hear those terms or phrases, DME or “modernization through sustainment”, consider them a flag to the type of relevant opportunities where EMC Isilon solutions will apply. Where are these opportunities for big data likely to occur? According to the GovWin IQ Report 2012, the number of big data initiatives and funded projects by agency are as shown below: As you can see, when it comes to big data, the sales opportunities span all major federal agencies and the technology application opportunities span business and operational missions from healthcare and life sciences (the largest of the big data opportunities identified) through Intelligence, Surveillance and Reconnaissance (ISR), Infrastructure, Video Surveillance and more. Here again, EMC Isilon is practically the perfect solution to address the efficiency and effectiveness objectives of these agencies.
The Canadian Department of National Defense and CSE are already using Isilon, in addition to dozens of other Federal organziations in America.
Summary slide: each of these EMC Isilon qualities combined provide a “Never Refresh Again Architecture” to simplify management of your Big Data Healthcare needs.
Isilon storage systems are extremely easy to use. This “simple to manage” approach translates into a significant cost savings for you. A recent IDC white paper details Isilon ’s cost advantages for enterprise environments. As shown in the graphic on the left, IDC investigated the relative amount of time needed by IT professionals to perform a wide range of data and storage management functions (listed on y axis) for Isilon as well as traditional storage systems. Isilon storage is easier to manage and requires less time. The study showed that with Isilon scale-out NAS, enterprises were able to increase IT productivity by 48 percent and thereby reduce OpEx (operating expenditures). Note to Presenter: IDC White Paper can be found here: http://simple.isilon.com/doc-viewer/1807/idc-quantifying-the-business-benefits-of-scale-out-network-attached-storage-solutions.pdf?gid=kyacua0laWyaTN&utm_campaign=www&utm_medium=doctab-f&utm_source=onefs_ap
The IDC study also found that as a result of Isilon storage systems ’ unmatched efficiency—over 80 percent storage utilization—organizations were able to reduce CAPEX (capital expenditures) significantly. With the reduced CapEx and increase in IT productivity, enterprise customers were able to reduce their overall storage costs by 40 percent with Isilon scale-out NAS (compared to traditional storage systems). Note to Presenter: IDC White Paper can be found here: http://simple.isilon.com/doc-viewer/1807/idc-quantifying-the-business-benefits-of-scale-out-network-attached-storage-solutions.pdf
Note to Presenter : View in Slide Show mode for animation. This slide provides an overview of the Isilon scale-out NAS architecture: Starting at the Client/Application layer, the Isilon NAS architecture supports a wide range of operating system environments, as shown here. At the Ethernet level, the Isilon OneFS operating system supports key industry-standard protocols, including NFS, CIFS, and HDFS (Hadoop Distributed File System), and provides you with great interoperability for business applications as well as your data analytics activities. OneFS is a single file system/single volume architecture, which makes it extremely easy to manage, regardless of the number of nodes in the storage cluster. Isilon storage systems scale from a minimum of three nodes up to 144 nodes, all of which are connected with an InfiniBand communications layer.
Note to Presenter : View in Slide Show mode for animation. OneFS can truly scale from as small as 18 TB to over 20 PB in a single file system, which eliminates the issues with conventional scale-up file systems that only scale to 16 TB. With Isilon, nodes can be added to the file system and be ready to use in minutes, versus a traditional file system that can take hours to install, configure, and provision. Isilon has the fastest performance and capacity scaling in the industry.