2. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Redshift, Big Data and Predictive
Modeling
Lige Hensley, CTO - Ivy Tech Community College
3. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Who we are
• 866,000 active users
• Students around the globe
• 138 buildings
• 24,000 PCs
• 2,000 wireless access points
• Over 100,000 network nodes
• 3,000 tablet devices
• 2,600 routers & switches
• 60,000 course sections a year
• 23,000,000 emails daily
• 24/7/365 operations
• 10TB data downloaded from Internet
daily
• 10,000 smart phones supported
• 700 TB of server data
• 7,000 VOIP phones
• 1,100 servers
• 1,200 software applications
• Generate over 100,000,000 rows of
data per day
• IT staff of 165
4. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Our challenge…
• Existing reporting environment inadequate
for our size
• Business need to better understand our
data
• Improve student success rates
• Very limited budget
5. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
What we wanted
• Flexibility to handle our myriad of data
sources and business needs
• Scalability to meet our growing demands
• Performance to match our need for quick
answers to data questions
6. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
What we found
• Worked with various “big data” vendors to
find an affordable solution
• All “solutions” started at 7-figures and went
up
• Nothing we found met our requirements
7. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
What we did
• Adopted Redshift & Pentaho as our solution
• Data feeds from numerous internal systems,
in near real-time in some cases
• Keep all transactions, not just snapshots
• Allow connections from other tools such as
Tableau, SPSS and Mathematica
8. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
What we can do now
• Deliver data to users quickly, timely and on
any device
• Analyze data between transactions
• Run machine learning tools against our
data for predictive modeling and reporting
9. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
How it helps
• Using historical data and pattern recognition
identify students who need assistance far
earlier than ever before
• Analysis can be run as early as 2 weeks into
term and maintain accuracy
• Accuracy of prediction models have ranged
from 62% two years ago to 81% currently
10. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Other benefits
• Allows for quicker identification of
fraudulent registrations
• Dramatically cheaper than the alternatives
• Pattern analysis identifies poorly designed
course materials
11. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cornell University
12. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Our Journey To The Cloud
Bob Carozzoni
Enterprise Cloud Strategist
13. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Which of these are officially part of
Cornell’s mission statement?
❏Research
❏Education
❏Outreach
❏Information
Technology
X
14. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
IT STRATEGY
Rebalance IT spend:
● less non-mission aligned
● more directly mission aligned
15. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cloud as Opportunity!
17. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cornell Cloud Advisory Service
A competency center for cloud adoption
18. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
FOUR YEARS OF AGGRESSIVE CLOUD
ADOPTION
19. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Was it really that easy?
20. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Understanding the needs of key
stakeholders
21. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
IT Leadership in the
“Post Enterprise World”?
Administrati
ve IT Staff
End-user
Business
leads
CLOUD
VENDOR
See: Educause Paper
Tracy Schroeder
22. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
If no one follows,
are we leading?
23. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Slow central IT ...
… means no central IT
Be on the train, or under it.
24. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
If you can’t take risks...
… end users will do it for you
25. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
You need to be more flexible
than your SaaS
provider
26. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Don’t fight redundancy and
don’t fight redundancy.
27. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
• Partnering
• Inspiring
• Coaching
• Brokering
• Enabling
Lead in a new way...
28. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Resistance from within
google: “smarter every day
backwards bike”
29. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
PRINCIPLES FOR LEADING
IN THE CLOUD ERA
1. #1 Strategy - be tactical
2. See with your customer’s eyes
3. Build relationships
4. Influence trumps control
5. Small is the new big
6. Question everything
32. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Cloudification = Transformation
33. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
What is cloudification?
• Partnership with campus IT units
• refactoring for most effective use of cloud technologies and
containerization vs lift and shift
• Central IT must be the expert that campus wants to come to
for help
• Enable not enforce
• Understanding that if IaaS isn’t better with us, campus will
make the move without us
• Allow campus technologists to focus on unit differentiators
central IT can help with the utilities
34. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Feature Delivery trend
feature delivery
datacenter features cloud features
36. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
traditional infrastructure
37. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Immutable Infrastructure
Quote by Michael
Bryzek, the CTO and
co-founder of GILT
Group
38. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Build your system using building
blocks or containers
39. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
When you find a flaw or need to
make a change
40. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Start over to completely build the
system you need
41. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Resistance to change
42. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
http://www.educause.edu/library/resources/cloud-
strategy-higher-education-building-common-
solution
https://www.educause.edu/members/robert-
carozzoni
FURTHER READING
43. AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Thank You.
This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium
Washington, DC I June 25-26, 2015
Notas del editor
The key point is that we are big. We have a lot of systems and even more data. Every ten days we generate another BILLION rows of data and almost all of that data is student related in some way.
Very slow, many reports taking over 40 minutes to run. 1M records in 8 hours and only worked with ERP. Someone only being able to ask 12 questions a day does not make for a very nimble organization.
Since our existing solution only worked with our ERP, we need something that works for the entire business.
As with every school out there, our goal has always been to improve the success of our students and our existing system was a liability in that effort.
And while we’re quite large, we’re also extremely cost conscience. We had to make this work on a sub 7-figure budget.
Ivy Tech is somewhat unique in the higher-ed space. We have the three V’s of data (velocity, volume and variety) in spades.
Our existing system had to be scaled vertically. If you wanted it to go faster, you had to buy a faster server. We needed something that went horizontal, meaning that if we wanted it to go faster or handle more data, we just add more servers.
And this is one point that cannot be overlooked. For an organization to get actual value from their data, you have to be able to get an answer quickly. If you have to wait 20 minutes for every query, you stop asking “what if” questions because it’s just too painful. Our goal was to go from our existing 40+ minute report runs to under 10 seconds for any of our “canned” reports. Custom queries, of course, can take longer depending on the question being asked.
There are several key players in this space that have some pretty impressive solutions.
Unfortunately, they all know how impressive their offerings are and they charge accordingly.
We also found some issues with what I’ll call “affordable scalability”. All the solutions had the ability to scale, but the price tag for this was, to be honest, just ridiculous. So needless to say, nothing met our needs.
There was a lot that went into this decision, but we selected Redshift and Pentaho. The “pay as you go” model is, for us, the best possible pricing model (next to free anyway). Pentaho performs well, is user friendly and works well across platforms.
We call our Redshift and Pentaho stack “NewT”. It stands for “new thing”. Obviously we’re very creative when it comes to naming our projects.
We have data flowing into NewT from 7 disparate systems with several more in the works. If the source system can support it, we can load that data in near-real-time. We can now query all of this holistically and the insights gained from that are eye-opening.
We keep all transactions, not just daily “loads” as in many environments.
We don’t limit access to just Pentaho, although that’s what most of our data consumers use. We also allow other tools for more specialized analysis.
We can now compare current data to last week, month or year in a few seconds…or run any report for that matter, on everything from a PC to an iPhone.
Looking at data on this level gives us new insights into our organization. While this adds a tremendous amount of data to the system, being able to process it quickly is teaching us lots of new things.
This is probably the most exciting aspect of our project. We can now churn through billions and billions of rows of data to predict what will happen next.
There are lots of paths to success and failure for our students. These paths have hundreds of variables. We can now identify the paths and the variables.
Every day we run a report that takes a students current activity and behavior through the prediction models and end up with a list of “students of concern”. These are the students that are very likely to fail their course given their current behavior..and we know this very early.
While we know that things happen and our model will never be 100%, we are getting better. One important note to all of this, we never look at a students grades in a course to make this predictions. Grades are not a feature we consider in our analysis.
One unintended benefit is the ability quickly identify patterns of fraud. We’ve found that the patterns of fraud can be subtle, but you can see them when you look just right.
Other solutions wanted 7-figures just to begin “playing” with their technology. We don’t have 7-figures into this project to date.
We’ve also used pattern analysis to find course content that may have been confusing to students based on their usage patterns. We literally have millions of assets in our learning management system. Looking at usage patterns of students with that content has led to finding and rebuilding some pieces in order to make it more understandable and easier to use.
How the Cloudification service came to be on Cornell campus.
Kuali development team supporting community source applications has worked with Kuali in our on prem implementation, with Kuali in AWS through the Kuali Foundation who does all new development in the cloud and also other schools. Being part of community source means that you contribute to the new and also help those implementing. Developers from Cornell have been part of troubleshooting problems on Kuali applications across many different schools and configs. This creates an environment for developers to have technical curiosity and the ability to learn new things quickly.
In the fall the Kuali team had a pilot project to move Kuali applications to the cloud. We sat down, scoped out the project and sent the developers off to figure it out. Two weeks later I had 3 developers who were so excited about cloud technologies, breadth of services, speed at which they could develop new things. The excitement was Awesome! Exactly what someone leading a group of technologists wants to see from team members. As we continued with the pilot others on campus started hearing about our pilot and campus units were contacting me asking if they could work with us. We found that because the technology was new to them, they were interested in a partnership with Central IT. This seemed like an opportunity too good to pass up so we created the Cloudification service.
There are two trends we point to when we are asked why we would move to the cloud. Technologist want to hear about the features, the services and how this can help them be better developers. The feature delivery trend is great to show these folks. There is no way we can keep up with the feature delivery rate in our on premise data centers. Every week there are new services available in the cloud. Every time we identify a gap, we report it and generally find it is on the roadmap already.
The other trend, which is more relevant to management and those who care about financials is the cost trend. We will continue to see costs of our local datacenters slowly rising as we have to replace hardware and keep many layers of infrastructure updated. The trend for cloud provides has been for costs to go down rather than up. There are people who will say that they have ‘done the math’ and they have proven that moving to the cloud will not save us money. This could be true if we chose to do no refactoring of our applications to take advantage of the automation that exists in the cloud but even if it is true today, the trend tells us that the move is still a good financial decision. The cost of the datacenter will continue to slowly go up while the cost of cloud computing will continue to go down at a fairly rapid pace.
We in higher ed are very comfortable with traditional infrastructure support processes. We create servers, we name them and we take care of them, we patch them, upgrade them and fix them when they are broken.
There is a new paradigm for infrastructure support called immutable infrastructure you may also hear it called infrastructure as code. This is a quote by Michael Bryzek, the CTO and co founder of GILT group, an on-line shopping service on steroids. If you google the term immutable infrastructure, you will find dozens of presentations by Michael Bryzek on this new paradigm, how he has gained huge efficiencies and how to make it work.
So what does immutable infrastructure really mean?
You don’t patch or fix your server. You throw it away. You have no server persistence, you are able to build a new environment from the OS all the way up the application stack in a matter of minutes. You design your entire development pipeline based on this immutable server concept.
Building what you need, when you need it and only the timeframe for which it is needed.
The biggest challenge to all of this change and opportunity to do things better, faster and smarter, is helping people see their path forward. We in Higher Ed have staff members who have worked for our institutions for many many years. In some cases doing exactly the same work. Change is hard for everyone but it is important. I use this analogy of the fogger game to show that the log you are on today is not necessarily the best, safest, most stable place for you to be forever. In order to stay relevant it is important to understand new technology direction. The good news is, I have watched people feel resistant and push back then just take the first step towards understanding the opportunities introduced by this new paradigm. I have watched people go from terrified and resistant to excited and innovative in their varying fields of expertise. Not everyone deals with change at the same speed or in the same way but I do think it is important to say that this is one of the major challenges, it’s important and critical to help our resources find their path forward.