Publicidad
Publicidad

Más contenido relacionado

Presentaciones para ti(20)

Similar a Data Science Salon: Applying Machine Learning to Modernize Business Processes(20)

Publicidad

Más de Formulatedby(16)

Publicidad

Data Science Salon: Applying Machine Learning to Modernize Business Processes

  1. Applying Machine Learning to Modernize Business Processes Matt Madden, Alteryx
  2. © 2017 Alteryx, Inc. Alteryx Analytics Platform for the Enterprise © 2017 Alteryx, Inc. Discover Share Deploy Manage Prep Analyze Blend Model Data & Analytics Culture Community
  3. © 2017 Alteryx, Inc. CODE-FREE ANALYTICS for the citizen data scientist Data Science for the Masses • Broad range of preconfigured predictive models • Complete toolset for spatial analytics • Leverage encrypted models from data scientists All Purpose DataWorkbench • Drag-and-drop UI for workflow creation • Prep, blend and analyze for most any use case • 250+ tools for wide array of data work • Simple yet sophisticated tool configuration • Global search for community support
  4. © 2017 Alteryx, Inc. CODE-FRIENDLY ANALYTICS for the trained data scientist High Performance for Big Data • Analytic compute in the Alteryx Engine for scale and performance • Additional In-DB platform support • Spark Breadth of Algorithmic Support through API • Enhanced R tool • Python and Scala support • Guide to creating R based Alteryx tools © 2017 Alteryx, Inc.
  5. © 2017 Alteryx, Inc. the only quick-to-implement, self-service data analytics platform that allows data scientists & citizen users alike to break the barriers to insight, so everyone can experience the thrill of getting to the answer faster. LikeNoOther. Data Prep & Blending is the Foundation For All Levels of Analytics © 2017 Alteryx, Inc.
  6. © 2017 Alteryx, Inc. Making Data Science ACTIONABLE The Last Mile of Analytics
  7. © 2017 Alteryx, Inc. • Lack of understanding the benefits • Lack of trust in effectiveness • Technical complexity to operationalize • No way to measure ROI Making Data Science ACTIONABLE is Challenging
  8. © 2017 Alteryx, Inc. Model deployment methods Interactive Dashboards Real-time ApplicationsReports
  9. © 2017 Alteryx, Inc. Model deployment methods Interactive Dashboards Real-time ApplicationsReports
  10. © 2017 Alteryx, Inc. Model deployment methods Interactive Dashboards Real-time ApplicationsReports
  11. © 2017 Alteryx, Inc. Data-Driven Apps Oscar Health Insurance Insurance Uber Transportation & Logistics TurboTax Accounting
  12. © 2017 Alteryx, Inc. http://www.informationweek.com/big-data/big-data-analytics/big-data-success-remains-elusive-study/d/d-id/1318891 Data science value chain Apps that reach customers and front-line employees operationally are more valuable than static reports
  13. © 2017 Alteryx, Inc. PROBLEM Business Problem Evaluate Available Data Request Data Access from IT Request Compute Resources from IT Negotiate with IT for Requested Resources Wait for Resources to be Provisioned Install Languages & Tools Configure Connectivity, Access, & Security RAM/CPU Availability, Scalability, Monitoring Request Network Config Change Request to Install Another Package Model Building Compose a Powerpoint to Share Results Edit Team Wiki to Document Your Work Negotiate with Product on Model Deployment Timeline Wait for Engineering to Implement the Model Test Newly Implemented Model to Ensure Valid Results Request Modifications to the Model due to Unexpected Results Release the Model to Production Document Release Notes and Deployment Steps Prepare for Change Management DATA SCIENCETEAMS FACE A MYRIAD OF CHALLENGES AT EVERY STEP OFTHE WAY
  14. © 2017 Alteryx, Inc. Deployments take 6 to 9 Months Cost to deploy 1 model runs in excess of $250,000 13% of models make it to production Your Applications Data Scientists Developers Your Customers Write more custom code to integrate Customer benefitsPainstakingly rewrite models into other languagesBuild a model in R or Python Model Production:Today’s Reality
  15. Download a FREETrial: alteryx.com/trial© 2017 Alteryx, Inc. | Confidential 15 MASTER NODE WORKER NODES MAKE PREDICTIONS PREDICTIVE MODELS DATA SCIENCE & ANALYTICSTEAMS FRONT-LINE EMPLOYEES MARKETERS CUSTOMERS APPLICATIONS PROMOTE Code Free & Code Friendly
  16. © 2017 Alteryx, Inc. Alteryx Promote 16 ❶ Deploy Code-Free  Utilize the code free environment of Alteryx Designer to build and deploy model ❷ Deploy Code-Friendly  Delivering R and Python models is immediately accessible via standard REST API without recoding ❸ Manage  Evaluate andTest Models before they are put into production. As well as manage users, and model versioning ❹ Monitor  Understand the effectiveness and stability of the models ❶ ❷ ❸ ❹
  17. © 2017 Alteryx, Inc. SOLUTION Business Problem Evaluate Available Data Request Data Access from IT Request Compute Resources from IT Negotiate with IT for Requested Resources Wait for Resources to be Provisioned Configure Connectivity, Access, & Security RAM/CPU Availability, Scalability, Monitoring Request Network Config Change Request to Install Another Package Model Building Compose a Powerpoint to Share Results Edit Team Wiki to Document Your Work Negotiate with Product on Model Deployment Timeline Wait for Engineering to Implement the Model Test Newly Implemented Model to Ensure Valid Results Request Modifications to the Model due to Unexpected Results Release the Model to Production Document Release Notes and Deployment Steps Prepare for Change Management Install Languages & Tools DATA SCIENTISTS NEED AWAYTO MANAGE THEIR PROJECTS FROM END-TO-END
  18. © 2017 Alteryx, Inc. Recommender Systems Use Case • Tendril deploys and retrains predictive models that forecast consumer behavior and purchase decisions. Results • 2x faster development cycles 4x faster time to market • $350,000 saved per year in engineering costs
  19. © 2017 Alteryx, Inc. Credit Scoring Use Case • Scoring and decision-making models, credit underwriting, fraud detection, marketing Results • 8% increase in approval rates • 15% decrease in loss rates • 17% increase on overall margins
  20. © 2017 Alteryx, Inc. Dynamic Pricing Use Case • Turo deploys the data science team’s dynamic pricing model and recommender engine into their web and mobile app. Results • RemovedTuro’s data science team’s dependence on other engineering teams so that they can realize the value of their work almost immediately. (less than a day)
  21. © 2017 Alteryx, Inc. Fraud Detection Use Case • VIA SMS Group deploys fraud detection and credit profiling models. Results • 200x faster model retraining • 5% decrease in manual application reviews • 13% better model performance (lift in ROC curve)
  22. ThankYou Matt Madden mmadden@alteryx.com

Notas del editor

  1. Data Science for the Masses: Turbo Tax-like interview interface We have just announced the acquisition of Yhat, and will be bringing you a new solution that will help you to operationalize models through a REST API. Another area of focus is around performance. The Alteryx Engine was built to handle gigabytes of data. The new Engine is built for Terabytes. When you have petabytes, we need to leverage the power of Spark and we need to process data where it lives. We are adding additional algorithmic support. We already support R, and are adding a brand new Python SDK that will ultimately enable a Python tool for Alteryx Designer.
  2. The Alteryx Engine was built to handle gigabytes of data. The new Engine is built for Terabytes. When you have petabytes, we need to leverage the power of Spark and we need to process data where it lives. We are adding additional algorithmic support. We already support R, and are adding a brand new Python SDK that will ultimately enable a Python tool for Alteryx Designer.
  3. So we have always believed data prep and blending would be a core competency of any analytic system, because it is the foundation for any type of analytic. For the past few years, the industry was perhaps a bit over-focused on the new interactive visualization paradigm – which is awesome because it increases engagement and insight, but it didn’t really move us beyond descriptive analytics. But now, folks are setting their sights higher. Organizations are starting to understand that business value is going to come from the more sophisticated analytics that are just now making there way into the mainstream. And so the question there, becomes – how, or more specifically, who? This jump to these higher value analytics is really throwing the issue of analytic talent shortage, into sharp focus.
  4. We talk to a lot of organizations and one thing we find is that there is a lot of consistency in their challenges. There is an inordinate amount of difficulty in terms of extracting value from, in particular advanced analytics, but probably as you guys can attest too across the board from business intelligence all the way through sort of PhD level sort of methodologies and teams. We boil all of the complexities and challenges that data science teams face into sort of two big categories. One is human. Organizations tend to have a really difficult time understanding what's capable, what are appropriate questions to be asking of the data in the first place. And then the second is technical. This stuff is very complicated. The data prep and blending problem is impossibly nuanced and incredibly complex. There will never be a solution that can magically clean data. You guys all know this. The same sort of complexity in situational nuance applies to machine learning problems that data scientists deal with every day.
  5. So if we just focus on the later- the technology problem- it really comes down to truly extracting that value from analytics and building machine learning models is one way to do so, but to truly extract that value- it really comes down to model deployment. The sort of main two methodologies or methods that companies tend to think about the most when they are dealing with or talking about investing in predictive analytics are sort of offline. One is reports. If you take a PhD in astrophysics who is now working in industry, this individual may deliver on behalf of the executive team charts and pros that describe a business phenomenon and why it happens, and the end deliverable is sort of academic in spirit where there's no application, it's purely the academic style research applied to whatever the business phenomenon is.
  6. The second is interactive dashboards which require that the model be fit and that scores be generated for new records usually on a nightly basis, though it can be at intermittent hours. If you want to estimate which leads to send direct mail to and which ones are poor quality leads that you should throw away. This would be another sort of batch or offline application.
  7. Where Alteryx is now focusing on is these near real-time applications. Alteryx has been very, very capable for quite some time in doing a lot of batch capabilities with respect to predictive analytics, but there was no real way to take a model that you've built in Designer and incorporate the business logic that the model is designed to operate under into say an iPhone app. And we feel we are filling a gap and helping data scientist limit the challenges they face.
  8. Why do we even care about this in general? Well, table stakes today in pretty much every software application that any of us uses on our phones or on the web is designed to be hyper personalized. This is the case with certainly in the consumer world but it's growing more and more even internally with products that people or companies are building for their frontline employees.
  9. Not every team is going to have a very, very large or robust data engineering team. It's not uncommon for some of the LinkedIns and Netflixes, the sort of great machine learning companies out there to have one engineer for every data scientist. This is not a scalable team structure for the vast majority of companies. When we think about the model deployment problem in particular, we always think about it in terms of when does data and data science projects in general become valuable? If you think about raw data versus clean data and the notoriously difficult set of problems that go into taking messy and unclean data and converting it into clean data suitable for a dashboard in Tableau or suitable for building models in the first place, there isn't a great deal of value created certainly not in terms of or in comparison to the great expense that goes into doing that work. On the flip side, the higher you get up the data science value chain, the more value is realized with sort of the arc-typical example being super-sticky machine learning powered consumer apps like Netflix that rely entirely on machine learning and all of our mothers can use without ever knowing anything about support vector regression or clustering algorithms that go into this stuff. Sort of the rub in building a data science team despite the fact that maybe the majority of the work takes place in the data prep and blending phase, the value may not always be created until much farther down the stream.
  10. Along the way there's a disastrous list of inefficiencies that data teams go through. You guys already know this. You have to jockey and broker for IT resources. You have no idea in some cases that a project that you've been working hard on has already been completed, perhaps even several times by other people in other offices. A lot of the time there's compliance and regulatory constraints that cost you many weeks of lag time and wait time in between when you've made a request into IT for access to a data asset and when you actually can get it. This becomes a very, very expensive process for virtually every company.
  11. Most organisations today have a real disconnect between their data science teams and their developers or engineers. Data scientists will typically be working with a variety of statistical libraries in either R or Python. They’ll be creating and evaluating their models in those languages, and once finished they’ll pretty much hand over that code ‘as-is’ to a bunch of developers whose job is to translate those scripts into working code across the enterprise, and then get that code in a shape where it can be made available to your customers or users. And here’s the problem. The enterprise doesn’t always speak R or Python. It’s talking Java, or .NET or PHP or Javascript. In short – we have a major language barrier problem. Rewriting an optimised statistical model from R into a Java Virtual Machine? That’s a lot of costly rework. What if Java doesn’t have a neat way to handle all of the clever statistical tricks your data scientist has incorporated into the model? Then your developers have really got their work cut out for them. We could try a halfway house and get the data scientists to convert the model into PMML – Predictive Model Markup Language – and throw THAT over the wall. Fine for some situations, but PMML doesn’t cover all the rich capabilities of those R and Python libraries that your data scientists are using. It’s a partial solution, at best. And in many cases according to TDWI- model deployments can take around 6 to 9 months. What’s even scarier is that we worked with one customer that said that the typical cost to deploy one model was more than $250K In fact, because of situations like these – REXER ANALYTICS did a survey where only 13% of data scientists said their models actually get deployed into Production. That language barrier – that re-engineering effort – prevents the quality work that your data science teams are producing from making an impact to your business.
  12. We decided to think about what's a better way for companies instead of rewriting code that was implemented in R by a data science team or implemented as a workflow in Designer and then rewriting it painstaking line by line in Java or .NET or whatever the production front-end user facing app is implemented in, there should be a way to cost less and take the models that you've built and integrate them into an iPhone or a Salesforce plug-in or a website. So that's precisely what we have built. From a user's perspective the way it works is you'll have a new tool, a deployed tool which can take as input anything, any predictive tool which currently the vast majority are R based, but we are working on Python aggressively as well, and deploy any arbitrary business logic that you've come up with in Designer across a number of servers in a cluster. These are then exposed as standards compliant highly scalable web services, APIs that any developer in the world will immediately understand how they work and how to use them. So instead of saying to your development team, “Look, I built this business logic that it can estimate the likelihood that somebody is going to abandon their shopping cart and I'd really like to display a pop-up that says checkout now if the probability of shopping cart abandonment rate exceeds 90% or something,” instead of having that engineer receive that spec request from you and then have to go and open the guts of their website up and write the code that corresponds to the workflow that you've built already in Designer, instead you just give them one line of code and they will be able to hit your API and invoke the business logic that you have constructed in an Alteryx workflow. This gives you back the ability to make changes dynamically as well. For example, let's say you come up with a new feature, a new user behaviour that is highly indicative that someone is likely or unlikely to abandon their shopping cart. You wouldn't even have to talk to your developer. You just make the change to the model and click rerun in Designer and the model will update itself. So – let’s take a look at how Alteryx Promote fits into this vision. Your code-free or code-friendly models developed in the language of choice for your analysts and data science teams. Deployed through a simple process into the Promote cluster and instantly available through a clean API endpoint. This API now becomes the easiest possible way to get your model in the hands of your end-users, your marketing teams, your customers and your applications. No rewriting, no performance bottlenecks – simply your analytic model available - in real-time – wherever you need it.
  13. Using the Alteryx platform, we’ve got years of experience helping customers to deploy analytical and modelling workflows to production environments without the use of code. When analysts use our predictive suite, it’s easy to drop a Score tool into a workflow to receive the predicted insight from a deployed model – again, no code required. We’re also introducing enhancements to the platform that are going to make it much much easier for data scientists working in R or Python to join this party. With a bare minimum of code – your data teams will be able to deploy their native language straight into the Alteryx platform – ready for use! Where we had blockers around managing updates to production models in legacy software, we find ourselves working with a platform that makes it positively EASY to deploy models many times every hour. Build the deployment process into your existing Alteryx scheduling process or DevOps scripts to start providing continuous integration for your data science teams. When it’s this easy to deploy – this completely transforms your ability to test new models before they enter production. This low cost of experimentation allows you to innovate with data science in ways that just weren’t possible before. And how about managing your models from a simple web portal? Our platform can be configured so that model predictions are available for testing and review at any time. We can set up models on a champion/challenger basis so that new models can be gracefully promoted, while older ones are retired. It’s all about making it easier for you to get more value out of the analytic effort you’re putting in – we’ve regularly seen customers rapidly shift from 1 or 2 production models to literally hundreds: all under version control, all under a review-based deployment workflow for governance and integrated and embedded throughout the customer’s business processes.
  14. That's the concept, and the benefits are pretty staggering. It typically cuts down the amount of development time for implementing a real-time predictive model. It could be 10x or more in terms of cost and time savings.
  15. This is a company in Boulder, Colorado. They help energy providers so like utility companies, power plants target homes for clean tech products. An example would be they would look at the energy usage in a certain geography and then they would call down on the right leads at the right time with the right product and the right price. Then they also provide heads-up analytics in the form of all kinds of web apps, mobile apps, both for the utility companies and for the consumers that those utility companies work with. The numbers and the time savings are there.   Data scientists at energy software company Tendril opted for a hybrid approach that combines both collaborative and content-based filtering. Tendril provides analytics and consumer solutions to energy suppliers, including which energy products consumers would most likely consider. “We use Support Vector Regression models to predict household energy consumption to provide our clients with indepth, personalized information about their customers,” explains Mark Gately,
  16. Another customer is a European lender called Ferratum Bank. They operate in like 30 countries and the decision-making tasks that go into making a loan through a UI that looks like this direct to consumer are numerous. I think they have on average per country a dozen predictive models that have to get invoked, and all of those predictive models are designed in R or within Alteryx workflows using the R tools. Each day we're dealing with millions and millions of euros worth of loans being assessed and delivered all in real-time. The first general purpose credit scoring algorithm, now known as the FICO score, was introduced in 1989. The FICO score is still one of the most widely used models in the United States today, though peer-to-peer and direct lending organizations have focused on developing new techniques over the past few years. These new machine learning models and algorithms capture innovative factors and relationships that traditional loan scorecards couldn’t, like how applicants manage monthly cash flow or whether friends or community members would endorse the applicant. One such company is Ferratum Bank, a pioneer in financial technology and mobile consumer lending since 2005. “We developed complex statistical and machine learning models to enable smarter lending decisions,” explains Scott Donnelly, Director of Business Lending at Ferratum Bank. “By getting creative with our approach and adopting innovative technologies, we’ve been able to reinvent how both consumers and businesses obtain loans. This has allowed us to reach prospective customers that in the past may have been overlooked by traditional banking institutions.”
  17. Turo offers car owners and travelers with a service toshare vehicles by renting them. Use Case They built a model that could suggest a reasonable price based on all the data we had at our disposal. The algorithm evaluates a few straightforward variables like car make, model, and year, as well as some less obvious factors like demand and competition, both intra- and extra- Turo.” Our data science teams works predominantly in Python but our engineering team develops in Java and Javascript, so there was no clear path to production. I actually came across Yhat’s platform at a data science meetup in San Francisco. I thought the idea was really interesting. What stood out to me was the vision of separating the development tracks, uncoupling data science modeling from the user facing product.” Turo uses ScienceOps to deploy the data science team’s dynamic pricing model and recommender engine into their web and mobile app. Results Yhat removed Turo’s data science team’s dependence on other engineering teams so that they can realize the value of their work almost immediately.
  18. ounded in 2008, VIA SMS Group developed an entirely new approach to issuing quick loans using predictive analytics and machine learning. VIASMS Group uses advanced algorithms to assess whether an applicant is fraudulent and, if not, what his or her credit profile looks like in order to determine whether or not to underwrite a loan. VIA SMS Group writes the decision algorithms in the R programming language. Before using ScienceOps, VIA SMS had to rewrite these algorithms from R to the server-side language of PHP in order to deploy models into their web and mobile apps. “We were basically limited to linear or logistic regression models and decision trees since anything we wrote had to be implemented in PHP.” “It also took a lot of time to make even minor modifications like adding new predictors because we were handing off models to the dev team. Updates could easily take a few days per model since they were dependent on coordination between teams and manual input. Rewrites were prone to typos and syntactical errors, since the languages handle data types differently.”
Publicidad