Slide deck from the Dreamforce 20134 talk "Building Machine Learning Systems with Apex". Includes links to github code repository and contact details for speakers.
1. Building Machine Learning Systems in
Apex
Jen Wyher
Technical Architect
@jenwyher
Paul Battisson
Technical Architect
@pbattisson
2. Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize
or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the
forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any
projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding
strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or
technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for
our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate
of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with
completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability
to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our
limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential
factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year
and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are
available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and
may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are
currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
5. Mavens Consulting
• Preferred Life Sciences implementation
partner for salesforce.com and Veeva
• 60+ consultants located across North America
and Europe
• 12 Mavens in attendance at #Dreamforce14,
speaking in 7 different technical sessions
11. Why Apex?
• Governor limits make it hard to do long
running or big jobs with apex
• Showing the power of the platform
12. K-Means Clustering
• Account targeting
• Medical diagnosis aid
• Data segmentation
“given a group of m different
data points derive k clusters
of related items”
13. The Algorithm
• Initialize K centroids
• Assign each training example to it’s
“nearest” centroid
• Reset the centroid as the mean of all
assigned examples
• Repeat until the centroid is fixed
14. The Algorithm
• Initialize K centroids
• Assign each training example to it’s
“nearest” centroid
• Reset the centroid as the mean of all
assigned examples
• Repeat until the centroid is fixed
18. The Need For Speed
• Chained Batches
– Batches creating batches
• Speedier loops
– Remove around 90% of CPUTime
– See http://goo.gl/mR5GZe
• JSON serialize/deserialize and attachments
– Quick and effective way of storing data
– Attachments have much larger limit (around 10x the amount of data)
• Running totals (stateful batch)
– Saves repeated loops
• Javascript Remoting for charting
– Loading so many attachments destroys heap size
– Use remoting to load attachments for display asynchronously
19. Future Ideas
• Recommendation Engines
– Content
– Products/services
• Neural Networks
– Lots of number processing
– Chaining will be key
• Real time sites recommendations
– Think Amazon recommendations
JOE
and if you’re not familiar with mavens
we are the preferred healthcare and life sciences partner for salesforce.com and veeva
we have about 60 consultants located across North America and Europe
and we have 12 mavens in attendance at #dreamforce14, speaking in 7 different technical sessions
and we’re feeling a little extra pride right now, earlier today, we won the salesforce partner innovation award for marketing, so I figured id toot our horn a bit
Paul - EOS = 2 mins
Paul - EOS =3
Paul - EOS = 4
Paul - EOS = 6
Paul - EOS - 8
Jen - EOS = 12
Simple but still lots of iterations and calculations
K-Mean Clustering Algorithm.
How does the algorithm work?
Feed it a large amount of dataset (eg: your sales data).
The end result of the calculations, the data points are a pre-defined # of clustering (k # of clustering) groupings of the provided data set. Identifying data points with the most similarities.
Account Target
Customized marketing plan.
Cardiology vs a neurologist
Medical Diagnosis Aid
Could have a tumor based on profile, probability of having certain diseases.
Data Segmentation
Cluster into groups to analyze
Jen - EOS = 14
Jen - EOS = 14
Diagram shows you 2-dimensional, but imagine that you can do this analysis for 5, 10, even 100 attributes to the observation.
Jen - EOS = 18
Saving too much data
Our calculations were running long
.. And actually our goal at this time was to be able to analyze on 11 datapoints.
Chained Batches, Winter ‘13.
Start a Batch Apex job from within another Batch Apex job.
Purist, and use matrix operations.