I contributed the featured article in the June 2018 newsletter: Structure and Complexity- Algorithms, Data, and User Experience. In it, I untangle the link between data and algorithms, and how that might limit what design options we have.
ML Times: Mainframe Machine Learning Initiative- June newsletter (2018)
1. Gradient Boosting Algorithm:
eXtreme Gradient Boosting (XGBoost) model is an
implementation of the gradient boosting
framework. Gradient Boosting algorithm is a
machine learning technique used for building
predictive tree-based models. Boosting is an
ensemble technique in which new models are
added to correct the errors made by existing
models. Models are added sequentially until no
further improvements can be made.
Gradient boosting is an approach where new
models are created that predict the residuals or
errors of prior models and then added together to
make the final prediction.
Featured CA Product using Gradient Boosting
Algorithm: PERSONA MAPPING
Algorithm of the
Month
WALK THROUGH A SIMPLE
USE CASE - ML WORKSHOP
JUNE GUEST SPEAKER
SESSION - TBD
UPCOMING EVENTS
M L T I M E S
MAINFRAME MACHINE LEARNING INITIATIVE
J U N E 4 I S S U E 3
IN THIS ISSUE
FEATURED PRODUCT - 2
MUSINGS - 2
FEATURED ARTICLE - 3
Computer-Aided Detection (CAD) software can spot 52% of
breast cancer cells, a year before patients are diagnosed.
Sai Gujja:
Saiswetha.gujja@ca.com
Pooja Uppalapati:
Pooja.uppalapati@ca.com
QUESTIONS? CONTRIBUTIONS?
2. Persona Mapping processes the raw text into personas which are used by sales team in Contact Intelligence
product to effectively target the customers with profiles(personas) such as Architects, VPs, Product management
etc. For the predictions of personas using contact title, team has initially build a manually mapped data set for
around 11k contacts which was split into train (7.7K) and test (3.3K) sets. This text information was later aligned
into structured format from unstructured by using bag of words model to build a Document Term Matrix. This
matrix was used for applying the Naïve Bayes algorithm for Text classification. K- Fold cross validations were
used to check the model stability which gave a train accuracy of around 65%. To further improve this, we have
come up with ensemble technique known as Gradient boosting where we combine all the predictors by giving
some weights to each predictor to reduce loss function to minimum. This algorithm is now induced into the Azure
big data pipeline where 250K contacts are mapped to personas without any human intervention.
I attended the AI Experience conference in Chicago and got a glimpse of how organizations are utilizing
Automated Machine Learning tools for accurate predictions. In this conference, there was also a lot of emphasis
on having all key personas being involved through-out the product development process - from inception to
customer satisfaction. I'll elaborate a little bit about these below.
Automated Machine Learning Tools - What are they? What purpose do they serve?
Building the right model and providing analysis for the business for a certain use case, requires a lot of prepping of
the data, running multiple models to find the right fit. It sometimes might take Data Scientists days, if not months,
depending on the use case and amount of data to analyze.
Automated Machine Learning tools are the next gen tools that are disrupting the AI market, and reducing the time
spent by Data Scientists running algorithms and allowing for more time on things that matters most like analyzing
the outcomes.
DataRobot is one such company leading the pack in automated machine learning tools. We got to hear from big
companies like United Airlines and TD Ameritrade on how they were using DataRobot automated Machine
Learning tool and how it is helping them go-to-market faster.
Key personas should be involved through-out the process. What does it mean?
To build the right product, and achieve the expected outcome, involving all the key personas is important. It is key
to create personas, assign key characteristics to each persona and make sure every persona is represented. You
can think of personas as roles, here is an example -
Data Scientist, Business Analyst, Regulatory Officer, Technology expert and a Business Expert can form a good
team representing all the personas.
The product you are developing might need a different set of personas and that is absolutely ok. Time and effort
need to be spent to identify these personas and have them represented through-out the process. That is the key.
All in all, the conference gave me good exposure on how the AI market is evolving and the direction this industry is
heading towards.
FEATURED PRODUCT - PERSONA
MAPPING
For further questions, please reach out to Jilan, Asheesh or Karthik from GIS Data science team.
MUSINGS
Journal of a Machine Learning Enthusiast
Contributor: Pooja Uppalapati
Contributor: Jilan Dudekula
3. FEATURED ARTICLE
M A C H I N E L E A R N I N G , A R T I F I C I A L I N T E L L I G E N C E A N D B E Y O N D
Structure and Complexity - Algorithms, Data and User Experience
Contributor: Leslie A. McFarlin
As creators of products and users ourselves, we all grasp that user experience impacts a product’s success. Products working
quickly and efficiently are lauded by users and industry publications, while those requiring complicated interactions and
performing slowly receive backlash. Most of what impacts the user experience seems either plainly visible, like screen designs,
or easy to quantify, like steps in a wizard. However, sometimes the tool driving the technology affects the user experience.
Machine learning algorithms are one such tool.
Focusing only on algorithms, though, limits the discussion. Algorithms refine data for use elsewhere in a product or for users
performing tasks. Therefore, it helps to clarify the link between data, algorithms, and user experience when creating machine
learning products. This link demonstrates that the user experience of a machine learning product is impacted at certain points,
and in specific ways, fairly consistently. This article introduces how data and algorithms impact the user experience, and where
those impacts can occur.
Where Algorithms, Data, and User Experience Collide
Very generally, an algorithm is a collection of steps outlined to address a genre of problem. Therefore, designing user
experiences for machine learning products requires understanding of three points:
● What problem machine learning intends to resolve
● What type of algorithm is being applied to the problem
● What should the final solution look like for users
The reason for the first two points is obvious: Fitting algorithms to problems enables product and user efficiency, something
that can only result from understanding the problem and the tools. Understanding what the final solution should look like
improves usability. Depending on the product, improved usability might mean anything from easy-to-consume and actionable
output to no user action needed at all.
Circling back to the first two points of understanding what problem machine learning intends to resolve and which type of
algorithm to apply brings the discussion to data. What does the data look like? How much data might there be? How much work
must an algorithm do to the data to generate usable output for users? These points can make a difference when deciding
among interaction patterns for inputting data into the algorithm, presenting its output, and any additional features needed to
help manage user expectations around a product’s performance.
Linking data, algorithms, and user experience as this article does merits an exploration of a single common variable:
Complexity. Data offers complexity, and algorithms might be complex in response. A key challenge in the user experience of
machine learning products, therefore, is mitigating complexity.
The Ripple Effect of Complexity
Complexity varies across domains and problems, and it can manifest in the quantity and quality of data related to a problem.
Complicated problems might produce large volumes of data, unstructured data, or multivariate data. Data quantity and quality
might impact how the machine learning algorithm receives input, in turn limiting the applicable interaction patterns. The type
of algorithm selected to address the problem, the steps comprising it, and the actions defined at each step within it partly
result from the data’s features. Large amounts of data could mean a long wait for an algorithm to satisfactorily analyze;
similarly, large algorithms and those with intricate equations in their steps could mean a long wait for the algorithms to
complete their analysis. Additional features should be designed into the UI to manage where product performance might clash
with user expectations due to data features.
Post-analysis, the impact of complexity continues. Most importantly, algorithm output quality will vary according to the input
quality. Because neither user experience specialists nor engineers can control the input quality, they must create safeguards
for when data quality affects information usability. Some applications might require supplemental information to explain the
output, and some might provide recommendations for maximizing output value. Limitations at this point arise around
applicable interactions for retrieving the machine learning output and presentation of content, such as the amount, complexity,
and format (for example, whether the output is textual or graphical).
Thoughts on Managing the User Experience
For machine learning products, laying out the information above lets us identify where and how considerations around data and
algorithms impact design. For many machine learning products, such impacts occur primarily around inputting data into the
algorithm and displaying the algorithm’s output. In some circumstances, a product might need additional features designed to
help manage user expectations around performance. The interconnectedness of data and algorithms, and the similar effects
they have on the user experience of machine learning products, highlights the need for a partnership among UX resources,
engineering, and data scientists. Such a partnership would encourage knowledge sharing to support an understanding of the
problem to solve, the algorithm selected to solve it, and what the final solution must look like for the user--leading to a better
experience for our users.