Slides of a lecture delivered at the First Process Mining Summer School in Aachen, Germany, July 2022.
This lecture introduces techniques in the area of "task mining" with an emphasis on Robotic Process Mining. Robotic Process Mining (RPM) is a family of techniques to discover repetitive routines that can be automated using Robotic Process Automation (RPA) technology, by analyzing interactions between
one or more workers and one or more software applications, during the performance of one or more tasks in a business process. In general, RPM techniques take as input logs of User Interactions (UI logs). These UI logs are recorded while workers interact with one or more applications, typically desktop applications. Based on these logs, RPM techniques produce specifications of one or more routines that can be automated using RPA or related tools.
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Robotic Process Mining
1. Applications of process mining to
robotic process automation: Robotic
Process Mining
Marlon Dumas
University of Tartu and Apromore
Process Mining Summer School, Aachen, 4-8 July 2022
Research Funded by the European Research Council (PIX project) and the Australian Research Council
With Volodymyr Leno, Marcello La Rosa, Artem Polyvyanyy, and Fabrizio Maggi
2. ETL
process model
comparative variant
analysis reports
conformance
reports
process performance
measurements
data-driven simulations
process predictions
Enterprise System
Scope of Process Mining
4. Processes, Tasks, User Interactions
Task1:
Check Application
Loan origination process instance
Task2:
Check Background
Task3:
AssessApplication
Task4:
Underwrite
Task5:
ApproveOffer
Adobe
Reader
Sales-
Force
Private
User
action
User
action
Outlook Outlook Private
Private
User
action
SAP
Worker’sinteractions with IT systems
Task3:
AssessApplication
Adobe
Reader
Sales-
Force
Outlook Outlook
Lending
Backend
Loan ID Loan ID Loan ID Loan ID Loan ID
Woker ID Worker ID Woker ID Woker ID …
6. No Case ID(s) – End of the world?
• TomakeUI logsuseful, weneed to:
• Segment them–groupevents intotaskinstances
• Ideally also:link taskinstancestoprocessinstances(caseIDs)
• Segmentationapproaches:
• Delimiter-based segmentation:If weknowthatevery taskinstance(ofa tasktype)startswhen auser
clicksbutton“CreatePO”andfinisheswitha click ona “Submit”button,wecansegment ateach
occurrenceoftheseevents.
• Resource-time windowing:If anevent log tells us thatWorker123 performedtaskinstancet0023
between 2022-07-0810:15-10:20,thenall events ofthisworkerduringthis timewindowarelinked to
t0023
• In resource-timewindowing,thelink UI event totaskinstancegives us alsoacaseID(e.g. ifanevent is linked
tot0023,then it is linkedtoitscaseID)
• Otherwise,we cansometimesfinda caseID bylooking atthepayloadoftheUI events (e.g. the LoanID
appearsasa textfield in someUIevents)
7. Analyzing Task UI Logs withProcess Mining
• Task Mining probes are
deployed on user workstations to
gather user interaction (UI) logs
on performed tasks using
screenshot + image processing
OR native GUI libraries.
• Raw UI data pushed to a Data
Processing Server.
• The Task Mining configuration module
allows analysts to provide input for data
processing, e.g.
• defining task boundaries for
segmentation
• specifying the granularity (screen-
level of task-level)
• tagging sensitive information
• …
• The Task Mining data processing
engine pre-processes the raw UI data
using the configuration.
Processed UI logs are fed into the
process mining tool.
From here, this data can be used
to discover the underlying routines
inside each task, analyze
performance and compliance at the
sub-task level, analyze worker
performance, etc.
Raw UI
data
Processed
UI data Process MiningTool
Data Processing Server
UserWorkstations
10. Task miningvs Process mining
Process Mining Task Mining
Scope Full end-to-end processes Individual tasks and how they are done
Objective Optimizing against process performance
indicators
Optimizing the performance of individual
tasks
Source of data
Event logs generated by enterprise
systems, e.g. SAP, Salesforce,
ServiceNow…
User interaction logs obtainer by recording
worker activity via desktop or Web
applications, e.g. MS Outlook or Adobe
Reader
Correlation
Single case-id or inter-related case IDs
(e.g. Loan Application ID, PO ID, Invoice
ID)
No direct link to a “case”; instead reference
to recorded worker
11. Task Mining:Use Cases
(accordingtoGartner)
Task Automation
(Robotic Process
Mining)
• Unknownrootcausesofprocessworkaroundsand
deviations
• Task workflownon-compliance
• Notaskvisibility:mosteffectivepathsmay be
overlooked
• Tasksamenabletoautomation(e.g. viaRPA) arehardto
identify
• Slowautomationdevelopment
• InaccurateROIassessment(notgroundedonrealdata)
• Lowworkforceproductivity
• IncorrectlysetKPIs,no benchmark
• Highvarianceintask executions
• Bestpracticesnotknown
• Poortraining&knowledge base
• Bad employeeexperience
Workforce
Optimization
Task
Improvement
Common
pain
points
Marc Kerremans and Tushar Srivastava. Discover the differences and use cases of process mining
versus task mining. Research Note G00723821, Gartner, April 2020.
12. 3
Robotic Process Automation – emerging technology that allows organizations to automate repetitive
clerical work by executing scripts (RPA bots) that encode sequences of fine-grained interactions with Web
and desktop applications
From: https://www.reliableplant.com/Read/31352/human-robot-collaboration
http://www.cirriusimpact.com/robotic-process-automation-rpa/
Attended automation Unattended automation
Robotic Process Automaton (RPA)
14. 3
Error rates reduction
Cycle time reduction
Flow standardization (consistency)
Cost efficiency
Why Robotic Process Automation?
1
From Adobe Stock
15. 3
Classical RPA Analysisand Development
Interaction
Information
System
Event Log
Process Mining
Discovery
Conformance
Enhancement
Process Model
Information
systems
Users
(employees)
RPA script (bot)
Routine
Analysis
Development
Interviews Workshops Observation
− Time-consuming
− Error-prone
− Difficult to maintain
16. RPA with RoboticProcess Mining
Interaction
Information
System
Event Log
Process Mining
Discovery
Conformance
Enhancement
Process Model
Information
systems
Users
(employees)
RPA script (bot)
Routine
Analysis
UI log
Recording
Automated Discovery
Compilation
Synthesis
Routine specification
Development
Shortened time-frames
Data-driven
Objective
18. 1. Given a user interaction log, how to identify routines that can be
potentially automated via an RPA tool? How to reliably assess the
“automatability” of a routine?
2. Given a set of (automatable) routines, how to prioritize these
routines to maximize the benefits/ROI of RPA investments?
3. Given a routine, how to discover an executable specification that can
be executed by an RPA bot?
4. Given a collection of operative RPA bots, how to monitor their
performance and assess the realized benefits of an RPA initiative?
How to adjust and evolve RPA bots for maximum benefit realization?
Robotic Process Mining:Research Questions
20. UI Log
Preprocessing and
normalization
Control-flow graph
construction
Back Edges
detection
Segments
identification
Candidate
selection
Candidates
discovery
Candidate
routines
Segmentation
Routines
identification
Identificationof CandidateRoutines
21. Preprocessing Normalization
UI parameters
Data
parameters
• Copied content
• Cell value
• Field value
Context
parameters
• Field name
• Button label
• Spreadsheet
Unique value for each trace Same value for all traces
Preprocessing and
normalization
Control-flow graph
construction
Back Edges
detection
Segments
identification
Routines
identification
26. Target nodes
Source nodes
Segment 1
Segment 2
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
27. < , Uy, U2, U3, Ux, , Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern1: {U1, U2, U3, U4}
< , Uy, U2, U3, Ux, , Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern2: {U1, Ux, U4}
< , Uy, U2, U3, Ux, , Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern3: {U1, Uy, U2, U3, U4}
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
One UI can only belong to one routine!
U1 U4 U1 U4 U1 U4
A routine is a frequent (gapped) sequence of user interactions.
Good candidates for automation are time-consuming routines with a large number of executions and
error-prone manual labor.
28. <Uy, Ux, Uz>
<Uy, Ux, Uz>
<Ux, Uz>
<U1, Uy, U2, U3, Ux, U4, Uz>
<U1, Uy, U2, Ux, U3, Uz, U4>
<U1, Ux, Uz, U2, U3, U4>
Pattern1: {U1, U2, U3, U4}
Solution:
Discover frequent patterns as usual
Rank them accordingly to a certain metric (e.g., length, frequency, coverage)
Select the best pattern and remove its occurrences from the segments
Repeat the procedure until no more frequent patterns left
Preprocessing and
normalization
Control-flow graph
construction
Segments
identification
Routines
identification
Back Edges
detection
Pattern2: {Ux, Uz}
<Uy>
<Uy>
<>
30. * The second log describes more complex and unstructured behavior
Scholarship allocation process in the University of Melbourne
2 workers
Log # Discovered
segments
# Identified
routines
# Routine
variants
Execution
time (sec.)
Scholarship1 35 2 5 41.686
Scholarship2* 3 0 0 426.319
Evaluationresults.Unsupervised recording logs
31. A routine is automatable if every UI in the routine can
be deterministically executed based on input data, or
data produced by previous UIs.
Routine specification is a representation of an
automatable routine that can be executed by an RPA
tool.
Executableroutines discovery & synthesis
Routine Automatability Index (RAI) is the degree of
the automatability of a routine. Computed as a ratio
of the automatable UIs within the routine.
34. Key ideas:
Synthesize one transformation per output field and use UI log to discover input-to-output data-flows
Discover patterns in the input values and discover one transformation per input pattern
Discovering data transformationsby example
35. For each routine instance:
Collect last edits of all target application elements
Identify corresponding sources and their values
Create input-output transformation examples (Input, Output, Source, Target)
Examplesextraction:Overview
Last edit Target Output
Corresponding read Source Input
t = (
Input = “+61 043 512 4834”,
Output = “043-512-4834”,
Source = “D3”,
Target = “Phone”
)
36. 3
+61 (039) 689 9324
+61 (039) 689-9324
+61 039 689-9324
61.039.689.9324
+61 039 689 9324
039-689-9324
039.689.9324
039-689-9324
No single data transformation program
Identify patterns by applying tokenization
Group transformation examples with the
same pattern together
Discover transformation program for each group
Solution
Examplesextraction.Heterogeneous data
37. 3
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<d>+
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<a>+
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
Special characters
(remain unchanged)
99 Beacon Rd, Port Melbourne, VIC 3207, Australia
<d>+ <a>+ <a>+, <a>+ <a>+, <a>+ <d>+, <a>+
Example
Examplesextraction.Tokenization
38. 3
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
039 689 9324
035 341 2938
079 149 3015
split_first(0, ‘ ‘)
split(0, ‘ ‘)
drop(0, ‘ ‘)
drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘)
Input Output
Program synthesis as a search problem;
Heuristic search based on A* algorithm;
Cost function is based on the number of data manipulations;
Deals with string and table manipulations.
Implemented in the Foofah toolset
Transformationdiscovery. Syntactictransformations
39. 3
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
split_first(0, ‘ ‘)
split(0, ‘ ‘)
drop(0, ‘ ‘)
drop(0, ‘ ‘) join(0, ‘ ‘) join(0, ‘ ‘)
Input Output
+61 039 689 9324
+61 035 341 2938
+61 079 149 3015
039 689 9324
035 341 2938
079 149 3015
039 689 9324
035 341 2938
079 149 3015
Transformationdiscovery. Syntactictransformations
Program synthesis as a search problem;
Heuristic search based on A* algorithm;
Cost function is based on the number of data manipulations;
Deals with string and table manipulations.
Implemented in the Foofah toolset
40. 3
Searching for functional dependencies;
Transformations in the form of substitution mapping schemes
Transformationdiscovery. Semanticaltransformations
45. Candidate routines discovery
Discovering routines in the presence of multi-tasking and/or frequent worker
distractions (the routine occurrences may overlap)
Discovering routines that are often performed in a piece-wise manner?
Executable routines discovery
Discovering automatable routines where the data transfer between fields is NOT
explicitly recorded in the UI log ( “copy typing”)?
How to discover automatable routines with complex conditional behavior?
How to discover semi-automatable routine specifications (for unattended RPA)?
Strategic alignment, governance, people & culture:
Monitoring of RPA performance & acceptance
Compliance verification & monitoring.
Open Challenges
46. UI Log Recording
V. Leno, A. Polyvyanyy, M. La Rosa, M. Dumas, & F. M. Maggi. Action Logger: Enabling Process Mining for Robotic
Process Automation. In BPM Demonstration Track, 2019, pp. 124-128.
UI Log Segmentation
J. Shen, L. Li, and T. G. Dietterich, Real-time detection of task switches of desktop users. IJCAI 2007, pp. 2868–2873.
Bosco, A. Augusto, M. Dumas, M. La Rosa, and G. Fortino, “Discovering automatable routines from user interaction
logs,” in BPM Forum’2019. Springer.
G. Tello, G. Gianini, R. Mizouni, and E. Damiani, “Machine learning-based framework for log-lifting in business process
mining applications,” in BPM’2019, Springer.
Simone Agostinelli, Francesco Leotta, Andrea Marrella: Interactive Segmentation of User Interface Logs. ICSOC 2021:
65-80
References
36
47. Candidate Routine Identification
A. Jimenez-Ramirez, H. A. Reijers, I. Barba, and C. Del Valle, “A method to improve the early stages of the robotic
process automation lifecycle,” in CAiSE’2019, Springer, pp. 446–461
D. Choi, H. R’bigui, and C. Cho, “Candidate digital tasks selection methodology for automation with robotic process
automation,” Sustainability 13(16):8980, 2021.
V. Leno, A. Augusto, M. Dumas, M. La Rosa, F. M. Maggi, & A. Polyvyanyy. Identifying candidate routines for Robotic
Process Automation from unsegmented UI logs. In ICPM’2020, pp. 153-160, IEEE.
J. Gao, S. J. van Zelst, X. Lu, W.M.P. van der Aalst: Automated Robotic Process Automation: A Self-Learning Approach.
OTM Conferences 2019, Springer, pp. 95-112
Synthesis of Executable Routine Specifications
V. Leno, M. Dumas, M. La Rosa, F. M. Maggi, & A. Polyvyanyy (2020). Automated Discovery of Data Transformations
for Robotic Process Automation. AAAI Workshop on Intelligent Process Automation (IPA), 2020.
S. Agostinelli, M. Lupia, A. Marrella, M. Mecella: Automated Generation of Executable RPA Scripts from User Interface
Logs. BPM Blockchain and RPA Forum 2020, Springer, pp. 116-131.
R. Dong, Z. Huang, I. Iong Lam, Y. Chen, X. Wang . WebRobot: Web Robotic Process Automation using Interactive
Programming-by-Demonstration. PLDI’2022.
References
36
48. End-to-End Robotic Process Mining
V. Leno, A. Polyvyanyy, M. Dumas, M. La Rosa, & F. M. Maggi (2020). Robotic Process Mining: Vision and Challenges.
Business and Information Systems Engineering, pp. 1-14, Springer.
V. Leno, S. Deviatykh, A. Polyvyanyy, M. La Rosa, M. Dumas, & F. M. Maggi. Robidium: Automated Synthesis of Robotic
Process Automation Scripts from UI logs. In BPM Demonstration Track, 2020, pp. 102-106.
Simone Agostinelli, Marco Lupia, Andrea Marrella, Massimo Mecella: SmartRPA: A Tool to Reactively Synthesize
Software Robots from User Interface Logs. CAiSE Forum 2021, Springer, pp. 137-145.
V. Leno, A. Augusto, M. Dumas, M. La Rosa, F. M. Maggi & A. Polyvyany. Discovering executable routine specifications
from user interaction logs. Information Systems 107: 101916, 2022.
References
36
Notas del editor
Dashboards and maps/BPMN models can be used to identify bottlenecks in the user routines (long waiting times, long actions, rework, distractions etc) as well as identify best user practices
Dashboards and maps/BPMN models can be used to identify bottlenecks in the user routines (long waiting times, long actions, rework, distractions etc) as well as identify best user practices
No “process” automation but “task” automation
Not “physical” robots but “software” robots
Correct segments discovered for all artificial logs
For most supervised recording logs LED is less than 0.1
Execution time does not exceed 4 seconds
We search for all inputs that “contributed” to the final value of a modified field
Optimization 1 cannot deal with heterogeneous data (values have different formats).
It also fails to discover transformation when the output values are ambiguous (e.g. two transformation examples have the same output value).