SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
How to
”Effectively” ”Test”
your Chatbot
Soumya Mukherjee
Director QA, DevOps & AIML
Apty.IO
How are we doing our QA today
• Testing is Blackbox for testers
• Mostly manual testing done in organization
• Conversational flow testing
• Small Talk
• Fallback checks
• Integrations
• Automation done on UI and API layer
• Testing is mostly done on same training data
• Models are trained by engineers and are not being
monitored by QA
• There are analytics tools available to monitor but it
needs technical expertise for the QA
• Result : More than 90% times bot breaks (no one
understands when it will break), most of them fallback
and get stuck - once bot is stuck it is stuck
Q ?
A
What are the issues in QA ?
• Bots are evolving and continuous story creation is a problem
• No tool manage story coverage
• Your training data may not correspond to new stories or vice versa (it’s a
mismatch) – most org keep training on the same data
• Most automation tools offers record and playback (My stories are
already written how to port is the question)
What are the issues in QA ?
• No (unified) centralized dashboard present where QA can check (everything is quite scattered)
• Intent Matching
• Entity Testing – Slot identification
• Entity Testing – Entity Validation
• Confidence score
• Confusion Matrix along with Precision/Recall/F1-Score
• No easy way to reset the failed bot !
• Bot versioning is a mess and A/B testing becomes difficult
• Multilingual bot QA is a challenge (have to make 2 separate bots)
• High confidence score is also a problem as your bot will only predict same thing (if the data is same
for multiple intents then it will predict the one with highest confidence score – may be incorrect)
How to make sure your bot never breaks ?
How to make your test effective ?
• Create scenarios for happy path, contextual questions, digressions, domain
specific questions, stateless conversations
• Map proper entities for common scenarios (example bus fee, tuition fee) –
flow should change with entities in the stories
• Automated tests should consume all stories and run them each time as part
of regression testing
• Story coverage visualization
• For Manual Testing use Bot emulation product (like RasaX, Botfront) to test
How to make your test effective ?
• Central dashboarding including :
• Confusion matrix, Precision, Recall and F1-Score
• Cumulative accuracy profile
• Cross validation results
• Perform Exhaustive testing (bot resiliency), Integration checks across
platforms, Webhooks
• Perform fault tolerance testing by performing performance testing (bot
response, session management) & security testing (api interaction,
typing speed check, punctuations, typo errors)
Other KPIs to track
• Activity Volume
• Bounce rate
• Retention rate
• Open sessions count
• Session times (conversation length)
• Goal completion rate
• User feedback (sentiments)
• Fallback rate (Confusion rate, reset rate & Human takeover rate)
Thanks
@QASoumya
Linkedin.com/in/mukherjeesoumya

Más contenido relacionado

La actualidad más candente

AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Pythonamyiris
 
Introduction to Aspect Oriented Programming
Introduction to Aspect Oriented ProgrammingIntroduction to Aspect Oriented Programming
Introduction to Aspect Oriented ProgrammingYan Cui
 
Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit Jordi Cabot
 
Code Review tool for personal effectiveness and waste analysis
Code Review tool for personal effectiveness and waste analysisCode Review tool for personal effectiveness and waste analysis
Code Review tool for personal effectiveness and waste analysisMikalai Alimenkou
 
Webinar: How to Use Integrated Version Control in Rasa X
Webinar: How to Use Integrated Version Control in Rasa XWebinar: How to Use Integrated Version Control in Rasa X
Webinar: How to Use Integrated Version Control in Rasa XRasa Technologies
 
DevOps & Technical Agility: From Theory to Practice
DevOps & Technical Agility: From Theory to PracticeDevOps & Technical Agility: From Theory to Practice
DevOps & Technical Agility: From Theory to PracticeLemi Orhan Ergin
 
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in ParisDeveloping Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in ParisOW2
 
When you get lost in api testing #ForumPHP
When you get lost in api testing #ForumPHPWhen you get lost in api testing #ForumPHP
When you get lost in api testing #ForumPHPPaula Čučuk
 
Best Practices for a Repeatable Shift-Left Commitment
Best Practices for a Repeatable Shift-Left CommitmentBest Practices for a Repeatable Shift-Left Commitment
Best Practices for a Repeatable Shift-Left CommitmentApplause
 
Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011Kareem Amin
 
Kaiser Permanente CSUN 2018
Kaiser Permanente CSUN 2018Kaiser Permanente CSUN 2018
Kaiser Permanente CSUN 2018Mark Stimson
 
The 7 minute accessibility assessment and app rating system
The 7 minute accessibility assessment and app rating systemThe 7 minute accessibility assessment and app rating system
The 7 minute accessibility assessment and app rating systemAidan Tierney
 
Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)Yan Cui
 
Writing Testable Code in SharePoint
Writing Testable Code in SharePointWriting Testable Code in SharePoint
Writing Testable Code in SharePointTim McCarthy
 
Research Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and DialogueResearch Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and DialogueRasa Technologies
 
Low-code vs Model-Driven Engineering
Low-code vs Model-Driven EngineeringLow-code vs Model-Driven Engineering
Low-code vs Model-Driven EngineeringJordi Cabot
 
Android application development part2
Android application development part2Android application development part2
Android application development part2Mayank Bhatt
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test FrameworkSmartBear
 

La actualidad más candente (20)

AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Python
 
Introduction to Aspect Oriented Programming
Introduction to Aspect Oriented ProgrammingIntroduction to Aspect Oriented Programming
Introduction to Aspect Oriented Programming
 
Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit
 
Aspect Oriented Programing - Introduction
Aspect Oriented Programing - IntroductionAspect Oriented Programing - Introduction
Aspect Oriented Programing - Introduction
 
Code Review tool for personal effectiveness and waste analysis
Code Review tool for personal effectiveness and waste analysisCode Review tool for personal effectiveness and waste analysis
Code Review tool for personal effectiveness and waste analysis
 
Webinar: How to Use Integrated Version Control in Rasa X
Webinar: How to Use Integrated Version Control in Rasa XWebinar: How to Use Integrated Version Control in Rasa X
Webinar: How to Use Integrated Version Control in Rasa X
 
DevOps & Technical Agility: From Theory to Practice
DevOps & Technical Agility: From Theory to PracticeDevOps & Technical Agility: From Theory to Practice
DevOps & Technical Agility: From Theory to Practice
 
Presentation delex
Presentation delexPresentation delex
Presentation delex
 
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in ParisDeveloping Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
Developing Intelligent Chatbots using RASA, OW2con'19, June 12-13, 2019 in Paris
 
When you get lost in api testing #ForumPHP
When you get lost in api testing #ForumPHPWhen you get lost in api testing #ForumPHP
When you get lost in api testing #ForumPHP
 
Best Practices for a Repeatable Shift-Left Commitment
Best Practices for a Repeatable Shift-Left CommitmentBest Practices for a Repeatable Shift-Left Commitment
Best Practices for a Repeatable Shift-Left Commitment
 
Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011Skillshare - From Noob to Tech CEO - nov 7th, 2011
Skillshare - From Noob to Tech CEO - nov 7th, 2011
 
Kaiser Permanente CSUN 2018
Kaiser Permanente CSUN 2018Kaiser Permanente CSUN 2018
Kaiser Permanente CSUN 2018
 
The 7 minute accessibility assessment and app rating system
The 7 minute accessibility assessment and app rating systemThe 7 minute accessibility assessment and app rating system
The 7 minute accessibility assessment and app rating system
 
Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)Introduction to Aspect Oriented Programming (DDD South West 4.0)
Introduction to Aspect Oriented Programming (DDD South West 4.0)
 
Writing Testable Code in SharePoint
Writing Testable Code in SharePointWriting Testable Code in SharePoint
Writing Testable Code in SharePoint
 
Research Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and DialogueResearch Updates from Rasa: Transformers in NLU and Dialogue
Research Updates from Rasa: Transformers in NLU and Dialogue
 
Low-code vs Model-Driven Engineering
Low-code vs Model-Driven EngineeringLow-code vs Model-Driven Engineering
Low-code vs Model-Driven Engineering
 
Android application development part2
Android application development part2Android application development part2
Android application development part2
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test Framework
 

Similar a How to Effectively Test Your Chatbot | Rasa Summit

Thomas Haver - Mobile Testing.pdf
Thomas Haver - Mobile Testing.pdfThomas Haver - Mobile Testing.pdf
Thomas Haver - Mobile Testing.pdfQA or the Highway
 
Creating testing tools to support development
Creating testing tools to support developmentCreating testing tools to support development
Creating testing tools to support developmentChema del Barco
 
Test automation lesson
Test automation lessonTest automation lesson
Test automation lessonSadaaki Emura
 
Test Automation Architecture That Works by Bhupesh Dahal
Test Automation Architecture That Works by Bhupesh DahalTest Automation Architecture That Works by Bhupesh Dahal
Test Automation Architecture That Works by Bhupesh DahalQA or the Highway
 
Karishma Kolli – Myth Busters on Test Automation
Karishma Kolli – Myth Busters on Test AutomationKarishma Kolli – Myth Busters on Test Automation
Karishma Kolli – Myth Busters on Test AutomationPractiTest
 
CV_Sachin_11Years_Automation_Performance
CV_Sachin_11Years_Automation_PerformanceCV_Sachin_11Years_Automation_Performance
CV_Sachin_11Years_Automation_PerformanceSachin Kodagali
 
Automated Testing but like for PowerShell (April 2012)
Automated Testing but like for PowerShell (April 2012)Automated Testing but like for PowerShell (April 2012)
Automated Testing but like for PowerShell (April 2012)Rob Reynolds
 
Test team dynamics, Антон Мужайло
Test team dynamics, Антон МужайлоTest team dynamics, Антон Мужайло
Test team dynamics, Антон МужайлоSigma Software
 
Testing Conversational AI
Testing Conversational AITesting Conversational AI
Testing Conversational AIShama Ugale
 
How to scale your Test Automation
How to scale your Test AutomationHow to scale your Test Automation
How to scale your Test AutomationKlaus Salchner
 
Why test automation projects are failing
Why test automation projects are failingWhy test automation projects are failing
Why test automation projects are failingIgor Khrol
 
Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)
Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)
Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)Dinis Cruz
 
SauceCon 2017: Making Your Mobile App Automatable
SauceCon 2017: Making Your Mobile App AutomatableSauceCon 2017: Making Your Mobile App Automatable
SauceCon 2017: Making Your Mobile App AutomatableSauce Labs
 
A Sampling of Tools
A Sampling of ToolsA Sampling of Tools
A Sampling of ToolsDawn Code
 
Unit Testing and role of Test doubles
Unit Testing and role of Test doublesUnit Testing and role of Test doubles
Unit Testing and role of Test doublesRitesh Mehrotra
 
Winning the battle against Automated testing
Winning the battle against Automated testingWinning the battle against Automated testing
Winning the battle against Automated testingElena Laskavaia
 
How to Go Codeless for Automated Mobile App Testing
How to Go Codeless for Automated Mobile App TestingHow to Go Codeless for Automated Mobile App Testing
How to Go Codeless for Automated Mobile App TestingApplause
 
Automated Acceptance Test Practices and Pitfalls
Automated Acceptance Test Practices and PitfallsAutomated Acceptance Test Practices and Pitfalls
Automated Acceptance Test Practices and PitfallsWyn B. Van Devanter
 

Similar a How to Effectively Test Your Chatbot | Rasa Summit (20)

Thomas Haver - Mobile Testing.pdf
Thomas Haver - Mobile Testing.pdfThomas Haver - Mobile Testing.pdf
Thomas Haver - Mobile Testing.pdf
 
QAorHighway2016
QAorHighway2016QAorHighway2016
QAorHighway2016
 
Creating testing tools to support development
Creating testing tools to support developmentCreating testing tools to support development
Creating testing tools to support development
 
Test automation lesson
Test automation lessonTest automation lesson
Test automation lesson
 
Test Automation Architecture That Works by Bhupesh Dahal
Test Automation Architecture That Works by Bhupesh DahalTest Automation Architecture That Works by Bhupesh Dahal
Test Automation Architecture That Works by Bhupesh Dahal
 
Karishma Kolli – Myth Busters on Test Automation
Karishma Kolli – Myth Busters on Test AutomationKarishma Kolli – Myth Busters on Test Automation
Karishma Kolli – Myth Busters on Test Automation
 
CV_Sachin_11Years_Automation_Performance
CV_Sachin_11Years_Automation_PerformanceCV_Sachin_11Years_Automation_Performance
CV_Sachin_11Years_Automation_Performance
 
Automated Testing but like for PowerShell (April 2012)
Automated Testing but like for PowerShell (April 2012)Automated Testing but like for PowerShell (April 2012)
Automated Testing but like for PowerShell (April 2012)
 
Test team dynamics, Антон Мужайло
Test team dynamics, Антон МужайлоTest team dynamics, Антон Мужайло
Test team dynamics, Антон Мужайло
 
Testing Conversational AI
Testing Conversational AITesting Conversational AI
Testing Conversational AI
 
How to scale your Test Automation
How to scale your Test AutomationHow to scale your Test Automation
How to scale your Test Automation
 
Agile testing
Agile testingAgile testing
Agile testing
 
Why test automation projects are failing
Why test automation projects are failingWhy test automation projects are failing
Why test automation projects are failing
 
Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)
Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)
Start with passing tests (tdd for bugs) v0.5 (22 sep 2016)
 
SauceCon 2017: Making Your Mobile App Automatable
SauceCon 2017: Making Your Mobile App AutomatableSauceCon 2017: Making Your Mobile App Automatable
SauceCon 2017: Making Your Mobile App Automatable
 
A Sampling of Tools
A Sampling of ToolsA Sampling of Tools
A Sampling of Tools
 
Unit Testing and role of Test doubles
Unit Testing and role of Test doublesUnit Testing and role of Test doubles
Unit Testing and role of Test doubles
 
Winning the battle against Automated testing
Winning the battle against Automated testingWinning the battle against Automated testing
Winning the battle against Automated testing
 
How to Go Codeless for Automated Mobile App Testing
How to Go Codeless for Automated Mobile App TestingHow to Go Codeless for Automated Mobile App Testing
How to Go Codeless for Automated Mobile App Testing
 
Automated Acceptance Test Practices and Pitfalls
Automated Acceptance Test Practices and PitfallsAutomated Acceptance Test Practices and Pitfalls
Automated Acceptance Test Practices and Pitfalls
 

Más de Rasa Technologies

Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...Rasa Technologies
 
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...Rasa Technologies
 
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...Rasa Technologies
 
The missing link: How AI can help create a safer society and better businesse...
The missing link: How AI can help create a safer society and better businesse...The missing link: How AI can help create a safer society and better businesse...
The missing link: How AI can help create a safer society and better businesse...Rasa Technologies
 
Boss - Bringing More Diversity to Tech | Rasa Summit
Boss - Bringing More Diversity to Tech | Rasa SummitBoss - Bringing More Diversity to Tech | Rasa Summit
Boss - Bringing More Diversity to Tech | Rasa SummitRasa Technologies
 
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa SummitHow Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa SummitRasa Technologies
 
Applying Conversational AI in the Enterprise
Applying Conversational AI in the EnterpriseApplying Conversational AI in the Enterprise
Applying Conversational AI in the EnterpriseRasa Technologies
 
Ai = your data | Rasa Summit 2021
Ai = your data | Rasa Summit 2021Ai = your data | Rasa Summit 2021
Ai = your data | Rasa Summit 2021Rasa Technologies
 
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021 STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021 Rasa Technologies
 
Continuous Improvement of Conversational AI in Production | Rasa Summit
Continuous Improvement of Conversational AI in Production | Rasa SummitContinuous Improvement of Conversational AI in Production | Rasa Summit
Continuous Improvement of Conversational AI in Production | Rasa SummitRasa Technologies
 
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...Rasa Technologies
 
The State of Conversation Design - Designing for the Conversational Future
The State of Conversation Design - Designing for the Conversational FutureThe State of Conversation Design - Designing for the Conversational Future
The State of Conversation Design - Designing for the Conversational FutureRasa Technologies
 
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021Rasa Technologies
 
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021Rasa Technologies
 
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...Rasa Technologies
 
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...Rasa Technologies
 
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...Rasa Technologies
 
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from RasaRasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from RasaRasa Technologies
 
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & IntroRasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & IntroRasa Technologies
 
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...Rasa Technologies
 

Más de Rasa Technologies (20)

Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
 
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
 
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
 
The missing link: How AI can help create a safer society and better businesse...
The missing link: How AI can help create a safer society and better businesse...The missing link: How AI can help create a safer society and better businesse...
The missing link: How AI can help create a safer society and better businesse...
 
Boss - Bringing More Diversity to Tech | Rasa Summit
Boss - Bringing More Diversity to Tech | Rasa SummitBoss - Bringing More Diversity to Tech | Rasa Summit
Boss - Bringing More Diversity to Tech | Rasa Summit
 
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa SummitHow Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
 
Applying Conversational AI in the Enterprise
Applying Conversational AI in the EnterpriseApplying Conversational AI in the Enterprise
Applying Conversational AI in the Enterprise
 
Ai = your data | Rasa Summit 2021
Ai = your data | Rasa Summit 2021Ai = your data | Rasa Summit 2021
Ai = your data | Rasa Summit 2021
 
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021 STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
 
Continuous Improvement of Conversational AI in Production | Rasa Summit
Continuous Improvement of Conversational AI in Production | Rasa SummitContinuous Improvement of Conversational AI in Production | Rasa Summit
Continuous Improvement of Conversational AI in Production | Rasa Summit
 
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
 
The State of Conversation Design - Designing for the Conversational Future
The State of Conversation Design - Designing for the Conversational FutureThe State of Conversation Design - Designing for the Conversational Future
The State of Conversation Design - Designing for the Conversational Future
 
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
 
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
Conversational Teams: Moving Fast at Scale | Rasa Summit 2021
 
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dial...
 
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: H...
 
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
Rasa Developer Summit - Praneeth Gubbala, NLP Engineer, Sam's Club at Walmart...
 
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from RasaRasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
Rasa Developer Summit - Tom Bocklisch, Rasa - Product Updates from Rasa
 
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & IntroRasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
Rasa Developer Summit - Alan Nichol, Rasa - Welcome & Intro
 
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...
Rasa Developer Summit - Juste Petraityte, Rasa - Rasa Community Updates & Out...
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

How to Effectively Test Your Chatbot | Rasa Summit

  • 1. How to ”Effectively” ”Test” your Chatbot Soumya Mukherjee Director QA, DevOps & AIML Apty.IO
  • 2. How are we doing our QA today • Testing is Blackbox for testers • Mostly manual testing done in organization • Conversational flow testing • Small Talk • Fallback checks • Integrations • Automation done on UI and API layer • Testing is mostly done on same training data • Models are trained by engineers and are not being monitored by QA • There are analytics tools available to monitor but it needs technical expertise for the QA • Result : More than 90% times bot breaks (no one understands when it will break), most of them fallback and get stuck - once bot is stuck it is stuck Q ? A
  • 3. What are the issues in QA ? • Bots are evolving and continuous story creation is a problem • No tool manage story coverage • Your training data may not correspond to new stories or vice versa (it’s a mismatch) – most org keep training on the same data • Most automation tools offers record and playback (My stories are already written how to port is the question)
  • 4. What are the issues in QA ? • No (unified) centralized dashboard present where QA can check (everything is quite scattered) • Intent Matching • Entity Testing – Slot identification • Entity Testing – Entity Validation • Confidence score • Confusion Matrix along with Precision/Recall/F1-Score • No easy way to reset the failed bot ! • Bot versioning is a mess and A/B testing becomes difficult • Multilingual bot QA is a challenge (have to make 2 separate bots) • High confidence score is also a problem as your bot will only predict same thing (if the data is same for multiple intents then it will predict the one with highest confidence score – may be incorrect) How to make sure your bot never breaks ?
  • 5. How to make your test effective ? • Create scenarios for happy path, contextual questions, digressions, domain specific questions, stateless conversations • Map proper entities for common scenarios (example bus fee, tuition fee) – flow should change with entities in the stories • Automated tests should consume all stories and run them each time as part of regression testing • Story coverage visualization • For Manual Testing use Bot emulation product (like RasaX, Botfront) to test
  • 6. How to make your test effective ? • Central dashboarding including : • Confusion matrix, Precision, Recall and F1-Score • Cumulative accuracy profile • Cross validation results • Perform Exhaustive testing (bot resiliency), Integration checks across platforms, Webhooks • Perform fault tolerance testing by performing performance testing (bot response, session management) & security testing (api interaction, typing speed check, punctuations, typo errors)
  • 7. Other KPIs to track • Activity Volume • Bounce rate • Retention rate • Open sessions count • Session times (conversation length) • Goal completion rate • User feedback (sentiments) • Fallback rate (Confusion rate, reset rate & Human takeover rate)