SlideShare una empresa de Scribd logo
1 de 31
cwh.consulting
Artificial Intelligence in
Real Time Communications
(AI in RTC)
RTC Korea
1 November 2018
cwh.consulting
A blog for WebRTC developers
webrtcHacks.com
@webrtcHacks
AI & RTC blog
cogint.ai
@cogintai
WebRTC and ML for Developer Event
November 16, 2018 in San Francisco
krankygeek.com
About Me
Chad Hart
Analyst & Product Consultant
https://cwh.consulting
@chadwallacehart
chad@cwh.consulting
cwh.consulting
AI in RTC Research Study
• Authors
• Chad Hart – cwh.consulting
• Tsahi Levent-Levi - BlogGeek.me
• Methodology
• 40+ 1-on-1 vendor interviews
• ~100 respondent web survey
• Analysis of 126 companies & all major
products
• Output: 147-page report
cwh.consulting
+ =
Image source:
pixabay.com/en/a-i-ai-anatomy-2729782
What is AI in RTC?
RTC
cwh.consulting
AI in RTC use case categories
speech analytics
voicebots
RTC optimization
computer vision
Image source:
pixabay.com/en/a-i-ai-anatomy-2729782
cwh.consulting
• Call center agent
monitoring
• Transcription
• Translation
• Agent coaching
• Customer engagement
Speech Analytics
cwh.consulting
Promise:
machine transcription at human levels
Source: Google I/O 2017 keynote
cwh.consulting
Reality:
transcription quality is often not so great
My name is a chat heart of you might be
familiar with Dave from a brand or if you
are, a web or to see people I've done
about five years, I'm or so a of an
independent analyst. So I'm mostly do
park management strategy type. For a
product, marketing.
My name is Chad Hart. You might be
familiar with me from a brand -- if you are
WebRTC people; I've done webrtcHacks
now for about five years or so. Outside of
webrtcHacks, I have been an independent
analyst. I mostly do product management
and strategy type work and product
marketing.
Machine Transcription Actual Transcription
https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful
cwh.consulting
My name is Chad Hart. You might be
familiar with me from a brand -- if you are
WebRTC people; I've done webrtcHacks
now for about five years or so. Outside of
webrtcHacks, I have been an independent
analyst. I mostly do product management
and strategy type work and product
marketing.
Reality:
transcription quality is often not so great
My name is a chat heart of you might be
familiar with Dave from a brand or if you
are, a web or to see people I've done
about five years, I'm or so a of an
independent analyst. So I'm mostly do
park management strategy type. For a
product, marketing.
Machine Transcription Actual Transcription
Non-standard
spelling
Industry
Jargon
Speech
disfluencies
US-English
language
assumption
https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful
cwh.consulting
Higher-level speech analytics
• Perfect transcription is not needed to
provide useful analysis.
• Higher-level speech analytics systems look
for patterns in speech.
• These patterns can be matched to
business outcomes, such as did a caller
end up purchasing or did they give a good
customer satisfaction score.
• There are often meaningful patterns
beyond the words that were spoken – like
how fast each party was speaking, or how
often the agent talked compared to the
customer.
• There is also a lot of work going into
looking at caller emotion and sentiment.
Source: CallMiner
cwh.consulting
• IVR replacement
• Starting meetings
• In-call assistance
Voicebots – Smart Speakers & Assistants
cwh.consulting
• Another area we examined was voice bots.
• These are smart speakers like the google home which was recently made available in
South Korea and AI assistants like Bixby or Siri.
• Building a voicebot is complex. You not only need to transcribe the speech and run
some natural language understanding on it like in speech analytics, but you need to
also generate speech and deal with interactivity with the customer in real time.
• There is very broad interest in using these voicebots
• Every telephony device maker is interested in adding a voice user interface to their
products – and this is a natural fit since people “talk” to these devices already.
• Typical conference room equipment is already setup to capture good quality audio
with minimal noise from a variety of locations throughout the room with microphone
arrays
• However, most companies are just starting to figure out how to use them in their
products.
Voicebots – Smart Speakers & Assistants
cwh.consulting
Flattening the IVR:
humans don’t speak in menus
https://cogint.ai/dialogflow-phone-bot/
Menu
DTMF
Menu
DTMF
Response Response Menu
DTMF
Response Response Response
Menu
DTMF
Response Response Response Menu
DTMF
Response Response
Utterance
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Intent
Response
Traditional IVR Menu Voicebot
time
10 potential responses in an IVR menu hierarchy vs. a voicebot
cwh.consulting
Flattening the IVR:
humans don’t speak in menus
• One major area where voicebots will have an impact is in IVRs.
• Traditional IVRs were designed for DTMF input and are usually setup with multiple
levels of menus.
• Because people cannot remember more than a few menu options at a time, you
cannot put too many options in each menu.
• As a result, to fit many options, you need to have a complex menu with many
layers.
• Users hate this because they are difficult to navigate and takes too long.
• Voicebots help to flatten the IVR into a just a few layers.
• Rather than navigating a complex menu, user can just say what they want and use
natural language to get the information they need.
• This is good for call centers too because users are more likely to stay in the IVR
instead of immediately dropping out to an operator.
https://cogint.ai/dialogflow-phone-bot/
cwh.consulting
New voicebots: consumer ⇨ businessNotable Consumer Voicebot Market Milestones
krankygeek.com/research
KRANKY GEEK RESEARCH
Notable voicebot milestones
cwh.consulting
New voicebot technology threatens IVRs
Time
Abilitytooffloadhumantasks
today
cwh.consulting
• Funny hats
• Face detection
• Gestures
• Object detection
• Emotion analysis
Computer vision
cwh.consulting
Object detection over WebRTC with TensorFlow
Blog post:
https://webrtchacks.com/webrtc-cv-tensorflow/
Demo video: https://youtu.be/vzTXW0hGINM
• Using open source libraries and existing work,
without having a PhD in computer vision it is
relatively simple to setup your own server
and process real time video.
• Here is an example of a server I setup to do
real time analysis of a WebRTC stream.
cwh.consulting
Object detection over WebRTC with TensorFlow – example
architecture
https://webrtchacks.com/webrtc-cv-tensorflow/
TensorFlow
Object
Detection
Flask
Server Browser
local.js
index.html
objDetect.js
POST with image
object details
web assets
GET web assets
• This is just a very basic example that uses an
HTTP post to send several images per
second to a cloud-based server for
processing.
• As you saw in the video, there can be a little
bit of lag.
• Using a GPU-accelerated server or even
something like Google’s TPU that were
specifically designed to accelerate heavy
machine learning graphs would have helped
• But ultimately streaming a high-quality
image can always have its limits.
• Wouldn’t it be nice if you do the heavy
processing locally with hardware
acceleration, just like you can hardware
accelerate codecs like H.264?
cwh.consulting
ML processing moving to the edge,
with faster, local processing
• That’s exactly what you can do with some new chipsets from vendors like
Intel.
• This is an example of a kit from Google called the AIY Vision Kit that
includes the Intel Movidius processor.
• The Movidius is designed to run deep neural networks locally and is
especially well-suited to low-power computer vision applications.
• This kits runs on a tiny, single core Raspberry Pi 0 with only 512MB of RAM.
• Google used to sell just the vision bonnet add-on part of the chip for $45.
Now you can buy the complete kit with the Raspberry Pi for $90 in the US.
• Note that Amazon also has a computer vision kit it calls Deep Lense. That
runs on something more like an Intel NUC mini-PC and costs $250.
cwh.consulting
ML processing moving to the edge,
with faster, local processing
https://webrtchacks.com/aiy-vision-kit-uv4l-web-server/
cwh.consulting
Improvements with edge hardware (demonstration)
• Let’s look at this in action
• This all runs locally on the Pi.
• So in this case, I am doing the computer
vision process locally while sending the
stream and annotation remotely
Blog post:
https://webrtchacks.com/aiy-vision-kit-uv4l-
web-server
Video:
https://youtu.be/h0O18R1rI9U
cwh.consulting
Fun use cases with native mobile libraries
• With new native mobile libraries like
Apple’s CoreML and Google’s ML Kit, it
is relatively simple.
• Some of the engineers at Houseparty
wrote a blog post demonstrating how
to do smile detection
• Similar libraries are available that
detect facial boundaries and let you
put hats, sunglasses, beards, and other
silly masks on people – I am sure you
have seen some of these!
• Similar techniques can be used in a
business context to blur out
backgrounds for remote workers who
call into a video conference.
https://webrtchacks.com/ml-kit-smile-detection/
cwh.consulting
MLKit CPU consumption: high framerates are not practical (without
special hardware)
CPU Usage for different framerates processed by ML Kit
CPUUsage%
https://webrtchacks.com/ml-kit-smile-detection/
cwh.consulting
Resource consumption
MLKit is small compared to WebRTC
https://webrtchacks.com/ml-kit-smile-detection/
cwh.consulting
WebRTC CV is coming to the browser
https://w3c.github.io/webrtc-nv-use-cases/#funnyhats*
This is from a W3C document examining use cases for the next version of WebRTC
cwh.consulting
RTC optimization
• Noise suppression
• Echo cancellation
• Error correction
• Route optimization
cwh.consulting
Mozilla RNNoise – real time, low-power noise suppression with
deep learning
• One example is a research project
from Mozilla that uses Deep Learning
to provide better real-time noise
suppression.
• This is designed for lower power
devices and does not require any
specialized hardware.
• We do not have time now, but you can
go to that link and try some demos.
• Unfortunately this was just a research
project, but it gives you some idea of
what could be done in this and other
areas.
https://people.xiph.org/~jm/demo/rnnoise/
cwh.consulting
Special discount
for RTC Korea
Use code RTC-KOREA
until November 7
for $1000.00 off
krankygeek.com/research
or email me
purchase at
cwh.consulting
Questions?
cwh.consulting
A blog for WebRTC developers
webrtcHacks.com
@webrtcHacks
AI & RTC blog
cogint.ai
@cogintai
WebRTC and ML for Developer Event
November 16, 2018 in San Francisco
krankygeek.com
About Me
Chad Hart
Analyst & Product Consultant
https://cwh.consulting
@chadwallacehart
chad@cwh.consulting

Más contenido relacionado

La actualidad más candente

WebRTC: A front-end perspective
WebRTC: A front-end perspectiveWebRTC: A front-end perspective
WebRTC: A front-end perspectiveshwetank
 
WebRTC standards update (April 2015)
WebRTC standards update (April 2015)WebRTC standards update (April 2015)
WebRTC standards update (April 2015)Victor Pascual Ávila
 
Kamailio World 2017: Getting Real with WebRTC
Kamailio World 2017: Getting Real with WebRTCKamailio World 2017: Getting Real with WebRTC
Kamailio World 2017: Getting Real with WebRTCChad Hart
 
WebRTC Timeline and Forecast
WebRTC Timeline and ForecastWebRTC Timeline and Forecast
WebRTC Timeline and ForecastTsahi Levent-levi
 
Deploying WebRTC in a low-latency streaming service
Deploying WebRTC in a low-latency streaming serviceDeploying WebRTC in a low-latency streaming service
Deploying WebRTC in a low-latency streaming serviceAlexandre Gouaillard
 
Upperside WebRTC conference - WebRTC intro
Upperside WebRTC conference - WebRTC introUpperside WebRTC conference - WebRTC intro
Upperside WebRTC conference - WebRTC introVictor Pascual Ávila
 
The future of WebRTC - Sept 2021
The future of WebRTC - Sept 2021The future of WebRTC - Sept 2021
The future of WebRTC - Sept 2021Arnaud BUDKIEWICZ
 
WebRTC overview
WebRTC overviewWebRTC overview
WebRTC overviewRouyun Pan
 
Baby Steps: A WebRTC Tutorial
Baby Steps: A WebRTC TutorialBaby Steps: A WebRTC Tutorial
Baby Steps: A WebRTC TutorialTsahi Levent-levi
 
Introduction to WebRTC
Introduction to WebRTCIntroduction to WebRTC
Introduction to WebRTCArt Matsak
 
WebRTC standards update - November 2014
WebRTC standards update - November 2014WebRTC standards update - November 2014
WebRTC standards update - November 2014Victor Pascual Ávila
 
WebRTC on Mobile Devices: Challenges and Opportunities
WebRTC on Mobile Devices: Challenges and OpportunitiesWebRTC on Mobile Devices: Challenges and Opportunities
WebRTC on Mobile Devices: Challenges and OpportunitiesVladimir Beloborodov
 
Getting started with WebRTC
Getting started with WebRTCGetting started with WebRTC
Getting started with WebRTCDan Jenkins
 
WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...
WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...
WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...Amir Zmora
 
WebRTC - a quick introduction
WebRTC - a quick introductionWebRTC - a quick introduction
WebRTC - a quick introductionOlle E Johansson
 
Common WebRTC mistakesand how to avoid them (RTC Expo 2019)
Common WebRTC mistakesand how to avoid them (RTC Expo 2019)Common WebRTC mistakesand how to avoid them (RTC Expo 2019)
Common WebRTC mistakesand how to avoid them (RTC Expo 2019)Tsahi Levent-levi
 
Value Added Services and WebRTC
Value Added Services and WebRTCValue Added Services and WebRTC
Value Added Services and WebRTCDialogic Inc.
 

La actualidad más candente (20)

WebRTC in the Real World
WebRTC in the Real WorldWebRTC in the Real World
WebRTC in the Real World
 
WebRTC DataChannels demystified
WebRTC DataChannels demystifiedWebRTC DataChannels demystified
WebRTC DataChannels demystified
 
WebRTC: A front-end perspective
WebRTC: A front-end perspectiveWebRTC: A front-end perspective
WebRTC: A front-end perspective
 
WebRTC standards update (April 2015)
WebRTC standards update (April 2015)WebRTC standards update (April 2015)
WebRTC standards update (April 2015)
 
Kamailio World 2017: Getting Real with WebRTC
Kamailio World 2017: Getting Real with WebRTCKamailio World 2017: Getting Real with WebRTC
Kamailio World 2017: Getting Real with WebRTC
 
WebRTC Timeline and Forecast
WebRTC Timeline and ForecastWebRTC Timeline and Forecast
WebRTC Timeline and Forecast
 
WebRTC - a History Lesson
WebRTC - a History LessonWebRTC - a History Lesson
WebRTC - a History Lesson
 
Deploying WebRTC in a low-latency streaming service
Deploying WebRTC in a low-latency streaming serviceDeploying WebRTC in a low-latency streaming service
Deploying WebRTC in a low-latency streaming service
 
Upperside WebRTC conference - WebRTC intro
Upperside WebRTC conference - WebRTC introUpperside WebRTC conference - WebRTC intro
Upperside WebRTC conference - WebRTC intro
 
The future of WebRTC - Sept 2021
The future of WebRTC - Sept 2021The future of WebRTC - Sept 2021
The future of WebRTC - Sept 2021
 
WebRTC overview
WebRTC overviewWebRTC overview
WebRTC overview
 
Baby Steps: A WebRTC Tutorial
Baby Steps: A WebRTC TutorialBaby Steps: A WebRTC Tutorial
Baby Steps: A WebRTC Tutorial
 
Introduction to WebRTC
Introduction to WebRTCIntroduction to WebRTC
Introduction to WebRTC
 
WebRTC standards update - November 2014
WebRTC standards update - November 2014WebRTC standards update - November 2014
WebRTC standards update - November 2014
 
WebRTC on Mobile Devices: Challenges and Opportunities
WebRTC on Mobile Devices: Challenges and OpportunitiesWebRTC on Mobile Devices: Challenges and Opportunities
WebRTC on Mobile Devices: Challenges and Opportunities
 
Getting started with WebRTC
Getting started with WebRTCGetting started with WebRTC
Getting started with WebRTC
 
WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...
WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...
WebRTC Webinar & Q&A - W3C WebRTC JS API Test Platform & Updates from W3C Lis...
 
WebRTC - a quick introduction
WebRTC - a quick introductionWebRTC - a quick introduction
WebRTC - a quick introduction
 
Common WebRTC mistakesand how to avoid them (RTC Expo 2019)
Common WebRTC mistakesand how to avoid them (RTC Expo 2019)Common WebRTC mistakesand how to avoid them (RTC Expo 2019)
Common WebRTC mistakesand how to avoid them (RTC Expo 2019)
 
Value Added Services and WebRTC
Value Added Services and WebRTCValue Added Services and WebRTC
Value Added Services and WebRTC
 

Similar a AI in RTC - RTC Korea 2018

DevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable ProductsDevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable ProductsBen Hall
 
Behavior Driven Development
Behavior Driven DevelopmentBehavior Driven Development
Behavior Driven DevelopmentNETUserGroupBern
 
QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...
QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...
QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...QAFest
 
Custom Image Classifier with Visual Recognition: Building with Watson
Custom Image Classifier with Visual Recognition: Building with Watson Custom Image Classifier with Visual Recognition: Building with Watson
Custom Image Classifier with Visual Recognition: Building with Watson IBM Watson
 
Notes From Velocity Conference Europe
Notes From Velocity Conference EuropeNotes From Velocity Conference Europe
Notes From Velocity Conference EuropeSiriusWay
 
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...Acquia
 
NUS-ISS Learning Day 2019- ChatBots: All about Conversational Experiences
NUS-ISS Learning Day 2019- ChatBots: All about Conversational ExperiencesNUS-ISS Learning Day 2019- ChatBots: All about Conversational Experiences
NUS-ISS Learning Day 2019- ChatBots: All about Conversational ExperiencesNUS-ISS
 
xAPI in Action
xAPI in ActionxAPI in Action
xAPI in Actionbbetts
 
When e-commerce meets Symfony
When e-commerce meets SymfonyWhen e-commerce meets Symfony
When e-commerce meets SymfonyMarc Morera
 
Global Azure2021 Verona.pptx
Global Azure2021 Verona.pptxGlobal Azure2021 Verona.pptx
Global Azure2021 Verona.pptxLuis Beltran
 
Webiner Presentation
Webiner PresentationWebiner Presentation
Webiner Presentationo96717393
 
How HTML5 missed its graduation - #TrondheimDC
How HTML5 missed its graduation - #TrondheimDCHow HTML5 missed its graduation - #TrondheimDC
How HTML5 missed its graduation - #TrondheimDCChristian Heilmann
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2
 
Application Starter Kits for Developers - Building with Watson
Application Starter Kits for Developers - Building with WatsonApplication Starter Kits for Developers - Building with Watson
Application Starter Kits for Developers - Building with WatsonIBM Watson
 
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias
 
Picnic Software - Developing a flexible and scalable application
Picnic Software - Developing a flexible and scalable applicationPicnic Software - Developing a flexible and scalable application
Picnic Software - Developing a flexible and scalable applicationNick Josevski
 
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...Bert Jan Schrijver
 
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Bert Jan Schrijver
 
05 DIGI CREATIVE people&process
05 DIGI CREATIVE people&process05 DIGI CREATIVE people&process
05 DIGI CREATIVE people&processSheSaysCREATIVE
 

Similar a AI in RTC - RTC Korea 2018 (20)

DevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable ProductsDevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable Products
 
Behavior Driven Development
Behavior Driven DevelopmentBehavior Driven Development
Behavior Driven Development
 
QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...
QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...
QA Fest 2018. Александр Хотемский. Использование голосовых помощников для раз...
 
Custom Image Classifier with Visual Recognition: Building with Watson
Custom Image Classifier with Visual Recognition: Building with Watson Custom Image Classifier with Visual Recognition: Building with Watson
Custom Image Classifier with Visual Recognition: Building with Watson
 
Notes From Velocity Conference Europe
Notes From Velocity Conference EuropeNotes From Velocity Conference Europe
Notes From Velocity Conference Europe
 
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
 
NUS-ISS Learning Day 2019- ChatBots: All about Conversational Experiences
NUS-ISS Learning Day 2019- ChatBots: All about Conversational ExperiencesNUS-ISS Learning Day 2019- ChatBots: All about Conversational Experiences
NUS-ISS Learning Day 2019- ChatBots: All about Conversational Experiences
 
xAPI in Action
xAPI in ActionxAPI in Action
xAPI in Action
 
When e-commerce meets Symfony
When e-commerce meets SymfonyWhen e-commerce meets Symfony
When e-commerce meets Symfony
 
Global Azure2021 Verona.pptx
Global Azure2021 Verona.pptxGlobal Azure2021 Verona.pptx
Global Azure2021 Verona.pptx
 
Webiner Presentation
Webiner PresentationWebiner Presentation
Webiner Presentation
 
presentation slides
presentation slidespresentation slides
presentation slides
 
How HTML5 missed its graduation - #TrondheimDC
How HTML5 missed its graduation - #TrondheimDCHow HTML5 missed its graduation - #TrondheimDC
How HTML5 missed its graduation - #TrondheimDC
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
 
Application Starter Kits for Developers - Building with Watson
Application Starter Kits for Developers - Building with WatsonApplication Starter Kits for Developers - Building with Watson
Application Starter Kits for Developers - Building with Watson
 
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
 
Picnic Software - Developing a flexible and scalable application
Picnic Software - Developing a flexible and scalable applicationPicnic Software - Developing a flexible and scalable application
Picnic Software - Developing a flexible and scalable application
 
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
 
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
 
05 DIGI CREATIVE people&process
05 DIGI CREATIVE people&process05 DIGI CREATIVE people&process
05 DIGI CREATIVE people&process
 

Más de Chad Hart

Kill Your IVR with a Voicebot (ClueCon 2019)
Kill Your IVR with a Voicebot (ClueCon 2019)Kill Your IVR with a Voicebot (ClueCon 2019)
Kill Your IVR with a Voicebot (ClueCon 2019)Chad Hart
 
AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)
AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)
AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)Chad Hart
 
Boosting business with WebRTC - ClueCon 2017
Boosting business with WebRTC - ClueCon 2017Boosting business with WebRTC - ClueCon 2017
Boosting business with WebRTC - ClueCon 2017Chad Hart
 
6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutes6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutesChad Hart
 
Astricon WebRTC Update
Astricon WebRTC UpdateAstricon WebRTC Update
Astricon WebRTC UpdateChad Hart
 
WebRTC From Asterisk to Headline - MoNage
WebRTC From Asterisk to Headline - MoNageWebRTC From Asterisk to Headline - MoNage
WebRTC From Asterisk to Headline - MoNageChad Hart
 
ClueCon 2016: Should you use WebRTC?
ClueCon 2016: Should you use WebRTC?ClueCon 2016: Should you use WebRTC?
ClueCon 2016: Should you use WebRTC?Chad Hart
 
WebRTC Hacks: Lessons Learned
WebRTC Hacks: Lessons LearnedWebRTC Hacks: Lessons Learned
WebRTC Hacks: Lessons LearnedChad Hart
 
WebRTC for Billions
WebRTC for BillionsWebRTC for Billions
WebRTC for BillionsChad Hart
 
The Future of Real Time Communications
The Future of Real Time CommunicationsThe Future of Real Time Communications
The Future of Real Time CommunicationsChad Hart
 
What's Next for WebRTC
What's Next for WebRTCWhat's Next for WebRTC
What's Next for WebRTCChad Hart
 

Más de Chad Hart (11)

Kill Your IVR with a Voicebot (ClueCon 2019)
Kill Your IVR with a Voicebot (ClueCon 2019)Kill Your IVR with a Voicebot (ClueCon 2019)
Kill Your IVR with a Voicebot (ClueCon 2019)
 
AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)
AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)
AIY Vision Kit - Embedded ML for STEM and Makers (GDG Boston Tensorflow)
 
Boosting business with WebRTC - ClueCon 2017
Boosting business with WebRTC - ClueCon 2017Boosting business with WebRTC - ClueCon 2017
Boosting business with WebRTC - ClueCon 2017
 
6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutes6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutes
 
Astricon WebRTC Update
Astricon WebRTC UpdateAstricon WebRTC Update
Astricon WebRTC Update
 
WebRTC From Asterisk to Headline - MoNage
WebRTC From Asterisk to Headline - MoNageWebRTC From Asterisk to Headline - MoNage
WebRTC From Asterisk to Headline - MoNage
 
ClueCon 2016: Should you use WebRTC?
ClueCon 2016: Should you use WebRTC?ClueCon 2016: Should you use WebRTC?
ClueCon 2016: Should you use WebRTC?
 
WebRTC Hacks: Lessons Learned
WebRTC Hacks: Lessons LearnedWebRTC Hacks: Lessons Learned
WebRTC Hacks: Lessons Learned
 
WebRTC for Billions
WebRTC for BillionsWebRTC for Billions
WebRTC for Billions
 
The Future of Real Time Communications
The Future of Real Time CommunicationsThe Future of Real Time Communications
The Future of Real Time Communications
 
What's Next for WebRTC
What's Next for WebRTCWhat's Next for WebRTC
What's Next for WebRTC
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

AI in RTC - RTC Korea 2018

  • 1. cwh.consulting Artificial Intelligence in Real Time Communications (AI in RTC) RTC Korea 1 November 2018
  • 2. cwh.consulting A blog for WebRTC developers webrtcHacks.com @webrtcHacks AI & RTC blog cogint.ai @cogintai WebRTC and ML for Developer Event November 16, 2018 in San Francisco krankygeek.com About Me Chad Hart Analyst & Product Consultant https://cwh.consulting @chadwallacehart chad@cwh.consulting
  • 3. cwh.consulting AI in RTC Research Study • Authors • Chad Hart – cwh.consulting • Tsahi Levent-Levi - BlogGeek.me • Methodology • 40+ 1-on-1 vendor interviews • ~100 respondent web survey • Analysis of 126 companies & all major products • Output: 147-page report
  • 5. cwh.consulting AI in RTC use case categories speech analytics voicebots RTC optimization computer vision Image source: pixabay.com/en/a-i-ai-anatomy-2729782
  • 6. cwh.consulting • Call center agent monitoring • Transcription • Translation • Agent coaching • Customer engagement Speech Analytics
  • 7. cwh.consulting Promise: machine transcription at human levels Source: Google I/O 2017 keynote
  • 8. cwh.consulting Reality: transcription quality is often not so great My name is a chat heart of you might be familiar with Dave from a brand or if you are, a web or to see people I've done about five years, I'm or so a of an independent analyst. So I'm mostly do park management strategy type. For a product, marketing. My name is Chad Hart. You might be familiar with me from a brand -- if you are WebRTC people; I've done webrtcHacks now for about five years or so. Outside of webrtcHacks, I have been an independent analyst. I mostly do product management and strategy type work and product marketing. Machine Transcription Actual Transcription https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful
  • 9. cwh.consulting My name is Chad Hart. You might be familiar with me from a brand -- if you are WebRTC people; I've done webrtcHacks now for about five years or so. Outside of webrtcHacks, I have been an independent analyst. I mostly do product management and strategy type work and product marketing. Reality: transcription quality is often not so great My name is a chat heart of you might be familiar with Dave from a brand or if you are, a web or to see people I've done about five years, I'm or so a of an independent analyst. So I'm mostly do park management strategy type. For a product, marketing. Machine Transcription Actual Transcription Non-standard spelling Industry Jargon Speech disfluencies US-English language assumption https://www.nojitter.com/post/240173958/when-speech-analytics-makes-gibberish-useful
  • 10. cwh.consulting Higher-level speech analytics • Perfect transcription is not needed to provide useful analysis. • Higher-level speech analytics systems look for patterns in speech. • These patterns can be matched to business outcomes, such as did a caller end up purchasing or did they give a good customer satisfaction score. • There are often meaningful patterns beyond the words that were spoken – like how fast each party was speaking, or how often the agent talked compared to the customer. • There is also a lot of work going into looking at caller emotion and sentiment. Source: CallMiner
  • 11. cwh.consulting • IVR replacement • Starting meetings • In-call assistance Voicebots – Smart Speakers & Assistants
  • 12. cwh.consulting • Another area we examined was voice bots. • These are smart speakers like the google home which was recently made available in South Korea and AI assistants like Bixby or Siri. • Building a voicebot is complex. You not only need to transcribe the speech and run some natural language understanding on it like in speech analytics, but you need to also generate speech and deal with interactivity with the customer in real time. • There is very broad interest in using these voicebots • Every telephony device maker is interested in adding a voice user interface to their products – and this is a natural fit since people “talk” to these devices already. • Typical conference room equipment is already setup to capture good quality audio with minimal noise from a variety of locations throughout the room with microphone arrays • However, most companies are just starting to figure out how to use them in their products. Voicebots – Smart Speakers & Assistants
  • 13. cwh.consulting Flattening the IVR: humans don’t speak in menus https://cogint.ai/dialogflow-phone-bot/ Menu DTMF Menu DTMF Response Response Menu DTMF Response Response Response Menu DTMF Response Response Response Menu DTMF Response Response Utterance Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Intent Response Traditional IVR Menu Voicebot time 10 potential responses in an IVR menu hierarchy vs. a voicebot
  • 14. cwh.consulting Flattening the IVR: humans don’t speak in menus • One major area where voicebots will have an impact is in IVRs. • Traditional IVRs were designed for DTMF input and are usually setup with multiple levels of menus. • Because people cannot remember more than a few menu options at a time, you cannot put too many options in each menu. • As a result, to fit many options, you need to have a complex menu with many layers. • Users hate this because they are difficult to navigate and takes too long. • Voicebots help to flatten the IVR into a just a few layers. • Rather than navigating a complex menu, user can just say what they want and use natural language to get the information they need. • This is good for call centers too because users are more likely to stay in the IVR instead of immediately dropping out to an operator. https://cogint.ai/dialogflow-phone-bot/
  • 15. cwh.consulting New voicebots: consumer ⇨ businessNotable Consumer Voicebot Market Milestones krankygeek.com/research KRANKY GEEK RESEARCH Notable voicebot milestones
  • 16. cwh.consulting New voicebot technology threatens IVRs Time Abilitytooffloadhumantasks today
  • 17. cwh.consulting • Funny hats • Face detection • Gestures • Object detection • Emotion analysis Computer vision
  • 18. cwh.consulting Object detection over WebRTC with TensorFlow Blog post: https://webrtchacks.com/webrtc-cv-tensorflow/ Demo video: https://youtu.be/vzTXW0hGINM • Using open source libraries and existing work, without having a PhD in computer vision it is relatively simple to setup your own server and process real time video. • Here is an example of a server I setup to do real time analysis of a WebRTC stream.
  • 19. cwh.consulting Object detection over WebRTC with TensorFlow – example architecture https://webrtchacks.com/webrtc-cv-tensorflow/ TensorFlow Object Detection Flask Server Browser local.js index.html objDetect.js POST with image object details web assets GET web assets • This is just a very basic example that uses an HTTP post to send several images per second to a cloud-based server for processing. • As you saw in the video, there can be a little bit of lag. • Using a GPU-accelerated server or even something like Google’s TPU that were specifically designed to accelerate heavy machine learning graphs would have helped • But ultimately streaming a high-quality image can always have its limits. • Wouldn’t it be nice if you do the heavy processing locally with hardware acceleration, just like you can hardware accelerate codecs like H.264?
  • 20. cwh.consulting ML processing moving to the edge, with faster, local processing • That’s exactly what you can do with some new chipsets from vendors like Intel. • This is an example of a kit from Google called the AIY Vision Kit that includes the Intel Movidius processor. • The Movidius is designed to run deep neural networks locally and is especially well-suited to low-power computer vision applications. • This kits runs on a tiny, single core Raspberry Pi 0 with only 512MB of RAM. • Google used to sell just the vision bonnet add-on part of the chip for $45. Now you can buy the complete kit with the Raspberry Pi for $90 in the US. • Note that Amazon also has a computer vision kit it calls Deep Lense. That runs on something more like an Intel NUC mini-PC and costs $250.
  • 21. cwh.consulting ML processing moving to the edge, with faster, local processing https://webrtchacks.com/aiy-vision-kit-uv4l-web-server/
  • 22. cwh.consulting Improvements with edge hardware (demonstration) • Let’s look at this in action • This all runs locally on the Pi. • So in this case, I am doing the computer vision process locally while sending the stream and annotation remotely Blog post: https://webrtchacks.com/aiy-vision-kit-uv4l- web-server Video: https://youtu.be/h0O18R1rI9U
  • 23. cwh.consulting Fun use cases with native mobile libraries • With new native mobile libraries like Apple’s CoreML and Google’s ML Kit, it is relatively simple. • Some of the engineers at Houseparty wrote a blog post demonstrating how to do smile detection • Similar libraries are available that detect facial boundaries and let you put hats, sunglasses, beards, and other silly masks on people – I am sure you have seen some of these! • Similar techniques can be used in a business context to blur out backgrounds for remote workers who call into a video conference. https://webrtchacks.com/ml-kit-smile-detection/
  • 24. cwh.consulting MLKit CPU consumption: high framerates are not practical (without special hardware) CPU Usage for different framerates processed by ML Kit CPUUsage% https://webrtchacks.com/ml-kit-smile-detection/
  • 25. cwh.consulting Resource consumption MLKit is small compared to WebRTC https://webrtchacks.com/ml-kit-smile-detection/
  • 26. cwh.consulting WebRTC CV is coming to the browser https://w3c.github.io/webrtc-nv-use-cases/#funnyhats* This is from a W3C document examining use cases for the next version of WebRTC
  • 27. cwh.consulting RTC optimization • Noise suppression • Echo cancellation • Error correction • Route optimization
  • 28. cwh.consulting Mozilla RNNoise – real time, low-power noise suppression with deep learning • One example is a research project from Mozilla that uses Deep Learning to provide better real-time noise suppression. • This is designed for lower power devices and does not require any specialized hardware. • We do not have time now, but you can go to that link and try some demos. • Unfortunately this was just a research project, but it gives you some idea of what could be done in this and other areas. https://people.xiph.org/~jm/demo/rnnoise/
  • 29. cwh.consulting Special discount for RTC Korea Use code RTC-KOREA until November 7 for $1000.00 off krankygeek.com/research or email me purchase at
  • 31. cwh.consulting A blog for WebRTC developers webrtcHacks.com @webrtcHacks AI & RTC blog cogint.ai @cogintai WebRTC and ML for Developer Event November 16, 2018 in San Francisco krankygeek.com About Me Chad Hart Analyst & Product Consultant https://cwh.consulting @chadwallacehart chad@cwh.consulting

Notas del editor

  1. As a quick background, my name is Chad Hart. I am an analyst and consultant focused on real time communications products and services Some of you may be familiar with webrtcHacks – I blog I have run since 2013 that aims to provide useful content for WebRTC developers I also recently launched a blog to specifically explore topics related to AI, Machine Learning and RTC. You can check that out at cogint.ai Lastly, I also help to run the Kranky Geek series of events with the help of Google and other sponsors like Intel, Nexmo and Agora. We hold an event every year in San Francisco. This year we will also be focusing on the AI in RTC topics with many great talks from companies like Facebook, Microsoft, IBM and many more.
  2. The AI in RTC topic has been a major focus of mine. I recently came off a long-term project where I ran a new product incubator group that launched a speech analytics service inside a telco. I could see speech analytics and other machine-learning based technologies were starting to intersect with real time communications. To understand this better I teamed up with Tsahi Levent-Levi of BlogGeek.me, another WebRTC analyst many of you know, to write a research report on this topic. We covered more than 125 vendors, ran an industry survey, and had 1-on1 conversations with 40 vendors.
  3. So what is AI in RTC? I am not talking about science fiction robots making phone calls I am going to talk about how modern machine learning techniques can be used to improve and expand real time communications.
  4. We saw 4 major categories of use cases Speech analytics voice bots computer vision, And using Machine Learning (ML) to optimize lower-level RTC protocols and networks
  5. By far the most common use case was speech analytics There is a broad range of use cases that range from providing transcription on conference calls to providing real time agent coaching based on what the customer is saying in the call center.
  6. Speech transcription – also known as ASR or Speech-to-text (STT) Has made a lot of improvements over the past couple of year thanks to deep learning techniques. Many vendors now claim they are at human-levels of accuracy.
  7. The reality is that transcription still has a number of challenges. The example here shows a transcription where I was introducing myself. As you can see – the machine transcription did not do such a great job.
  8. This specific example is probably worse than average, but not uncommon. The first major challenge is getting languages and dialects correct. I am sure that this is a big struggle for this audience as you deal with STT technologies made outside of Korea. I am lucky that English, and particularly American English, is by far the best supported language. May vendors also have support for many dialects of English, such as British, Australian, and Indian accents. You will find much more limited support for Korean. I do not think I have seen any major international vendor support specific Korean dialects. Fortunately this is improving and newer algorithms require less training data, so it is becoming easier to build support for new languages. Non-standard spellings and specific industry jargon that does not appear in the dictionary like “WebRTC” is also a challenge. Most systems now have techniques that let you specify a custom vocabulary to correct these.
  9. It is also important to note that perfect transcription is not needed to provide useful analysis. Higher-level speech analytics systems look for patterns in speech. These patterns can be matched to business outcomes, such as did a caller end up purchasing or did they give a good customer satisfaction score. There are often meaningful patterns beyond the words that were spoken – like how fast each party was speaking, or how often the agent talked compared to the customer. There is also a lot of work going into looking at caller emotion and sentiment.
  10. Another area we examined was voice bots. These are smart speakers like the google home which was recently made available in South Korea (https://voicebot.ai/2018/09/11/google-home-arriving-in-south-korean-on-september-18-pre-orders-start-today/) And AI assistants like Bixby or Siri. Building a voicebot is complex. You not only need to transcribe the speech and run some natural language understanding on it like in speech analytics, but you need to also generate speech and deal with interactivity with the customer in real time. There is very broad interest in using these voicebots Every telephony device maker is interested in adding a voice user interface to their products – and this is a natural fit since people “talk” to these devices already. Typical conference room equipment is already setup to capture good quality audio with minimal noise from a variety of locations throughout the room with microphone arrays However, most companies are just starting to figure out how to use them in their products.
  11. Another area we examined was voice bots. These are smart speakers like the google home which was recently made available in South Korea (https://voicebot.ai/2018/09/11/google-home-arriving-in-south-korean-on-september-18-pre-orders-start-today/) And AI assistants like Bixby or Siri. Building a voicebot is complex. You not only need to transcribe the speech and run some natural language understanding on it like in speech analytics, but you need to also generate speech and deal with interactivity with the customer in real time. There is very broad interest in using these voicebots Every telephony device maker is interested in adding a voice user interface to their products – and this is a natural fit since people “talk” to these devices already. Typical conference room equipment is already setup to capture good quality audio with minimal noise from a variety of locations throughout the room with microphone arrays However, most companies are just starting to figure out how to use them in their products.
  12. One major area where voicebots will have an impact is in IVRs. Traditional IVRs were designed for DTMF input and are usually setup with multiple levels of menus. Because people cannot remember more than a few menu options at a time, you cannot put too many options in each menu. As a result, to fit many options, you need to have a complex menu with many layers. Users hate this because they are difficult to navigate and takes too long. Voicebots help to flatten the IVR into a just a few layers. Rather than navigating a complex menu, user can just say what they want and use natural language to get the information they need. This is good for call centers too because users are more likely to stay in the IVR instead of immediately dropping out to an operator.
  13. One major area where voicebots will have an impact is in IVRs. Traditional IVRs were designed for DTMF input and are usually setup with multiple levels of menus. Because people cannot remember more than a few menu options at a time, you cannot put too many options in each menu. As a result, to fit many options, you need to have a complex menu with many layers. Users hate this because they are difficult to navigate and takes too long. Voicebots help to flatten the IVR into a just a few layers. Rather than navigating a complex menu, user can just say what they want and use natural language to get the information they need. This is good for call centers too because users are more likely to stay in the IVR instead of immediately dropping out to an operator.
  14. Actually, many advanced IVR systems like those sold by companies like Nuance, Aspect, and Genesys already have natural language inputs and responses. One big change here is the growth of the consumer voicebot market. As this technology has matured, these solutions are not being targeted at business telephony use cases, not just consumers. For example, IBM launched a voice gateway option for its Watson assistant. Amazon is integrating its natural language engine called Lex into Amazon Connect, its contact center solution. Microsoft’s language processing platform is called LUIS and it has a bot-builder framework that can use this to integrate into the consumer Skype and Skype for business. Just this summer, Google launched its contact center AI initiative where it has partnered with many major communications providers and vendors. As part of Google’s solution, they are looking to penetrate call centers by using Dialogflow, their natural languge understanding engine and are using other tools to help agents more quickly answer questions.
  15. Existing IVR technology that incorporates natural language tends to be very expensive. Big vendors like Amazon, Google, and Microsoft are adapting technologies they built for the much larger consumer market and applying that to business use cases at much lower costs, often with better performance. One of Google’s customers Marks and Spensor, commented they were able to save the equivalent of 100 Full Time employees using this technology across their call center.
  16. The last area I would like to discuss is computer vision. This domain already had a lot of usage in consumer applications and is just starting to find some business use cases. There are many applications area including counting people, identifying faces, using gestures for controls, and even augmented reality.
  17. Using open source libraries and existing work, without having a PhD in computer vision it is relatively simple to setup your own server and process real time video. Here is an example of a server I setup to do real time analysis of a WebRTC stream.
  18. This is just a very basic example that uses an HTTP post to send several images per second to a cloud-based server for processing. As you saw in the video, there can be a little bit of lag. Using a GPU-accelerated server or even something like Google’s TPU that were specifically designed to accelerate heavy machine learning graphs would have helped But ultimately streaming a high-quality image can always have its limits. Wouldn’t it be nice if you do the heavy processing locally with hardware acceleration, just like you can hardware accelerate codecs like H.264?
  19. That’s exactly what you can do with some new chipsets from vendors like Intel. This is an example of a kit from Google called the AIY Vision Kit that includes the Intel Movidius processor. The Movidius is designed to run deep neural networks locally and is especially well-suited to low-power computer vision applications. This kits runs on a tiny, single core Raspberry Pi 0 with only 512MB of RAM. Google used to sell just the vision bonnet add-on part of the chip for $45. Now you can buy the complete kit with the Raspberry Pi for $90 in the US. Note that Amazon also has a computer vision kit it calls Deep Lense. That runs on something more like an Intel NUC mini-PC and costs $250.
  20. That’s exactly what you can do with some new chipsets from vendors like Intel. This is an example of a kit from Google called the AIY Vision Kit that includes the Intel Movidius processor. The Movidius is designed to run deep neural networks locally and is especially well-suited to low-power computer vision applications. This kits runs on a tiny, single core Raspberry Pi 0 with only 512MB of RAM. Google used to sell just the vision bonnet add-on part of the chip for $45. Now you can buy the complete kit with the Raspberry Pi for $90 in the US. Note that Amazon also has a computer vision kit it calls Deep Lense. That runs on something more like an Intel NUC mini-PC and costs $250.
  21. Let’s look at this in action This all runs locally on the Pi. So in this case, I doing the computer vision process locally while sending the stream and annotation remotely
  22. With new native mobile libraries like Apple’s CoreML and Google’s ML Kit, it is relatively simple. Some of the engineers at Houseparty wrote a blog post demonstrating how to do smile detection Similar libraries are available that detect facial boundaries and let you put hats, sunglasses, beards, and other silly masks on people – I am sure you have seen some of these! Similar techniques can be used in a business context to blur out backgrounds for remote workers who call into a video conference.
  23. The last area is RTC optimization. There are many opportunities to use machine learning to improve bandwidth estimation, echo cancellation, and perform better error correction. We were very surprised that there has been relatively investment made here.
  24. One example is a research project from Mozilla that uses Deep Learning to provide better real-time noise suppression. This is designed for lower power devices and does not require any specialized hardware. We do not have time now, but you can go to that link and try some demos. It is pretty neat. Unfortunately this was just a research project, but it gives you some idea of what could be done in this and other areas.
  25. Before I take questions, I did want to mention we have a special discount code for RTC Korea attendees. If you are interested in seeing out full 147-page report, you can use that for a big discount.