SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
M3Conference

Developing with Speech and
Voice Recognition in Mobile Apps
Nick Landry, MVP
App Artisan
Nokia Developer Champion & Ambassador
activenick@mobility42.com
@ActiveNick – www.mobility42.com

talk2me
Who is ActiveNick?
• App Artisan – Mobile Development Consultant – Mobility42
• Microsoft MVP: Windows Phone Development
• Mobile Publisher – Big Bald Apps: http://www.bigbaldapps.com
• Nokia Developer Champion and Ambassador
• Speaker. Blogger. Author. Tweeter. Gamer

• 20+ Years of Professional Experience
• Specialties:
• Mobile Development
• Location Intelligence & Geospatial Systems
• Data Visualization, HPC, Cloud
• Mobile Game Development
• Blog: www.ActiveNick.net

• Twitter: @ActiveNick
Agenda
• Speech on Windows Phone 8
• Speech synthesis
• Controlling applications using speech
• Voice command definition files
• Building conversations
• Selecting application entry points
• Simple speech input
• Speech input and grammars
• Using Grammar Lists
Speech on
Windows Phone 8
Windows Phone Speech Support
• Windows Phone 7.x had voice support built into the operating system
• Programs and phone features could be started by voice commands e.g “Start MyApp”
• Incoming SMS messages could be read to the user

• The user could compose and send SMS messages
• Windows Phone 8 builds on this to allow applications to make use of speech
• Applications can speak messages using the Speech Synthesis feature
• Applications can be started and given commands
• Applications can accept commands using voice input

• Speech recognition requires an internet connection, but Speech Synthesis does not
Speech
Synthesis
Enabling Speech Synthesis
• If an application wishes to use speech output the
ID_CAP_SPEECH_RECOGNITION capability must
be enabled in WMAppManifest.xml
• The application can also reference the Synthesis
namespace
using Windows.Phone.Speech.Synthesis;
Simple Speech
async void CheeseLiker()
{
SpeechSynthesizer synth = new SpeechSynthesizer();
await synth.SpeakTextAsync("I like cheese.");
}

• The SpeechSynthesizer class provides a simple way to produce speech
• The SpeakTextAsync method speaks the content of the string using the default voice

• Note that the method is an asynchronous one, so the calling method must use the
async modifier
• Speech output does not require a network connection
Selecting a language
// Query for a voice that speaks French.
var frenchVoices = from voice in InstalledVoices.All
where voice.Language == "fr-FR"
select voice;
// Set the voice as identified by the query.
synth.SetVoice(frenchVoices.ElementAt(0));

• The default speaking voice is selected automatically from the locale set for the phone
• The InstalledVoices class provides a list of all the voices available on the phone

• The above code selects a French voice
Demo 1: Speech Synthesis
and Voice Selection
talk2me - http://bit.ly/wpt2m
Speech Synthesis Markup Language
<?xml version="1.0" encoding="ISO-8859-1"?>
<speak version="1.0"
xmlns=http://www.w3.org/2001/10/synthesis xml:lang="en-US">
<p> Your <say-as interpret-as="ordinal">1st</say-as> request was for
<say-as interpret-as="cardinal">1</say-as> room on
<say-as interpret-as="date" format="mdy">10/19/2010</say-as> ,
arriving at <say-as interpret-as="time" format="hms12">12:35pm</say-as>.
</p>
</speak>
• You can use Speech Synthesis Markup Language (SSML) to control the spoken output
• Change the voice, pitch, rate, volume, pronunciation and other characteristics
• Also allows the inclusion of audio files into the spoken output
• You can also use the Speech synthesizer to speak the contents of a file
Controlling
Applications
using Voice
Commands
Application Launching using Voice command
• The Voice Command feature of Windows Phone 7 allowed users to start applications
• In Windows Phone 8 the feature has been expanded to allow the user to request data
from the application in the start command
• The data will allow a particular application page to be selected when the program starts
and can also pass request information to that page

• To start using Voice Commands you must Create a Voice Command Definition (VCD) file
that defines all the spoken commands
• The application then calls a method to register the words and phrases the first time

it is run
The Fortune Teller Program
• The Fortune Teller program will tell
your future

• You can ask it questions and it will
display replies
• It could also speak them

• Some of the spoken commands activate
different pages of the application and
others are processed by the application

when it starts running
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>
• This is the “money” question: “Fortune Teller Will I find money”
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the phrase the user
says to trigger the
command
• All of the Fortune Teller
commands start with this
phrase
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is example text that
will be displayed by the
help for this app as an
example of the commands
the app supports
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the command
name
• This can be obtained from
the URL by the application
when it starts
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the example for this
specific command
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the trigger phrase for
this command
• It can be a sequence of
words
• The user must prefix this
sequence with the words
“Fortune Teller”
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the phraselist for the
command
• The user can say any of the
words in the phraselist to
match this command

• The application can
determine the phrase used
• The phraselist can be
changed by the application
dynamically
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the spoken feedback
from the command
• The feedback will insert the
phrase item used to
activate the command
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• This is the url for the page
to be activated by the
command

• Commands can go to
different pages, or all go to
MainPage.xaml if required
The Voice Command Definition (VCD) file
<CommandPrefix> Fortune Teller </CommandPrefix>
<Example> Will I find money </Example>
<Command Name="showMoney">
<Example> Will I find money </Example>
<ListenFor> [Will I find] {futureMoney} </ListenFor>
<Feedback> Showing {futureMoney} </Feedback>
<Navigate Target="/money.xaml"/>
</Command>
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

• These are the phrases that
can be used at the end of
the command

• The application can modify
the phrase list of a
command dynamically
• It could give movie times
for films by name
Installing a Voice Command Definition (VCD) file
async void setupVoiceCommands()
{
await VoiceCommandService.InstallCommandSetsFromFileAsync(
new Uri("ms-appx:///VCDCommands.xml", UriKind.RelativeOrAbsolute));
}

• The VCD file can be loaded from the application or from any URI
• In this case it is just a file that has been added to the project and marked as Content
• The VCD can also be changed by the application when it is running
• The voice commands for an application are loaded into the voice command service when
the application runs
• The application must run at least once to configure the voice commands
Launching Your App With a Voice Command
• If the user now presses and holds the Windows button, and says:
Fortune Teller, Will I find gold?

the Phone displays “Showing gold”
• It then launches your app and navigates to the page associated with this command, which is
/Money.xaml
• The query string passed to the page looks like this:
"/?voiceCommandName=showMoney&futureMoney=gold&reco=Fortune%20Teller%Will%20I%20find%20gold"

Command
Name

Phaselist
Name

Recognized
phrase

Whole phrase as it
was recognized
Handling Voice Commands
if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New) {
if (NavigationContext.QueryString.ContainsKey("voiceCommandName")) {
string command = NavigationContext.QueryString["voiceCommandName"];
switch command) {
case "tellJoke":
messageTextBlock.Text = "Insert really funny joke here";
break;
// Add cases for other commands.
default:
messageTextBlock.Text = "Sorry, what you said makes no sense.";
break;
}
}
}

• This code runs in the OnNavigatedTo method of a target page
• Can also check for the voice command phrase that was used
Identifying phrases
<PhraseList Label="futureMoney">
<Item> money </Item>
<Item> riches </Item>
<Item> gold </Item>
</PhraseList>

string moneyPhrase = NavigationContext.QueryString["futureMoney"];
• The navigation context can be queried to determine the phrase used to trigger the navigation
• In this case the program is selecting between the phrase used in the “riches” question
Demo 2:
Fortune Teller
Modifying the phrase list
VoiceCommandSet fortuneVcs = VoiceCommandService.InstalledCommandSets["en-US"];

await fortuneVcs.UpdatePhraseListAsync("futureMoney",
new string[] { "money", "cash", “millions", “piles of dough" });

• An application can modify a phrase list when it is running

• It cannot add new commands however
• This would allow a program to implement behaviours such as:
“Movie Planner tell me showings for Batman”
Simple Speech Input
Recognizing Free Speech
• A Windows Phone application can recognize words and phrases
and pass them to your program
• From my experiments it seems quite reliable

• Note that a network connection is required for this feature if
you use the generic dictation grammar
• Your application can just use the speech string directly
• The standard “Listening” interface is displayed over
your application
Simple Speech Recognition
SpeechRecognizerUI recoWithUI;
async private void ListenButton_Click(object sender, RoutedEventArgs e)
{
this.recoWithUI = new SpeechRecognizerUI();
SpeechRecognitionUIResult recoResult =
await recoWithUI.RecognizeWithUIAsync();
if ( recoResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded )
MessageBox.Show(string.Format("You said {0}.",
recoResult.RecognitionResult.Text));
}
• The above method checks for a successful response
• By default the system uses the language settings on the Phone
Customizing Speech Recognition
• InitialSilenceTimeout
• The time that the speech recognizer will wait until it hears speech

• The default setting is 5 seconds
• BabbleTimeout
• The time that the speech recognizer will listen while it hears background noise

• The default setting is 0 seconds (the feature is not activated)
• EndSilenceTimeout
• The time interval during which the speech recognizer will wait before finalizing the

recognition operation
• The default setting is 150 milliseconds
Customizing Speech Recognition
recoWithUI.Settings.ReadoutEnabled = false; // don't read the saying back
recoWithUI.Settings.ShowConfirmation = false; // don't show the confirmation

recoWithUI.Recognizer.Settings.InitialSilenceTimeout = TimeSpan.FromSeconds(6.0);
recoWithUI.Recognizer.Settings.BabbleTimeout = TimeSpan.FromSeconds(4.0);
recoWithUI.Recognizer.Settings.EndSilenceTimeout = TimeSpan.FromSeconds(1.2);

• A program can also select whether or not the speech recognition echoes back the user
input and displays it in a message box
• The code above also sets timeout values
Handling Errors
recoWithUI.Recognizer.AudioProblemOccurred +=Recognizer_AudioProblemOccurred;
recoWithUI.Recognizer.AudioCaptureStateChanged +=
Recognizer_AudioCaptureStateChanged;
...
void Recognizer_AudioProblemOccurred(SpeechRecognizer sender,
SpeechAudioProblemOccurredEventArgs args)
{
MessageBox.Show("PLease speak more clearly");
}

• An application can bind to events which indicate problems with the audio input
• There is also an event fired when the state of the capture changes
Using Grammars
Grammars and Speech input
• The simple speech recognition we have seen so far uses the “Short Dictation” grammar
which just captures the text and returns it to the application
• You can add your own grammars that will structure the conversation between the user and

the application
• Grammars can be created using the Speech Recognition Grammar Specification (SRGS)
Version 1.0 and stored as XML files loaded when the application runs

• This is a little complex, but worth the effort if you want to create applications with rich
language interaction with the user
• If the application just needs to identify particular commands you can use a grammar list to
achieve this
• Custom grammars can be handled on the client without any network access
Using Grammar Lists
string [] strengthNames = { "weak", "mild", "medium", "strong", "english"};
recoWithUI.Recognizer.Grammars.AddGrammarFromList("cheeseStrength",
strengthNames);

• To create a Grammar List an application defines an array of strings that form the words in
the list
• The Grammar can then be added to the recognizer and given a name
• Multiple grammar lists can be added to a grammar recognizer
• The recognizer will now resolve any of the words in the lists that have been supplied
Enabling and Disabling Grammar Lists
recoWithUI.Settings.ListenText = "How strong do you like your cheese?";
recoWithUI.Recognizer.Grammars["cheeseStrength"].Enabled = true;
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

• An application can enable or disable particular grammars before a recognition action
• It is also possible to set relative weightings of grammar lists
• The text displayed as part of the listen operation can also be set, as shown above
Determining the confidence in the result
SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync();

if ( recoResult.RecognitionResult.TextConfidence ==
SpeechRecognitionConfidence.High )
{
// select cheese based on strength value
}

• An application can determine the confidence that the speech system has in the result that
was obtained
• Result values are High, Medium, Low, Rejected
Matching Multiple Grammars
var alternatives = recoResult.RecognitionResult.GetAlternates(3);

• If the spoken input matches multiple grammars a program can obtain a list of the
alternative results using recoResult.RecognitionResult.GetAlternatives
• The list is supplied in order of confidence
• The application can then determine the best fit from the context of the voice request

• This list is also provided if the request used a more complex grammar
Profanity
• Words that are recognised as profanities are not displayed in the response from a
recognizer command
• The speech system will also not repeat them
• They are enclosed in <Profanity> </Profanity> when supplied to the program that
receives the speech data
Summary
• Applications in Windows Phone 8 can use speech generation and recognition to interact
with users
• Applications can produce speech output from text files which can be marked up with

Speech Synthesis Markup Language (SSML) to include sound files
• Applications can be started and provided with initial commands by registering a Voice
Command Definition File with the Windows Phone

• The commands can be picked up when a page is loaded, or the commands specify a
particular page to load
• An application can modify the phrase part of a command to change the

activation commands
• Applications can recognise speech using complex grammars or simple word lists
45

Summary and Next Steps…
Get Ready to Become a Windows Phone Developer

1

Download the SDK at dev.windowsphone.com

2

Windows Phone 8 Jump Start Training: http://bit.ly/wp8jump

Explore the Microsoft samples and start building apps in Visual Studio
Learn More About Windows Phone Development via Official Microsoft Videos
Windows Phone 8 Dev for Absolute Beginners: http://bit.ly/wp8devAB
Check Out Additional Learning Resources

3

Pluralsight WP Training: www.pluralsight.com/training/Courses#windows-phone
Nokia Developer: www.developer.nokia.com

4

Download Additional Resources & Become an Expert
Download the Windows Phone Toolkit: phone.codeplex.com

Nokia Developer Offers: http://bit.ly/nokiadevoffers
Windows Phone Resources
• Windows Phone Developer Blog: blogs.windows.com/windows_phone/b/wpdev
• Windows Phone Consumer Blog: blogs.windows.com/windows_phone/b/windowsphone

• Nokia WP Wiki: www.developer.nokia.com/Community/Wiki/Category:Windows_Phone
• Nokia Dvlup Challenges & Rewards: www.dvlup.com
• Nokia Conversations Blog: http://conversations.nokia.com
• Microsoft App Studio: http://apps.windowsstore.com
• Nick Landry’s Blog: ActiveNick.net

• Windows Phone Developer Magazine (online): http://flip.it/95YFG
• GeekChamp (WP & Win8 dev): www.geekchamp.com
• Windows Phone Central (News): www.wpcentral.com
Thank You!
Slides and demos will be posted on SlideShare (see links below)
Let me know how you liked this session. Your feedback is important and appreciated.
Blog: www.ActiveNick.net
Twitter: @ActiveNick
Email: activenick@mobility42.com
Mobile Apps: www.bigbaldapps.com
LinkedIn: www.linkedin.com/in/activenick
Website: www.mobility42.com
Slideshare: www.slideshare.net/ActiveNick

Más contenido relacionado

Destacado

PatSeer Patent Database Overview
PatSeer Patent Database OverviewPatSeer Patent Database Overview
PatSeer Patent Database Overview
Harshad Karmarkar
 
Building Connected IoT Gadgets with Particle.io & Azure
Building Connected IoT Gadgets with Particle.io & AzureBuilding Connected IoT Gadgets with Particle.io & Azure
Building Connected IoT Gadgets with Particle.io & Azure
Nick Landry
 
Good presentation!
Good presentation!Good presentation!
Good presentation!
Arry Arman
 

Destacado (14)

Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
A study of EMG based Speech Recognition
A study of EMG  based Speech Recognition A study of EMG  based Speech Recognition
A study of EMG based Speech Recognition
 
PatSeer Patent Database Overview
PatSeer Patent Database OverviewPatSeer Patent Database Overview
PatSeer Patent Database Overview
 
Building a Node.js Backend in the Cloud for Android Apps
Building a Node.js Backend in the Cloud for Android AppsBuilding a Node.js Backend in the Cloud for Android Apps
Building a Node.js Backend in the Cloud for Android Apps
 
Developing Windows Phone Apps with Maps and Location Services
Developing Windows Phone Apps with Maps and Location ServicesDeveloping Windows Phone Apps with Maps and Location Services
Developing Windows Phone Apps with Maps and Location Services
 
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi..."Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
"Automatic speech recognition for mobile applications in Yandex" — Fran Campi...
 
Microsoft Tools for Android Developers
Microsoft Tools for Android DevelopersMicrosoft Tools for Android Developers
Microsoft Tools for Android Developers
 
State of Union: Xamarin & Cross-Platform .NET in 2016 and Beyond
State of Union: Xamarin & Cross-Platform .NET in 2016 and BeyondState of Union: Xamarin & Cross-Platform .NET in 2016 and Beyond
State of Union: Xamarin & Cross-Platform .NET in 2016 and Beyond
 
Building Cloud-Enabled Cross-Platform Mobile Apps in C# with Azure App Services
Building Cloud-Enabled Cross-PlatformMobile Apps in C# with Azure App ServicesBuilding Cloud-Enabled Cross-PlatformMobile Apps in C# with Azure App Services
Building Cloud-Enabled Cross-Platform Mobile Apps in C# with Azure App Services
 
Hacking with the Raspberry Pi and Windows 10 IoT Core
Hacking with the Raspberry Pi and Windows 10 IoT CoreHacking with the Raspberry Pi and Windows 10 IoT Core
Hacking with the Raspberry Pi and Windows 10 IoT Core
 
Building Connected IoT Gadgets with Particle.io & Azure
Building Connected IoT Gadgets with Particle.io & AzureBuilding Connected IoT Gadgets with Particle.io & Azure
Building Connected IoT Gadgets with Particle.io & Azure
 
Good presentation!
Good presentation!Good presentation!
Good presentation!
 
Building a Windows 10 Game with C#, XAML and Win2D
Building a Windows 10 Game with C#, XAML and Win2DBuilding a Windows 10 Game with C#, XAML and Win2D
Building a Windows 10 Game with C#, XAML and Win2D
 
Scaling IoT: Telemetry, Command & Control, Analytics and the Cloud
Scaling IoT: Telemetry, Command & Control, Analytics and the CloudScaling IoT: Telemetry, Command & Control, Analytics and the Cloud
Scaling IoT: Telemetry, Command & Control, Analytics and the Cloud
 

Similar a Developing with Speech and Voice Recognition in Mobile Apps

Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...
Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...
Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...
Nick Landry
 
Multi Site Manager (25 Jan).pptx
Multi Site Manager (25 Jan).pptxMulti Site Manager (25 Jan).pptx
Multi Site Manager (25 Jan).pptx
shivani garg
 
Building Windows 10 Universal Apps with Speech and Cortana
Building Windows 10 Universal Apps with Speech and CortanaBuilding Windows 10 Universal Apps with Speech and Cortana
Building Windows 10 Universal Apps with Speech and Cortana
Nick Landry
 
Lets have some fun with twilio open tok
Lets have some fun with   twilio open tokLets have some fun with   twilio open tok
Lets have some fun with twilio open tok
mirahman
 

Similar a Developing with Speech and Voice Recognition in Mobile Apps (20)

Integrando nuestra Aplicación Windows Phone con Cortana
Integrando nuestra Aplicación Windows Phone con CortanaIntegrando nuestra Aplicación Windows Phone con Cortana
Integrando nuestra Aplicación Windows Phone con Cortana
 
Cortana for Windows Phone
Cortana for Windows PhoneCortana for Windows Phone
Cortana for Windows Phone
 
Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...
Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...
Beyond Cortana & Siri: Using Speech Recognition & Speech Synthesis for the Ne...
 
Hands free with cortana
Hands free with cortanaHands free with cortana
Hands free with cortana
 
Hey Cortana!
Hey Cortana!Hey Cortana!
Hey Cortana!
 
Integrating cortana with wp8 app
Integrating cortana with wp8 appIntegrating cortana with wp8 app
Integrating cortana with wp8 app
 
TDC 2014 - Cortana
TDC 2014 - CortanaTDC 2014 - Cortana
TDC 2014 - Cortana
 
Fonctions vocales sous Windows Phone : intégrez votre application à Cortana !
Fonctions vocales sous Windows Phone : intégrez votre application à Cortana !Fonctions vocales sous Windows Phone : intégrez votre application à Cortana !
Fonctions vocales sous Windows Phone : intégrez votre application à Cortana !
 
Speech for Windows Phone 8
Speech for Windows Phone 8Speech for Windows Phone 8
Speech for Windows Phone 8
 
Speech for Windows Phone 8
Speech for Windows Phone 8Speech for Windows Phone 8
Speech for Windows Phone 8
 
Multi Site Manager (25 Jan).pptx
Multi Site Manager (25 Jan).pptxMulti Site Manager (25 Jan).pptx
Multi Site Manager (25 Jan).pptx
 
Tropo Presentation at the Telecom API Workshop
Tropo Presentation at the Telecom API WorkshopTropo Presentation at the Telecom API Workshop
Tropo Presentation at the Telecom API Workshop
 
ITT 2014 - Matt Brenner- Localization 2.0
ITT 2014 - Matt Brenner- Localization 2.0ITT 2014 - Matt Brenner- Localization 2.0
ITT 2014 - Matt Brenner- Localization 2.0
 
Droidcon ppt
Droidcon pptDroidcon ppt
Droidcon ppt
 
Building Windows 10 Universal Apps with Speech and Cortana
Building Windows 10 Universal Apps with Speech and CortanaBuilding Windows 10 Universal Apps with Speech and Cortana
Building Windows 10 Universal Apps with Speech and Cortana
 
Rapid Prototyping Chatter with a PHP/Hack Canvas App on Heroku
Rapid Prototyping Chatter with a PHP/Hack Canvas App on HerokuRapid Prototyping Chatter with a PHP/Hack Canvas App on Heroku
Rapid Prototyping Chatter with a PHP/Hack Canvas App on Heroku
 
Inbox love
Inbox loveInbox love
Inbox love
 
Lets have some fun with twilio open tok
Lets have some fun with   twilio open tokLets have some fun with   twilio open tok
Lets have some fun with twilio open tok
 
Android design patterns
Android design patternsAndroid design patterns
Android design patterns
 
Word Talk Tutorial
Word Talk TutorialWord Talk Tutorial
Word Talk Tutorial
 

Más de Nick Landry

Cognitive Services: Building Smart Apps with Speech, NLP & Vision
Cognitive Services: Building Smart Apps with Speech, NLP & VisionCognitive Services: Building Smart Apps with Speech, NLP & Vision
Cognitive Services: Building Smart Apps with Speech, NLP & Vision
Nick Landry
 
Building Mobile Cross-Platform Apps for iOS, Android & Windows in C# with Xam...
Building Mobile Cross-Platform Apps foriOS, Android & Windows in C# with Xam...Building Mobile Cross-Platform Apps foriOS, Android & Windows in C# with Xam...
Building Mobile Cross-Platform Apps for iOS, Android & Windows in C# with Xam...
Nick Landry
 
Cloud-enabling the Next Generation of Mobile Apps
Cloud-enabling the Next Generation of Mobile AppsCloud-enabling the Next Generation of Mobile Apps
Cloud-enabling the Next Generation of Mobile Apps
Nick Landry
 

Más de Nick Landry (20)

Designing XR Experiences with Speech & Natural Language Understanding in Unity
Designing XR Experiences with Speech & Natural Language Understandingin UnityDesigning XR Experiences with Speech & Natural Language Understandingin Unity
Designing XR Experiences with Speech & Natural Language Understanding in Unity
 
MR + AI: Machine Learning for Language in HoloLens & VR Apps
MR + AI: Machine Learning for Language in HoloLens & VR AppsMR + AI: Machine Learning for Language in HoloLens & VR Apps
MR + AI: Machine Learning for Language in HoloLens & VR Apps
 
Building Holographic & VR Experiences Using the Mixed Reality Toolkit for Unity
Building Holographic & VR Experiences Using the Mixed Reality Toolkit for UnityBuilding Holographic & VR Experiences Using the Mixed Reality Toolkit for Unity
Building Holographic & VR Experiences Using the Mixed Reality Toolkit for Unity
 
Developing for Xbox as an Indie in 2018
Developing for Xbox as an Indie in 2018Developing for Xbox as an Indie in 2018
Developing for Xbox as an Indie in 2018
 
Mixed Reality Development Overview
Mixed Reality Development OverviewMixed Reality Development Overview
Mixed Reality Development Overview
 
Bots are the New Apps: Building Bots with ASP.NET WebAPI & Language Understan...
Bots are the New Apps: Building Bots with ASP.NET WebAPI & Language Understan...Bots are the New Apps: Building Bots with ASP.NET WebAPI & Language Understan...
Bots are the New Apps: Building Bots with ASP.NET WebAPI & Language Understan...
 
Mobilizing your Existing Enterprise Applications
Mobilizing your Existing Enterprise ApplicationsMobilizing your Existing Enterprise Applications
Mobilizing your Existing Enterprise Applications
 
Lessons Learned from Real World Xamarin.Forms Projects
Lessons Learned from Real World Xamarin.Forms ProjectsLessons Learned from Real World Xamarin.Forms Projects
Lessons Learned from Real World Xamarin.Forms Projects
 
Building Mixed Reality Experiences with the HoloToolkit for Unity
Building Mixed Reality Experiences with the HoloToolkit for UnityBuilding Mixed Reality Experiences with the HoloToolkit for Unity
Building Mixed Reality Experiences with the HoloToolkit for Unity
 
Microsoft Speech Technologies for Developers
Microsoft Speech Technologies for DevelopersMicrosoft Speech Technologies for Developers
Microsoft Speech Technologies for Developers
 
Building Mixed Reality Experiences for Microsoft HoloLens
Building Mixed Reality Experiences for Microsoft HoloLensBuilding Mixed Reality Experiences for Microsoft HoloLens
Building Mixed Reality Experiences for Microsoft HoloLens
 
Building a Cross-Platform Mobile App Backend in the Cloud with Node.js
Building a Cross-Platform Mobile App Backend in the Cloud with Node.jsBuilding a Cross-Platform Mobile App Backend in the Cloud with Node.js
Building a Cross-Platform Mobile App Backend in the Cloud with Node.js
 
Building Mixed Reality Experiences for Microsoft HoloLens in Unity
Building Mixed Reality Experiences for Microsoft HoloLens in UnityBuilding Mixed Reality Experiences for Microsoft HoloLens in Unity
Building Mixed Reality Experiences for Microsoft HoloLens in Unity
 
Cognitive Services: Building Smart Apps with Speech, NLP & Vision
Cognitive Services: Building Smart Apps with Speech, NLP & VisionCognitive Services: Building Smart Apps with Speech, NLP & Vision
Cognitive Services: Building Smart Apps with Speech, NLP & Vision
 
Bots are the New Apps: Building with the Bot Framework & Language Understanding
Bots are the New Apps: Building with the Bot Framework & Language UnderstandingBots are the New Apps: Building with the Bot Framework & Language Understanding
Bots are the New Apps: Building with the Bot Framework & Language Understanding
 
From Oculus to HoloLens: Building Virtual & Mixed Reality Apps & Games
From Oculus to HoloLens: Building Virtual & Mixed Reality Apps & GamesFrom Oculus to HoloLens: Building Virtual & Mixed Reality Apps & Games
From Oculus to HoloLens: Building Virtual & Mixed Reality Apps & Games
 
Building a New Generation of Mobile Games with Speech
Building a New Generation of Mobile Games with SpeechBuilding a New Generation of Mobile Games with Speech
Building a New Generation of Mobile Games with Speech
 
Building a Startup for the Mobile-first, Cloud-first World
Building a Startup for the Mobile-first, Cloud-first WorldBuilding a Startup for the Mobile-first, Cloud-first World
Building a Startup for the Mobile-first, Cloud-first World
 
Building Mobile Cross-Platform Apps for iOS, Android & Windows in C# with Xam...
Building Mobile Cross-Platform Apps foriOS, Android & Windows in C# with Xam...Building Mobile Cross-Platform Apps foriOS, Android & Windows in C# with Xam...
Building Mobile Cross-Platform Apps for iOS, Android & Windows in C# with Xam...
 
Cloud-enabling the Next Generation of Mobile Apps
Cloud-enabling the Next Generation of Mobile AppsCloud-enabling the Next Generation of Mobile Apps
Cloud-enabling the Next Generation of Mobile Apps
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Developing with Speech and Voice Recognition in Mobile Apps

  • 1. M3Conference Developing with Speech and Voice Recognition in Mobile Apps Nick Landry, MVP App Artisan Nokia Developer Champion & Ambassador activenick@mobility42.com @ActiveNick – www.mobility42.com talk2me
  • 2. Who is ActiveNick? • App Artisan – Mobile Development Consultant – Mobility42 • Microsoft MVP: Windows Phone Development • Mobile Publisher – Big Bald Apps: http://www.bigbaldapps.com • Nokia Developer Champion and Ambassador • Speaker. Blogger. Author. Tweeter. Gamer • 20+ Years of Professional Experience • Specialties: • Mobile Development • Location Intelligence & Geospatial Systems • Data Visualization, HPC, Cloud • Mobile Game Development • Blog: www.ActiveNick.net • Twitter: @ActiveNick
  • 3. Agenda • Speech on Windows Phone 8 • Speech synthesis • Controlling applications using speech • Voice command definition files • Building conversations • Selecting application entry points • Simple speech input • Speech input and grammars • Using Grammar Lists
  • 5. Windows Phone Speech Support • Windows Phone 7.x had voice support built into the operating system • Programs and phone features could be started by voice commands e.g “Start MyApp” • Incoming SMS messages could be read to the user • The user could compose and send SMS messages • Windows Phone 8 builds on this to allow applications to make use of speech • Applications can speak messages using the Speech Synthesis feature • Applications can be started and given commands • Applications can accept commands using voice input • Speech recognition requires an internet connection, but Speech Synthesis does not
  • 7. Enabling Speech Synthesis • If an application wishes to use speech output the ID_CAP_SPEECH_RECOGNITION capability must be enabled in WMAppManifest.xml • The application can also reference the Synthesis namespace using Windows.Phone.Speech.Synthesis;
  • 8. Simple Speech async void CheeseLiker() { SpeechSynthesizer synth = new SpeechSynthesizer(); await synth.SpeakTextAsync("I like cheese."); } • The SpeechSynthesizer class provides a simple way to produce speech • The SpeakTextAsync method speaks the content of the string using the default voice • Note that the method is an asynchronous one, so the calling method must use the async modifier • Speech output does not require a network connection
  • 9. Selecting a language // Query for a voice that speaks French. var frenchVoices = from voice in InstalledVoices.All where voice.Language == "fr-FR" select voice; // Set the voice as identified by the query. synth.SetVoice(frenchVoices.ElementAt(0)); • The default speaking voice is selected automatically from the locale set for the phone • The InstalledVoices class provides a list of all the voices available on the phone • The above code selects a French voice
  • 10. Demo 1: Speech Synthesis and Voice Selection talk2me - http://bit.ly/wpt2m
  • 11. Speech Synthesis Markup Language <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns=http://www.w3.org/2001/10/synthesis xml:lang="en-US"> <p> Your <say-as interpret-as="ordinal">1st</say-as> request was for <say-as interpret-as="cardinal">1</say-as> room on <say-as interpret-as="date" format="mdy">10/19/2010</say-as> , arriving at <say-as interpret-as="time" format="hms12">12:35pm</say-as>. </p> </speak> • You can use Speech Synthesis Markup Language (SSML) to control the spoken output • Change the voice, pitch, rate, volume, pronunciation and other characteristics • Also allows the inclusion of audio files into the spoken output • You can also use the Speech synthesizer to speak the contents of a file
  • 13. Application Launching using Voice command • The Voice Command feature of Windows Phone 7 allowed users to start applications • In Windows Phone 8 the feature has been expanded to allow the user to request data from the application in the start command • The data will allow a particular application page to be selected when the program starts and can also pass request information to that page • To start using Voice Commands you must Create a Voice Command Definition (VCD) file that defines all the spoken commands • The application then calls a method to register the words and phrases the first time it is run
  • 14. The Fortune Teller Program • The Fortune Teller program will tell your future • You can ask it questions and it will display replies • It could also speak them • Some of the spoken commands activate different pages of the application and others are processed by the application when it starts running
  • 15. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the “money” question: “Fortune Teller Will I find money”
  • 16. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the phrase the user says to trigger the command • All of the Fortune Teller commands start with this phrase
  • 17. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is example text that will be displayed by the help for this app as an example of the commands the app supports
  • 18. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the command name • This can be obtained from the URL by the application when it starts
  • 19. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the example for this specific command
  • 20. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the trigger phrase for this command • It can be a sequence of words • The user must prefix this sequence with the words “Fortune Teller”
  • 21. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the phraselist for the command • The user can say any of the words in the phraselist to match this command • The application can determine the phrase used • The phraselist can be changed by the application dynamically
  • 22. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the spoken feedback from the command • The feedback will insert the phrase item used to activate the command
  • 23. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • This is the url for the page to be activated by the command • Commands can go to different pages, or all go to MainPage.xaml if required
  • 24. The Voice Command Definition (VCD) file <CommandPrefix> Fortune Teller </CommandPrefix> <Example> Will I find money </Example> <Command Name="showMoney"> <Example> Will I find money </Example> <ListenFor> [Will I find] {futureMoney} </ListenFor> <Feedback> Showing {futureMoney} </Feedback> <Navigate Target="/money.xaml"/> </Command> <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> • These are the phrases that can be used at the end of the command • The application can modify the phrase list of a command dynamically • It could give movie times for films by name
  • 25. Installing a Voice Command Definition (VCD) file async void setupVoiceCommands() { await VoiceCommandService.InstallCommandSetsFromFileAsync( new Uri("ms-appx:///VCDCommands.xml", UriKind.RelativeOrAbsolute)); } • The VCD file can be loaded from the application or from any URI • In this case it is just a file that has been added to the project and marked as Content • The VCD can also be changed by the application when it is running • The voice commands for an application are loaded into the voice command service when the application runs • The application must run at least once to configure the voice commands
  • 26. Launching Your App With a Voice Command • If the user now presses and holds the Windows button, and says: Fortune Teller, Will I find gold? the Phone displays “Showing gold” • It then launches your app and navigates to the page associated with this command, which is /Money.xaml • The query string passed to the page looks like this: "/?voiceCommandName=showMoney&futureMoney=gold&reco=Fortune%20Teller%Will%20I%20find%20gold" Command Name Phaselist Name Recognized phrase Whole phrase as it was recognized
  • 27. Handling Voice Commands if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New) { if (NavigationContext.QueryString.ContainsKey("voiceCommandName")) { string command = NavigationContext.QueryString["voiceCommandName"]; switch command) { case "tellJoke": messageTextBlock.Text = "Insert really funny joke here"; break; // Add cases for other commands. default: messageTextBlock.Text = "Sorry, what you said makes no sense."; break; } } } • This code runs in the OnNavigatedTo method of a target page • Can also check for the voice command phrase that was used
  • 28. Identifying phrases <PhraseList Label="futureMoney"> <Item> money </Item> <Item> riches </Item> <Item> gold </Item> </PhraseList> string moneyPhrase = NavigationContext.QueryString["futureMoney"]; • The navigation context can be queried to determine the phrase used to trigger the navigation • In this case the program is selecting between the phrase used in the “riches” question
  • 30. Modifying the phrase list VoiceCommandSet fortuneVcs = VoiceCommandService.InstalledCommandSets["en-US"]; await fortuneVcs.UpdatePhraseListAsync("futureMoney", new string[] { "money", "cash", “millions", “piles of dough" }); • An application can modify a phrase list when it is running • It cannot add new commands however • This would allow a program to implement behaviours such as: “Movie Planner tell me showings for Batman”
  • 32. Recognizing Free Speech • A Windows Phone application can recognize words and phrases and pass them to your program • From my experiments it seems quite reliable • Note that a network connection is required for this feature if you use the generic dictation grammar • Your application can just use the speech string directly • The standard “Listening” interface is displayed over your application
  • 33. Simple Speech Recognition SpeechRecognizerUI recoWithUI; async private void ListenButton_Click(object sender, RoutedEventArgs e) { this.recoWithUI = new SpeechRecognizerUI(); SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync(); if ( recoResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded ) MessageBox.Show(string.Format("You said {0}.", recoResult.RecognitionResult.Text)); } • The above method checks for a successful response • By default the system uses the language settings on the Phone
  • 34. Customizing Speech Recognition • InitialSilenceTimeout • The time that the speech recognizer will wait until it hears speech • The default setting is 5 seconds • BabbleTimeout • The time that the speech recognizer will listen while it hears background noise • The default setting is 0 seconds (the feature is not activated) • EndSilenceTimeout • The time interval during which the speech recognizer will wait before finalizing the recognition operation • The default setting is 150 milliseconds
  • 35. Customizing Speech Recognition recoWithUI.Settings.ReadoutEnabled = false; // don't read the saying back recoWithUI.Settings.ShowConfirmation = false; // don't show the confirmation recoWithUI.Recognizer.Settings.InitialSilenceTimeout = TimeSpan.FromSeconds(6.0); recoWithUI.Recognizer.Settings.BabbleTimeout = TimeSpan.FromSeconds(4.0); recoWithUI.Recognizer.Settings.EndSilenceTimeout = TimeSpan.FromSeconds(1.2); • A program can also select whether or not the speech recognition echoes back the user input and displays it in a message box • The code above also sets timeout values
  • 36. Handling Errors recoWithUI.Recognizer.AudioProblemOccurred +=Recognizer_AudioProblemOccurred; recoWithUI.Recognizer.AudioCaptureStateChanged += Recognizer_AudioCaptureStateChanged; ... void Recognizer_AudioProblemOccurred(SpeechRecognizer sender, SpeechAudioProblemOccurredEventArgs args) { MessageBox.Show("PLease speak more clearly"); } • An application can bind to events which indicate problems with the audio input • There is also an event fired when the state of the capture changes
  • 38. Grammars and Speech input • The simple speech recognition we have seen so far uses the “Short Dictation” grammar which just captures the text and returns it to the application • You can add your own grammars that will structure the conversation between the user and the application • Grammars can be created using the Speech Recognition Grammar Specification (SRGS) Version 1.0 and stored as XML files loaded when the application runs • This is a little complex, but worth the effort if you want to create applications with rich language interaction with the user • If the application just needs to identify particular commands you can use a grammar list to achieve this • Custom grammars can be handled on the client without any network access
  • 39. Using Grammar Lists string [] strengthNames = { "weak", "mild", "medium", "strong", "english"}; recoWithUI.Recognizer.Grammars.AddGrammarFromList("cheeseStrength", strengthNames); • To create a Grammar List an application defines an array of strings that form the words in the list • The Grammar can then be added to the recognizer and given a name • Multiple grammar lists can be added to a grammar recognizer • The recognizer will now resolve any of the words in the lists that have been supplied
  • 40. Enabling and Disabling Grammar Lists recoWithUI.Settings.ListenText = "How strong do you like your cheese?"; recoWithUI.Recognizer.Grammars["cheeseStrength"].Enabled = true; SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync(); • An application can enable or disable particular grammars before a recognition action • It is also possible to set relative weightings of grammar lists • The text displayed as part of the listen operation can also be set, as shown above
  • 41. Determining the confidence in the result SpeechRecognitionUIResult recoResult = await recoWithUI.RecognizeWithUIAsync(); if ( recoResult.RecognitionResult.TextConfidence == SpeechRecognitionConfidence.High ) { // select cheese based on strength value } • An application can determine the confidence that the speech system has in the result that was obtained • Result values are High, Medium, Low, Rejected
  • 42. Matching Multiple Grammars var alternatives = recoResult.RecognitionResult.GetAlternates(3); • If the spoken input matches multiple grammars a program can obtain a list of the alternative results using recoResult.RecognitionResult.GetAlternatives • The list is supplied in order of confidence • The application can then determine the best fit from the context of the voice request • This list is also provided if the request used a more complex grammar
  • 43. Profanity • Words that are recognised as profanities are not displayed in the response from a recognizer command • The speech system will also not repeat them • They are enclosed in <Profanity> </Profanity> when supplied to the program that receives the speech data
  • 44. Summary • Applications in Windows Phone 8 can use speech generation and recognition to interact with users • Applications can produce speech output from text files which can be marked up with Speech Synthesis Markup Language (SSML) to include sound files • Applications can be started and provided with initial commands by registering a Voice Command Definition File with the Windows Phone • The commands can be picked up when a page is loaded, or the commands specify a particular page to load • An application can modify the phrase part of a command to change the activation commands • Applications can recognise speech using complex grammars or simple word lists
  • 45. 45 Summary and Next Steps… Get Ready to Become a Windows Phone Developer 1 Download the SDK at dev.windowsphone.com 2 Windows Phone 8 Jump Start Training: http://bit.ly/wp8jump Explore the Microsoft samples and start building apps in Visual Studio Learn More About Windows Phone Development via Official Microsoft Videos Windows Phone 8 Dev for Absolute Beginners: http://bit.ly/wp8devAB Check Out Additional Learning Resources 3 Pluralsight WP Training: www.pluralsight.com/training/Courses#windows-phone Nokia Developer: www.developer.nokia.com 4 Download Additional Resources & Become an Expert Download the Windows Phone Toolkit: phone.codeplex.com Nokia Developer Offers: http://bit.ly/nokiadevoffers
  • 46. Windows Phone Resources • Windows Phone Developer Blog: blogs.windows.com/windows_phone/b/wpdev • Windows Phone Consumer Blog: blogs.windows.com/windows_phone/b/windowsphone • Nokia WP Wiki: www.developer.nokia.com/Community/Wiki/Category:Windows_Phone • Nokia Dvlup Challenges & Rewards: www.dvlup.com • Nokia Conversations Blog: http://conversations.nokia.com • Microsoft App Studio: http://apps.windowsstore.com • Nick Landry’s Blog: ActiveNick.net • Windows Phone Developer Magazine (online): http://flip.it/95YFG • GeekChamp (WP & Win8 dev): www.geekchamp.com • Windows Phone Central (News): www.wpcentral.com
  • 47. Thank You! Slides and demos will be posted on SlideShare (see links below) Let me know how you liked this session. Your feedback is important and appreciated. Blog: www.ActiveNick.net Twitter: @ActiveNick Email: activenick@mobility42.com Mobile Apps: www.bigbaldapps.com LinkedIn: www.linkedin.com/in/activenick Website: www.mobility42.com Slideshare: www.slideshare.net/ActiveNick