Enviar búsqueda
Cargar
AT&T 2012 DevLab Speech API Deep Dive
•
3 recomendaciones
•
2,027 vistas
Michael Owens
Seguir
Speech given at the 2012 DevLab ( http://2012devlab.com ) about AT&T's Speech API.
Leer menos
Leer más
Denunciar
Compartir
Denunciar
Compartir
1 de 28
Descargar ahora
Descargar para leer sin conexión
Recomendados
OAuth 2.0 Updates #technight in Osaka
OAuth 2.0 Updates #technight in Osaka
Nov Matake
Incorporating OAuth: How to integrate OAuth into your mobile app
Incorporating OAuth: How to integrate OAuth into your mobile app
Nordic APIs
OpenID Connect via WebIntents
OpenID Connect via WebIntents
Nov Matake
OAuth 2.0 & OpenID Connect @ OpenSource Conference 2011 Tokyo #osc11tk
OAuth 2.0 & OpenID Connect @ OpenSource Conference 2011 Tokyo #osc11tk
Nov Matake
OAuth 1.0
OAuth 1.0
Nov Matake
SMS Passcode - Vcw Sales Presentation
SMS Passcode - Vcw Sales Presentation
VCW Security Ltd
Nordic APIs - Building a Secure API
Nordic APIs - Building a Secure API
Twobo Technologies
AT&T API Platform
AT&T API Platform
AT&T Developer Program
Más contenido relacionado
Similar a AT&T 2012 DevLab Speech API Deep Dive
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Hitachi, Ltd. OSS Solution Center.
AT&T Enhanced WebRTC API Overview
AT&T Enhanced WebRTC API Overview
AT&T Developer Program
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...
Rogue Wave Software
Launching a Successful and Secure API
Launching a Successful and Secure API
Nordic APIs
Metadata for Telepresence / Video conferencing
Metadata for Telepresence / Video conferencing
IMTC
Enterprise Global Messaging
Enterprise Global Messaging
Jonathan Spinney
Incorporating OAuth
Incorporating OAuth
Twobo Technologies
Secure your APIs using OAuth 2 and OpenID Connect
Secure your APIs using OAuth 2 and OpenID Connect
Nordic APIs
AT&T Shape Hackathon Kick-off
AT&T Shape Hackathon Kick-off
Ed Donahue
Who’s Knocking? Identity for APIs, Web and Mobile
Who’s Knocking? Identity for APIs, Web and Mobile
Nordic APIs
1400 ping madsen-nordicapis-connect-01
1400 ping madsen-nordicapis-connect-01
Nordic APIs
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays
Open APIs - Risks and Rewards (Øredev 2013)
Open APIs - Risks and Rewards (Øredev 2013)
Nordic APIs
The Role of Standards in IoT Security
The Role of Standards in IoT Security
Hannes Tschofenig
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Amazon Web Services
Webinar: Identity Wars: The Unified Platform Awakens
Webinar: Identity Wars: The Unified Platform Awakens
ForgeRock
Multi-Network Location & SMS APIs
Multi-Network Location & SMS APIs
Jonathan Spinney
Thadomal IEEE-HTML5-Workshop
Thadomal IEEE-HTML5-Workshop
Romin Irani
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
DataWorks Summit/Hadoop Summit
Near Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and Interpretation
DataWorks Summit/Hadoop Summit
Similar a AT&T 2012 DevLab Speech API Deep Dive
(20)
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
AT&T Enhanced WebRTC API Overview
AT&T Enhanced WebRTC API Overview
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...
Launching a Successful and Secure API
Launching a Successful and Secure API
Metadata for Telepresence / Video conferencing
Metadata for Telepresence / Video conferencing
Enterprise Global Messaging
Enterprise Global Messaging
Incorporating OAuth
Incorporating OAuth
Secure your APIs using OAuth 2 and OpenID Connect
Secure your APIs using OAuth 2 and OpenID Connect
AT&T Shape Hackathon Kick-off
AT&T Shape Hackathon Kick-off
Who’s Knocking? Identity for APIs, Web and Mobile
Who’s Knocking? Identity for APIs, Web and Mobile
1400 ping madsen-nordicapis-connect-01
1400 ping madsen-nordicapis-connect-01
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
Open APIs - Risks and Rewards (Øredev 2013)
Open APIs - Risks and Rewards (Øredev 2013)
The Role of Standards in IoT Security
The Role of Standards in IoT Security
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Webinar: Identity Wars: The Unified Platform Awakens
Webinar: Identity Wars: The Unified Platform Awakens
Multi-Network Location & SMS APIs
Multi-Network Location & SMS APIs
Thadomal IEEE-HTML5-Workshop
Thadomal IEEE-HTML5-Workshop
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and Interpretation
AT&T 2012 DevLab Speech API Deep Dive
1.
09.25.2012
2.
September 25, 2012 AT&T
SPEECH API DEEP DIVE Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
3.
WHAT IS THE
AT&T SPEECH API? 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
4.
How the
AT&T Speech API Works 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
5.
Powered by AT&T
WATSON℠ • Developed 20+ years • Optimized for different usage scenarios: • Web Search • Business Search • Question & Answer • Voicemail-to-Text • Short Message (SMS) • TV Search/Remote (U-Verse) • Generic Speech-to-Text 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
6.
Simple Speech-to-Text
• One REST endpoint • Accepts audio in WAV or AMR • Structured JSON response • Text spoken by user • Metrics to evaluate recognition quality • AT&T Native SDKs for Android and iOS handle audio capture and streaming 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
7.
Apps in the
Wild AT&T-Translator Speak4it U4Verse-Easy-Remote 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
8.
GETTING STARTED
WITH THE AT&T SPEECH API 3 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
9.
Sign Up for
API Access • j.mp/ATTDevSignUp • Free API Access for DevLab Attendees • Detailed Instructions in your Attendee Packet • Sign up with code “APILAB12” • AT&T Staff is on hand to answer questions and help get you set up 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
10.
Before You Code
• Get your API Keys from Developer portal: • Client ID (“API Key” on the AT&T Developer Portal) • Client Secret (“Secret Key” on the AT&T Developer Portal) • OAuth 2.0 client_credentials grant type • OAuth 2.0 access_token • Audio File Types: • AMR: narrowband, 12.2 kbits/s, 8 kHz sampling • WAV: 16 bit PCM WAV, single channel, 8 kHz sampling • Audio File Length: • Voicemail: 4 minutes or less • Other: 1 minute or less 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
11.
Step 1: Connect
via OAuth Request Method: POST Request URL: https://api.att.com/oauth/token Request Headers: Content-Type: application/x-www-form- urlencoded Request Body: client_id=ATT_API_CLIENT_ID &client_secret=ATT_API_CLIENT_SECRET &grant_type=client_credentials &scope=SPEECH Response Body: { "access_token": "xxyz123" } 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
12.
Step 2: POST
Audio to AT&T (Non-Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/wav Content-Length: 1534 X-SpeechContext: BusinessSearch Request Body: AUDIO_BINARY_DATA Note: The Audio Binary Data goes directly in POST Body, not a MIME Attachment. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
13.
Step 2: POST
Audio to AT&T (Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/amr Transfer-Encoding: chunked X-SpeechContext: QuestionAndAnswer Request Body: 200 Note: Numbers are the AUDIO_BINARY_DATA_CHUNK recommended chunk size 200 in hexadecimal format. AUDIO_BINARY_DATA_CHUNK 0 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
14.
AT&T SPEECH API
EXAMPLE APPLICATION Download the Source: https://github.com/attdevsupport/2012DevLabExamples 4 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
15.
Transcription in Three
Steps 1. Capture Audio Input 2. POST Audio to AT&T 3. Use AT&T API Response Capturing audio input differs Once the audio input has been The AT&T API sends back a very from platform to platform. captured, we send the easy to parse JSON object with compatible audio file from our the interpreted text. In our Basic Example, we use a server to the Speech API using small Adobe Flex app to access In our Basic example, we a simple POST. the mic via Flash, capture the output this to the user’s screen audio in one of the two In our Basic Example, we use a pretty printed and syntax accepted formats, then save small Node.js module called highlighted, but you could do that newly created audio file to “Watson.js” (NPM: “watson-js”) much more. disk on the server. to OAuth to the Speech API In our Speech Labs, we will look and then POST the audio file. In our Speech Labs, we will look at other ways to use this data, at the methods by which you In our Speech Labs, we will do like searching for businesses can capture and stream audio this on iOS, Android, and Web. on Foursquare. directly to the Speech API. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
16.
Watson.js
Node.js API Wrapper for the AT&T Speech API GitHub: http://github.com/mowens/watson-js/ NPM: https://npmjs.org/package/watson-js 5 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
17.
Using Watson.js
1. Require API Wrapper var WatsonClient = require(‘watson-js’); 2. Set API Client Options var options = { client_id: ATT_API_CLIENT_ID, client_secret: ATT_API_CLIENT_SECRET, access_token: ACCESS_TOKEN, scope: "SPEECH", context: "Generic", access_token_url: "https://api.att.com/oauth/token", api_domain: "api.att.com" }; 3. Instantiate New API Client var Watson = new WatsonClient.Watson(options); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
18.
The Methods of
Watson.js Watson.getAccessToken(callback) Method for requesting a new OAuth Access Token using the Client Credentials grant type and passes the returned Access Token to the passed callback function. Watson.speechToText(speechFile, accessToken, callback) Method for piping a speech file (passed as an absolute file location) to the AT&T Speech API using the passed access token. The API Response’s JSON is returned to the passed callback function as parsed JSON. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
19.
AT&T SPEECH API
EXAMPLE APP CODE WALKTHROUGH Using the AT&T Speech API to convert generic audio to text in a web browser. example-basic in the examples repo 6 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
20.
Frameworks &
Requirements: Server-side: • Node.js: JavaScript platform for building fast, scalable network apps • FS: Node.js File System module • Express: Minimal web application framework for Node.js • Optimist: Lightweight option parsing module for Node.js • HBS: Express View Engine wrapper for Handlebars • Watson.js: Simple API Wrapper for AT&T Speech API Client-side: • jQuery: The gold standard of client-side JavaScript libraries • swfobject: JavaScript to make embedding Flash objects easier • Bootstrap: Twitter’s CSS framework for quickly developing web apps 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
21.
Capture Audio Input
recorder.swf: Adobe Flex app that accesses the user’s microphone and emits events to JS recorder.js: JavaScript interface to receive events, update UI, and POST file to Node.js Node.js upload script: function cp(source, destination, callback) { fs.readFile(source, function(err, buf) { fs.writeFile(destination, buf, callback); }); } app.post('/upload', function(req, res) { cp(req.files.upload_file.filename.path, __dirname + req.files.upload_file.filename.name, function(err) { res.send({ saved: 'saved' }); return; }); }); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
22.
POST Audio to
AT&T AJAX Request via POST from client side to Node.js // Receive an AJAX POST from client-side JavaScript app.post('/speechToText', function(req, res) { // Pass the audio file and access token to AT&T Speech API Watson.speechToText(__dirname + '/public/audio/audio.wav', this.access_token, function(err, reply) { // Pass any errors associated with API call to client-side JS if(err) { res.send({ error: err }); return; } // Return the parsed JSON to client-side JavaScript res.send(reply); return; }); }); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
23.
Use Speech API
Response Example API Response, returned Response- What-The-Response-Parameter-Means from call using Content-Type of Parameter ‘application/json’: Recognition Body"object"for"the"AT&T"Speech"API"Response ResponseId Unique"IdenGfier"for"a"specific"API"call Array"of"hypothesis"objects"(possible" { NBest transcripGons"of"audio"data). "Recognition": { PlainKtext,"cleaned"up"representaGon"of"the" "ResponseId": "74a964bf2fe", ResultText Hypothesis."This"should"be"used"when"displaying" "NBest": [ { the"text"to"users." "WordScores": [1, 0.75, 1, 0.75], Confidence"score"for"the"overall"Hypothesis." "Confidence": 0.75, Confidence Scored"on"a"scale"from"0"(not"confident)"to"1.0" (very"confident) "Grade": "accept", Recommended"acGon"to"take"with"the"current" "ResultText": "This is a test.", Grade Hypothesis:"accept,"reject,"or"confirm "Words": [“This”, “is”, “a”, Array"of"the"individual"words."Confidence"scores" “test.”], Words for"each"word"are"available"in"the"WordScores" "LanguageId": "en-us", array." "Hypothesis": "This is a test." Array"of"individual"confidence"scores"for"each" WordScores word"in"the"ResultText"parameter."Corresponds" } ] to"Words"array. } RepresentaGon"of"the"response"language." } LanguageId Supports"English"&"Spanish"in"Generic;"EnglishK only"in"other"contexts. The"raw"transcripGon"of"the"audio"that"was" Hypothesis interpreted. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
24.
Up Next:
Michael Fitzpatrick 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
25.
Up Next:
Jason Goecke Adam Kalsey 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
26.
ADVANCED
EXAMPLES What can you do with Speech-to-text? You could… • Make your mobile or web application accessible with voice commands • Post tweets using voice commands in a simple Twitter app • Add on-the-fly transcripts while recording in a podcasting app • Add captioning to videos hosted on your website automatically • Create real-time closed captions of a conference speaker’s presentation • Search for nearby places to check in at on Foursquare 7 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
27.
Speech Labs
We’re now going to break out into three clusters, each focusing on a different technology stack. Work independently or with a partner! Web (Flex + Node.js) iOS (Objective-C) Android (Java) In the Web Speech Lab, Michael In the iOS Speech Lab, Brant In the Android Speech Lab, Jay will be on hand to help get your will help you try out the AT&T will help you try out the AT&T Node.js app working with the Speech API on iOS and go into Speech API on Android and go AT&T Speech API. Code up your more depth about the AT&T into more depth about the own Speech API app from Speech SDK for iOS. AT&T Speech SDK for Android. scratch, or you can start from a The mobile SDK allows you to The mobile SDK allows you to boilerplate app that uses quickly capture and stream quickly capture and stream Foursquare to search for audio from your iPhone or iPad audio from your Android locations and allow you to app to the AT&T Speech API. phone or tablet app to the check-in from your web AT&T Speech API. browser! 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
28.
September 25, 2012 THANKS!
ANY QUESTIONS? Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
Descargar ahora