SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
09.25.2012
September 25, 2012




AT&T SPEECH API DEEP DIVE
                      Michael Owens (@mko on Twitter, mowens on Github)
                      Jay Lieske ( jay.lieske@att.com, jayatyp on Github)




                                                                                                                                  AT&T Developer Program
   ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
WHAT IS THE
    AT&T SPEECH API?




2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
How the
    AT&T
    Speech
    API Works




2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Powered by AT&T WATSON℠
    • Developed 20+ years
    • Optimized for different usage scenarios:
      • Web Search
      • Business Search
      • Question & Answer
      • Voicemail-to-Text
      • Short Message (SMS)
      • TV Search/Remote (U-Verse)
      • Generic Speech-to-Text
2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Simple Speech-to-Text
    • One REST endpoint
    • Accepts audio in WAV or AMR
    • Structured JSON response
       • Text spoken by user
       • Metrics to evaluate recognition quality
    • AT&T Native SDKs for Android and iOS
     handle audio capture and streaming




2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Apps in the Wild




    AT&T-Translator                                                                               Speak4it                          U4Verse-Easy-Remote



2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                       AT&T Developer Program
GETTING STARTED
    WITH THE AT&T
    SPEECH API




3
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Sign Up for API Access
    • j.mp/ATTDevSignUp
    • Free API Access for
     DevLab Attendees
    • Detailed Instructions in
     your Attendee Packet
    • Sign up with code
     “APILAB12”
    • AT&T Staff is on hand to
     answer questions and
     help get you set up

2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Before You Code
    • Get your API Keys from Developer portal:
      • Client ID (“API Key” on the AT&T Developer Portal)
      • Client Secret (“Secret Key” on the AT&T Developer Portal)
    • OAuth 2.0 client_credentials grant type
    • OAuth 2.0 access_token
    • Audio File Types:
      • AMR: narrowband, 12.2 kbits/s, 8 kHz sampling
      • WAV: 16 bit PCM WAV, single channel, 8 kHz sampling
    • Audio File Length:
      • Voicemail: 4 minutes or less
      • Other: 1 minute or less


2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Step 1: Connect via OAuth
    Request Method:                                                              POST
    Request URL:                                                                 https://api.att.com/oauth/token

    Request Headers: Content-Type: application/x-www-form-
                                                                                 urlencoded
    Request Body:                                                                client_id=ATT_API_CLIENT_ID
                                                                                 &client_secret=ATT_API_CLIENT_SECRET
                                                                                 &grant_type=client_credentials
                                                                                 &scope=SPEECH

    Response Body:                                                               {
                                                                                              "access_token": "xxyz123"
                                                                                 }




2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Step 2: POST Audio to AT&T
                                                        (Non-Streaming HTTP Request)
    Request Method: POST
    Request URL:    https://api.att.com/rest/1/SpeechToText
    Request Headers: Accept: application/json
                                                                                 Authorization: Bearer xxyz123
                                                                                 Content-Type: audio/wav
                                                                                 Content-Length: 1534
                                                                                 X-SpeechContext: BusinessSearch
    Request Body:                                                                 AUDIO_BINARY_DATA
    Note: The Audio Binary Data
    goes directly in POST Body,
    not a MIME Attachment.


2
        ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                       AT&T Developer Program
Step 2: POST Audio to AT&T
                                                                                     (Streaming HTTP Request)
    Request Method: POST
    Request URL:    https://api.att.com/rest/1/SpeechToText
    Request Headers: Accept: application/json
                                                                                Authorization: Bearer xxyz123
                                                                                Content-Type: audio/amr
                                                                                Transfer-Encoding: chunked
                                                                                X-SpeechContext: QuestionAndAnswer
    Request Body:                                                               200
    Note: Numbers are the                                                       AUDIO_BINARY_DATA_CHUNK
    recommended chunk size                                                      200
    in hexadecimal format.                                                      AUDIO_BINARY_DATA_CHUNK
                                                                                0
2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
AT&T SPEECH API
    EXAMPLE
    APPLICATION
    Download the Source:
    https://github.com/attdevsupport/2012DevLabExamples




4
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Transcription in Three Steps
         1. Capture Audio Input                                                              2. POST Audio to AT&T                      3. Use AT&T API Response

    Capturing audio input differs                                                   Once the audio input has been                     The AT&T API sends back a very
    from platform to platform.                                                      captured, we send the                             easy to parse JSON object with
                                                                                    compatible audio file from our                     the interpreted text.
    In our Basic Example, we use a                                                  server to the Speech API using
    small Adobe Flex app to access                                                                                                    In our Basic example, we
                                                                                    a simple POST.
    the mic via Flash, capture the                                                                                                    output this to the user’s screen
    audio in one of the two                                                         In our Basic Example, we use a                    pretty printed and syntax
    accepted formats, then save                                                     small Node.js module called                       highlighted, but you could do
    that newly created audio file to                                                 “Watson.js” (NPM: “watson-js”)                    much more.
    disk on the server.                                                             to OAuth to the Speech API
                                                                                                                                      In our Speech Labs, we will look
                                                                                    and then POST the audio file.
    In our Speech Labs, we will look                                                                                                  at other ways to use this data,
    at the methods by which you                                                     In our Speech Labs, we will do                    like searching for businesses
    can capture and stream audio                                                    this on iOS, Android, and Web.                    on Foursquare.
    directly to the Speech API.




2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                          AT&T Developer Program
Watson.js
    Node.js API Wrapper for the AT&T
    Speech API

     GitHub: http://github.com/mowens/watson-js/
     NPM: https://npmjs.org/package/watson-js




5
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Using Watson.js
    1. Require API Wrapper
          var WatsonClient = require(‘watson-js’);

    2. Set API Client Options
          var options = {
              client_id: ATT_API_CLIENT_ID,
              client_secret: ATT_API_CLIENT_SECRET,
              access_token: ACCESS_TOKEN,
              scope: "SPEECH",
              context: "Generic",
              access_token_url: "https://api.att.com/oauth/token",
              api_domain: "api.att.com"
           };

    3. Instantiate New API Client
          var Watson = new WatsonClient.Watson(options);

2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
The Methods of Watson.js
    Watson.getAccessToken(callback)
    Method for requesting a new OAuth Access Token using
    the Client Credentials grant type and passes the returned
    Access Token to the passed callback function.


    Watson.speechToText(speechFile, accessToken, callback)
    Method for piping a speech file (passed as an absolute file
    location) to the AT&T Speech API using the passed access
    token. The API Response’s JSON is returned to the passed
    callback function as parsed JSON.



2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
AT&T SPEECH API
    EXAMPLE APP CODE
    WALKTHROUGH
    Using the AT&T Speech API to convert
    generic audio to text in a web browser.
    example-basic in the examples repo




6
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Frameworks &
    Requirements:
    Server-side:
    • Node.js:                                  JavaScript platform for building fast, scalable network apps
    • FS:                                       Node.js File System module
    • Express:                                  Minimal web application framework for Node.js
    • Optimist:                                 Lightweight option parsing module for Node.js
    • HBS:                                      Express View Engine wrapper for Handlebars
    • Watson.js:                                Simple API Wrapper for AT&T Speech API

    Client-side:
    • jQuery:                                   The gold standard of client-side JavaScript libraries
    • swfobject:                                JavaScript to make embedding Flash objects easier
    • Bootstrap:                                Twitter’s CSS framework for quickly developing web apps


2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Capture Audio Input
    recorder.swf:
            Adobe Flex app that accesses the user’s microphone and emits events to JS
    recorder.js:
            JavaScript interface to receive events, update UI, and POST file to Node.js
    Node.js upload script:
            function cp(source, destination, callback) {
              fs.readFile(source, function(err, buf) {
                 fs.writeFile(destination, buf, callback);
              });
            }
            app.post('/upload', function(req, res) {
              cp(req.files.upload_file.filename.path, __dirname +
              req.files.upload_file.filename.name, function(err) {
                 res.send({ saved: 'saved' });
                 return;
              });
            });

2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
POST Audio to AT&T
    AJAX Request via POST from client side to Node.js
    // Receive an AJAX POST from client-side JavaScript
    app.post('/speechToText', function(req, res) {

      // Pass the audio file and access token to AT&T Speech API
      Watson.speechToText(__dirname + '/public/audio/audio.wav',
      this.access_token, function(err, reply) {

           // Pass any errors associated with API call to client-side JS
           if(err) { res.send({ error: err }); return; }

           // Return the parsed JSON to client-side JavaScript
           res.send(reply);
           return;

      });

    });


2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Use Speech API Response
    Example API Response, returned                                                                                                Response-
                                                                                                                                               What-The-Response-Parameter-Means
    from call using Content-Type of                                                                                               Parameter
    ‘application/json’:                                                                                                         Recognition    Body"object"for"the"AT&T"Speech"API"Response
                                                                                                                                 ResponseId    Unique"IdenGfier"for"a"specific"API"call
                                                                                                                                               Array"of"hypothesis"objects"(possible"
    {                                                                                                                                  NBest
                                                                                                                                               transcripGons"of"audio"data).
    "Recognition": {
                                                                                                                                               PlainKtext,"cleaned"up"representaGon"of"the"
      "ResponseId": "74a964bf2fe",                                                                                               ResultText    Hypothesis."This"should"be"used"when"displaying"
      "NBest": [ {                                                                                                                             the"text"to"users."
        "WordScores": [1, 0.75, 1, 0.75],                                                                                                      Confidence"score"for"the"overall"Hypothesis."
        "Confidence": 0.75,                                                                                                      Confidence    Scored"on"a"scale"from"0"(not"confident)"to"1.0"
                                                                                                                                               (very"confident)
        "Grade": "accept",
                                                                                                                                               Recommended"acGon"to"take"with"the"current"
        "ResultText": "This is a test.",                                                                                               Grade
                                                                                                                                               Hypothesis:"accept,"reject,"or"confirm
        "Words": [“This”, “is”, “a”,                                                                                                           Array"of"the"individual"words."Confidence"scores"
    “test.”],                                                                                                                          Words   for"each"word"are"available"in"the"WordScores"
        "LanguageId": "en-us",                                                                                                                 array."
        "Hypothesis": "This is a test."                                                                                                        Array"of"individual"confidence"scores"for"each"
                                                                                                                                 WordScores    word"in"the"ResultText"parameter."Corresponds"
        } ]                                                                                                                                    to"Words"array.
      }                                                                                                                                        RepresentaGon"of"the"response"language."
    }                                                                                                                            LanguageId    Supports"English"&"Spanish"in"Generic;"EnglishK
                                                                                                                                               only"in"other"contexts.
                                                                                                                                               The"raw"transcripGon"of"the"audio"that"was"
                                                                                                                                 Hypothesis
                                                                                                                                               interpreted.


2
        ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                                         AT&T Developer Program
Up Next:




                                     Michael Fitzpatrick

2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Up Next:




                                                          Jason Goecke
                                                           Adam Kalsey
2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
ADVANCED
    EXAMPLES
    What can you do with Speech-to-text?
     You could…
     • Make your mobile or web application accessible with voice commands
     • Post tweets using voice commands in a simple Twitter app
     • Add on-the-fly transcripts while recording in a podcasting app
     • Add captioning to videos hosted on your website automatically
     • Create real-time closed captions of a conference speaker’s presentation
     • Search for nearby places to check in at on Foursquare




7
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Speech Labs
    We’re now going to break out into three clusters, each focusing on a
    different technology stack. Work independently or with a partner!

           Web (Flex + Node.js)                                                                  iOS (Objective-C)                           Android (Java)

    In the Web Speech Lab, Michael                                                 In the iOS Speech Lab, Brant                       In the Android Speech Lab, Jay
    will be on hand to help get your                                               will help you try out the AT&T                     will help you try out the AT&T
    Node.js app working with the                                                   Speech API on iOS and go into                      Speech API on Android and go
    AT&T Speech API. Code up your                                                  more depth about the AT&T                          into more depth about the
    own Speech API app from                                                        Speech SDK for iOS.                                AT&T Speech SDK for Android.
    scratch, or you can start from a                                               The mobile SDK allows you to                       The mobile SDK allows you to
    boilerplate app that uses                                                      quickly capture and stream                         quickly capture and stream
    Foursquare to search for                                                       audio from your iPhone or iPad                     audio from your Android
    locations and allow you to                                                     app to the AT&T Speech API.                        phone or tablet app to the
    check-in from your web                                                                                                            AT&T Speech API.
    browser!




2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                          AT&T Developer Program
September 25, 2012




THANKS! ANY QUESTIONS?
                      Michael Owens (@mko on Twitter, mowens on Github)
                      Jay Lieske ( jay.lieske@att.com, jayatyp on Github)




                                                                                                                                  AT&T Developer Program
   ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.

Más contenido relacionado

Similar a AT&T 2012 DevLab Speech API Deep Dive

Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...Hitachi, Ltd. OSS Solution Center.
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...Rogue Wave Software
 
Launching a Successful and Secure API
Launching a Successful and Secure APILaunching a Successful and Secure API
Launching a Successful and Secure APINordic APIs
 
Metadata for Telepresence / Video conferencing
Metadata for Telepresence /  Video conferencingMetadata for Telepresence /  Video conferencing
Metadata for Telepresence / Video conferencingIMTC
 
Enterprise Global Messaging
Enterprise Global MessagingEnterprise Global Messaging
Enterprise Global MessagingJonathan Spinney
 
Secure your APIs using OAuth 2 and OpenID Connect
Secure your APIs using OAuth 2 and OpenID ConnectSecure your APIs using OAuth 2 and OpenID Connect
Secure your APIs using OAuth 2 and OpenID ConnectNordic APIs
 
AT&T Shape Hackathon Kick-off
AT&T Shape Hackathon Kick-offAT&T Shape Hackathon Kick-off
AT&T Shape Hackathon Kick-offEd Donahue
 
Who’s Knocking? Identity for APIs, Web and Mobile
Who’s Knocking? Identity for APIs, Web and MobileWho’s Knocking? Identity for APIs, Web and Mobile
Who’s Knocking? Identity for APIs, Web and MobileNordic APIs
 
1400 ping madsen-nordicapis-connect-01
1400 ping madsen-nordicapis-connect-011400 ping madsen-nordicapis-connect-01
1400 ping madsen-nordicapis-connect-01Nordic APIs
 
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachiapidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachiapidays
 
Open APIs - Risks and Rewards (Øredev 2013)
Open APIs - Risks and Rewards (Øredev 2013)Open APIs - Risks and Rewards (Øredev 2013)
Open APIs - Risks and Rewards (Øredev 2013)Nordic APIs
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT SecurityHannes Tschofenig
 
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Amazon Web Services
 
Webinar: Identity Wars: The Unified Platform Awakens
Webinar: Identity Wars: The Unified Platform AwakensWebinar: Identity Wars: The Unified Platform Awakens
Webinar: Identity Wars: The Unified Platform AwakensForgeRock
 
Multi-Network Location & SMS APIs
Multi-Network Location & SMS APIsMulti-Network Location & SMS APIs
Multi-Network Location & SMS APIsJonathan Spinney
 
Thadomal IEEE-HTML5-Workshop
Thadomal IEEE-HTML5-WorkshopThadomal IEEE-HTML5-Workshop
Thadomal IEEE-HTML5-WorkshopRomin Irani
 
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...DataWorks Summit/Hadoop Summit
 

Similar a AT&T 2012 DevLab Speech API Deep Dive (20)

Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
Consideration on Holder-of-Key Bound Token < from Financial-grade API (FAPI) ...
 
AT&T Enhanced WebRTC API Overview
AT&T Enhanced WebRTC API OverviewAT&T Enhanced WebRTC API Overview
AT&T Enhanced WebRTC API Overview
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...
 
Launching a Successful and Secure API
Launching a Successful and Secure APILaunching a Successful and Secure API
Launching a Successful and Secure API
 
Metadata for Telepresence / Video conferencing
Metadata for Telepresence /  Video conferencingMetadata for Telepresence /  Video conferencing
Metadata for Telepresence / Video conferencing
 
Enterprise Global Messaging
Enterprise Global MessagingEnterprise Global Messaging
Enterprise Global Messaging
 
Incorporating OAuth
Incorporating OAuthIncorporating OAuth
Incorporating OAuth
 
Secure your APIs using OAuth 2 and OpenID Connect
Secure your APIs using OAuth 2 and OpenID ConnectSecure your APIs using OAuth 2 and OpenID Connect
Secure your APIs using OAuth 2 and OpenID Connect
 
AT&T Shape Hackathon Kick-off
AT&T Shape Hackathon Kick-offAT&T Shape Hackathon Kick-off
AT&T Shape Hackathon Kick-off
 
Who’s Knocking? Identity for APIs, Web and Mobile
Who’s Knocking? Identity for APIs, Web and MobileWho’s Knocking? Identity for APIs, Web and Mobile
Who’s Knocking? Identity for APIs, Web and Mobile
 
1400 ping madsen-nordicapis-connect-01
1400 ping madsen-nordicapis-connect-011400 ping madsen-nordicapis-connect-01
1400 ping madsen-nordicapis-connect-01
 
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachiapidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
apidays Paris 2022 - Securing APIs in Open Banking, Takashi Norimatsu, Hitachi
 
Open APIs - Risks and Rewards (Øredev 2013)
Open APIs - Risks and Rewards (Øredev 2013)Open APIs - Risks and Rewards (Øredev 2013)
Open APIs - Risks and Rewards (Øredev 2013)
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
 
Webinar: Identity Wars: The Unified Platform Awakens
Webinar: Identity Wars: The Unified Platform AwakensWebinar: Identity Wars: The Unified Platform Awakens
Webinar: Identity Wars: The Unified Platform Awakens
 
Multi-Network Location & SMS APIs
Multi-Network Location & SMS APIsMulti-Network Location & SMS APIs
Multi-Network Location & SMS APIs
 
Thadomal IEEE-HTML5-Workshop
Thadomal IEEE-HTML5-WorkshopThadomal IEEE-HTML5-Workshop
Thadomal IEEE-HTML5-Workshop
 
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
Near Real-time Outlier Detection and Interpretation - Part 1 by Robert Thorma...
 
Near Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and InterpretationNear Real-Time Outlier Detection and Interpretation
Near Real-Time Outlier Detection and Interpretation
 

AT&T 2012 DevLab Speech API Deep Dive

  • 2. September 25, 2012 AT&T SPEECH API DEEP DIVE Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
  • 3. WHAT IS THE AT&T SPEECH API? 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 4. How the AT&T Speech API Works 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 5. Powered by AT&T WATSON℠ • Developed 20+ years • Optimized for different usage scenarios: • Web Search • Business Search • Question & Answer • Voicemail-to-Text • Short Message (SMS) • TV Search/Remote (U-Verse) • Generic Speech-to-Text 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 6. Simple Speech-to-Text • One REST endpoint • Accepts audio in WAV or AMR • Structured JSON response • Text spoken by user • Metrics to evaluate recognition quality • AT&T Native SDKs for Android and iOS handle audio capture and streaming 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 7. Apps in the Wild AT&T-Translator Speak4it U4Verse-Easy-Remote 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 8. GETTING STARTED WITH THE AT&T SPEECH API 3 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 9. Sign Up for API Access • j.mp/ATTDevSignUp • Free API Access for DevLab Attendees • Detailed Instructions in your Attendee Packet • Sign up with code “APILAB12” • AT&T Staff is on hand to answer questions and help get you set up 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 10. Before You Code • Get your API Keys from Developer portal: • Client ID (“API Key” on the AT&T Developer Portal) • Client Secret (“Secret Key” on the AT&T Developer Portal) • OAuth 2.0 client_credentials grant type • OAuth 2.0 access_token • Audio File Types: • AMR: narrowband, 12.2 kbits/s, 8 kHz sampling • WAV: 16 bit PCM WAV, single channel, 8 kHz sampling • Audio File Length: • Voicemail: 4 minutes or less • Other: 1 minute or less 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 11. Step 1: Connect via OAuth Request Method: POST Request URL: https://api.att.com/oauth/token Request Headers: Content-Type: application/x-www-form- urlencoded Request Body: client_id=ATT_API_CLIENT_ID &client_secret=ATT_API_CLIENT_SECRET &grant_type=client_credentials &scope=SPEECH Response Body: { "access_token": "xxyz123" } 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 12. Step 2: POST Audio to AT&T (Non-Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/wav Content-Length: 1534 X-SpeechContext: BusinessSearch Request Body: AUDIO_BINARY_DATA Note: The Audio Binary Data goes directly in POST Body, not a MIME Attachment. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 13. Step 2: POST Audio to AT&T (Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/amr Transfer-Encoding: chunked X-SpeechContext: QuestionAndAnswer Request Body: 200 Note: Numbers are the AUDIO_BINARY_DATA_CHUNK recommended chunk size 200 in hexadecimal format. AUDIO_BINARY_DATA_CHUNK 0 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 14. AT&T SPEECH API EXAMPLE APPLICATION Download the Source: https://github.com/attdevsupport/2012DevLabExamples 4 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 15. Transcription in Three Steps 1. Capture Audio Input 2. POST Audio to AT&T 3. Use AT&T API Response Capturing audio input differs Once the audio input has been The AT&T API sends back a very from platform to platform. captured, we send the easy to parse JSON object with compatible audio file from our the interpreted text. In our Basic Example, we use a server to the Speech API using small Adobe Flex app to access In our Basic example, we a simple POST. the mic via Flash, capture the output this to the user’s screen audio in one of the two In our Basic Example, we use a pretty printed and syntax accepted formats, then save small Node.js module called highlighted, but you could do that newly created audio file to “Watson.js” (NPM: “watson-js”) much more. disk on the server. to OAuth to the Speech API In our Speech Labs, we will look and then POST the audio file. In our Speech Labs, we will look at other ways to use this data, at the methods by which you In our Speech Labs, we will do like searching for businesses can capture and stream audio this on iOS, Android, and Web. on Foursquare. directly to the Speech API. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 16. Watson.js Node.js API Wrapper for the AT&T Speech API GitHub: http://github.com/mowens/watson-js/ NPM: https://npmjs.org/package/watson-js 5 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 17. Using Watson.js 1. Require API Wrapper var WatsonClient = require(‘watson-js’); 2. Set API Client Options var options = { client_id: ATT_API_CLIENT_ID, client_secret: ATT_API_CLIENT_SECRET, access_token: ACCESS_TOKEN, scope: "SPEECH", context: "Generic", access_token_url: "https://api.att.com/oauth/token", api_domain: "api.att.com" }; 3. Instantiate New API Client var Watson = new WatsonClient.Watson(options); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 18. The Methods of Watson.js Watson.getAccessToken(callback) Method for requesting a new OAuth Access Token using the Client Credentials grant type and passes the returned Access Token to the passed callback function. Watson.speechToText(speechFile, accessToken, callback) Method for piping a speech file (passed as an absolute file location) to the AT&T Speech API using the passed access token. The API Response’s JSON is returned to the passed callback function as parsed JSON. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 19. AT&T SPEECH API EXAMPLE APP CODE WALKTHROUGH Using the AT&T Speech API to convert generic audio to text in a web browser. example-basic in the examples repo 6 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 20. Frameworks & Requirements: Server-side: • Node.js: JavaScript platform for building fast, scalable network apps • FS: Node.js File System module • Express: Minimal web application framework for Node.js • Optimist: Lightweight option parsing module for Node.js • HBS: Express View Engine wrapper for Handlebars • Watson.js: Simple API Wrapper for AT&T Speech API Client-side: • jQuery: The gold standard of client-side JavaScript libraries • swfobject: JavaScript to make embedding Flash objects easier • Bootstrap: Twitter’s CSS framework for quickly developing web apps 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 21. Capture Audio Input recorder.swf: Adobe Flex app that accesses the user’s microphone and emits events to JS recorder.js: JavaScript interface to receive events, update UI, and POST file to Node.js Node.js upload script: function cp(source, destination, callback) { fs.readFile(source, function(err, buf) { fs.writeFile(destination, buf, callback); }); } app.post('/upload', function(req, res) { cp(req.files.upload_file.filename.path, __dirname + req.files.upload_file.filename.name, function(err) { res.send({ saved: 'saved' }); return; }); }); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 22. POST Audio to AT&T AJAX Request via POST from client side to Node.js // Receive an AJAX POST from client-side JavaScript app.post('/speechToText', function(req, res) { // Pass the audio file and access token to AT&T Speech API Watson.speechToText(__dirname + '/public/audio/audio.wav', this.access_token, function(err, reply) { // Pass any errors associated with API call to client-side JS if(err) { res.send({ error: err }); return; } // Return the parsed JSON to client-side JavaScript res.send(reply); return; }); }); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 23. Use Speech API Response Example API Response, returned Response- What-The-Response-Parameter-Means from call using Content-Type of Parameter ‘application/json’: Recognition Body"object"for"the"AT&T"Speech"API"Response ResponseId Unique"IdenGfier"for"a"specific"API"call Array"of"hypothesis"objects"(possible" { NBest transcripGons"of"audio"data). "Recognition": { PlainKtext,"cleaned"up"representaGon"of"the" "ResponseId": "74a964bf2fe", ResultText Hypothesis."This"should"be"used"when"displaying" "NBest": [ { the"text"to"users." "WordScores": [1, 0.75, 1, 0.75], Confidence"score"for"the"overall"Hypothesis." "Confidence": 0.75, Confidence Scored"on"a"scale"from"0"(not"confident)"to"1.0" (very"confident) "Grade": "accept", Recommended"acGon"to"take"with"the"current" "ResultText": "This is a test.", Grade Hypothesis:"accept,"reject,"or"confirm "Words": [“This”, “is”, “a”, Array"of"the"individual"words."Confidence"scores" “test.”], Words for"each"word"are"available"in"the"WordScores" "LanguageId": "en-us", array." "Hypothesis": "This is a test." Array"of"individual"confidence"scores"for"each" WordScores word"in"the"ResultText"parameter."Corresponds" } ] to"Words"array. } RepresentaGon"of"the"response"language." } LanguageId Supports"English"&"Spanish"in"Generic;"EnglishK only"in"other"contexts. The"raw"transcripGon"of"the"audio"that"was" Hypothesis interpreted. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 24. Up Next: Michael Fitzpatrick 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 25. Up Next: Jason Goecke Adam Kalsey 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 26. ADVANCED EXAMPLES What can you do with Speech-to-text? You could… • Make your mobile or web application accessible with voice commands • Post tweets using voice commands in a simple Twitter app • Add on-the-fly transcripts while recording in a podcasting app • Add captioning to videos hosted on your website automatically • Create real-time closed captions of a conference speaker’s presentation • Search for nearby places to check in at on Foursquare 7 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 27. Speech Labs We’re now going to break out into three clusters, each focusing on a different technology stack. Work independently or with a partner! Web (Flex + Node.js) iOS (Objective-C) Android (Java) In the Web Speech Lab, Michael In the iOS Speech Lab, Brant In the Android Speech Lab, Jay will be on hand to help get your will help you try out the AT&T will help you try out the AT&T Node.js app working with the Speech API on iOS and go into Speech API on Android and go AT&T Speech API. Code up your more depth about the AT&T into more depth about the own Speech API app from Speech SDK for iOS. AT&T Speech SDK for Android. scratch, or you can start from a The mobile SDK allows you to The mobile SDK allows you to boilerplate app that uses quickly capture and stream quickly capture and stream Foursquare to search for audio from your iPhone or iPad audio from your Android locations and allow you to app to the AT&T Speech API. phone or tablet app to the check-in from your web AT&T Speech API. browser! 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 28. September 25, 2012 THANKS! ANY QUESTIONS? Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.