PS2_FinalReport_2011B1A7689G

Page | 1
A
REPORT
ON
SPEECH MANAGEMENT SYSTEM DESIGN
By
Name of the Student ID No.
Trishu Dey 2011B1A7689G
AT
[24]7 ILabs, Bangalore
A Practice II Station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
June, 2016

Page | 2
A
REPORT
ON
SPEECH MANAGEMENT SYSTEM DESIGN
By
Name of the student(s) ID No.(s) Discipline(s)
Trishu Dey 2011B1A7689G B.E. Computer Science
Prepared in the partial fulfillment of the
Practice School II Course
AT
[24]7 ILabs, Bangalore
A Practice II Station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
June, 2016

Page | 3
ACKNOWLEDGEMENTS
First and foremost, I would like to thank P.V. Kannan, the founder and C.E.O of [24]7 ILabs to
have opened this wonderful company which gave me ample opportunities to learn and hone my
programming skills. Further, without the help and guidance of Mr. Subbarao Chunduri , manager
and my mentor Mrs. Smrti Atrey, Tech Lead at Speech Platform team of [24]7 ILabs, I would
have had a very difficult time adapting here. Being my mentor, I have a learnt a lot from her
already at how to perform well at a professional environment like this. They constantly inspired
me to take up the continuous challenges and have provided me with required study materials and
tips on the compilation of this report. I would like to thank my entire service team for their
constant guidance and support and constantly helping me with getting acquainted around the
work area. I would also like to show my gratitude towards Mr. Vineet Garg, the practice school
faculties in charge of [24]7 ILabs. They ensured that we get settled soon at the organization and
took care of all the difficulties we would have faced.

Page | 4
Abstract Sheet
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE
PILANI (RAJASTHAN)
Practice School Division
Station: [24]7 ILabs Centre: Bangalore
Duration: 5 months Date of Start: 18th
January,2016
Date of Submission: 8th
June, 2016
Title of the Project: Speech management system design
ID No./Name(s)/:2011B1A7689G Trishu Dey
Discipline(s) of the Student: B.E Computer Science
Name and Designation of the expert: Mrs. Smrti Atrey , Tech Lead , Speech Platform , [24]7 Ilabs.
Name(s) of the PS Faculty: Mr. Vineet Garg
Key Words: Utterance, Pod,DC,Sisyphyis,SV1,VA1,TPC,provDB
Project Areas: Analytics
Abstract: [24]7 ILabs Speech team carries out the Speech IVR analytics and quality testing.
Constant improvements are made in the existing systems to make them more time and space
efficient. Huge amount of data is processed on a regular basis and managing time is a key
factor driving the extraction.
My project on Utterance Capture deals with acquiring the data from the pod in as efficient
manner as possible and re-designing the existing model to make it more efficient. I am also
required to generate the grammar for a caller and design a GUI to access both. It involves the
use of tools like Java, MySql, Unix,Ruby on rails,Javascript etc.
Signature(s) of Student(s) Signature of PS Faculty
Trishu Dey
Date 8th
June, 2016

Page | 5
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE
PILANI (RAJASTHAN)
PRACTICE SCHOOL DIVISION
Response Option Sheet
Station: [24]7 ILabs Center: Bangalore
ID No. &Name(s): 2011B1A7689G Trishu Dey
Title of the Project: Speech management system design
Usefulness of the project to the on-campus courses of study in various disciplines. Project
should be scrutinized keeping in view the following response options. Write Course No. and
Course Name against the option under which the project comes.
Refer Bulletin for Course No. and Course Name.
Code No. Response Option Course No.(s) & Name
1. A new course can be designed out of this
project.
NO
2. The project can help modification of the course
content of some of the existing Courses
NO
3. The project can be used directly in some of the
existing Compulsory Discipline Courses
(CDC)/ Discipline Courses Other than
Compulsory (DCOC)/ Emerging Area (EA),
etc. Courses
OOP (Object Oriented
Programming)
4. The project can be used in preparatory courses
like Analysis and Application Oriented Courses
(AAOC)/ Engineering Science (ES)/ Technical
Art (TA) and Core Courses.
NO
5. This project cannot come under any of the
above mentioned options as it relates to the
professional work of the host organization.
NO
Signature of Student Signature of Faculty
Trishu Dey
Date: 8th
June, 2016

Page | 6
NO DUES CERTIFICATE
PS II station at: [24]7 Ilabs Date: 11th
March, 2016
Name: Trishu Dey ID No. : 2011B1A7689G
will be completing his/her Practice School Program in December. In case he /she has any dues, please
report it below against
your name. In case he/she has no dues please write NO DUES and sign.
1. Organization Coordinator : ______________________________________
2. Professional Expert : ______________________________________
3. Librarian : ______________________________________
4. Accounts Section : ______________________________________
5. PS Faculty : ______________________________________
6. Any Other : ______________________________________
7. Any Other : ______________________________________
Signature of PS Faculty

Page | 7
Table of Contents
S. no. Title Page No.
ACKNOWLEDGEMENT III
ABSTRACT IV
RESPONSE OPTION SHEET V
NO DUES CERTIFICATE VI
1 INTRODUCTION 9
1.1 [24]7 ILabs Company Overview 9
1.2 Self Services – [24]7 Speech Platform 12
2 UTTERANCE CAPTURE 13
2.1 Existing Utterance Capture Design 13
2.2 Design limitation 14
2.3 Issues other than the design limitation 15
2.4 Proposed New Design for Utterance capture 16
2.4.1 Highlights of the new infrastructure 16
2.4.2 Diagrammatic representation 17
3 UTTERANCE CAPTURE MANAGEMENT 19
3.1 Creating the Graphical User Interface 19
3.1.1 Home Page 19
3.1.2Utterance Capture Page 21
3.2 Saving Parameters into the Database 25
3.3 Reading the Parameters from the Database 27
3.4 Getting the Configuration 28
3.5 XML Parsing to get Host list 28
3.6 Secure Copy for Utterance files 29
3.6.1 Generating telbox-uuid mapping 29

Page | 8
S. no. Title Page No.
3.6.2 Secure copy to local system 29
3.7 Generating Grammar 32
3.7.1 Mysql queries for data 33
3.7.2 XML file generation 36
3.8 Exception Handling 37
3.9 Email notification 38
4 CONCLUSION 39
5 APPENDIX 40
6 REFERENCES 43

Page | 9
1. Introduction:
1.1 [24]7 ILabs Company Overview:
[24]7 Innovation Labs is renowned for delivering customer engagement solutions which enable
clients to interact with their own business from any possible place, across any possible channel/
device without ever having to restart. This unique Customer Engagement Platform makes omni-
channel (across-channels) journey possible. It is a cloud-based platform which powers each of
the products of ILabs, thus transforming data to create predictive omni-channel experiences in
customer service and sales.
The various departments at [24]7 are namely, product engineering, design, data science group
and service delivery. These departments at [24]7 ILabs are described below:
Product Engineering: This team is responsible for development and deployment of software for
the clients. The Product Engineering team is associated with providing business and technology
solutions to the clients. The operations agents of the contact centers that outsource voice and chat
service to clients make use of these products to provide a rich user experience.
Design: This team is focused on end user experience analysis and design. Designing stands at
the core of [24]7 software products. The team analyzes user response to different UI and then
selects and designs the best to increase sales. There is continuous evaluation of different designs
and then the one which best suits the requirements and needs are chosen.
Data Science Group: The DSG focuses on predictive analytics, data modeling, real-time
decision making, and machine learning at scale. The team aims at making proper use of the data
available and directs it to the right outcome by using predictive analysis.
Service Delivery: This team deals with the clients and ensures that a proper set of standards
guides the product development at all the phases of software development cycle. It lays down the
constraints to which all the products must adhere to in order to ensure a rich customer

Page | 10
engagement. The company also operates contact centers that outsource both voice and chat-agent
services, for sales and support, to companies worldwide. Its largest customers are in
telecommunications, financial services, retail, insurance and travel industries.
Together these departments put up a spectrum of products/ appliances that make [24]7 come into
being. Following is a brief overview of the products developed by [24]7 Ilabs:
[24]7 Customer Engagement Platform: This cloud-based platform supports all the other
products of [24]7. It is based on the Anticipate, Simplify and Learn framework. The coming
together of these elements in a single platform makes self-service and agent-assisted service in
and across web, mobile, chat, social, and phone channels predictive and more intuitive. Machine
learning along with big data enables anticipating customer’s intent in real time and thus provide
predictive omnichannel experiences across channels, devices, locations and time and also make
customer experience more intuitive and agents more effective and consequently impacting the
strategic and operational metrics that matter most.
Self-Service: Customers may choose to serve themselves (self-service is the most commonly
selected platform) through digital self-service at any time, on any channel and device as per their
convenience. The self-service solutions are namely, [24]7 Virtual Agent and [24]7 Speech (the
team assigned to me as a part of PS-II), are integrated on the [24]7 Customer Engagement
Platform.
[24]7 Speech can be defined as a multimodal self-service solution which has reinvented IVR
using new and powerful digital capabilities which are designed to enable users to self-serve
effectively.
[24]7 Assist: For most customers the first choice to start their journey is self-service. If they need
any help of a live agent, there is an integrated self-service and assisted service channel that
enable users to seamlessly transition to chat without having to restart. Users expect the company
to maintain a context of their problem and guess their intent. Assist provides an intelligent,
interactive platform for live chat and voice assistance that focuses on the entire end user
experiences. It leverages analytics and optimized contact center software to make agents more
effective.

Page | 11
[24]7 Social: This is a suite of apps to effectively add a social influence to customers’ purchase
and service journeys, and to drive higher brand reach and sales. Customers can share their
experiences on the social platform and provide insights into the quality of service provided to
other users.
[24]7 Mobile: The aim of the Mobile is to provide more engaging mobile experiences. This is
done by providing intelligent mobile apps, visual IVR and mobile chat solutions. Thus customer
engagement is increased whether the customer journey involves self-service, assisted service or
both.
[24]7 Virtual Agent: This is the latest software product. This product involves automation of the
process of delivering correct answers to customer questions posed across a variety of interaction
channels. The Virtual Agent [VA] technology is next generation self-service software. The
product involves driving profitable online conversations with seamless omni-channel customer
service, capturing key voice of the customer insights and delivering relevant offers that improve
conversion and customer engagement.

Page | 12
1.2 Self Services – [24]7 Speech Platform
The main aim of speech is to make customer experience with IVR more engaging and digitized.
It makes use of a cloud based platform to provide self-service solutions to clients. Prediction, and
digital customer engagement has helped increase the self-service rates for the clients.
Key roles:
IVR has been made web-aware. By tracking customer’s web journey [24]7 can detect if the
customer is on the website during the call. They can use both website and phone at the same time
enabling better user satisfaction and faster solutions.
To make speech recognition system more perfect [24]7's natural language technology is put in
place. It is enhanced by advanced deep neural networks (DNN). These networks draw from over
10 billion utterances. At [24]7 a single intelligence layer is provided by spanning both speech
and text-based interactions by natural language to derive meaning from what's been said and/or
typed. This model provides a low cost of ownership and can be applied across platforms like
IVR, mobile apps, and virtual agents together to deliver a consistent customer experience. To
accurately predict a caller’s intent and simplify their experience, customer data and real-time
journey data were combined. This has enabled [24]7 Speech to adaptively learn and track and
tailor engagement to the needs of the individual customer. Unlike legacy IVRs, [24]7 Speech
enables companies to improve IVR performance significantly. [24]7 has achieved the remarkable
increase in self-service rates by up to 20% and has seen a reduction in IVR call duration by up to
30%.
Visual IVR: Vivid Speech is the mobile solution for IVR. It enables visual display touch and
speech in IVR interactions to make IVR less cumbersome and more interactive for the
customers.
IVR to Chat: By integrating [24]7 Speech with [24]7 Chat, [24]7 is able to transition phone calls
to a chat on a mobile device. At [24]7 qualified IVR callers serve with mobile chat and the rest in
the IVR.

Page | 13
2. Utterance Capture
Utterance capture deals with capturing the spoken words by a customer, a.k.a. utterance, during
an on-call activity, that is, when the person has chosen Speech-IVR for the fulfilment of his/her
service. This data is captured by [24]7 ILabs on a regular basis wherein it undergoes further
processing steps to fish out meaningful data from the raw captured data.
2.1 Existing Utterance Capture Design

Page | 14
A cronjob is run for nightly_portal.pl. The configured streams for capture are through MySql
tables and then the respective utterance files are copied from the pod. MySql tables are updated
and further activities follow to verify successful copying and further processing.
2.2 Design limitation
In the current scenario, the “Trasncribeme” host begins the process to capture utterance for
particular day after 22 hours. For example to procure utterance data for the 11th
of March,
“Transcribeme” host will start its job of processing on 12th
of March at 22:00 hours and will
usually take more than 4-5 hours to complete the entire process.
Now as per the design specifications of the Telpod from which the data is being copied, it will
retain the data for a particular date only for a time-period of three days. So as the Telpod enters
13th
of March, it will automatically start clearing all the data related to 11th
of March.
In best case scenario (considering a successful attempt) data is properly copied but in the worst
case scenario, that is if transcribeme host somehow fails to copy data from Telpod to its local
environment, by the time the issue reaches to the support team’s notice, Telpod would have
already cleared the data for that particular day making it impossible to retrieve any information
related to that day and therefore limiting our further actions that could have extracted meaningful
information from the data.
This is the main limitation which is causing any “Transcribeme” host failure to severe issue.

Page | 15
2.3 Issues other than the design limitation
 No error triggering mechanism: Currently, the automation design does not trigger notification
in case of any error on the “Transcribme” host. The current logging is not enough to identify
the occurrence of a failure. The design is required to be enhanced in such a way that from the
log a failure scenario can be identified.
 Sisyphus job (that works on the data procured by Transcribeme) only catches the failure
scenario which is not of much use for production purposes, there is a need to catch the event
before Sisyphus starts its job to catch the failure so that someone can rerun (if possible) the
utterance capture on transcribme host.
 The database has no clean up mechanism. A huge amount of data is already present in the
database and with every passing day the data keeps on building. This process creates a junk
data on the database,hard disk and will create problem in future as there may not be enough
memory to store this data and the processes might become further slower trying to process
that data.
 There exists a locking problem when multiple threads are run for a stream on the host
“Transcribeme” which also creates a failure scenario.
 The “Transcibeme” host does not have any alternative and/or stanbdby setup. So, if the
“Transcribeme” host malfunctions then no other host can work for it/replace it to continue
smooth functioning of utterance capture.
 “Transcribeme” host and its database in SV2 has created this time issue.
 Entire infrastructure is based on extranet, if the process moves from extranet to TPC (which
is an ongoing thought), this will not work.
 Infrastructure of the capture is based on DBlog which may create problem if process moves
from Dblog to Big Data

Page | 16
2.4 Proposed New Design for Utterance capture
As can be seen from section 2.2 and 2.3 above there are multiple issues with the existing model.
Thus, an alternative is of utmost importance to provide more efficiency to the system both by
improving the time and success rate of the process and providing a robust failure detection
mechanism.
2.4.1 Highlights of the new infrastructure
 SV1 and VA1 will have their own, separate capture host instead of one single “Transcribeme”.
This will avoid the network latency between cross Data center access.
 Dblog will be used to get data.
 Client can configure stream on TPC instead of portal. Client will not require the multiple portals
for configuration and fetching. This means a front end module will be created for acquiring all
the facilities from a single place which reduces the maintenance costs as well as is easier for the
client.
 Better fault tolerance by having the controlled time sync. So the data loss problem that is faced
with the current system in place and can be curtailed. This system will ensure we do not lose any
valuable data.
 Better Logging for debugging and automatic rerunning job for failed date. As already mentioned
in the section above that the current logging mechanism does not help us find anything about the
failure and it is not realized until it is too late and all the data has been erased.
 Better control on total maximum utterance capture.

Page | 17
2.4.2 Diagrammatic representation
In this infrastructure capture host will run in all DCs (SV1 and VA1) and contact its own DC
pod. This will reduce all the network latency.
Capture Host
Capture Host
Telpod
SV1
Telpod
VA1
Dblog
Dblog
provDBTPC
Host
NFS
path
VA1
SV1
S
i
s
y
p
h
u
s Admi
n
User
Read Operation Write Operation

Page | 18
Through TPC, client Admin can configure stream as they are doing through portal.tellme.com in
current date. To support this, new TPC page is required where client can configure and watch
the status for streams. As TPC backend uses provDB, all configured information will go into
provDB and fetched by all required host. New table will be required in provDB.
Capture Host shall be the heart of this solution. Host will read the configured stream from
provDB for yesterday date and through Index-Older host get the all desired UUID and respective
Pod. Host will group all UUID for a particular pod and perform the secure copy from Pod to
local path.
Before copying the utterance and grammar from Pod, capture host will zip all utterance into
temporary path on pod and sync to capture host. After successful copy Capture host will
remove the temporary zipped folder from Pod.
Both DC has its own NFS device. Sisyphus job will take care to copy data from Capture host to
NFS path and combined into user readable folder.

Page | 19
3. Utterance Capture Management
The project was divided into two parts:
 Back-End : (Java,Mysql,JDBC)
o Creating a system that can copy user data from various hosts and organize them
into a folder structure for further processing.
o Generating grammar for utterances of the users isolated in the previous step.
 Front-End : (Ruby on Rails,Javascript,HTML,css)
o Creating a GUI that can conditionally access the backend.
Henceforth, I will explain the front-end prior to the backend for easy flow of thoughts.
3.1 Creating the Graphical User Interface
3.1.1 Home Page
Creating the GUI or Graphical User Interface required the techniques of Ruby on Rails,
Javascript, HTML and css. All of these four languages were used to add the required
module to a pre-existing code. The next page shows an image of the Platform Central
Home Page which will be visible to the user. The code was pre-existing and as a part of
this project I have added to the pre-existing code for making the access of back-end
possible through the GUI.

Page | 20
The screenshot above shows the home page after logging into Platform Central. The menu bar
contains options to navigate to the respective features as required by the user. The field
‘Utterance Capture’ was added as a part of my code. The user can access the utterance page by
clicking on ‘utterance capture’ in the menu bar or by clicking on the cog wheel that represents
utterances. Both lead to the same url, namely, /utterances. All the paths generated and redirect
urls were added into the code database. The home page itself was pre-coded and as a part of this
project the utterance feature was added to all the files related to the creation of the home page.

Page | 21
3.1.2 Utterance Capture Page
When the user clicks on the ‘Utterance Capture’ name on the menu bar or the cog wheel they are
redirected to ‘/utterance’ which is represented by the diagram shown above. This page has been
designed to take inputs from the user regarding the search and feed them into database which
shall be later read by the back-end code.

Page | 22
A brief overview of the inputs taken and their importance in the backend code:
1. Request Name
This can be any name the user wants to give to his/her request. This does not serve any
purpose from coding perspective. It may be null, name, combination of name and
number, special characters or a combination of all.
2. Time Period
This field basically specifies the time period for which the code has to run. This field has
been provided with a calendar structure input. When a user clicks on the date a calendar
pops up so that an appropriate date can be selected. This has been done so that the format
of input of the date remains the same i.e yyyymmdd which will make back end
processing much easier. Even though a calendar is provided to the client, all dates entered
are internally checked to avoid any foul play.
3. Utterances
This subheading homes two features, the maximum utterances and maximum utterances
per day. These two are required to put a limit on the number of users whose data will be
copied into the machine or whose grammar shall be generated. If not mentioned an upper
limit is defined within the code for which the data will be extracted.
4. Advanced Options
(i) DNIS : This field provides a drop down menu to show all the phone numbers/sip
numbers that are available to the logged in client. This numbers represent the
number on which the call was placed. Multiple select is possible for this field and
all the dnis selected will be considered while acquiring the data from the server. A
drop down restriction was provided to ensure that a user cannot enter invalid /

Page | 23
inaccessible numbers that are not in the privilege of the client. As per the
description of a standard drop down menu it is scrollable drop down menu that
houses all the possible sips/numbers available to the client. This list is also
internally sorted to make searching the number(s) through the list easier and fast.
(ii) ANI: This field specifies the phone/sip numbers from which the calls were placed.
Their validity is also internally checked. A text area has been provided for ANI
because it can take multiple inputs .The area is also facilitated with drag options.
This means that the user can drag the corner of the box and resize it to any
possible size he/she wants. It is also enabled with the scroll feature which gets
activated when the user exceeds predefined area restrictions.
(iii) LOGTAG: When a call is placed on the host, several tags are added throughout
the call to check the progress of the call. For example, if a caller chooses ‘x’
option from an array of options then a tag for option ‘x’ will be added to
remember the users choice. When a client is searching the for the data, he/she can
specify the logtag equivalent to ‘x’ and all the calls where user had selected ‘x’
option will only be copied keeping coherence with other fields. A text area has
been provided for LOGATAG because it can take multiple inputs .The area is also
facilitated with drag options. This means that the user can drag the corner of the
box and resize it to any possible size he/she wants. It is also enabled with the
scroll feature which gets activated when the user exceeds predefined area
restrictions.

Page | 24
5. Expected Output
Three check boxes have been provided for the client to choose the outputs that he wants.
These are grammar, utterances and devlog. A user can select all or can selectively select any
combination of the three boxes. The selection/deselection will determine whether the
underlying code will be executed or not.
Note : Devlog is for future prospective and has not been implemented as a part of this
project. Selection/De-selection of devlog will not make any difference in the procedure
followed by backend.
6. Search
The search button serves two purposes. First, it is kept under the form tag of HTML, so as
soon as search is clicked all the related javascript and controller(ruby) files read the data
written in the boxes.
Once all the data is gathered, it is pushed into the database and is distributed among 4
different tables to maintain a normalized table structure in the database. The different tables
are utterance_details,dnis_requests,ani_requests and logtag_requests;
Utterance_details houses all the data apart from anis, ani and logtag since multiple values are
possible for these fields.
After pushing the data into utterance_details, the auto_incremented request_id is acquired by
running a query in the database. This id serves as a foreign key for tables
dnis_requests,ani_requests and logtag_requests.
The page is self-rendered so that after search is pressed it loads back the original page instead
of trying to search an undefined url.

Page | 25
3.2 Saving parameters into the Database
The details for the search entered by the user are fed to a database. This database is then accessed
by the underlying code for further processing. For the purpose of maintaining normalization, four
separate tables are created and data is fed into all of them.
Table 1: ‘utterance_details’
The id field in this table is an auto-generated value and can be found from the Extra field
description. The request name, maximum utterances per day and maximum utterances are put
into the fields request_name, max_utt_per_day and max_utterances respectively without any
modification. Grammar , utterances and devlog have only two possible values , true and false.
These fields have been provided to keep a track of which boxes were checked during the request.
As can be seen the start and end time are of the form int(10) instead of DateTime. This is
because the date before being pushed to the table is converted to unix timestamp which is of the
form int(10).
For all the three tables that follow, the auto-generated id of the utterance_details table is used as
a foreign key to connect to two tables. For accessing the auto-generated key through the Ruby on
Rails code, a query is run on the INFORMATION_SCHEMA. The query is as follows:
“SELECT auto_increment FROM information_schema.tables WHERE table_schema =
'<database_name>AND table_name = 'utterance_details';”
This query returns the auto-incremented value to be used next by the table whose name has been
mentioned. Therefore, the returned value – 1 is the value just inserted into the table. This was
required to be auto-generated because in case of deleted entries previously used request ids
should not be issued again. This value is then passed on to the following three tables as foreign
key.

Page | 26
Table 2: ‘dnis_requests’
This table consists of three fields. The first one is an auto-generated id which serves as a
PRIMARY KEY for the table as all other fields can have multiple entries with the same value.
It can be found through the description of Mysql that dnis has been declared as MUL, which
means multiple values are possible for this field. An example for the entry would be, for x
request id, client had selected 3 dnis values a,b and c. for populating this table first x would be
acquired by running the above mentioned query. Then three separate entries will be made to the
dnis_requests table with x -- > a, x --> b and x--> c
.
Table 3: ‘ani_requests’
It can be found through the description of Mysql that ani has been declared as MUL, which
request id, client had selected 3 ani values a,b and c. for populating this table first x would be
ani_requests table with x -- > a, x --> b and x-->c.
Table 4: ‘logtag_requests’
It can be found through the description of Mysql that logtag has been declared as MUL, which
request id, client had selected 3 logtag values a,b and c. for populating this table first x would be
logtag_requests table with x -- > a, x --> b and x-->c.

Page | 27
3.3 Reading the Parameters from the Database
Henceforth, backend code has been explained. The coding language used from this point an
onwards has been only Java. All the acronyms are with respect to Java.
The first goal in the backend code was to gather all the data from the database that the user had
requested for. JDBC connection through Java was used to connect to the Mysql database.
Java Database Connectivity (JDBC) is an application programming interface (API) for the
programming language Java, which defines how a client may access a database. It is part of the
Java Standard Edition platform, from Oracle Corporation.
The data present in the utterance_details table is extracted for a particular date range. The date
can be sent in as an argument while running the code or the code will take the system date for the
previous day. A list of all the request ids is created for a particular date. This date is used as a
key to procure all dnis, ani and logtags for that particular request_id.
Apart from these values grammar and utterances values are also procured to ensure that the
correct portion of the code is run. If both are true, both the scripts will be run. However, if the
client had selectively chosen the options, only the particular portion of the code will be run. As
already mentioned devlog has not been implemented as a part of this project. No checks have
been provided in the backend code either for selecting devlog. If grammar and utterances both
are unselected, irrespective of devlog the code will interpret that no output was selected and no
further processing will be done. The data structures used for storing all these were LinkedLists
because of the flexibility of size. Since, it is not known how many dnis/ ani etc. could be present
and theorectically it could range from zero to infinity, it much more optimal to use a LinkedList
than an array. This would save a lot of memory and prevent the code from going into an memory
leak situation.

Page | 28
3.4 Getting the Configuration
To create generalization in the code and make it easily accessible, a configuration file was
created with details of all the configurable parameters. For example, database username and
password, host url etc. Since these values differ depending on environment (production,dev,qa),
it is efficient to keep such data configurable.
To accomplish this task InputStreamReader and BufferedReader were used and
class.getResourceStream() was used to connect to the file. This was done because FileReader
cannot access files that are packaged into a jar. Since this code had to run on a unix environment,
the code had to be in the form of a jar and thus FileReader could not be used.
3.5 XML Parsing to get Host list
The list of all the hosts on which the code will be processed was provided in an api. The api had
data in XML format. Therefore, it needed to be parsed from the api to make it useful for the
project.
DocumentBuilderFactory is an in-built class in Java that performs the task of parsing XML data.
The javax.xml.Parsers.DocumentBuilderFactory class defines a factory API that enables
applications to obtain a parser that produces DOM object trees from XML documents. It extends
the Object class. This returns the tags of an XML file in the form of a nodelist. The NodeList,
also a pre-defined Java interface, provides the abstraction of an ordered collection of nodes,
without defining or constraining how this collection is implemented. NodeList objects in the
DOM are live.The items in the NodeList are accessible via an integral index. The interface
Element was used to access all the nodes in the NodeList and store the ones that are important
from the project perspective. Each host runs the same code to find user data within itself.
Depending on the output selected either grammar will be generated or utterance files will be
copied or both.

Page | 29
3.6 Secure Copy for Utterance files
When the user connects to an IVR, the spoken words of the user a.k.a. utterances are recorded
and stored in the system for further analysis. If the client had selected ‘utterances’ in the
‘expected output’ division in the GUI then this part of the code will be executed. The primary
goal is to go to the telbox that contains the files for a particular user and copy it to the local
machine. For achieving this, the first step is to establish a relation between the telbox and user id.
3.6.1 Generating telbox-uuid mapping
Telbox is the place where all the utterance files for a user are available. All the users of a
particular time frame are spread across all (theoretically) available telboxes. It is not a
one-to one mapping either. One telbox may be a host to multiple uuids. However, one
uuid will be present in only one telbox. Thus, the mapping generated is of many-to-one
format. Since the telbox is to be used as a key and in HashMap data structure of Java
multiple keys are not possible, every telbox is associated with a linked list of uuids.
The data pertaining to telbox and uuid is also maintained via MySQL and thus a JDBC
connection was established to gain this data and was later stored in the above mentioned
format. The details of JDBC connection have already been mentioned in section 3.3.
Same steps were followed in acquiring this data as well. For the query generation, the
parameters obtained from section 3.3 were used as a filtering criterion. This is done to
reduce the bulk of data and create a more concise output.
3.6.2 Secure copy to local system
The primary goal of this task was to secure copy (scp) folders containing user utterance
files into the local system for further processing. The files are present, as already
mentioned, as folders in various directories in different telboxes.

Page | 30
The steps followed to copy data into local:
For each uuid present in the linked list associated with each telbox do the following,
 Log into the host using ssh.
 Find the folder which has the corresponding uuid as a substring.
 Secure copy to the local system.
Since the code has to run in a Unix environment, Unix commands need to be run using
the Java code. There are multiple ways possible to achieve this. The ones suited for the
code excerpt have been used here.
Unix commands have been run on the system using Runtime.exec() or ProcessBuilder
class of Java.
Runtime.exec() : The java.lang.Runtime.exec(String command) method in a predefined
class in Java that enables us to run any Unix command from our Java application.
ProcessBuilder : There are certain commands for which Runtime.exec() does not work
properly. Thus, this class was also used. ProcessBuilder is also a predefined class in Java.
The start() method creates a new Process instance with those attributes.
Both of these return an InputStream that can be read using BufferedReader class in Java.
The output was used as required.
Once the telbox and uuid are finalized, the first step is to ssh login into the telbox and
find a path for the user utterance file. The uuid is the key in finding the path.
Secure Shell (SSH), sometimes known as Secure Socket Shell, is a UNIX-based command
interface and protocol for securely getting access to a remote computer. SSH uses RSA public
key cryptography for both connection and authentication. The entire SSH setup including public
and private key generation was done for the purpose of this project.

Page | 31
Finding the folder to copy was implemented with a simple find query in the parent
directory which returned a path as a result.
Query : ‘ find . –name *<uuid>’
This query returns a path as an output if the files are present or returns null. If a path is
present the code proceeds to the next step else it logs a negative search result and moves
on with the next pair of telbox-uuid available.
Both of these, that is ssh login and finding the folder were implemented together using
ProcessBuilder class because the commands individually do not retain memory. Once a
program is executed any other command is executed independent of the other. Although
for this find to be successful we need to ssh and then find. Thus, they were implemented
together.
The next step after finding the path to the folder is to copy it to the local system. Secure
copy or SCP is a means to securely transferring computer files between a local host and a remote
host or between two remote hosts. It is based on the Secure Shell (SSH) protocol.
The path that had been received from the above query was used to do a secure copy into the local
system. For this command Runtime.exec() was used. The function has been explained above. All
the copied folders were moved into a separate common folder which is specific to the
date for which the code has been executed. This common folder was further tarred and
gunziped and the originally created folder were deleted for security purposes.
This step completed the secure copy of utterance files and the code proceeds futher to
check if ‘grammar’ was selected by the user. If it wasn’t, the code terminates else it
continues to follow the methods mentioned in the following sections.

Page | 32
3.7 Generating Grammar
When the user speaks/ interacts with IVR, they are stored in the database in the form of codes.
Grammar generation deals with decoding these and presenting a more readable data for
analysis.This portion of the code is executed only if the client has selected grammar in the
expected output division.
The initial steps in this module are similar to section 3.6. However, a brief overview has been
provided in this page. This module receives the day as an argument from the main class to make
sure that the code is executed for the same time frame as the secure copy module.
Here, the first step is to read the configuration file to get values like username, password,
datacenter etc. These values help to filter out results and provide necessary parameters to connect
to databases and execute queries.
Secondly, we get the parameters for querying the database to get our user and telbox lists and
other parameters. This is done exactly as explained in section 3.3. It is read from the same
database as earlier. These parameters are stored appropriately for further processing.
And the final common step among these two modules is to get the hostlist. This is again done via
XML parsing explained earlier in section 3.5. The same list of hosts is obtained again.
The concepts used are same as that of the ones used earlier. All of these steps have been
explained in detail in the previous sections. For any queries please refer to those sections. The
purpose of repeating these tasks is varied. First, executing these modules of the code depends on
user selection. Thus, executing it in one module and using it in another is not a feasible option.
Secondly, if the common code is run prior to executing either module it will increase the burden
on the main function and also, it is fairly possible that the client did not select any option / valid
option that can be processed by the code. In this case, a lot of system time will be wasted in
doing database queries and file reads. Hence, they have been attached separately to each module.
The following sections will describe the steps that are unique to the grammar generation module.

Page | 33
3.7.1 Mysql queries for data
The grammar generation process required a series of Mysql queries to be performed. From these
queries all the data was gathered that would later help in generating the xml file. The queries
were performed on the database independent of each other.
Prior to conducting any queries the particular user ids of interest were filtered out. This is
necessary to reduce the huge bulk of data and effectively reduce the computation cost. A query
was run to isolate the user ids having the parameters as entered by the client. This data is stored
locally in a file. Post this; a temporary table is created in the database which loads this file as
input. All the queries that follow are performed on the users’ common to the concerned tables
and this file.
Query 1: This query is performed on the concerned table (unnamed due to privacy policy) and a
natural join is performed with the temporary table. This query gives the basic details about a call,
like the number from which the call was made (ani/sip) or the privacy constraints pertaining to
the user etc. These are some of the various annotations that are associated with the user. All of
the data is stored in a HashMap data structure with user id as the key.
Query 2: The second query is the transfer query, to know whether the call was transferred from
self-services to assisted-services. This data is of utmost importance as it allows us to gain insight
on the functional capabilities of the IVR and improve our system so that the burden on assited-
services can be reduced. This query is performed on the concerned table (unnamed due to
privacy policy) and a natural join is performed with the temporary table. Details such as transfer
id, hang up time, etc. are gathered in this step. All of the data is stored in a HashMap data
structure with user id as the key.
Query 3: The third query is to get details about the call. This query is performed on the
concerned table (unnamed due to privacy policy) and a natural join is performed with the
temporary table. This provides the information regarding the reason of the call, the md5
encryption associated with the call etc. All these information help the organization to understand
why a call was made and what the efficiency of problem solving achieved by the IVR. All of the
data is stored in a HashMap data structure with user id as the key.

Page | 34
Query 4: The fourth query was performed to find out all the attributes associated with the call.
This query is performed on the concerned table (unnamed due to privacy policy) and a natural
join is performed with the temporary table. This query gives the basic details about the attributes
of a call, like the name , value , date etc. These are some of the various annotations that are
associated with the user. These help in identifying specific features of the call. All of the data is
stored in a HashMap data structure with user id as the key. All of the data is stored in a HashMap
data structure with user id as the key.
Query 5: The fifth query is the one where we extract the rules of the grammar. It comprises of
two queries, the output of the first serves as an input to the second. The type of the grammar is
fixed in the first query and is used as a query parameter of the second. This query is performed
on the concerned table (unnamed due to privacy policy) and a natural join is performed with the
temporary table. From a combination of both the queries we get data like weight, load_order, call
id etc. All these help in defining the rules by which the grammar will be generated. All of the
Query 6: The sixth query stores the results of the call. It provides details about the words spoken,
the configuration to be followed etc. This query is performed on the concerned table (unnamed
due to privacy policy) and a natural join is performed with the temporary table. Details such as
words, conf etc. were obtained. All these help in defining the rules by which the grammar will be
generated. All of the data is stored in a HashMap data structure with user id as the key.
Query 7: The seventh query stores slots of the call. It provides details about the interp id, name,
value etc. This query is performed on the concerned table (unnamed due to privacy policy) and a
natural join is performed with the temporary table. Details such as words, conf etc. were
obtained. All these help in defining the rules by which the grammar will be generated. All of the

Page | 35
Query 8: The eighth and the final query pertain to vxml log file query. This is one of the
important queries that gives the log label and the log value for generating xml files. This query is
performed on the concerned table (unnamed due to privacy policy) and a natural join is
performed with the temporary table. Details such as log label, log value etc. were obtained. All
these help in defining the rules by which the grammar will be generated. All of the data is stored
in a HashMap data structure with user id as the key.
All these HashMaps are combined into one common structure so that the data per user is
accessible in an indexed fashion. A user-defined structure, namely Chart, has been used to stored
all the data which has user ids as keys and entire tables as columns. The column names relate to
every query performed. Some columns are formed by merging two or more queries while others
were added directly to the Chart. This structure is essentially a table of tables.
key value Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
User ids P1 P2 P3 P4
v v v v
The above structure displays the structure of a Chart. Here Col1,Col2 etc. represent the names of
the columns of the main Chart. Each entity of the Chart houses a table within it. P1, P2, etc. are
the properties or the column names of the columns of this table. And ‘v’ corresponds to the value
stored for a specific user under a specific column within a specific property.
This Chart is passed on to further functions to generate the xml which has been explained in the
following sections.

Page | 36
3.7.2 XML file generation
The next and final step of the process is generating grammar xml files for the client. The xml
files are generated for each user separately. However, they need to organized together in one
folder according to the date on which the request was run. Unix commands to create folders
according to date were executed using Runtime.exec(). The use and application of this
predefined Java class have already been discussed in prior sections.
The PrintWriter class helps in generating files outside the local jar where the code is being run.
This class was used to create an xml file for each user that is present in the Chart generated
above. The data is stored in xml format with roles and ids. There can be multiple entries for a
single user. However, multiple files for a single user are not useful. Thus, all the data related to a
user is stored in a single file. Here, multiple entries will be present only if the caller places
multiple calls. The Table in each column of the Chart is therefore, a linked list of tables and only
a single file is present per user. This approach is more space efficient and avoids confusion
which would be generated in case multiple files for same user are present.
A snapshot of the generated xml can be seen below (all the sensitive data has been removed):

Page | 37
3.8 Exception Handling
An exception (or exceptional event) is a problem that arises during the execution of a program.
When an Exception occurs the normal flow of the program is disrupted and the
program/Application terminates abnormally, which is not recommended, therefore
these exceptions are to be handled.
There are a lot of inbuilt Java classes or pre-defined Java classes that have been used as a part of
this project. Since these classes are not user defined and are used to run System commands, for
example, Runtime.exec() or ProcessBuilder or PrintWriter, most of the code excerpts are
enclosed in try-catch blocks to encounter the various errors that might occur. All possible
exceptions have been handled by the code. The standard exceptions, FileNotFoundException or
NullPointerException or UnsupportedFormatException as well as not standard ones were
handled. By non-standard exceptions, here I refer to issues like missing configuration files,
missing input parameters etc. For handling non-standard exceptions, default values were
provided so that the code does not crash if the configuration file is missing.
The exceptions also help to debug the code in case any error has occurred. By default, all
exceptions are printed on the console. Since we are running the code script on a huge amount of
data, keeping track of each and every log on the console is a time consuming and exhausting
process. Thus, for better management all the exceptions are grouped together and a common log
file is generated which can be configured by the user or can use a default location. Any error that
occurs will be maintained in this file and can be encountered once the code run is complete.
PrintWriter class was again used to achieve this. However, if the default path for logs is missing
and configurable path is not provided, the code will not proceed any further and will alert the
user about the problem.

Page | 38
3.9 Email Notification
As mentioned in section 3.8 the exceptions generated are stored in a file. Now, instead of
manually checking the file for any errors an automated code was written that would email the
errors to the concerned person. This code script was written in a separate module and was built
into a separate jar file to be run automatically. The working of the code is as follows. It searches
for the log file in the mentioned directory. After locating the file, it reads the contents of the file.
The next step is mailing the contents of the file if any error is recorded in it. This was
accomplished using the mailif functionality. The mailif program reads its standard input and if
there is data, sends it in the body of a message to the given recipient addresses. This is useful in
crontab entries for mailing the cron job's output to people so that problems are known and can be
fixed, but mail is sent only when the job actually produces output, so noise is kept to a minimum.
This built in function itself takes care to not send any email if the contents of the file is empty.
Due to this functionality the errors in the code run will be directly sent to the concerned person
and appropriate measures can be taken to rectify the errors. The crontab entry was added to run
the code for email generation during half time of the code run.

Page | 39
4. Conclusion
Utterance capture management is of utmost priority for the system as valuable data is lost in case
of a failure. The redesign of the system will make it more robust and reliable for utterance
capture. This new design will run separately on datacenters as already mentioned. This will
reduce the network delay the current system faces. There is also an addition of a new feature to
front-end which will make it easier for the client as well to update and configure the
requirements. Client will not require multiple portals for configuration and fetching. This means a front
end module will be created for acquiring all the facilities from a single place which reduces the
maintenance costs as well as is easier for the client.
The new system being incorporated is just in the making stage. As we go more deep into the
code we regularly find better ways to optimize our code and make it even more efficient than it
was planned.

Page | 40
5. Appendix
Following is a process flowchart to show the execution of the program.
START
Platform Central Home
Page
Click on Utterance /
Capture Request / Cog
wheel representing
utterance
Select output type
(Grammar/Utterance/Devl
og)
New page to take input.
Fill inputs such as
request_name, time, dnis,
ani,etc.
“SEARCH”
Pressed ?
YES
NO
WAIT

Page | 41
Data entered is passed to
provdb02 and put into 4
tables : utterance_details ,
dnis_requests,
ani_requests, and
logatag_requests .
PROVDB
Get system date and search
for requests in
utterance_details table in
provdb.
Get dnis,ani,logtag for those
requests from provdb. Also
get values of selected
outputs
‘Utteran
ce’ =
true?
‘Gramm
ar’ =
true?
NO YES
Read configuration file to get
db owner, db user,
password, dbtype, failover,
data center.
Get hosts from cdblog url
and for each host query for
telbox and uuid for that day
and create a map. For each
pair find the folder path for
uuid and perform scp.
YES
NO
STOP

Page | 42
Read configuration file to
get db owner, db user,
password, dbtype,
failover, data center.
Get hosts from cdblog url
and for each host run a
list of mysql queries.
Get uuids from calls_date
for entered
dnis,ani,logtag and create
a temporary table using
this data.
Execute the eight queries
mentioned in the text
above. Section 3.6
Generate one hashmap
with all these query
results keeping uuid as
key
Generate XML file with all
this data
STOP

Page | 43
6. References
 http://www.247-inc.com/
 http://stackoverflow.com/
 http://www.tutorialspoint.com/java/lang/runtime_exec.htm
 https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html
 http://www.tutorialspoint.com/java/java_multithreading.htm
 http://www.tutorialspoint.com/json/json_java_example.htm
 http://stackoverflow.com/questions/2514439/how-do-i-run-ssh-commands-on-remote-
system-using-java

PS2_FinalReport_2011B1A7689G

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a PS2_FinalReport_2011B1A7689G

Similar a PS2_FinalReport_2011B1A7689G (20)

PS2_FinalReport_2011B1A7689G