SlideShare a Scribd company logo
1 of 5
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 914
Automatic Subtitle Generation for Sound in Videos
Kanika Mishra1, Prerak Bhagat2, Prof. Atiya Kazi3
Kanika Mishra, Department of Information Technology, University of Mumbai,
Finolex Academy of Management & Technology, Ratnagiri, Maharashtra.
Prerak Bhagat, Department of Information Technology, University of Mumbai,
Finolex Academy of Management & Technology, Ratnagiri, Maharashtra.
Prof.Atiya Kazi, Department of Information Technology, University of Mumbai,
Finolex Academy of Management & Technology, Ratnagiri, Maharashtra.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – The last two decades have seen a vast
growth in the amount of video content being generated
across various platforms and spanning several genres.
Apart from the convenient availability of cheap portable
media devices, the emergence of dedicated websites
tailored specifically for storage and sharing of videoshas
increased the value of audio - visual content in eyes of the
general populous.
However as with any technology, the emergence
of audio-visual content as the preferred method of
information sharing has brought along with it certain
hurdles. Prominent among those is the existence of
numerous accents and dialects of any particular
language. Another is the sheer number of languages that
the content is generated in. Also the hearing impaired
audience is unable to receive the audio partof thecontent
and thus, unable in most cases, to completely or
accurately comprehend the information being presented
to them. This situation can be rectified with aid of audio
transcripts and necessary translations. The software’s
currently available require for most part an active user
participation and also have a notable lack of accuracy.
The sheer quantity of audio - video information therefore
makes it impossible to provide transcriptsforallrelevant
data encapsulated in such formats. Hence,theneedfor an
automated concept is vividly felt.
This thesis report analyses a specific path to
automatically generating subtitles for videos using
Google's free API's. Due to lack of necessary resources,the
translation component has not been included in this
specific paper. We propose a three step model to realize
our process. The first stage will be extracting the audioto
be transcribed from the video file and then converting it
into a format compatible withoursecondstageiftheneed
be. The second stage uses a speech recognition engine to
convert the information from audioformattotextformat.
The third and final stage will be the encoding of the
generated text into a subtitle format which will include
adding time frames and necessary pauses and
punctuation marks.
The results of the resulting works have been
encouraging though there exists a tremendous room for
improvements and adjustments.
Key Words: Audio Extraction, Speech Recognition, Subtitle
Generation.
1. INTRODUCTION
Video is one of the most popular multimedia used on PCs. So
it is essential to make information encompassed in form of
sound in videos comprehendible to people with auditory
problems. The most natural way lies in the use of subtitles.
However, manual subtitle creation is a tedious activity and
requires active participation of the user. Hence, the study of
automatic subtitle generation presents as a valid subject of
research.
2. LITERATURE SURVEY
This is overview of concept we got from some papers about
automatic subtitle generation for sound in videos. There are
many programming languages that can be used in the
creation of AutoSubGen. A brief study on internet suggests
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 915
focus on C/C++ and Java. We now analyze thepositivepoints
of the two languages based on the research of J.P. Lewis and
Ulrich Neumann [8]. On the one hand, C++ provides
remarkable advantages in speed, cross systems
performances, and well-tested packages. On the other hand,
Java offers an intuitive syntax, portability on multiple
Operating Systems and trustedlibraries. They areusedinthe
Sphinx speech recognition engine. Java provides the Java
Media Framework (JMF) API that allows developers to deal
with media tracks (audio and video) in an efficient way. In
Lewis’s articles, it appears Java performances are quite
similar to C/C++. Besides, in theory, Java performances
should be higher. The absence of pointers,theuseof efficient
garbage collector and the run-time compilation plays a key
role in Java and their extent shouldsoonpermittogobeyond
C++ performances.
The JMF API features facilitate audio extraction.We
present an overview of the JMF API and focus a bit deeperon
insightful functions in relation to audio extraction. It exists
an interesting tutorial [4] to start with the JMF and hence it
is used in the following parts. The official documentations
are also pertinent resource [4]. Furthermore, a complete
book describing JMF API [3] was published in 1998. JMF
gives applications the possible capability to output
multimedia content, capture audio and video through
microphone and camera, do real-time streaming of media
over the Internet, process media and store media into file.
3. METHODOLOGY
3.1 Audio Extraction
This module aims to output an audio file from a media file.
An activity diagram of the module is presented in Figure 1.It
takes as input a file URL and gives audio format of the
output. Next, it checks the file content and creates a list of
individual tracks composing the initial media file. An
exception is thrown if the file content is irrelevant. Once the
tracks are separated, the processor analyzes each track,
selects the first audio track found and discards the rest.
Finally, the audio track is written in a file applying the
default or wished format. In Audio Formats Discussions, it
was previously talked about the audio formats that Sphinx
accepts. At the moment, it principally supportsWAVorRAW
files. Hence, the task of obtaining the preferred format is
complicated. It often requires various treatments and
conversions to reach the mentioned purpose. Audio
extraction provides a way to extract audio from a media file
but does not offer conversion tools.Otherwise,itisadvisable
to use the facilities of VLC for the purpose of getting suitable
audio material. It specially uses the java media framework
(jmf) for audio extraction. Because JMF provides a platform-
neutral framework for handling multimedia [4].
Java Media Framework (jmf): JMF is used for handling
streaming media in Java programs. JMF enables Java
programs to -
• Present multimedia contents,
• Capture audio through microphone and video
content through camera,
• Do instant streaming of media over the Internet,
• Process media(such as changing media format,
adding special effects),
• Store media into a file.
Figure 1. Activity Diagram for Audio Extraction.
3.2 Speech Recognition
Some speech recognition systems make use of "source-
independent speech recognition" while some others use
"training" in which a single speaker reads sections of text
into the SR system. These system analyze the person's
specific voice and use it to find the recognition of that
individual’s speech, which results in more accurate
transcripts. Systems that do notmakeuseoftrainingmethod
is called as "speaker-independent" systems. Systems that
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 916
make use of such training is called as "speaker-dependent"
systems. Here in our Project we have used the Line playback
device, which lets the voice to the google transcriptor in the
chrome web app and the text is generated from it. Google
Transcriptor gets the voice from the audio file which is
provided by the user, through the LINE playback device and
starts converting that audio into thetextformatanddisplays
on the front end. We can create a file .txt or .rtf using that
text. Using different Sphinx systems [6], for nearly two
decades they have developed and proved their reliability
within several ASR systems where they were linked in. It is
therefore natural we opted for the Sphinx-4 decode. Sphinx-
4 has been entirely written in Java [7], making it totally
portable and able to provide a modular architecture, that
allows us to modify configurations with ease. The
architecture overview wasshowninthe backgroundsection.
We aim to procure advantage of Sphinx-4 features in order
to satisfy the needs of the Automatic Subtitle Generation
Speech recognition module [5]. Subtitle Generation is
expected to select the most suitable models considering the
audio and the parameters (category, length, etc.) given in
input and thereby generate a precise transcript of the audio
speech. The Figure 2 resumes the organization of thespeech
recognition module. Speech Recognition uses the Hidden
Markov Model (HMMs) as an algorithm.
Figure 2.Activity Diagram for Speech Recognition.
3.3 Subtitle Generation:
This module is expected to get a list of words and their
respective speech time-frames from the speech recognition
module and then produce a .srt subtitle file. To do so, the
module must look at the list of words and use silence (SIL)
spoken words as a boundary between for two consecutive
sentences. The Figure 3 offers a visual representation of the
steps to follow for the subtitle generation for sound in
videos. However, we face up to some limiting rule. Indeed, it
will not be able to define punctuation in our system since it
involves much more speech analysis and deeper design.
Figure 3. Activity Diagram for Subtitle Generation.
4. REQUIREMENTS OF PROPOSED SYSTEMS
The requirements of our proposed system is expressed
through Table I.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 917
TABLE I. REQUIREMENT SPECIFICATIONS
No Technology used Description
1. 1GB RAM Storing
2 Java Speech API It allows to incorporation
of speech technology into
user interfaces for Java
based applets and
applications. This API
defines a cross-platform
interface to help command
and control recognizers
and speech synthesizers.
Sun [12] hasalsodeveloped
the JSGF(Java Speech
Grammar Format) to
supply cross-platform
grammar of speech
recognizers
3 JDK For writing Java applets
and applications and run
them into java language.
4 Apache Tomcat To manage sessions as well
as applications across the
network
5 MySQL Database Storing audio and video
related text files
6 Net beans It is an IDE use to run the
whole project.
5. CONCLUSIONS
A complete system including therequiredmodulescouldnot
be realized since the audio conversion need moreresources.
VLC gave a suitable solution but a custom coded Java
component is expected in future projects so that portability
and installation of the system become uncomplicated.
Nonetheless, the expected output for each phase has been
reached. The audio extraction module supply a suitable
audio format that can be used by the intended speech
recognition module. Speech recognition produce a list of
recognized words and their corresponding time-frames in
the audio content, although the accuracy cannot be
guaranteed. The former list is then used by the intended
subtitle generation module to create standard subtitle file
that is readable by the most common media players
available.
As of Now subtitles are being generated for the specific
format of the Audio files, But In future this restriction of the
format will be removed. English Subtitles of the Regional
Languages will be generated, and the accuracy of the project
will be more precise. Offline content will be web-based in
coming future, and the User can access and generate
subtitles for his video in a feasible way. Overall it will be
Accurate, less time consuming, No language barrier, Web-
based.
REFERENCES
[1] Boris Guenebaut, “Automatic Subtitle
Generation for Sound in Videos”, Tapro 02,
University West, pp. 35, 2009.
[2]Zoubin Ghahramani,“An introduction to
hiddenmarkov models and Bayesian
networks”, in World Scientific PublishingCo.,
Inc. River Edge, NJ, USA, pp.30, 2001.
[3] Robert Gordon, Stephen Talley, and Rob
Gordon.Essential Jmf:Java Media Framework.
Prentice Hall PTR, pp. 17, 1998.
[4] Java tm media framework api guide, 1999.
URL:http://java.sun.com/javase/technolog/
desktp/mdia/jmf/2.1.1/guide/index.html
[5] Frederick Jelinek, Statistical Methods for
Speech Recognition. MIT Press, pp. 7, 1999.
[6] L. Besacier, C. Bergamini, D. Vaufreydaz, and
E. Castelli,“The effect of speech and audio
compression on speech recognition
performance”. In IEEE Multimedia Signal
Processing Workshop,
Cannes:France,2001.URL :http://hal.archives
ouvertes.fr/docs/00/32/61/65/PDF/Besacie
r01b.pdf.
[7] Sphinx-4 a speech recognizer written entirely
in the javatm programming language, pp. 5,
1999-2008. URL
http://cmusphinx.sourceforge.net/sphinx4/.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 918
[8] J.P. Lewis and Ulrich Neumann. Performance
of java v/s c++
URL:http://www.idiom.com/˜zilla/Comput
er/javaCbenchmark.html.

More Related Content

Similar to Automatic Subtitle Generation for Sound in Videos

IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET Journal
 
Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrievaleSAT Journals
 
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...IRJET Journal
 
Playing sound files in labview using audacity toolkit
Playing sound files in labview using audacity toolkitPlaying sound files in labview using audacity toolkit
Playing sound files in labview using audacity toolkiteSAT Journals
 
IRJET- Transcription of Conferences
IRJET- Transcription of ConferencesIRJET- Transcription of Conferences
IRJET- Transcription of ConferencesIRJET Journal
 
Pacify based video retrieval system
Pacify based video retrieval systemPacify based video retrieval system
Pacify based video retrieval systemeSAT Journals
 
IRJET- Audio Genre Classification using Neural Networks
IRJET-  	  Audio Genre Classification using Neural NetworksIRJET-  	  Audio Genre Classification using Neural Networks
IRJET- Audio Genre Classification using Neural NetworksIRJET Journal
 
Voice Recognition System for Automobile Safety.
Voice Recognition System for Automobile Safety.Voice Recognition System for Automobile Safety.
Voice Recognition System for Automobile Safety.IRJET Journal
 
IRJET-Raspberry Pi based Reader for Blind People
IRJET-Raspberry Pi based Reader for Blind PeopleIRJET-Raspberry Pi based Reader for Blind People
IRJET-Raspberry Pi based Reader for Blind PeopleIRJET Journal
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech RecognitionIRJET Journal
 
Voice Based Email System For Visual Impaired Using AI
Voice Based Email System For Visual Impaired Using AIVoice Based Email System For Visual Impaired Using AI
Voice Based Email System For Visual Impaired Using AIIRJET Journal
 
VIDEO TO TEXT SUMMARIZER USING AI.pdf
VIDEO TO TEXT SUMMARIZER USING AI.pdfVIDEO TO TEXT SUMMARIZER USING AI.pdf
VIDEO TO TEXT SUMMARIZER USING AI.pdfFreeFire293813
 
A web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition ExtractionA web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition ExtractionIRJET Journal
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
 

Similar to Automatic Subtitle Generation for Sound in Videos (20)

IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech Recognition
 
Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrieval
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
 
Sub1577
Sub1577Sub1577
Sub1577
 
Playing sound files in labview using audacity toolkit
Playing sound files in labview using audacity toolkitPlaying sound files in labview using audacity toolkit
Playing sound files in labview using audacity toolkit
 
IRJET- Transcription of Conferences
IRJET- Transcription of ConferencesIRJET- Transcription of Conferences
IRJET- Transcription of Conferences
 
Pacify based video retrieval system
Pacify based video retrieval systemPacify based video retrieval system
Pacify based video retrieval system
 
IRJET- Audio Genre Classification using Neural Networks
IRJET-  	  Audio Genre Classification using Neural NetworksIRJET-  	  Audio Genre Classification using Neural Networks
IRJET- Audio Genre Classification using Neural Networks
 
Video Summarization
Video SummarizationVideo Summarization
Video Summarization
 
Voice Recognition System for Automobile Safety.
Voice Recognition System for Automobile Safety.Voice Recognition System for Automobile Safety.
Voice Recognition System for Automobile Safety.
 
Part 6
Part 6Part 6
Part 6
 
IRJET-Raspberry Pi based Reader for Blind People
IRJET-Raspberry Pi based Reader for Blind PeopleIRJET-Raspberry Pi based Reader for Blind People
IRJET-Raspberry Pi based Reader for Blind People
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech Recognition
 
Voice Based Email System For Visual Impaired Using AI
Voice Based Email System For Visual Impaired Using AIVoice Based Email System For Visual Impaired Using AI
Voice Based Email System For Visual Impaired Using AI
 
VIDEO TO TEXT SUMMARIZER USING AI.pdf
VIDEO TO TEXT SUMMARIZER USING AI.pdfVIDEO TO TEXT SUMMARIZER USING AI.pdf
VIDEO TO TEXT SUMMARIZER USING AI.pdf
 
A web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition ExtractionA web based approach: Acronym Definition Extraction
A web based approach: Acronym Definition Extraction
 
Sirio Orione and Pan
Sirio Orione and PanSirio Orione and Pan
Sirio Orione and Pan
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Arneb
ArnebArneb
Arneb
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 

Recently uploaded (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 

Automatic Subtitle Generation for Sound in Videos

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 914 Automatic Subtitle Generation for Sound in Videos Kanika Mishra1, Prerak Bhagat2, Prof. Atiya Kazi3 Kanika Mishra, Department of Information Technology, University of Mumbai, Finolex Academy of Management & Technology, Ratnagiri, Maharashtra. Prerak Bhagat, Department of Information Technology, University of Mumbai, Finolex Academy of Management & Technology, Ratnagiri, Maharashtra. Prof.Atiya Kazi, Department of Information Technology, University of Mumbai, Finolex Academy of Management & Technology, Ratnagiri, Maharashtra. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – The last two decades have seen a vast growth in the amount of video content being generated across various platforms and spanning several genres. Apart from the convenient availability of cheap portable media devices, the emergence of dedicated websites tailored specifically for storage and sharing of videoshas increased the value of audio - visual content in eyes of the general populous. However as with any technology, the emergence of audio-visual content as the preferred method of information sharing has brought along with it certain hurdles. Prominent among those is the existence of numerous accents and dialects of any particular language. Another is the sheer number of languages that the content is generated in. Also the hearing impaired audience is unable to receive the audio partof thecontent and thus, unable in most cases, to completely or accurately comprehend the information being presented to them. This situation can be rectified with aid of audio transcripts and necessary translations. The software’s currently available require for most part an active user participation and also have a notable lack of accuracy. The sheer quantity of audio - video information therefore makes it impossible to provide transcriptsforallrelevant data encapsulated in such formats. Hence,theneedfor an automated concept is vividly felt. This thesis report analyses a specific path to automatically generating subtitles for videos using Google's free API's. Due to lack of necessary resources,the translation component has not been included in this specific paper. We propose a three step model to realize our process. The first stage will be extracting the audioto be transcribed from the video file and then converting it into a format compatible withoursecondstageiftheneed be. The second stage uses a speech recognition engine to convert the information from audioformattotextformat. The third and final stage will be the encoding of the generated text into a subtitle format which will include adding time frames and necessary pauses and punctuation marks. The results of the resulting works have been encouraging though there exists a tremendous room for improvements and adjustments. Key Words: Audio Extraction, Speech Recognition, Subtitle Generation. 1. INTRODUCTION Video is one of the most popular multimedia used on PCs. So it is essential to make information encompassed in form of sound in videos comprehendible to people with auditory problems. The most natural way lies in the use of subtitles. However, manual subtitle creation is a tedious activity and requires active participation of the user. Hence, the study of automatic subtitle generation presents as a valid subject of research. 2. LITERATURE SURVEY This is overview of concept we got from some papers about automatic subtitle generation for sound in videos. There are many programming languages that can be used in the creation of AutoSubGen. A brief study on internet suggests
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 915 focus on C/C++ and Java. We now analyze thepositivepoints of the two languages based on the research of J.P. Lewis and Ulrich Neumann [8]. On the one hand, C++ provides remarkable advantages in speed, cross systems performances, and well-tested packages. On the other hand, Java offers an intuitive syntax, portability on multiple Operating Systems and trustedlibraries. They areusedinthe Sphinx speech recognition engine. Java provides the Java Media Framework (JMF) API that allows developers to deal with media tracks (audio and video) in an efficient way. In Lewis’s articles, it appears Java performances are quite similar to C/C++. Besides, in theory, Java performances should be higher. The absence of pointers,theuseof efficient garbage collector and the run-time compilation plays a key role in Java and their extent shouldsoonpermittogobeyond C++ performances. The JMF API features facilitate audio extraction.We present an overview of the JMF API and focus a bit deeperon insightful functions in relation to audio extraction. It exists an interesting tutorial [4] to start with the JMF and hence it is used in the following parts. The official documentations are also pertinent resource [4]. Furthermore, a complete book describing JMF API [3] was published in 1998. JMF gives applications the possible capability to output multimedia content, capture audio and video through microphone and camera, do real-time streaming of media over the Internet, process media and store media into file. 3. METHODOLOGY 3.1 Audio Extraction This module aims to output an audio file from a media file. An activity diagram of the module is presented in Figure 1.It takes as input a file URL and gives audio format of the output. Next, it checks the file content and creates a list of individual tracks composing the initial media file. An exception is thrown if the file content is irrelevant. Once the tracks are separated, the processor analyzes each track, selects the first audio track found and discards the rest. Finally, the audio track is written in a file applying the default or wished format. In Audio Formats Discussions, it was previously talked about the audio formats that Sphinx accepts. At the moment, it principally supportsWAVorRAW files. Hence, the task of obtaining the preferred format is complicated. It often requires various treatments and conversions to reach the mentioned purpose. Audio extraction provides a way to extract audio from a media file but does not offer conversion tools.Otherwise,itisadvisable to use the facilities of VLC for the purpose of getting suitable audio material. It specially uses the java media framework (jmf) for audio extraction. Because JMF provides a platform- neutral framework for handling multimedia [4]. Java Media Framework (jmf): JMF is used for handling streaming media in Java programs. JMF enables Java programs to - • Present multimedia contents, • Capture audio through microphone and video content through camera, • Do instant streaming of media over the Internet, • Process media(such as changing media format, adding special effects), • Store media into a file. Figure 1. Activity Diagram for Audio Extraction. 3.2 Speech Recognition Some speech recognition systems make use of "source- independent speech recognition" while some others use "training" in which a single speaker reads sections of text into the SR system. These system analyze the person's specific voice and use it to find the recognition of that individual’s speech, which results in more accurate transcripts. Systems that do notmakeuseoftrainingmethod is called as "speaker-independent" systems. Systems that
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 916 make use of such training is called as "speaker-dependent" systems. Here in our Project we have used the Line playback device, which lets the voice to the google transcriptor in the chrome web app and the text is generated from it. Google Transcriptor gets the voice from the audio file which is provided by the user, through the LINE playback device and starts converting that audio into thetextformatanddisplays on the front end. We can create a file .txt or .rtf using that text. Using different Sphinx systems [6], for nearly two decades they have developed and proved their reliability within several ASR systems where they were linked in. It is therefore natural we opted for the Sphinx-4 decode. Sphinx- 4 has been entirely written in Java [7], making it totally portable and able to provide a modular architecture, that allows us to modify configurations with ease. The architecture overview wasshowninthe backgroundsection. We aim to procure advantage of Sphinx-4 features in order to satisfy the needs of the Automatic Subtitle Generation Speech recognition module [5]. Subtitle Generation is expected to select the most suitable models considering the audio and the parameters (category, length, etc.) given in input and thereby generate a precise transcript of the audio speech. The Figure 2 resumes the organization of thespeech recognition module. Speech Recognition uses the Hidden Markov Model (HMMs) as an algorithm. Figure 2.Activity Diagram for Speech Recognition. 3.3 Subtitle Generation: This module is expected to get a list of words and their respective speech time-frames from the speech recognition module and then produce a .srt subtitle file. To do so, the module must look at the list of words and use silence (SIL) spoken words as a boundary between for two consecutive sentences. The Figure 3 offers a visual representation of the steps to follow for the subtitle generation for sound in videos. However, we face up to some limiting rule. Indeed, it will not be able to define punctuation in our system since it involves much more speech analysis and deeper design. Figure 3. Activity Diagram for Subtitle Generation. 4. REQUIREMENTS OF PROPOSED SYSTEMS The requirements of our proposed system is expressed through Table I.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 917 TABLE I. REQUIREMENT SPECIFICATIONS No Technology used Description 1. 1GB RAM Storing 2 Java Speech API It allows to incorporation of speech technology into user interfaces for Java based applets and applications. This API defines a cross-platform interface to help command and control recognizers and speech synthesizers. Sun [12] hasalsodeveloped the JSGF(Java Speech Grammar Format) to supply cross-platform grammar of speech recognizers 3 JDK For writing Java applets and applications and run them into java language. 4 Apache Tomcat To manage sessions as well as applications across the network 5 MySQL Database Storing audio and video related text files 6 Net beans It is an IDE use to run the whole project. 5. CONCLUSIONS A complete system including therequiredmodulescouldnot be realized since the audio conversion need moreresources. VLC gave a suitable solution but a custom coded Java component is expected in future projects so that portability and installation of the system become uncomplicated. Nonetheless, the expected output for each phase has been reached. The audio extraction module supply a suitable audio format that can be used by the intended speech recognition module. Speech recognition produce a list of recognized words and their corresponding time-frames in the audio content, although the accuracy cannot be guaranteed. The former list is then used by the intended subtitle generation module to create standard subtitle file that is readable by the most common media players available. As of Now subtitles are being generated for the specific format of the Audio files, But In future this restriction of the format will be removed. English Subtitles of the Regional Languages will be generated, and the accuracy of the project will be more precise. Offline content will be web-based in coming future, and the User can access and generate subtitles for his video in a feasible way. Overall it will be Accurate, less time consuming, No language barrier, Web- based. REFERENCES [1] Boris Guenebaut, “Automatic Subtitle Generation for Sound in Videos”, Tapro 02, University West, pp. 35, 2009. [2]Zoubin Ghahramani,“An introduction to hiddenmarkov models and Bayesian networks”, in World Scientific PublishingCo., Inc. River Edge, NJ, USA, pp.30, 2001. [3] Robert Gordon, Stephen Talley, and Rob Gordon.Essential Jmf:Java Media Framework. Prentice Hall PTR, pp. 17, 1998. [4] Java tm media framework api guide, 1999. URL:http://java.sun.com/javase/technolog/ desktp/mdia/jmf/2.1.1/guide/index.html [5] Frederick Jelinek, Statistical Methods for Speech Recognition. MIT Press, pp. 7, 1999. [6] L. Besacier, C. Bergamini, D. Vaufreydaz, and E. Castelli,“The effect of speech and audio compression on speech recognition performance”. In IEEE Multimedia Signal Processing Workshop, Cannes:France,2001.URL :http://hal.archives ouvertes.fr/docs/00/32/61/65/PDF/Besacie r01b.pdf. [7] Sphinx-4 a speech recognizer written entirely in the javatm programming language, pp. 5, 1999-2008. URL http://cmusphinx.sourceforge.net/sphinx4/.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 02 | Feb-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 918 [8] J.P. Lewis and Ulrich Neumann. Performance of java v/s c++ URL:http://www.idiom.com/˜zilla/Comput er/javaCbenchmark.html.