SlideShare una empresa de Scribd logo
1 de 3
Descargar para leer sin conexión
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 3866
Speech to Speech Translation using Encoder Decoder Architecture
Rohan Nakave1, Zarana Prajapati2, Pranav Phulware3, Prof. Akshay Loke4
1,2,3Students of Department of Information Technology, Vidyalankar Institute Of Technology, Mumbai,
Maharashtra
4Prof. of Department of Information Technology, Vidyalankar Institute Of Technology, Mumbai, Maharashtra
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In this paper, we address the task of Audio
language translation. We present a Two-way translation
method for translating spoken sentences from English
language into spoken sentences in another language and vice
versa. In this device it consists of mainly three modules speech
recognition in English language, text to text translation from
English to other language and other language speech
generation. It first recognize the speech (in English) using
speech recognition and displays the text on the screen and
then translates the text from English (text) into the desired
language(text) and displays it on the screen, after that it
converts into desired language (speech) which can heard at
the other end of the device.
Key Words: Speech Recognition, Speech Generation,Audio
language translation, Two-way translation method, text to
text translation.
1.INTRODUCTION
Nowadays with people’s emerging interest in travelling
around the world and meeting and interactingwithdifferent
types of people with different culture and tradition, most of
the people face the difficulty in communicatingandspeaking
with other people, who does not know their languages. To
avoid these difficulties the developed system records the
interaction and translate it into the desired languages.
Speech-to-speech translation technology represents a
technology which automatically translates one language to
another languageinordertoenablecommunication between
two parties with different native languages [1]. Speech to
speech translation systems are developed over the past
several decades with the goal of helping folks that speak
different languages to talk with each other. These systems
have usually been broken into three separate components:
automatic speech recognition to transcribe the source
speech as text, MT to translate the transcribed text into the
target language, and text-to-speech synthesis (TTS) to urge
speech within the target language from the translated text.
Additionally, the technology helps to understand and
recognize the native language and the user interface-related
technology integrated with the UI also play a crucial role
during this speech-to-speechtranslationsystemfromknown
language to the target language.
1.1. LITERARTURE SURVEY
In [1] they had proposed a device that automatically
recognize the speech in the form of English language and
translate into Tamil language. In the system and technique,
they record the speech in English languageandtranslateinto
Tamil language using manual transaction, so that difficulties
can be avoided during communication.
In [2] they had proposed a method for translating spoken
sentences from one language into spoken sentences in
another language using Spectrogram pairs. This model is
trained completely from scratch using spectrogrampairs,so
that it translates unseen sentences. A pyramidal-
bidirectional recurrent network is combined with a
convolutional network to output sentence-level
spectrograms in the target language.
In [3], the proposed system is designed in such a way that it
helps the patients speaking English language to
communicate and describe their symptoms to Korean
doctors or nurses. It is a one-way translation system.Speech
recognition, English-Korean translation, and Korean speech
generation are the three main parts of the system.
In [4], they have presented a deep attention model (Deep
ATT) for NMT systems. For high level translationtasks,Deep
ATT allows low-level attention information to decide on
what should be passed or suppressed from the encoder
layer. The model is designed to perform translation tasks on
both NIST Chinese-English, WMT14 English-German and
English-French. The model is easy to implement and flexible
to train and shows effectiveness in improving both
translation and alignment quality.
2. PROPOSED SYSTEM
A. System Overview:
In this project, we have created this application whichcanbe
used as an alternative for the traditional translating method
i.e. a human translator which is expensive in a daytodaylife.
Here we take one language as an audio input from Speech
Recognition Device. Then input audio will get converted to
text format. This input text is then passed to our neural
network model.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 3867
Once the entire input text is translated from the source
language to the target language (the language in which the
sentence needs tobetranslated),thetranslatedsentence will
the shown in a text format. The next and the final step is to
convert the sentence from the text format into the audio
format. The converted audio format in the target language is
presented as output.
Block diagram
Fig. 2.1 Block diagram
B. Proposed Model:
The overall structure of our Neural Network model consists
of an Encoder and Decoder both of these are Recurrent
Neural Networks and they are connected to Thought Vector
as shown in fig. 2.2. The Encoder network here outputs a
Thought Vector which is an array of float point numbers
roughly between -1 and 1 which summarizes thecontents or
the meaning or the intentions of the input text. We then use
this Thought Vector as the initial state of the recurrent units
in the decoder part of the network and then we start by
inputting a marker which have taken “ssss because it does
exist in vocabulary for the dataset and given this initial state
which is the thought vector summarizing the input text and
the start marker we want the decoder to output the word
‘once’ as shown in fig. 2.2 and in the next timestampwethen
include the same initial state, the thoughtvector butwehave
we both the start marker and the word ‘once’, then we want
the network to produce the word ‘upon’ and we input the
start marker ‘once upon’ which outputtheword ‘a’.Weinput
all the remaining words in a similar way and in the end we
input the start marker and the whole input text and get the
end marker “eeee” as the output. End marker also does not
exist in vocabulary for the dataset. This isthebasicoverview
of our Neural Network model. Now we will study on how
each layer works in the encoder and decoder of the Neural
network.
Fig. 2.2 Encoder Decoder architecture
The Recurrent units cannot work directly on text data so
there is a two-step process to convert it to numbers for the
Neural Network to work on. The first step is to use a
Tokenizer. A Tokenizer basically turns words into integers,
we need a Tokenizer for source language and we also need a
Tokenizer for the destination language because they have
different vocabularies. So, if we take the words in the input
text, it gets converted into their integer token using
Tokenization as shown in the figure. The tokens are
enumerated according to their frequency in the dataset that
we are using. But the Neural Network can still not work on
integers, so we have to convert each of these integers into a
vector a float point numbers roughly between -1 and1 so
that the Neural Network can work on it.
The Embedding layer is also different for the encoder and
the decoder because of the work on different languages and
it learns semantic similarities between layers. The
Embedding layer will learn that certain words have
possibility to occur in same situations so they probablyhave
the same semantic meaning. Another thingtonoteisthatthe
Tokenizer is applied outside the Neural Network because
there is no need to do this every-time we run data through
the Neural Network, so we just process the data once. So the
Neural Network actuallyconsists oftheEmbeddinglayer and
then 3 layers of gated Recurrent units and we use these
instead of LSTM because we want to set the initial state for
the Recurrent unit in the decoder and the LSTM has to
internal states and this force to take out the last internal
state for the last Recurrent unit here and use that as the
thought vector and the initial state at the decoder. When we
use the gated recurrent units it’s a lot easier we can either
use the internal state or the output in either case we get a
vector out which has size of the internal state of the gated
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 3868
recurrent unit and because it has only on internal unit we
need only one vector to initialize it. Now what we get out of
the final layer of Recurrent units is a sequence which only
outputs one vector and not a sequenceofvectorsbecausewe
only need one thought vector to summarize the contents of
the input text. But now we want to generate a sequence of
words ultimately so if the internal state of the gated
recurrent units have 256 elements andwehavea vocabulary
and destination language of 10,000wordsthenwesomehow
need to convert this vector of 256 elements to a number
between 1 and 10,000. One way of doing that is by using
encoded arrays. So, we want to output a vector of 10,000
elements and we take the index of the highest element and
this is the integer that we want to output.Thedenselayersof
Recurrent unit maps from a vector that is 256 elements long
to a vector that is 10,000 elements long so that we can take
the maximum and can take the integer out and then we use
the Tokenizer again to convert that to the original word.
once the entire input text is translated from the source
language to the target language (the language in which the
sentence needs tobetranslated),thetranslatedsentence will
the shown in a text format. The next and the final step is to
convert the sentence from the text format into the audio
format. The converted audio format in the target language is
presented as output.
C. Flowchart:
Fig. 2.3 Workflow of the project
3. CONCLUSIONS AND FUTURE WORK
This paper is showing that neural networks can be very
powerful in translating text from one language to another.
Using a simplified model, a set of words can be recognized
easily and can be translated to other language. Thus, the
proposed model helps to avoidthelanguage barrierbetween
people and helps them to communicate and interact with
each other easily.
This model could be integrated with a functionoftranslating
the text written hoardings of shop from other languages in
English. This can be done by first clicking the picture of
hoarding, then extracting the text on the image and then
translating it into English language. We can add more
languages for translation. This would mainly help the solo
travelers while travelling.
REFERENCES
[1] J.Poornakala , A.Maheshwari, “Automatic Speech-Speech
Translation System from English to Tamil Language”,
International Journal of Innovative Research in Science,
Engineering and Technology, Vol. 5, Issue 3, March 2016.
[2] Michelle Guo, Albert Haque, “End-to-End Spoken
Language Translation”.
[3] Sangmi Shin, Eric T. Matson, “Speech-to-Speech
Translation Humanoid Robot in Doctor’s Office”.
[4] Biao Zhang, Deyi Xiong and Jinsong Su, “Neural Machine
Translation with Deep Attention”, Journal of latex class files,
Vol. 14, NO. 8, August 2015.
[5] Wouter Gevaert, Georgi Tsenov, Valeri Mladenov, Senior
member IEEE, “Neural Machine Translation with Deep
Attention”, Journal of automatic control, University of
Belgrade, Vol. 20:1-7, 2010
[6] Shi Fan, Yili Yu, “Evaluating GRU and LSTM with
Regularization on Translating Different Language Pairs”.

Más contenido relacionado

La actualidad más candente

DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
Paper id 24201469
Paper id 24201469Paper id 24201469
Paper id 24201469
IJRAT
 

La actualidad más candente (16)

A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET- Applications of Artificial Intelligence in Neural Machine TranslationIRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Model
 
IRJET - Deep Learning based Chatbot
IRJET - Deep Learning based ChatbotIRJET - Deep Learning based Chatbot
IRJET - Deep Learning based Chatbot
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion Recognition
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
Cleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotCleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbot
 
IRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment AnalysisIRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment Analysis
 
Paper id 24201469
Paper id 24201469Paper id 24201469
Paper id 24201469
 
11 syllabus
11    syllabus11    syllabus
11 syllabus
 
IRJET- Speech to Speech Translation System
IRJET- Speech to Speech Translation SystemIRJET- Speech to Speech Translation System
IRJET- Speech to Speech Translation System
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 

Similar a IRJET - Speech to Speech Translation using Encoder Decoder Architecture

Similar a IRJET - Speech to Speech Translation using Encoder Decoder Architecture (20)

Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
A study on the techniques for speech to speech translation
A study on the techniques for speech to speech translationA study on the techniques for speech to speech translation
A study on the techniques for speech to speech translation
 
A Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation TechniquesA Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation Techniques
 
IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...
IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...
IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Real Time Direct Speech-to-Speech Translation
Real Time Direct Speech-to-Speech TranslationReal Time Direct Speech-to-Speech Translation
Real Time Direct Speech-to-Speech Translation
 
IRJET - Voice based Natural Language Query Processing
IRJET -  	  Voice based Natural Language Query ProcessingIRJET -  	  Voice based Natural Language Query Processing
IRJET - Voice based Natural Language Query Processing
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
 
Currency Recognition using Machine Learning
Currency Recognition using Machine LearningCurrency Recognition using Machine Learning
Currency Recognition using Machine Learning
 
IRJET- Image Caption Generation System using Neural Network with Attention Me...
IRJET- Image Caption Generation System using Neural Network with Attention Me...IRJET- Image Caption Generation System using Neural Network with Attention Me...
IRJET- Image Caption Generation System using Neural Network with Attention Me...
 
AI and Web-Based Interactive College Enquiry Chatbot
AI and Web-Based Interactive College Enquiry ChatbotAI and Web-Based Interactive College Enquiry Chatbot
AI and Web-Based Interactive College Enquiry Chatbot
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting Algorithm
 
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
 
IRJET- Recruitment Chatbot
IRJET- Recruitment ChatbotIRJET- Recruitment Chatbot
IRJET- Recruitment Chatbot
 
IRJET- Lost: The Horror Game
IRJET- Lost: The Horror GameIRJET- Lost: The Horror Game
IRJET- Lost: The Horror Game
 
IRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech AnalysisIRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech Analysis
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
IDE Code Compiler for the physically challenged (Deaf, Blind & Mute)
IDE Code Compiler for the physically challenged (Deaf, Blind & Mute)IDE Code Compiler for the physically challenged (Deaf, Blind & Mute)
IDE Code Compiler for the physically challenged (Deaf, Blind & Mute)
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
 

Más de IRJET Journal

Más de IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Último

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Último (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

IRJET - Speech to Speech Translation using Encoder Decoder Architecture

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 3866 Speech to Speech Translation using Encoder Decoder Architecture Rohan Nakave1, Zarana Prajapati2, Pranav Phulware3, Prof. Akshay Loke4 1,2,3Students of Department of Information Technology, Vidyalankar Institute Of Technology, Mumbai, Maharashtra 4Prof. of Department of Information Technology, Vidyalankar Institute Of Technology, Mumbai, Maharashtra ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - In this paper, we address the task of Audio language translation. We present a Two-way translation method for translating spoken sentences from English language into spoken sentences in another language and vice versa. In this device it consists of mainly three modules speech recognition in English language, text to text translation from English to other language and other language speech generation. It first recognize the speech (in English) using speech recognition and displays the text on the screen and then translates the text from English (text) into the desired language(text) and displays it on the screen, after that it converts into desired language (speech) which can heard at the other end of the device. Key Words: Speech Recognition, Speech Generation,Audio language translation, Two-way translation method, text to text translation. 1.INTRODUCTION Nowadays with people’s emerging interest in travelling around the world and meeting and interactingwithdifferent types of people with different culture and tradition, most of the people face the difficulty in communicatingandspeaking with other people, who does not know their languages. To avoid these difficulties the developed system records the interaction and translate it into the desired languages. Speech-to-speech translation technology represents a technology which automatically translates one language to another languageinordertoenablecommunication between two parties with different native languages [1]. Speech to speech translation systems are developed over the past several decades with the goal of helping folks that speak different languages to talk with each other. These systems have usually been broken into three separate components: automatic speech recognition to transcribe the source speech as text, MT to translate the transcribed text into the target language, and text-to-speech synthesis (TTS) to urge speech within the target language from the translated text. Additionally, the technology helps to understand and recognize the native language and the user interface-related technology integrated with the UI also play a crucial role during this speech-to-speechtranslationsystemfromknown language to the target language. 1.1. LITERARTURE SURVEY In [1] they had proposed a device that automatically recognize the speech in the form of English language and translate into Tamil language. In the system and technique, they record the speech in English languageandtranslateinto Tamil language using manual transaction, so that difficulties can be avoided during communication. In [2] they had proposed a method for translating spoken sentences from one language into spoken sentences in another language using Spectrogram pairs. This model is trained completely from scratch using spectrogrampairs,so that it translates unseen sentences. A pyramidal- bidirectional recurrent network is combined with a convolutional network to output sentence-level spectrograms in the target language. In [3], the proposed system is designed in such a way that it helps the patients speaking English language to communicate and describe their symptoms to Korean doctors or nurses. It is a one-way translation system.Speech recognition, English-Korean translation, and Korean speech generation are the three main parts of the system. In [4], they have presented a deep attention model (Deep ATT) for NMT systems. For high level translationtasks,Deep ATT allows low-level attention information to decide on what should be passed or suppressed from the encoder layer. The model is designed to perform translation tasks on both NIST Chinese-English, WMT14 English-German and English-French. The model is easy to implement and flexible to train and shows effectiveness in improving both translation and alignment quality. 2. PROPOSED SYSTEM A. System Overview: In this project, we have created this application whichcanbe used as an alternative for the traditional translating method i.e. a human translator which is expensive in a daytodaylife. Here we take one language as an audio input from Speech Recognition Device. Then input audio will get converted to text format. This input text is then passed to our neural network model.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 3867 Once the entire input text is translated from the source language to the target language (the language in which the sentence needs tobetranslated),thetranslatedsentence will the shown in a text format. The next and the final step is to convert the sentence from the text format into the audio format. The converted audio format in the target language is presented as output. Block diagram Fig. 2.1 Block diagram B. Proposed Model: The overall structure of our Neural Network model consists of an Encoder and Decoder both of these are Recurrent Neural Networks and they are connected to Thought Vector as shown in fig. 2.2. The Encoder network here outputs a Thought Vector which is an array of float point numbers roughly between -1 and 1 which summarizes thecontents or the meaning or the intentions of the input text. We then use this Thought Vector as the initial state of the recurrent units in the decoder part of the network and then we start by inputting a marker which have taken “ssss because it does exist in vocabulary for the dataset and given this initial state which is the thought vector summarizing the input text and the start marker we want the decoder to output the word ‘once’ as shown in fig. 2.2 and in the next timestampwethen include the same initial state, the thoughtvector butwehave we both the start marker and the word ‘once’, then we want the network to produce the word ‘upon’ and we input the start marker ‘once upon’ which outputtheword ‘a’.Weinput all the remaining words in a similar way and in the end we input the start marker and the whole input text and get the end marker “eeee” as the output. End marker also does not exist in vocabulary for the dataset. This isthebasicoverview of our Neural Network model. Now we will study on how each layer works in the encoder and decoder of the Neural network. Fig. 2.2 Encoder Decoder architecture The Recurrent units cannot work directly on text data so there is a two-step process to convert it to numbers for the Neural Network to work on. The first step is to use a Tokenizer. A Tokenizer basically turns words into integers, we need a Tokenizer for source language and we also need a Tokenizer for the destination language because they have different vocabularies. So, if we take the words in the input text, it gets converted into their integer token using Tokenization as shown in the figure. The tokens are enumerated according to their frequency in the dataset that we are using. But the Neural Network can still not work on integers, so we have to convert each of these integers into a vector a float point numbers roughly between -1 and1 so that the Neural Network can work on it. The Embedding layer is also different for the encoder and the decoder because of the work on different languages and it learns semantic similarities between layers. The Embedding layer will learn that certain words have possibility to occur in same situations so they probablyhave the same semantic meaning. Another thingtonoteisthatthe Tokenizer is applied outside the Neural Network because there is no need to do this every-time we run data through the Neural Network, so we just process the data once. So the Neural Network actuallyconsists oftheEmbeddinglayer and then 3 layers of gated Recurrent units and we use these instead of LSTM because we want to set the initial state for the Recurrent unit in the decoder and the LSTM has to internal states and this force to take out the last internal state for the last Recurrent unit here and use that as the thought vector and the initial state at the decoder. When we use the gated recurrent units it’s a lot easier we can either use the internal state or the output in either case we get a vector out which has size of the internal state of the gated
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 3868 recurrent unit and because it has only on internal unit we need only one vector to initialize it. Now what we get out of the final layer of Recurrent units is a sequence which only outputs one vector and not a sequenceofvectorsbecausewe only need one thought vector to summarize the contents of the input text. But now we want to generate a sequence of words ultimately so if the internal state of the gated recurrent units have 256 elements andwehavea vocabulary and destination language of 10,000wordsthenwesomehow need to convert this vector of 256 elements to a number between 1 and 10,000. One way of doing that is by using encoded arrays. So, we want to output a vector of 10,000 elements and we take the index of the highest element and this is the integer that we want to output.Thedenselayersof Recurrent unit maps from a vector that is 256 elements long to a vector that is 10,000 elements long so that we can take the maximum and can take the integer out and then we use the Tokenizer again to convert that to the original word. once the entire input text is translated from the source language to the target language (the language in which the sentence needs tobetranslated),thetranslatedsentence will the shown in a text format. The next and the final step is to convert the sentence from the text format into the audio format. The converted audio format in the target language is presented as output. C. Flowchart: Fig. 2.3 Workflow of the project 3. CONCLUSIONS AND FUTURE WORK This paper is showing that neural networks can be very powerful in translating text from one language to another. Using a simplified model, a set of words can be recognized easily and can be translated to other language. Thus, the proposed model helps to avoidthelanguage barrierbetween people and helps them to communicate and interact with each other easily. This model could be integrated with a functionoftranslating the text written hoardings of shop from other languages in English. This can be done by first clicking the picture of hoarding, then extracting the text on the image and then translating it into English language. We can add more languages for translation. This would mainly help the solo travelers while travelling. REFERENCES [1] J.Poornakala , A.Maheshwari, “Automatic Speech-Speech Translation System from English to Tamil Language”, International Journal of Innovative Research in Science, Engineering and Technology, Vol. 5, Issue 3, March 2016. [2] Michelle Guo, Albert Haque, “End-to-End Spoken Language Translation”. [3] Sangmi Shin, Eric T. Matson, “Speech-to-Speech Translation Humanoid Robot in Doctor’s Office”. [4] Biao Zhang, Deyi Xiong and Jinsong Su, “Neural Machine Translation with Deep Attention”, Journal of latex class files, Vol. 14, NO. 8, August 2015. [5] Wouter Gevaert, Georgi Tsenov, Valeri Mladenov, Senior member IEEE, “Neural Machine Translation with Deep Attention”, Journal of automatic control, University of Belgrade, Vol. 20:1-7, 2010 [6] Shi Fan, Yili Yu, “Evaluating GRU and LSTM with Regularization on Translating Different Language Pairs”.