SlideShare una empresa de Scribd logo
1 de 15
Transcription Factor-DNA binding
                 prediction
Tahmina Ahmed
Prosunjit Biswas
Iffat Sharmin Chowdhury
Badri Sampath




                                 1
Motivation
• Label the unlabeled DNA sequences by the model,
  built by examining the labeled DNA sequences
  and be able to perceive some real world Machine
  Learning problems.




                                        2
Approaches
• K-mer based
     Fixed length K-mer
     K-mer with Mismatches
     Using Regular Expression
• PWM based
     MEME and MAST
• Combined Model
    Unite both model




                                3
K-mer Approach Based on Regular
                Expression
Motivation
  2-mer appears mostly in the sequences. So, emphasize
 mostly on 2-mer.

Strategy
  - For any two 2-mers X & Y, generate regular expression
  X(.*)Y and Y(.*)X.
  - Use these Regular expression as candidate attribute.
Classifier Selection




                        Fig : Around 9 classifiers applied on TF data set
Algorithms are numbered as follows -
      (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging
                               7)LogitBoost (8)RandomForest (9)J48
Summary -
     * 9 classifiers are applied on 10 data set. 3 are shown among them
     * choosing an absolute classifier is not a trivial task
     * same classifier behaves differently on different data sets
                                                                            5
Change in Accuracy due to Different Classifiers




                 Logistic         J48       RandomForest     NaiveBayes                      Logistic         J48       RandomForest     NaiveBayes

 Fig : The performance of different types of Classifiers on TF_3 data set   Fig : The performance of different types of Classifiers on TF_5 data set




Summary -
       * classifiers have great consequences on accuracy
       * one has to be prudent when choosing classifiers

                                                                                                                       6
Change in Accuracy due to Different K-mer
                  Length




                        4-mer             5-mer             6-mer
            Fig : The performance of different length K-mer on TF_3 data set


Summary -
    * K-mer length also has consequences on accuracy
    * not trivial, difficult to find the absolute one


                                                                               7
Attribute Space Selection




        Fig : The performance of different selecting k-mer on TF_4 data set


Summary -
    * considering number of attributes also has consequences on accuracy
    * accuracy increases if we consider greater number of attributes, but from such
   saturation point it decreases.


                                                                              8
PWM based Analysis on Accuracy
                       (TF_1 data set)




Fig : J48, minW 6 - maxW 15, no. of sites 10               Fig : J48, minW 6 – maxW 15, no. of motifs 5
Summary -
      * accuracy increases when we have more motifs but fixed no. of sites
      * accuracy increases when we have more sites but fixed no. of motifs
      * what happened when we increases both ?????


                                                                                 9
PWM based Analysis




                            Fig : Accuracy vary on no. of motifs and no. of sites


* 1st bar concern with no. of sites
* 2nd bar concern with no. of motifs
* 3rd bar concern with accuracy
* the point is that accuracy decreases when we increases no. of motifs and no. of sites.
Extra Work for TF_20


                  Sequences
                identified by
                 both model
K-mer
                                                                   The New Model
  +                                                                  for TF-20
Pwm              Sequences         Biased 2-         Newly
                 identified        mer Model         Labeled
                 differently                        Sequences



              Fig : Flow diagram of Building New Model for TF-20


Summary -
    * we have done some extra work for TF_20
AUC based on the Feedback (bonus model)




                    Fig : AUC of 10 data sets based on last submission


* accuracy improved than first submission
* PWM does not have pleasant result



                                                                         12
Participation
            Background      Working      Working   Paramete   Automation
              Study         with Tools    with     r Tuning
                                         Models
  Badri     DNA,RNA,        AlignAce,     PWM       K-mer     Arff Writer,
 Sampath     protein,        MEME,                            Mast output
              motif          MAST                               writer
   Iffat      Protein,       Weka,        K-mer     PWM        Script for
 Sharmin       Motif,       AlignAce,                          FASTA,
Chowdhury   Transcriptio    ScanAce                             Weka
                 n
Prosunjit      DNA,          MEME,        K-mer     PWM        Script for
 Biswas     Transcriptio     MAST                             RE, for new
              nK-mer                                            model
 Tahmina      MEME,          MEME,        PWM       K-mer      Script for
  Ahmed       MAST,          MAST,                              MEME,
              PWM             Weka                              MAST




                                                                   13
Acknowledgment




                 14
Questions ???

Más contenido relacionado

Similar a Final Project Transciption Factor DNA binding Prediction

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduhoKim Du-Ho
 
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...TSC University of Mondragon
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsUniversität Rostock
 
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...Naoki Shibata
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionSebastian Hafner
 
Presentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturnePresentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturneRenuda SARL
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization MachineInMobi
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi templateGunjan Sharma
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMOptimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMcloudSME
 
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Mumbai B.Sc.IT Study
 

Similar a Final Project Transciption Factor DNA binding Prediction (20)

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
 
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementations
 
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
 
The CTO's Espresso Guide to SON
The CTO's Espresso Guide to SONThe CTO's Espresso Guide to SON
The CTO's Espresso Guide to SON
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognition
 
Presentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturnePresentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_Saturne
 
BWA-MEM2-IPDPS 2019
BWA-MEM2-IPDPS 2019BWA-MEM2-IPDPS 2019
BWA-MEM2-IPDPS 2019
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi template
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
Solido Pvt Corner Package Datasheet
Solido Pvt Corner Package DatasheetSolido Pvt Corner Package Datasheet
Solido Pvt Corner Package Datasheet
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMOptimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
 
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
 
UNIT 2.pptx
UNIT 2.pptxUNIT 2.pptx
UNIT 2.pptx
 

Más de UT, San Antonio

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formatsUT, San Antonio
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with SonarlintUT, San Antonio
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityUT, San Antonio
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelUT, San Antonio
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)UT, San Antonio
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystoneUT, San Antonio
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjitUT, San Antonio
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based EncryptionUT, San Antonio
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionUT, San Antonio
 

Más de UT, San Antonio (20)

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formats
 
Saml metadata
Saml metadataSaml metadata
Saml metadata
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with Sonarlint
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerability
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Recitation
RecitationRecitation
Recitation
 
Recitation
RecitationRecitation
Recitation
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) model
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)
 
Three month course
Three month courseThree month course
Three month course
 
One month-syllabus
One month-syllabusOne month-syllabus
One month-syllabus
 
Zerovm backgroud
Zerovm backgroudZerovm backgroud
Zerovm backgroud
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystone
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjit
 
Ksi
KsiKsi
Ksi
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based Encryption
 
Cyber Security Exam 2
Cyber Security Exam 2Cyber Security Exam 2
Cyber Security Exam 2
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Final Project Transciption Factor DNA binding Prediction

  • 1. Transcription Factor-DNA binding prediction Tahmina Ahmed Prosunjit Biswas Iffat Sharmin Chowdhury Badri Sampath 1
  • 2. Motivation • Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems. 2
  • 3. Approaches • K-mer based Fixed length K-mer K-mer with Mismatches Using Regular Expression • PWM based MEME and MAST • Combined Model Unite both model 3
  • 4. K-mer Approach Based on Regular Expression Motivation 2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer. Strategy - For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X. - Use these Regular expression as candidate attribute.
  • 5. Classifier Selection Fig : Around 9 classifiers applied on TF data set Algorithms are numbered as follows - (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48 Summary - * 9 classifiers are applied on 10 data set. 3 are shown among them * choosing an absolute classifier is not a trivial task * same classifier behaves differently on different data sets 5
  • 6. Change in Accuracy due to Different Classifiers Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data set Summary - * classifiers have great consequences on accuracy * one has to be prudent when choosing classifiers 6
  • 7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer 6-mer Fig : The performance of different length K-mer on TF_3 data set Summary - * K-mer length also has consequences on accuracy * not trivial, difficult to find the absolute one 7
  • 8. Attribute Space Selection Fig : The performance of different selecting k-mer on TF_4 data set Summary - * considering number of attributes also has consequences on accuracy * accuracy increases if we consider greater number of attributes, but from such saturation point it decreases. 8
  • 9. PWM based Analysis on Accuracy (TF_1 data set) Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5 Summary - * accuracy increases when we have more motifs but fixed no. of sites * accuracy increases when we have more sites but fixed no. of motifs * what happened when we increases both ????? 9
  • 10. PWM based Analysis Fig : Accuracy vary on no. of motifs and no. of sites * 1st bar concern with no. of sites * 2nd bar concern with no. of motifs * 3rd bar concern with accuracy * the point is that accuracy decreases when we increases no. of motifs and no. of sites.
  • 11. Extra Work for TF_20 Sequences identified by both model K-mer The New Model + for TF-20 Pwm Sequences Biased 2- Newly identified mer Model Labeled differently Sequences Fig : Flow diagram of Building New Model for TF-20 Summary - * we have done some extra work for TF_20
  • 12. AUC based on the Feedback (bonus model) Fig : AUC of 10 data sets based on last submission * accuracy improved than first submission * PWM does not have pleasant result 12
  • 13. Participation Background Working Working Paramete Automation Study with Tools with r Tuning Models Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin Motif, AlignAce, FASTA, Chowdhury Transcriptio ScanAce Weka n Prosunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for Ahmed MAST, MAST, MEME, PWM Weka MAST 13