SlideShare una empresa de Scribd logo
1 de 8
Empower Public Health
through Social Media
Zhen Wang, Ph.D.
Insight Health Data Science
Text
Cleaning, Tokenizing
Convert to Feature Vectors
“I like food!”
“Food is good!”
“I had some good food.”
i, like, food
food, is, good
i, had, some, good, food
e.g., TF-IDF
I’m really good
with numbers!
i like food is good had some
1 1 1 0 0 0 0
0 0 1 1 1 0 0
1 0 1 0 1 1 1
Downweight, Normalize
Machine
Learning
Numbers
Natural Language Processing
Text Classification
Normalized Retweet Counts
NumberofTweets
Distribution of Tweets
● Sample Imbalance
● Classification (0/1: Not / Retweeted)
● Logistic Regression
Threshold: 0.005
Misclassification Error: 22%
0 01 1
Train Test
downsampling
0.81
0.740.26
0.19
Normalized Confusion Matrix
Codes: github.com/zweinstein/SpreadHealth_dev
Zhen (Jen) Wang
Beta Tester
Since 2015 Editor since 2015
Traditional Medicine Science Fiction
Public Speaking Online Education
Ph.D. in Physical Chemistry
Thank you!
See the App in Action:
Text Preprocessing Pipeline
Text Cleaning:
● Convert to lower case
● Replace URL, #, and @
● Remove special characters other than
emoticons
● Remove stopwords
Tokenizing:
● Splitting each documents into individual
elements
● Bag-of-Words or N-grams
● Stemming
○ Porter Stemmer was used
○ Snowball or Lancaster stemmer faster but
more aggressive
○ Lemmatization computationally more
expensive but little impact on the
performance of text classification
Term Frequency-Inverse Document
Frequency (tf-idf):
Term Frequency--tf(t,d): the number of times
a term t occurs in a document d
Used to downweight frequently occurring
words in the feature vectors tf(t,d)
Document Frequency--df(d,f): the number of
documents d that contain a term t.
The implementation in Scikit-learn
● Train Dataset: 10000 tweets on diabetes (4782 retweeted);
● Test Set Accuracy (Random Chance 0.49 on positive class):
○ KNN: 60%
○ Naive Bayes: 67%
○ Logistic regression: 75% (chosen and tested on imbalanced test data)
● Potential Improvements:
○ Decision Trees with Bagging/Boosting (e.g., Random Forest, XGBoost)
○ Other Features:
■ Polarity & Sentiment
■ Length
● Out-of-Core Incremental Learning with Stochastic Gradient Descent
(Advantages of Logistic Regression…)
● Automatic Update to SQLite Database and to the Classifier
Prediction Algorithms

Más contenido relacionado

Destacado

Our Opening Title sequence presentation
Our Opening Title sequence presentationOur Opening Title sequence presentation
Our Opening Title sequence presentationchloe-carman
 
Hareket Magazine 12.
Hareket  Magazine 12.Hareket  Magazine 12.
Hareket Magazine 12.Hareket
 
3D Game Environment Workflow
3D Game Environment Workflow3D Game Environment Workflow
3D Game Environment Workflowraimondklavins
 
Hareket Magazine-19-2016
Hareket Magazine-19-2016Hareket Magazine-19-2016
Hareket Magazine-19-2016Hareket
 
What is a dance music video
What is a dance music videoWhat is a dance music video
What is a dance music videochloe-carman
 
Rich Aquilone- Top 5 Rock Drummers
Rich Aquilone- Top 5 Rock DrummersRich Aquilone- Top 5 Rock Drummers
Rich Aquilone- Top 5 Rock DrummersRichard Aquilone
 
1.1 ingles sistema operativo
1.1 ingles sistema operativo1.1 ingles sistema operativo
1.1 ingles sistema operativodenissecollins94
 
A step by-step guide on i doc-ale between two sap servers
A step by-step guide on i doc-ale between two sap serversA step by-step guide on i doc-ale between two sap servers
A step by-step guide on i doc-ale between two sap serverskrishna RK
 

Destacado (8)

Our Opening Title sequence presentation
Our Opening Title sequence presentationOur Opening Title sequence presentation
Our Opening Title sequence presentation
 
Hareket Magazine 12.
Hareket  Magazine 12.Hareket  Magazine 12.
Hareket Magazine 12.
 
3D Game Environment Workflow
3D Game Environment Workflow3D Game Environment Workflow
3D Game Environment Workflow
 
Hareket Magazine-19-2016
Hareket Magazine-19-2016Hareket Magazine-19-2016
Hareket Magazine-19-2016
 
What is a dance music video
What is a dance music videoWhat is a dance music video
What is a dance music video
 
Rich Aquilone- Top 5 Rock Drummers
Rich Aquilone- Top 5 Rock DrummersRich Aquilone- Top 5 Rock Drummers
Rich Aquilone- Top 5 Rock Drummers
 
1.1 ingles sistema operativo
1.1 ingles sistema operativo1.1 ingles sistema operativo
1.1 ingles sistema operativo
 
A step by-step guide on i doc-ale between two sap servers
A step by-step guide on i doc-ale between two sap serversA step by-step guide on i doc-ale between two sap servers
A step by-step guide on i doc-ale between two sap servers
 

Similar a Zhen wang demo3

Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
 
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...The Research Council of Norway, IKTPLUSS
 
Predicting Thyroid Disorder with Deep Neural Networks
Predicting Thyroid Disorder with Deep Neural NetworksPredicting Thyroid Disorder with Deep Neural Networks
Predicting Thyroid Disorder with Deep Neural NetworksAnaelia Ovalle
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Automated health responses
Automated health responses Automated health responses
Automated health responses Austin Powell
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicineHongyoon Choi
 
Biostatistics and DNA for NCKU iGEM
Biostatistics and DNA for NCKU iGEMBiostatistics and DNA for NCKU iGEM
Biostatistics and DNA for NCKU iGEMPo-Jen Wu
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Twitter Inc.
 
TADPole_Nurjahan Begum
TADPole_Nurjahan BegumTADPole_Nurjahan Begum
TADPole_Nurjahan BegumNurjahan Begum
 

Similar a Zhen wang demo3 (20)

Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
 
Predicting Thyroid Disorder with Deep Neural Networks
Predicting Thyroid Disorder with Deep Neural NetworksPredicting Thyroid Disorder with Deep Neural Networks
Predicting Thyroid Disorder with Deep Neural Networks
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Automated health responses
Automated health responses Automated health responses
Automated health responses
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicine
 
Biostatistics and DNA for NCKU iGEM
Biostatistics and DNA for NCKU iGEMBiostatistics and DNA for NCKU iGEM
Biostatistics and DNA for NCKU iGEM
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
 
TADPole_Nurjahan Begum
TADPole_Nurjahan BegumTADPole_Nurjahan Begum
TADPole_Nurjahan Begum
 

Último

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 

Último (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 

Zhen wang demo3

  • 1. Empower Public Health through Social Media Zhen Wang, Ph.D. Insight Health Data Science
  • 2. Text Cleaning, Tokenizing Convert to Feature Vectors “I like food!” “Food is good!” “I had some good food.” i, like, food food, is, good i, had, some, good, food e.g., TF-IDF I’m really good with numbers! i like food is good had some 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 1 1 Downweight, Normalize Machine Learning Numbers Natural Language Processing
  • 3. Text Classification Normalized Retweet Counts NumberofTweets Distribution of Tweets ● Sample Imbalance ● Classification (0/1: Not / Retweeted) ● Logistic Regression Threshold: 0.005 Misclassification Error: 22% 0 01 1 Train Test downsampling 0.81 0.740.26 0.19 Normalized Confusion Matrix Codes: github.com/zweinstein/SpreadHealth_dev
  • 4. Zhen (Jen) Wang Beta Tester Since 2015 Editor since 2015 Traditional Medicine Science Fiction Public Speaking Online Education Ph.D. in Physical Chemistry
  • 6. See the App in Action:
  • 7. Text Preprocessing Pipeline Text Cleaning: ● Convert to lower case ● Replace URL, #, and @ ● Remove special characters other than emoticons ● Remove stopwords Tokenizing: ● Splitting each documents into individual elements ● Bag-of-Words or N-grams ● Stemming ○ Porter Stemmer was used ○ Snowball or Lancaster stemmer faster but more aggressive ○ Lemmatization computationally more expensive but little impact on the performance of text classification Term Frequency-Inverse Document Frequency (tf-idf): Term Frequency--tf(t,d): the number of times a term t occurs in a document d Used to downweight frequently occurring words in the feature vectors tf(t,d) Document Frequency--df(d,f): the number of documents d that contain a term t. The implementation in Scikit-learn
  • 8. ● Train Dataset: 10000 tweets on diabetes (4782 retweeted); ● Test Set Accuracy (Random Chance 0.49 on positive class): ○ KNN: 60% ○ Naive Bayes: 67% ○ Logistic regression: 75% (chosen and tested on imbalanced test data) ● Potential Improvements: ○ Decision Trees with Bagging/Boosting (e.g., Random Forest, XGBoost) ○ Other Features: ■ Polarity & Sentiment ■ Length ● Out-of-Core Incremental Learning with Stochastic Gradient Descent (Advantages of Logistic Regression…) ● Automatic Update to SQLite Database and to the Classifier Prediction Algorithms

Notas del editor

  1. http://54.191.168.240