SlideShare a Scribd company logo
1 of 2
Download to read offline
Wenzhe(Evelyn) Xu
217-417-9270 | wxu23@illinois.edu | Address: 100 24 Ave #3 ,San Mateo, CA 94403
CORE QUALIFICATIONS
Overall Highlight: Solid data science background with data engineer skill sets applied in industry
Statistical Skill:
l Data manipulation: ETL (extract-transform-loading) data technique with large dataset (Impala 1.x)
l Modeling: machine learning (regression, classification, clustering), categorical data analysis, time series analysis,
sampling, ANOVA analysis, A/B testing, dimension reduction, model selection
Computer Skill:
l Programming: expertise in R (4+yr,dplyr,ggplot2,shiny) , Python(1+yr, Scipy, numpy, pandas, scikit-learn), SQL(3+yr)
l Data engineering: Apache Impala 1.x, Apache Hadoop 2.x, Cloudera CDH 5.x, Apache Spark 1.x
l Software and system: SAS (Advanced Certified), SPSS, Looker, Linux (CentOS6.5), Microsoft Office, Map Reduce
Communication and Presentation:
l Meeting with business colleagues and transfer the commercial goal into data-driven objective
l Quick response to the later-added requirement from business side
l Presentation of the modeling result and data insight by PPT and building interactive dashboard through Shiny
Project Management: Able to track each stage according to scheduled timeline for an independent project
EDUCATION BACKGROUND
University of Illinois at Urbana-Champaign 8/2013-5/2015
Master of Science in Statistics-Analytics | Major GPA:3.96/4.0
Tianjin University of Finance and Economics, Tianjin, China
9/2009-6/2013
Bachelor of Science in Statistics | Overall GPA: 3.75/4.0 | Major GPA:4.0/4.0
PROFESSIONAL EXPERIENCE
Data Scientist Intern, MasterCard Inc. San Carlos, CA 6/2015 - 12/2015(expected)
Independently designed the automatic anomalies detection system for business metrics time series data
l Researched on anomaly detection algorithm and finally choose the S-H-ESD test from Twitter as basic logic
l Read package source code of R and re-organized the logic and input to fit the business metrics data application scenario,
with integrating special requirement from business and engineer staff’s execution
l Used Impala/SQL for ETL the data from production, complied the R code into mapper and reducer function in Python
with each group id as mapper key, and set up the map-reduce data pipeline job on Oozie
l Build PPT for presentation and interactive dashboard from Shiny for collecting feedback of the algorithm to team and
quick response to additional need from business and engineer colleagues
Collaborative tasks from other projects
l Helped to write mapper and reducer function on word count for merchant classification based on transection records
l Build Hadoop cluster of 12 nodes including tuning critical parameters, and was responsible for the maintenance on
memory usage and related package installation under CentOS6.5 Linux environment
l Participated in the Cloudera training on Apache Spark
Tech-Sale Analytics Intern, Anheuser-Busch Inbev, Champaign, IL 5/2014 -5/2015
Customer profiling with social media data source and internal wholesaler database
l Learned the original algorithm and created advanced string cleaning step for Foursquare/Yelp database
l Improved the whole workflow by adding lat/lon information as well as innovative filters to increase mapping accuracy
l Wrote and integrated all steps in pipelined R code for future usage and cooperated with 3rd
party to build dashboard
l Overcame the challenges of handling Asian languages and popularized the customer profiling project to global market
l Mentored new interns for taking over the customer profiling work flow
Quantitative exploration experiment projects (Ad Hoc)
l Predicted volume for each POC through logistic models and learning algorithms, identified the influencing POCs
l Used regression tree to predict the number of pipe needed by the target volume for Belgium on-premise market
l Transfer the data analysis result to commercial insight and presented to business managers on a regular basis
OTHER RELATED PROJECT EXPERIENCES
Bad Auction Prediction for Old Car Purchase Applied machine learning, Champaign, IL 12/2014
l Preprocessed the data by checking for missing values and transforming features into appropriate form for modeling
l Used random forest to obtain the variable importance and performed feature selection
l Built gradient logistic regression model and tuning the threshold parameter for binary prediction
l Applied KNN algorithm and cluster based prediction algorithm for prediction and compared the results
Online advertisement click-through-rate prediction (large data set) Champaign, IL 12/2014
l Loaded in and manipulated the data of 12 GB for checking missing values and basic visualization
l Constructed logistic regression model and trained random forest algorithm to predict the click probability
l Applied frequent pattern mining for auxiliary prediction and gradient algorithm to combine the prediction results
Spatial Analysis on the Workload Distribution of Urbana Police, Consulting, Champaign, IL 5/2014
l Preprocessed about 50000 records of crimes in Urbana area from January 2011 to May 2013 using R and SAS
l Visualized the data by spatial-temporal bar plot to present the trend of the crime frequency
l Applied Kernel Density Estimation and Bernoulli likelihood estimation to detect global and local crime clusters
Time series analysis for the private final consumption of Australia, Champaign, IL 12/2013
l Collected the quarterly data with 127 records and took 121 for modeling, based on the plots of ACF and PACF after
1-step difference to built SARIMA model, conducted model selection process through AIC and BIC, and finally
choose SARIMA(1,1,1)*(2,1,2) as the prediction model which gave the smallest forecast error
l Discovered the seasonality of the raw data by plotting the smoothed periodogram through the spectral domain
approach using R
Modeling for the prediction of house price at Urbana-Champaign area, Champaign, IL 12/2013
l Cleaned the data from the original 3727 records in the raw dataset including checking and correcting record errors,
remove duplicated records and irrelevant records
l Extracted location, structure, size and age as factors which significantly influenced the price
l Built a linear regression model with R-square of 0.728 and a tree model with the most significant variables of house
size and number of bathroom using R
The correlation between the attributes of grapes and quality of wine, Tianjin, China 9/2012
l Classified grapes by the physical and chemical indexes with different quality level by K-means cluster method
l Established the canonical correlation model between the quality of grapes and wines
l Extracted the principle component of color index as significant factor in grading the quality of wine using R
HONORS, AWARDS, AND ACTIVITIES
Silver prize for the 2012 National Mathematical Modeling Contest, Tianjin, China
Attended the 2014 UseR Conference at University of California, Los Angeles, CA
Honored as the “Most Advanced Marketing Development” intern for the internship in AB-Inbev Budlab, Champaign,IL

More Related Content

What's hot

AnnuPriya_Resume_MS_InformationScience
AnnuPriya_Resume_MS_InformationScienceAnnuPriya_Resume_MS_InformationScience
AnnuPriya_Resume_MS_InformationScienceAnnu Priya
 
Sandeep_Analytics_Resume
Sandeep_Analytics_ResumeSandeep_Analytics_Resume
Sandeep_Analytics_ResumeSandeep K
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chuANH CHU
 
Shirley McDonough Resume_Data Analyst
Shirley McDonough Resume_Data AnalystShirley McDonough Resume_Data Analyst
Shirley McDonough Resume_Data AnalystShirley McDonough
 
Pratik Patel Python/ Big Data Analyst
Pratik Patel Python/ Big Data AnalystPratik Patel Python/ Big Data Analyst
Pratik Patel Python/ Big Data AnalystPratik Patel
 
Deblina Dey - Resume
Deblina Dey - ResumeDeblina Dey - Resume
Deblina Dey - Resumedeblina dey
 
Vishal Resume_Data Analyst
Vishal Resume_Data AnalystVishal Resume_Data Analyst
Vishal Resume_Data AnalystVishal Kumar
 
John Paredes Resume Sfnm
John Paredes Resume SfnmJohn Paredes Resume Sfnm
John Paredes Resume SfnmJohn Paredes
 
Resume anh chu data analyst
Resume anh chu data analystResume anh chu data analyst
Resume anh chu data analystANH CHU
 
Pratik Patel resume
Pratik Patel  resumePratik Patel  resume
Pratik Patel resumePratik Patel
 
Genet Tadesse Resume
Genet Tadesse ResumeGenet Tadesse Resume
Genet Tadesse ResumeGenet Tadesse
 
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...Vivek Mohan
 

What's hot (19)

AnnuPriya_Resume_MS_InformationScience
AnnuPriya_Resume_MS_InformationScienceAnnuPriya_Resume_MS_InformationScience
AnnuPriya_Resume_MS_InformationScience
 
Sandeep_Analytics_Resume
Sandeep_Analytics_ResumeSandeep_Analytics_Resume
Sandeep_Analytics_Resume
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chu
 
Shirley McDonough Resume_Data Analyst
Shirley McDonough Resume_Data AnalystShirley McDonough Resume_Data Analyst
Shirley McDonough Resume_Data Analyst
 
Resume
ResumeResume
Resume
 
Pratik Patel Python/ Big Data Analyst
Pratik Patel Python/ Big Data AnalystPratik Patel Python/ Big Data Analyst
Pratik Patel Python/ Big Data Analyst
 
Deblina Dey - Resume
Deblina Dey - ResumeDeblina Dey - Resume
Deblina Dey - Resume
 
Vishal Resume_Data Analyst
Vishal Resume_Data AnalystVishal Resume_Data Analyst
Vishal Resume_Data Analyst
 
John Paredes Resume Sfnm
John Paredes Resume SfnmJohn Paredes Resume Sfnm
John Paredes Resume Sfnm
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Resume
ResumeResume
Resume
 
Resume anh chu data analyst
Resume anh chu data analystResume anh chu data analyst
Resume anh chu data analyst
 
Pratik Patel resume
Pratik Patel  resumePratik Patel  resume
Pratik Patel resume
 
Genet Tadesse Resume
Genet Tadesse ResumeGenet Tadesse Resume
Genet Tadesse Resume
 
Kevin Resume
Kevin ResumeKevin Resume
Kevin Resume
 
Murali_Nagula_6
Murali_Nagula_6Murali_Nagula_6
Murali_Nagula_6
 
Jayant Shinde
Jayant ShindeJayant Shinde
Jayant Shinde
 
Resume
ResumeResume
Resume
 
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...
 

Similar to Wenzhe Xu (Evelyn) Resume for Data Science

Rahul_Bhatia_resume_new
Rahul_Bhatia_resume_newRahul_Bhatia_resume_new
Rahul_Bhatia_resume_newRahul Bhatia
 
Popular Industry Applications of R
Popular Industry Applications of RPopular Industry Applications of R
Popular Industry Applications of RTanya Cashorali
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Bo Chen Resume
Bo Chen ResumeBo Chen Resume
Bo Chen ResumeBo Chen
 
Premanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_punePremanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_punepremanand naik
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
SAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP Technology
 
Ron_Tharp_Resume_2016
Ron_Tharp_Resume_2016Ron_Tharp_Resume_2016
Ron_Tharp_Resume_2016Ron Tharp
 
CV-Grace-DataAnalytics-UCL
CV-Grace-DataAnalytics-UCLCV-Grace-DataAnalytics-UCL
CV-Grace-DataAnalytics-UCLHan Yang
 

Similar to Wenzhe Xu (Evelyn) Resume for Data Science (20)

Rahul_Bhatia_resume_new
Rahul_Bhatia_resume_newRahul_Bhatia_resume_new
Rahul_Bhatia_resume_new
 
Popular Industry Applications of R
Popular Industry Applications of RPopular Industry Applications of R
Popular Industry Applications of R
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
HANA Intro (KR)
HANA Intro (KR)HANA Intro (KR)
HANA Intro (KR)
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Vivek Adithya Mohankumar Resume
Vivek Adithya Mohankumar ResumeVivek Adithya Mohankumar Resume
Vivek Adithya Mohankumar Resume
 
GAINES_120102014
GAINES_120102014GAINES_120102014
GAINES_120102014
 
Piyush resume
Piyush resumePiyush resume
Piyush resume
 
Piyush resume
Piyush resumePiyush resume
Piyush resume
 
Navaneethan_(John)_Chandrapal_Resume
Navaneethan_(John)_Chandrapal_ResumeNavaneethan_(John)_Chandrapal_Resume
Navaneethan_(John)_Chandrapal_Resume
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Bo Chen Resume
Bo Chen ResumeBo Chen Resume
Bo Chen Resume
 
Newest mmis resume
Newest mmis  resumeNewest mmis  resume
Newest mmis resume
 
Premanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_punePremanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_pune
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
SAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis Library
 
Ron_Tharp_Resume_2016
Ron_Tharp_Resume_2016Ron_Tharp_Resume_2016
Ron_Tharp_Resume_2016
 
CV-Grace-DataAnalytics-UCL
CV-Grace-DataAnalytics-UCLCV-Grace-DataAnalytics-UCL
CV-Grace-DataAnalytics-UCL
 

Wenzhe Xu (Evelyn) Resume for Data Science

  • 1. Wenzhe(Evelyn) Xu 217-417-9270 | wxu23@illinois.edu | Address: 100 24 Ave #3 ,San Mateo, CA 94403 CORE QUALIFICATIONS Overall Highlight: Solid data science background with data engineer skill sets applied in industry Statistical Skill: l Data manipulation: ETL (extract-transform-loading) data technique with large dataset (Impala 1.x) l Modeling: machine learning (regression, classification, clustering), categorical data analysis, time series analysis, sampling, ANOVA analysis, A/B testing, dimension reduction, model selection Computer Skill: l Programming: expertise in R (4+yr,dplyr,ggplot2,shiny) , Python(1+yr, Scipy, numpy, pandas, scikit-learn), SQL(3+yr) l Data engineering: Apache Impala 1.x, Apache Hadoop 2.x, Cloudera CDH 5.x, Apache Spark 1.x l Software and system: SAS (Advanced Certified), SPSS, Looker, Linux (CentOS6.5), Microsoft Office, Map Reduce Communication and Presentation: l Meeting with business colleagues and transfer the commercial goal into data-driven objective l Quick response to the later-added requirement from business side l Presentation of the modeling result and data insight by PPT and building interactive dashboard through Shiny Project Management: Able to track each stage according to scheduled timeline for an independent project EDUCATION BACKGROUND University of Illinois at Urbana-Champaign 8/2013-5/2015 Master of Science in Statistics-Analytics | Major GPA:3.96/4.0 Tianjin University of Finance and Economics, Tianjin, China 9/2009-6/2013 Bachelor of Science in Statistics | Overall GPA: 3.75/4.0 | Major GPA:4.0/4.0 PROFESSIONAL EXPERIENCE Data Scientist Intern, MasterCard Inc. San Carlos, CA 6/2015 - 12/2015(expected) Independently designed the automatic anomalies detection system for business metrics time series data l Researched on anomaly detection algorithm and finally choose the S-H-ESD test from Twitter as basic logic l Read package source code of R and re-organized the logic and input to fit the business metrics data application scenario, with integrating special requirement from business and engineer staff’s execution l Used Impala/SQL for ETL the data from production, complied the R code into mapper and reducer function in Python with each group id as mapper key, and set up the map-reduce data pipeline job on Oozie l Build PPT for presentation and interactive dashboard from Shiny for collecting feedback of the algorithm to team and quick response to additional need from business and engineer colleagues Collaborative tasks from other projects l Helped to write mapper and reducer function on word count for merchant classification based on transection records l Build Hadoop cluster of 12 nodes including tuning critical parameters, and was responsible for the maintenance on memory usage and related package installation under CentOS6.5 Linux environment l Participated in the Cloudera training on Apache Spark Tech-Sale Analytics Intern, Anheuser-Busch Inbev, Champaign, IL 5/2014 -5/2015 Customer profiling with social media data source and internal wholesaler database l Learned the original algorithm and created advanced string cleaning step for Foursquare/Yelp database l Improved the whole workflow by adding lat/lon information as well as innovative filters to increase mapping accuracy l Wrote and integrated all steps in pipelined R code for future usage and cooperated with 3rd party to build dashboard l Overcame the challenges of handling Asian languages and popularized the customer profiling project to global market l Mentored new interns for taking over the customer profiling work flow Quantitative exploration experiment projects (Ad Hoc) l Predicted volume for each POC through logistic models and learning algorithms, identified the influencing POCs l Used regression tree to predict the number of pipe needed by the target volume for Belgium on-premise market l Transfer the data analysis result to commercial insight and presented to business managers on a regular basis
  • 2. OTHER RELATED PROJECT EXPERIENCES Bad Auction Prediction for Old Car Purchase Applied machine learning, Champaign, IL 12/2014 l Preprocessed the data by checking for missing values and transforming features into appropriate form for modeling l Used random forest to obtain the variable importance and performed feature selection l Built gradient logistic regression model and tuning the threshold parameter for binary prediction l Applied KNN algorithm and cluster based prediction algorithm for prediction and compared the results Online advertisement click-through-rate prediction (large data set) Champaign, IL 12/2014 l Loaded in and manipulated the data of 12 GB for checking missing values and basic visualization l Constructed logistic regression model and trained random forest algorithm to predict the click probability l Applied frequent pattern mining for auxiliary prediction and gradient algorithm to combine the prediction results Spatial Analysis on the Workload Distribution of Urbana Police, Consulting, Champaign, IL 5/2014 l Preprocessed about 50000 records of crimes in Urbana area from January 2011 to May 2013 using R and SAS l Visualized the data by spatial-temporal bar plot to present the trend of the crime frequency l Applied Kernel Density Estimation and Bernoulli likelihood estimation to detect global and local crime clusters Time series analysis for the private final consumption of Australia, Champaign, IL 12/2013 l Collected the quarterly data with 127 records and took 121 for modeling, based on the plots of ACF and PACF after 1-step difference to built SARIMA model, conducted model selection process through AIC and BIC, and finally choose SARIMA(1,1,1)*(2,1,2) as the prediction model which gave the smallest forecast error l Discovered the seasonality of the raw data by plotting the smoothed periodogram through the spectral domain approach using R Modeling for the prediction of house price at Urbana-Champaign area, Champaign, IL 12/2013 l Cleaned the data from the original 3727 records in the raw dataset including checking and correcting record errors, remove duplicated records and irrelevant records l Extracted location, structure, size and age as factors which significantly influenced the price l Built a linear regression model with R-square of 0.728 and a tree model with the most significant variables of house size and number of bathroom using R The correlation between the attributes of grapes and quality of wine, Tianjin, China 9/2012 l Classified grapes by the physical and chemical indexes with different quality level by K-means cluster method l Established the canonical correlation model between the quality of grapes and wines l Extracted the principle component of color index as significant factor in grading the quality of wine using R HONORS, AWARDS, AND ACTIVITIES Silver prize for the 2012 National Mathematical Modeling Contest, Tianjin, China Attended the 2014 UseR Conference at University of California, Los Angeles, CA Honored as the “Most Advanced Marketing Development” intern for the internship in AB-Inbev Budlab, Champaign,IL