Personal Information
Organización/Lugar de trabajo
San Francisco Bay Area United States
Ocupación
Senior Research Engineer at Netflix
Sector
Education
Sitio web
www.dbtsai.com
Acerca de
Big Data Machine Learning Engineer with strong computer science, theoretical physics and mathematical background. I've deep understanding of implementing data mining algorithms in a scalable ways, not just using them as consumers.
I'm a big fan of Scala, and have been using it to develop scalable and distributed data mining algorithms with Apache Spark. I've involved with open source Apache Spark development as a contributor. Apache Spark is a fast and general engine for large-scale data processing, and it fits into the Hadoop open-source ecosystem.
Specialties:
• Machine Learning and Data Mining.
• Distributed/Parallel Computing and Big Data Processing.
• Expert in Apache Hadoop
Etiquetas
machine learning
spark
mapreduce
hadoop
mllib
alpine data labs
big data
logistic regression
netflix
data mining
apache spark
multinomial
l-bfgs
recommendation
pipeline
kernel methods
linear models
polynomial mapping
feature engineering
linear regression
ml
spark summit
elastic-net
batch layer
serving layer
speed layer
spark streaming
pig
lambda architecture
real time
storm
stream
large scale
iot
internet of things
svd
k-means
unsupervised learning
Ver más
Presentaciones
(9)Recomendaciones
(4)Distributed Time Travel for Feature Generation at Netflix
sfbiganalytics
•
Hace 7 años
Introducing Windowing Functions (pgCon 2009)
PostgreSQL Experts, Inc.
•
Hace 10 años
Multinomial Logistic Regression with Apache Spark
DB Tsai
•
Hace 9 años
Personal Information
Organización/Lugar de trabajo
San Francisco Bay Area United States
Ocupación
Senior Research Engineer at Netflix
Sector
Education
Sitio web
www.dbtsai.com
Acerca de
Big Data Machine Learning Engineer with strong computer science, theoretical physics and mathematical background. I've deep understanding of implementing data mining algorithms in a scalable ways, not just using them as consumers.
I'm a big fan of Scala, and have been using it to develop scalable and distributed data mining algorithms with Apache Spark. I've involved with open source Apache Spark development as a contributor. Apache Spark is a fast and general engine for large-scale data processing, and it fits into the Hadoop open-source ecosystem.
Specialties:
• Machine Learning and Data Mining.
• Distributed/Parallel Computing and Big Data Processing.
• Expert in Apache Hadoop
Etiquetas
machine learning
spark
mapreduce
hadoop
mllib
alpine data labs
big data
logistic regression
netflix
data mining
apache spark
multinomial
l-bfgs
recommendation
pipeline
kernel methods
linear models
polynomial mapping
feature engineering
linear regression
ml
spark summit
elastic-net
batch layer
serving layer
speed layer
spark streaming
pig
lambda architecture
real time
storm
stream
large scale
iot
internet of things
svd
k-means
unsupervised learning
Ver más