SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
yhat
YC Application (S13)
What is your company going to make?
A deployment and hosting solution specifically for predictive models. Data
scientists deploy predictive models they build instantly using a Python client
and models are immediately accessible via a RESTful API. Data scientists get
a GUI/portal for administration, version control, and tracking of requests and
predictions.
This enables complex analytical models to be developed and deployed quickly
for rapid integration in production, thereby avoiding long and costly
throw-it-over-the-wall processes that makes the handoff from data scientists to
engineers painful.
For each founder, please list: YC username; name; age; year of graduation, school,
degree and subject for each degree; email address; personal url, github url, facebook id,
twitter id; employer and title (if any) at last job before this startup. Put unfinished degrees
in parens. List the main contact first. Separate founders with blank lines. Put an asterisk
before the name of anyone not able to move to the Bay Area.
glamp; Greg Lamp; 25; 2010, University of Virginia - BS Systems Engineering
& BA Financial Math; greg@yhathq.com; www.yaksis.com; github.com/glamp;
facebook.com/lamp.greg; @theglamp; Product Manager at On Deck Capital
hernamesbarbara; Austin Ogilvie; 26; 2010, University of Virginia - BA Foreign
Affairs, austinogilvie@gmail.com; www.hernamesbarbara.com;
github.com/hernamesbarbara; facebook.com/austinogilvie; @austinogilvie;
Product Manager at On Deck Capital; Analyst at EverFi
Please tell us in one or two sentences about the most impressive thing other than this
startup that each founder has built or achieved.
Greg - Walked on to the UVA baseball team (ranked #2 at the time).
Austin - brought On Deck's new web app to market in less than 2 months
working with 2 developers neither of which had prior experience with Rails. The
app sits at the center of the company's acquisition strategy and has
automatically deployed nearly a million dollars in small business loans within
the first 6 months.
Please tell us about the time you, glamp, most successfully hacked some
(non-computer) system to your advantage.
Greg found that math department was still counting systems engineering
credits as math credits. He took one extra math class and graduated with two
technical degrees.
Austin persuaded the Real Madrid ticket office to grant him access to the press
box under the pretense of being a student journalist.
Please tell us about an interesting project, preferably outside of class or work, that two or
more of you created together. Include urls if possible.
● Data Science Blog - blog.yhathq.com:
○ ~30,000 unique monthly visitors
○ Strong social media traction
○ Twitter Conversations: https://twitter.com/search?q=yhathq&src=typd
○ HN: https://news.ycombinator.com/item?id=5323448
○ HN: https://news.ycombinator.com/item?id=5204758
● pandasql:
○ An open-source Python package which lets you use SQL to query
pandas dataframes
○ Built in 1 weekend
○ Source code: https://github.com/yhat/pandasql
○ Blog Post: http://bit.ly/XF1FHF
○ We received some great feedback from twitter and got multiple emails
thanking us for building it
How long have the founders known one another and how did you meet? Have any of the
founders not met in person?
We met in school in 2007 and have been close friends since then. We currently
live and work together in NYC.
At On Deck, Greg turns analysis done in R and SAS into programs that can be called
from Java. Greg has a unique skillset combining math, stats, and machine learning with
the ability to build things pragmatically for the real-world. Austin was the product lead for
On Deck's API and built a Rails app on top of it which eventually became the company's
customer-facing app.
We built the first-ever online, self-serve business loan together. The biggest challenge
was integrating predictive models to detect risk and fraud. On Deck (and comScore)
developed highly predictive models only to shelf them b/c implementation was deemed
too difficult for engineering.
The problem is common. Our first user spent 2 months trying to use support vector
regression in a .NET app before switching to yhat. Anthony Goldblum (Kaggle CEO) said
clients don't know what to do with winning algorithms. The $1M Netflix prize winner was
never used due to engineering costs (http://bit.ly/11zx95b). Brad Gillespie (partner at IA
Ventures) was equally familiar with challenges with productizing predictive models.
Why did you pick this idea to work on? Do you have domain expertise in this area? How
do you know people need what you're making?
To get value from predictive models, companies need 4 things. (1) a process for
deploying models; (2) a way for data scientists to validate results in production;
(3) a system for maintaining/updating models; (4) tools for evaluating the
efficacy/value of a model.
Traditionally, once a model is built, it must then be re-coded for use in
production systems. The process of adapting predictive models for use in
production is typically complex, error-prone, and time-consuming. It's common
for companies to implement half-assed models in production because the time
necessary to recode the model for the new environment would be
unacceptable. There's no integrated solution for deploying, testing, and
maintaining predictive models.
yhat answers these needs by allowing data scientists to deploy predictive
models as-is, without waiting for developers to port or adapt their code for use
in production. With yhat, data scientists can build a model in the morning and
have it ready for integration by the afternoon. Consistency is guaranteed across
environments, and testing and cross validation is guaranteed. Updating models
requires zero downtime, so data scientists are free to make changes to their
models and retrain them on the fly.
What's new about what you're making? What substitutes do people resort to because it
doesn't exist yet (or they don't know about it)?
Who are your competitors, and who might become competitors? Who do you fear most?
We don't plan to play in the "drag and drop data science" arena, but these guys
are definitely on our radar -- "machine learning in a box" products with some
crossover: wise.io, precog, Google Prediction API, BigML.
"Enterprise Solution for hosting predictive models": Zementis. Zementis does
not offer any self-serve or quick-to-get-started product for data scientists.
Instead, companies must do POCs and sign contracts to test out Zementis.
Last but not least, their product relies on PMML, a markup language that has
little adoption despite having been around for 15 years.
Heroku, Google, or Amazon could build products for hosting scientific code.
Google seems focused on the Prediction API, so we'd expect them to focus on
that product and in the "ML in a Box" category. Amazon and Heroku would likely
extend their own product offerings which would require data scientists to write
their own web servers, a task most data scientists are unfamiliar with.
What do you understand about your business that other companies in it just don't get?
Data science is inherently exploratory, and the best models require hyper-specific domain knowledge.
Drag and drop data science is too generic to solve these problems which is why we're skeptical of
"machine learning in a box" products. Not to mention that data scientists have a preferred set of tools
(Python/R) which don't include wise.io or Big ML.
The problem is that most models built by data scientists wind up as unintentionally academic projects
that never make it into production. Companies that manage to deploy something are doing so by hand
using painful, poorly documented and non-reusable processes known only to a few people. yhat
makes it super easy for data scientists to produce tangible products for normal people (e.g. web apps,
CRM systems, and Excel workbooks).
Lastly, there's a big opportunity in small and medium data that others overlook. Few companies deal
with tera- and petabytes of data. On LinkedIn there are 75k Hadoop users compared with 6MM Excel,
250k R, and 400k Python users.
How do or will you make money? How much could you make? (We realize you can't
know precisely, but give your best estimate.)
We're testing a freemium model similar to Heroku: you can host 1 model for
free but pay for subsequent models or models larger than 50MB. We're partial
to this b/c it's cheaper and easier to bootstrap than enterprise sales and quicker
for customers to get up and running.
Another approach we've talked about is the 10gen-style model where we open
source the software and charge for enterprise support. Both models allow for
add-on products similar to Heroku's add-ons or Salesforce.com's
AppExchange.
We think we could charge bigger customers around $15k-$30k per year for the
base product, but we haven't spent much time thinking about pricing add-ons
yet.
Ultimately, yhat could become the primary hosted predictive analytics platform
and therefore the de facto "arbiter of predictive analytic insights."
If you've already started working on it, how long have you been working and
how many lines of code (if applicable) have you written?
We've spent two months working on this project. We've written between
1000 and 1200 lines of code including the prototype, website, and blog.
Do any founders have other commitments between June and August 2013
inclusive?
We will pursue yhat regardless of our participation in Y Combinator and
have no other commitments.
How far along are you? Do you have a beta yet? If not, when will you? Are
you launched? If so, how many users do you have? Do you have revenue? If
so, how much? If you're launched, what is your monthly growth rate (in users
or revenue or both)?
140+ users in the beta. We get 30,000 unique visitors per month. We
were selected to be a part of PyCon Silicon Valley's "startup row" in
March 2013.
Our monthly growth rate is 100% (70 users in Feb, 70+ and counting in
March).
Our primary acquisition channels are our blog, social media, and open
source. Our blog has generated a lot of interest and we typically get
~100 new leads with every post.
We're active in the Python community and have an open source project
that has generated leads. We plan to expand our open source
presence as well as our presence at Meetups, Skillshares, and
conferences similar to PyCon/PyData.
How will you get users? If your idea is the type that faces a chicken-and-egg
problem in the sense that it won't be attractive to users till it has a lot of
users (e.g. a marketplace, a dating site, an ad network), how will you
overcome that?
If you had any other ideas you considered applying with, please list them.
One may be something we've been waiting for. Often when we fund people
it's to do something they list here and not in the main application.
We are thinking about yhat 24/7. That said...
A product for queueing, ranking, and prioritizing leads in an efficient and
elegant manner. iTunes lets users drag and drop to create playlists; this
product would give users drag-and-drop lead management and sales
contests.
One cool feature would be a way for sales managers to gamify sales
initiatives via tactical periods of "Surge Commissions." Managers would
create temporary sales contests and tactical commission structures.
Sales agents would make bonuses by hitting contest goals within a
time limit. Agents compete as teams or individually and managers and
agents would be able to track commissions/goals on a real-time contest
leaderboard.
Please tell us something surprising or amusing that one of you has
discovered. (The answer need not be related to your project.)
In 1952, a London double-decker bus driver had to jump an opening
draw bridge (Tower Bridge). http://i.imgur.com/rFjHY.jpg

Más contenido relacionado

Similar a 2013 - Yhat - YC app.pdf

Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise
George Bara
 
G3May15-digital-Big Data
G3May15-digital-Big DataG3May15-digital-Big Data
G3May15-digital-Big Data
Jerry Bowskill
 
Un Microsystem Company Analysis Essay
Un Microsystem Company Analysis EssayUn Microsystem Company Analysis Essay
Un Microsystem Company Analysis Essay
Rikki Wright
 

Similar a 2013 - Yhat - YC app.pdf (20)

Artificial Intelligence: Competitive Edge for Business Solutions & Applications
Artificial Intelligence: Competitive Edge for Business Solutions & ApplicationsArtificial Intelligence: Competitive Edge for Business Solutions & Applications
Artificial Intelligence: Competitive Edge for Business Solutions & Applications
 
Ai trend report
Ai trend reportAi trend report
Ai trend report
 
Big data analytics use cases: all you need to know
Big data analytics use cases:  all you need to knowBig data analytics use cases:  all you need to know
Big data analytics use cases: all you need to know
 
TheProve.pptx
TheProve.pptxTheProve.pptx
TheProve.pptx
 
AI in Business - Key drivers and future value
AI in Business - Key drivers and future valueAI in Business - Key drivers and future value
AI in Business - Key drivers and future value
 
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaper
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alpha
 
Data Analytics - The Insight
Data Analytics - The InsightData Analytics - The Insight
Data Analytics - The Insight
 
Smart Data Module 6 d drive the future
Smart Data Module 6 d drive the futureSmart Data Module 6 d drive the future
Smart Data Module 6 d drive the future
 
The-CxO-Guide-to.pdf
The-CxO-Guide-to.pdfThe-CxO-Guide-to.pdf
The-CxO-Guide-to.pdf
 
Artificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
Artificial Intelligence in Real Estate - 3 Ways AI can Drive SavingsArtificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
Artificial Intelligence in Real Estate - 3 Ways AI can Drive Savings
 
Challenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the EnterpriseChallenges in AI LLMs adoption in the Enterprise
Challenges in AI LLMs adoption in the Enterprise
 
G3May15-digital-Big Data
G3May15-digital-Big DataG3May15-digital-Big Data
G3May15-digital-Big Data
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Top .NET development companies to outsource
Top .NET development companies to outsourceTop .NET development companies to outsource
Top .NET development companies to outsource
 
Un Microsystem Company Analysis Essay
Un Microsystem Company Analysis EssayUn Microsystem Company Analysis Essay
Un Microsystem Company Analysis Essay
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
 

Más de Austin Ogilvie

Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)
Austin Ogilvie
 

Más de Austin Ogilvie (12)

2013 05-27-yhat-about
2013 05-27-yhat-about2013 05-27-yhat-about
2013 05-27-yhat-about
 
Yhat 2017 Investor Deck
Yhat 2017 Investor DeckYhat 2017 Investor Deck
Yhat 2017 Investor Deck
 
Finding Lanes for Self-Driving Cars - PyData Berlin Jul 2017- Ross Kippenbroc...
Finding Lanes for Self-Driving Cars - PyData Berlin Jul 2017- Ross Kippenbroc...Finding Lanes for Self-Driving Cars - PyData Berlin Jul 2017- Ross Kippenbroc...
Finding Lanes for Self-Driving Cars - PyData Berlin Jul 2017- Ross Kippenbroc...
 
Electron - Build desktop apps using javascript
Electron - Build desktop apps using javascriptElectron - Build desktop apps using javascript
Electron - Build desktop apps using javascript
 
Yhat - Applied Data Science - Feb 2016
Yhat - Applied Data Science - Feb 2016Yhat - Applied Data Science - Feb 2016
Yhat - Applied Data Science - Feb 2016
 
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
 
Analyzing mlb data with ggplot
Analyzing mlb data with ggplotAnalyzing mlb data with ggplot
Analyzing mlb data with ggplot
 
ggplot for python
ggplot for pythonggplot for python
ggplot for python
 
Applied Data Science with Yhat
Applied Data Science with YhatApplied Data Science with Yhat
Applied Data Science with Yhat
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)
 
Predictive Models for Production Apps with Yhat
Predictive Models for Production Apps with YhatPredictive Models for Production Apps with Yhat
Predictive Models for Production Apps with Yhat
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Último (20)

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 

2013 - Yhat - YC app.pdf

  • 2. What is your company going to make? A deployment and hosting solution specifically for predictive models. Data scientists deploy predictive models they build instantly using a Python client and models are immediately accessible via a RESTful API. Data scientists get a GUI/portal for administration, version control, and tracking of requests and predictions. This enables complex analytical models to be developed and deployed quickly for rapid integration in production, thereby avoiding long and costly throw-it-over-the-wall processes that makes the handoff from data scientists to engineers painful.
  • 3. For each founder, please list: YC username; name; age; year of graduation, school, degree and subject for each degree; email address; personal url, github url, facebook id, twitter id; employer and title (if any) at last job before this startup. Put unfinished degrees in parens. List the main contact first. Separate founders with blank lines. Put an asterisk before the name of anyone not able to move to the Bay Area. glamp; Greg Lamp; 25; 2010, University of Virginia - BS Systems Engineering & BA Financial Math; greg@yhathq.com; www.yaksis.com; github.com/glamp; facebook.com/lamp.greg; @theglamp; Product Manager at On Deck Capital hernamesbarbara; Austin Ogilvie; 26; 2010, University of Virginia - BA Foreign Affairs, austinogilvie@gmail.com; www.hernamesbarbara.com; github.com/hernamesbarbara; facebook.com/austinogilvie; @austinogilvie; Product Manager at On Deck Capital; Analyst at EverFi
  • 4. Please tell us in one or two sentences about the most impressive thing other than this startup that each founder has built or achieved. Greg - Walked on to the UVA baseball team (ranked #2 at the time). Austin - brought On Deck's new web app to market in less than 2 months working with 2 developers neither of which had prior experience with Rails. The app sits at the center of the company's acquisition strategy and has automatically deployed nearly a million dollars in small business loans within the first 6 months.
  • 5. Please tell us about the time you, glamp, most successfully hacked some (non-computer) system to your advantage. Greg found that math department was still counting systems engineering credits as math credits. He took one extra math class and graduated with two technical degrees. Austin persuaded the Real Madrid ticket office to grant him access to the press box under the pretense of being a student journalist.
  • 6. Please tell us about an interesting project, preferably outside of class or work, that two or more of you created together. Include urls if possible. ● Data Science Blog - blog.yhathq.com: ○ ~30,000 unique monthly visitors ○ Strong social media traction ○ Twitter Conversations: https://twitter.com/search?q=yhathq&src=typd ○ HN: https://news.ycombinator.com/item?id=5323448 ○ HN: https://news.ycombinator.com/item?id=5204758 ● pandasql: ○ An open-source Python package which lets you use SQL to query pandas dataframes ○ Built in 1 weekend ○ Source code: https://github.com/yhat/pandasql ○ Blog Post: http://bit.ly/XF1FHF ○ We received some great feedback from twitter and got multiple emails thanking us for building it
  • 7. How long have the founders known one another and how did you meet? Have any of the founders not met in person? We met in school in 2007 and have been close friends since then. We currently live and work together in NYC.
  • 8. At On Deck, Greg turns analysis done in R and SAS into programs that can be called from Java. Greg has a unique skillset combining math, stats, and machine learning with the ability to build things pragmatically for the real-world. Austin was the product lead for On Deck's API and built a Rails app on top of it which eventually became the company's customer-facing app. We built the first-ever online, self-serve business loan together. The biggest challenge was integrating predictive models to detect risk and fraud. On Deck (and comScore) developed highly predictive models only to shelf them b/c implementation was deemed too difficult for engineering. The problem is common. Our first user spent 2 months trying to use support vector regression in a .NET app before switching to yhat. Anthony Goldblum (Kaggle CEO) said clients don't know what to do with winning algorithms. The $1M Netflix prize winner was never used due to engineering costs (http://bit.ly/11zx95b). Brad Gillespie (partner at IA Ventures) was equally familiar with challenges with productizing predictive models. Why did you pick this idea to work on? Do you have domain expertise in this area? How do you know people need what you're making?
  • 9. To get value from predictive models, companies need 4 things. (1) a process for deploying models; (2) a way for data scientists to validate results in production; (3) a system for maintaining/updating models; (4) tools for evaluating the efficacy/value of a model. Traditionally, once a model is built, it must then be re-coded for use in production systems. The process of adapting predictive models for use in production is typically complex, error-prone, and time-consuming. It's common for companies to implement half-assed models in production because the time necessary to recode the model for the new environment would be unacceptable. There's no integrated solution for deploying, testing, and maintaining predictive models. yhat answers these needs by allowing data scientists to deploy predictive models as-is, without waiting for developers to port or adapt their code for use in production. With yhat, data scientists can build a model in the morning and have it ready for integration by the afternoon. Consistency is guaranteed across environments, and testing and cross validation is guaranteed. Updating models requires zero downtime, so data scientists are free to make changes to their models and retrain them on the fly. What's new about what you're making? What substitutes do people resort to because it doesn't exist yet (or they don't know about it)?
  • 10. Who are your competitors, and who might become competitors? Who do you fear most? We don't plan to play in the "drag and drop data science" arena, but these guys are definitely on our radar -- "machine learning in a box" products with some crossover: wise.io, precog, Google Prediction API, BigML. "Enterprise Solution for hosting predictive models": Zementis. Zementis does not offer any self-serve or quick-to-get-started product for data scientists. Instead, companies must do POCs and sign contracts to test out Zementis. Last but not least, their product relies on PMML, a markup language that has little adoption despite having been around for 15 years. Heroku, Google, or Amazon could build products for hosting scientific code. Google seems focused on the Prediction API, so we'd expect them to focus on that product and in the "ML in a Box" category. Amazon and Heroku would likely extend their own product offerings which would require data scientists to write their own web servers, a task most data scientists are unfamiliar with.
  • 11. What do you understand about your business that other companies in it just don't get? Data science is inherently exploratory, and the best models require hyper-specific domain knowledge. Drag and drop data science is too generic to solve these problems which is why we're skeptical of "machine learning in a box" products. Not to mention that data scientists have a preferred set of tools (Python/R) which don't include wise.io or Big ML. The problem is that most models built by data scientists wind up as unintentionally academic projects that never make it into production. Companies that manage to deploy something are doing so by hand using painful, poorly documented and non-reusable processes known only to a few people. yhat makes it super easy for data scientists to produce tangible products for normal people (e.g. web apps, CRM systems, and Excel workbooks). Lastly, there's a big opportunity in small and medium data that others overlook. Few companies deal with tera- and petabytes of data. On LinkedIn there are 75k Hadoop users compared with 6MM Excel, 250k R, and 400k Python users.
  • 12. How do or will you make money? How much could you make? (We realize you can't know precisely, but give your best estimate.) We're testing a freemium model similar to Heroku: you can host 1 model for free but pay for subsequent models or models larger than 50MB. We're partial to this b/c it's cheaper and easier to bootstrap than enterprise sales and quicker for customers to get up and running. Another approach we've talked about is the 10gen-style model where we open source the software and charge for enterprise support. Both models allow for add-on products similar to Heroku's add-ons or Salesforce.com's AppExchange. We think we could charge bigger customers around $15k-$30k per year for the base product, but we haven't spent much time thinking about pricing add-ons yet. Ultimately, yhat could become the primary hosted predictive analytics platform and therefore the de facto "arbiter of predictive analytic insights."
  • 13. If you've already started working on it, how long have you been working and how many lines of code (if applicable) have you written? We've spent two months working on this project. We've written between 1000 and 1200 lines of code including the prototype, website, and blog.
  • 14. Do any founders have other commitments between June and August 2013 inclusive? We will pursue yhat regardless of our participation in Y Combinator and have no other commitments.
  • 15. How far along are you? Do you have a beta yet? If not, when will you? Are you launched? If so, how many users do you have? Do you have revenue? If so, how much? If you're launched, what is your monthly growth rate (in users or revenue or both)? 140+ users in the beta. We get 30,000 unique visitors per month. We were selected to be a part of PyCon Silicon Valley's "startup row" in March 2013. Our monthly growth rate is 100% (70 users in Feb, 70+ and counting in March).
  • 16. Our primary acquisition channels are our blog, social media, and open source. Our blog has generated a lot of interest and we typically get ~100 new leads with every post. We're active in the Python community and have an open source project that has generated leads. We plan to expand our open source presence as well as our presence at Meetups, Skillshares, and conferences similar to PyCon/PyData. How will you get users? If your idea is the type that faces a chicken-and-egg problem in the sense that it won't be attractive to users till it has a lot of users (e.g. a marketplace, a dating site, an ad network), how will you overcome that?
  • 17. If you had any other ideas you considered applying with, please list them. One may be something we've been waiting for. Often when we fund people it's to do something they list here and not in the main application. We are thinking about yhat 24/7. That said... A product for queueing, ranking, and prioritizing leads in an efficient and elegant manner. iTunes lets users drag and drop to create playlists; this product would give users drag-and-drop lead management and sales contests. One cool feature would be a way for sales managers to gamify sales initiatives via tactical periods of "Surge Commissions." Managers would create temporary sales contests and tactical commission structures. Sales agents would make bonuses by hitting contest goals within a time limit. Agents compete as teams or individually and managers and agents would be able to track commissions/goals on a real-time contest leaderboard.
  • 18. Please tell us something surprising or amusing that one of you has discovered. (The answer need not be related to your project.) In 1952, a London double-decker bus driver had to jump an opening draw bridge (Tower Bridge). http://i.imgur.com/rFjHY.jpg