2. What is your company going to make?
A deployment and hosting solution specifically for predictive models. Data
scientists deploy predictive models they build instantly using a Python client
and models are immediately accessible via a RESTful API. Data scientists get
a GUI/portal for administration, version control, and tracking of requests and
predictions.
This enables complex analytical models to be developed and deployed quickly
for rapid integration in production, thereby avoiding long and costly
throw-it-over-the-wall processes that makes the handoff from data scientists to
engineers painful.
3. For each founder, please list: YC username; name; age; year of graduation, school,
degree and subject for each degree; email address; personal url, github url, facebook id,
twitter id; employer and title (if any) at last job before this startup. Put unfinished degrees
in parens. List the main contact first. Separate founders with blank lines. Put an asterisk
before the name of anyone not able to move to the Bay Area.
glamp; Greg Lamp; 25; 2010, University of Virginia - BS Systems Engineering
& BA Financial Math; greg@yhathq.com; www.yaksis.com; github.com/glamp;
facebook.com/lamp.greg; @theglamp; Product Manager at On Deck Capital
hernamesbarbara; Austin Ogilvie; 26; 2010, University of Virginia - BA Foreign
Affairs, austinogilvie@gmail.com; www.hernamesbarbara.com;
github.com/hernamesbarbara; facebook.com/austinogilvie; @austinogilvie;
Product Manager at On Deck Capital; Analyst at EverFi
4. Please tell us in one or two sentences about the most impressive thing other than this
startup that each founder has built or achieved.
Greg - Walked on to the UVA baseball team (ranked #2 at the time).
Austin - brought On Deck's new web app to market in less than 2 months
working with 2 developers neither of which had prior experience with Rails. The
app sits at the center of the company's acquisition strategy and has
automatically deployed nearly a million dollars in small business loans within
the first 6 months.
5. Please tell us about the time you, glamp, most successfully hacked some
(non-computer) system to your advantage.
Greg found that math department was still counting systems engineering
credits as math credits. He took one extra math class and graduated with two
technical degrees.
Austin persuaded the Real Madrid ticket office to grant him access to the press
box under the pretense of being a student journalist.
6. Please tell us about an interesting project, preferably outside of class or work, that two or
more of you created together. Include urls if possible.
● Data Science Blog - blog.yhathq.com:
○ ~30,000 unique monthly visitors
○ Strong social media traction
○ Twitter Conversations: https://twitter.com/search?q=yhathq&src=typd
○ HN: https://news.ycombinator.com/item?id=5323448
○ HN: https://news.ycombinator.com/item?id=5204758
● pandasql:
○ An open-source Python package which lets you use SQL to query
pandas dataframes
○ Built in 1 weekend
○ Source code: https://github.com/yhat/pandasql
○ Blog Post: http://bit.ly/XF1FHF
○ We received some great feedback from twitter and got multiple emails
thanking us for building it
7. How long have the founders known one another and how did you meet? Have any of the
founders not met in person?
We met in school in 2007 and have been close friends since then. We currently
live and work together in NYC.
8. At On Deck, Greg turns analysis done in R and SAS into programs that can be called
from Java. Greg has a unique skillset combining math, stats, and machine learning with
the ability to build things pragmatically for the real-world. Austin was the product lead for
On Deck's API and built a Rails app on top of it which eventually became the company's
customer-facing app.
We built the first-ever online, self-serve business loan together. The biggest challenge
was integrating predictive models to detect risk and fraud. On Deck (and comScore)
developed highly predictive models only to shelf them b/c implementation was deemed
too difficult for engineering.
The problem is common. Our first user spent 2 months trying to use support vector
regression in a .NET app before switching to yhat. Anthony Goldblum (Kaggle CEO) said
clients don't know what to do with winning algorithms. The $1M Netflix prize winner was
never used due to engineering costs (http://bit.ly/11zx95b). Brad Gillespie (partner at IA
Ventures) was equally familiar with challenges with productizing predictive models.
Why did you pick this idea to work on? Do you have domain expertise in this area? How
do you know people need what you're making?
9. To get value from predictive models, companies need 4 things. (1) a process for
deploying models; (2) a way for data scientists to validate results in production;
(3) a system for maintaining/updating models; (4) tools for evaluating the
efficacy/value of a model.
Traditionally, once a model is built, it must then be re-coded for use in
production systems. The process of adapting predictive models for use in
production is typically complex, error-prone, and time-consuming. It's common
for companies to implement half-assed models in production because the time
necessary to recode the model for the new environment would be
unacceptable. There's no integrated solution for deploying, testing, and
maintaining predictive models.
yhat answers these needs by allowing data scientists to deploy predictive
models as-is, without waiting for developers to port or adapt their code for use
in production. With yhat, data scientists can build a model in the morning and
have it ready for integration by the afternoon. Consistency is guaranteed across
environments, and testing and cross validation is guaranteed. Updating models
requires zero downtime, so data scientists are free to make changes to their
models and retrain them on the fly.
What's new about what you're making? What substitutes do people resort to because it
doesn't exist yet (or they don't know about it)?
10. Who are your competitors, and who might become competitors? Who do you fear most?
We don't plan to play in the "drag and drop data science" arena, but these guys
are definitely on our radar -- "machine learning in a box" products with some
crossover: wise.io, precog, Google Prediction API, BigML.
"Enterprise Solution for hosting predictive models": Zementis. Zementis does
not offer any self-serve or quick-to-get-started product for data scientists.
Instead, companies must do POCs and sign contracts to test out Zementis.
Last but not least, their product relies on PMML, a markup language that has
little adoption despite having been around for 15 years.
Heroku, Google, or Amazon could build products for hosting scientific code.
Google seems focused on the Prediction API, so we'd expect them to focus on
that product and in the "ML in a Box" category. Amazon and Heroku would likely
extend their own product offerings which would require data scientists to write
their own web servers, a task most data scientists are unfamiliar with.
11. What do you understand about your business that other companies in it just don't get?
Data science is inherently exploratory, and the best models require hyper-specific domain knowledge.
Drag and drop data science is too generic to solve these problems which is why we're skeptical of
"machine learning in a box" products. Not to mention that data scientists have a preferred set of tools
(Python/R) which don't include wise.io or Big ML.
The problem is that most models built by data scientists wind up as unintentionally academic projects
that never make it into production. Companies that manage to deploy something are doing so by hand
using painful, poorly documented and non-reusable processes known only to a few people. yhat
makes it super easy for data scientists to produce tangible products for normal people (e.g. web apps,
CRM systems, and Excel workbooks).
Lastly, there's a big opportunity in small and medium data that others overlook. Few companies deal
with tera- and petabytes of data. On LinkedIn there are 75k Hadoop users compared with 6MM Excel,
250k R, and 400k Python users.
12. How do or will you make money? How much could you make? (We realize you can't
know precisely, but give your best estimate.)
We're testing a freemium model similar to Heroku: you can host 1 model for
free but pay for subsequent models or models larger than 50MB. We're partial
to this b/c it's cheaper and easier to bootstrap than enterprise sales and quicker
for customers to get up and running.
Another approach we've talked about is the 10gen-style model where we open
source the software and charge for enterprise support. Both models allow for
add-on products similar to Heroku's add-ons or Salesforce.com's
AppExchange.
We think we could charge bigger customers around $15k-$30k per year for the
base product, but we haven't spent much time thinking about pricing add-ons
yet.
Ultimately, yhat could become the primary hosted predictive analytics platform
and therefore the de facto "arbiter of predictive analytic insights."
13. If you've already started working on it, how long have you been working and
how many lines of code (if applicable) have you written?
We've spent two months working on this project. We've written between
1000 and 1200 lines of code including the prototype, website, and blog.
14. Do any founders have other commitments between June and August 2013
inclusive?
We will pursue yhat regardless of our participation in Y Combinator and
have no other commitments.
15. How far along are you? Do you have a beta yet? If not, when will you? Are
you launched? If so, how many users do you have? Do you have revenue? If
so, how much? If you're launched, what is your monthly growth rate (in users
or revenue or both)?
140+ users in the beta. We get 30,000 unique visitors per month. We
were selected to be a part of PyCon Silicon Valley's "startup row" in
March 2013.
Our monthly growth rate is 100% (70 users in Feb, 70+ and counting in
March).
16. Our primary acquisition channels are our blog, social media, and open
source. Our blog has generated a lot of interest and we typically get
~100 new leads with every post.
We're active in the Python community and have an open source project
that has generated leads. We plan to expand our open source
presence as well as our presence at Meetups, Skillshares, and
conferences similar to PyCon/PyData.
How will you get users? If your idea is the type that faces a chicken-and-egg
problem in the sense that it won't be attractive to users till it has a lot of
users (e.g. a marketplace, a dating site, an ad network), how will you
overcome that?
17. If you had any other ideas you considered applying with, please list them.
One may be something we've been waiting for. Often when we fund people
it's to do something they list here and not in the main application.
We are thinking about yhat 24/7. That said...
A product for queueing, ranking, and prioritizing leads in an efficient and
elegant manner. iTunes lets users drag and drop to create playlists; this
product would give users drag-and-drop lead management and sales
contests.
One cool feature would be a way for sales managers to gamify sales
initiatives via tactical periods of "Surge Commissions." Managers would
create temporary sales contests and tactical commission structures.
Sales agents would make bonuses by hitting contest goals within a
time limit. Agents compete as teams or individually and managers and
agents would be able to track commissions/goals on a real-time contest
leaderboard.
18. Please tell us something surprising or amusing that one of you has
discovered. (The answer need not be related to your project.)
In 1952, a London double-decker bus driver had to jump an opening
draw bridge (Tower Bridge). http://i.imgur.com/rFjHY.jpg