Daily Deals: Prediction, Social Diffusion, and Reputational Ramifications
September 7th, 2011
John W. Byers
Computer Science Dept.
Boston University
Michael Mitzenmacher
School of Eng. Appl. Sci.
Harvard University
Georgios Zervas
Computer Science Dept.
Boston University
1. Daily Deals: Prediction, Social Diffusion, and
Reputational Ramifications
∗ † ∗
John W. Byers Michael Mitzenmacher Georgios Zervas
Computer Science Dept. School of Eng. Appl. Sci. Computer Science Dept.
Boston University Harvard University Boston University
ABSTRACT as does LivingSocial. Deals each have a minimum threshold
Daily deal sites have become the latest Internet sensation, size that must be reached for the deal to take hold, and
providing discounted offers to customers for restaurants, tick- sellers may also set a maximum threshold size to limit the
eted events, services, and other items. We begin by under- number of coupons sold.
taking a study of the economics of daily deals on the web, Daily deal sites represent a change from recent Internet
advertising trends. While large-scale e-mail distributions
arXiv:1109.1530v1 [cs.SI] 7 Sep 2011
based on a dataset we compiled by monitoring Groupon and
LivingSocial sales in 20 large cities over several months. We for sale offers are commonplace (generally in the form of
use this dataset to characterize deal purchases; glean insights spam) and coupon sites have long existed on the Internet,
about operational strategies of these firms; and evaluate cus- Groupon and LivingSocial have achieved notable success
tomers’ sensitivity to factors such as price, deal schedul- with their emphasis on higher quality localized deals, as well
ing, and limited inventory. We then marry our daily deals as their marketing savvy both with respect to buyers and
dataset with additional datasets we compiled from Facebook sellers (merchants). This paper represents an attempt to
and Yelp users to study the interplay between social net- gain insight into the success of this business model, using a
works and daily deal sites. First, by studying user activity combination of data analysis and modeling.
on Facebook while a deal is running, we provide evidence The contributions of the paper are as follows:
that daily deal sites benefit from significant word-of-mouth • We compile and analyze datasets we gathered moni-
effects during sales events, consistent with results predicted toring Groupon over a period of six months and Liv-
by cascade models. Second, we consider the effects of daily ingSocial over a period of three months in 20 large US
deals on the longer-term reputation of merchants, based on markets. Our datasets will be made publicly available
their Yelp reviews before and after they run a daily deal. (on publication of this paper).
Our analysis shows that while the number of reviews in-
• We consider how the price elasticity of demand, as well
creases significantly due to daily deals, average rating scores
as what we call “soft incentives”, affect the size and
from reviewers who mention daily deals are 10% lower than
revenue of Groupon and LivingSocial deals empirically.
scores of their peers on average.
Soft incentives include deal aspects other than price,
such as whether a deal is featured and what days of
1. INTRODUCTION the week it is available.
Groupon and LivingSocial are websites offering various • We study the predictability of the size of Groupon
deals-of-the-day, with localized deals for major geographic deals, based on deal parameters and on temporal progress.
markets. Groupon in particular has been one of the fastest We show that deal sizes can be predicted with moder-
growing Internet sales businesses in history, with tens of ate accuracy based on a small number of parameters,
millions of registered users and 2011 sales expected to exceed and with substantially better accuracy shortly after a
1 billion dollars. deal goes live.
We briefly describe how daily deal sites work; additional
• We examine dependencies between the spread of Groupon
details relevant to our measurement methodology will be
deals and social networks by cross-referencing our Groupon
given subsequently. In each geographic market, or city, there
dataset with Facebook data tracking the frequency
are one or more deals of the day. Generally, one deal in
with which users “like” Groupon deals. We offer evi-
each market is the featured deal of the day, and receives
dence that propagation of Groupon deals is consistent
the prominent position on the primary webpage targeting
with predictions of social spreading made by cascade
that market. The deal provides a coupon for some product
models.
or service at a substantial discount (generally 40-60%) to
the list price. Deals may be available for one or more days. • We examine the change in reputation of merchants
We use the term size of a deal to represent the number of based on their Yelp reviews before and after they run
coupons sold, and the term revenue of a deal to represent a Groupon deal. We find that reviewers mentioning
the number of coupons multiplied by the price per coupon. daily deals are significantly more negative than their
Groupon retains approximately half the revenue from the peers on average, and the volume of their reviews ma-
discounted coupons [10], and provides the rest to the seller, terially lowers Yelp scores in the months after a daily
∗E-mail: {byers, zg}@cs.bu.edu. Supported in part by Ad- deal offering.
verplex, Inc. and by NSF grant CNS-1040800. We note that we presented preliminary findings based on
†E-mail: michaelm@eecs.harvard.edu. Supported in part by a single month of Groupon data that focused predominantly
NSF grants CCF-0915922 and CNS-0721491, and in part by on the issue of soft incentives in a technical report [3]. The
a research grant from Yahoo! current paper enriches that study in several ways, both in
2. its consideration of LivingSocial as a comparison point, and riod. Our criteria for city selection were population and ge-
especially in our use of social network data sources, such as ographic distribution. Specifically, our list of cities includes:
Facebook and Yelp, to study deal sites. Indeed, we believe Atlanta, Boston, Chicago, Dallas, Detroit, Houston, Las Ve-
this use of multiple disparate data sources, while not novel gas, Los Angeles, Miami, New Orleans, New York, Orlando,
as a research methodology, appears original in this context Philadelphia, San Diego, San Francisco, San Jose, Seattle,
of gaining insight into deal sites. Tallahassee, Vancouver, and Washington DC. In total, our
Before continuing, we acknowledge that a reasonable ques- data set contains statistics for 16,692 deals.
tion is why we gathered data ourselves, instead of asking Each Groupon deal is associated with a set of features:
Groupon for data; such data (if provided) would likely be the deal description, the retail and discounted prices, the
more accurate and possibly more comprehensive. We of- start and end dates, the threshold number of sales required
fer several justifications. First, by gathering our own data, for the deal to be activated, the number of coupons sold,
we can make it public, for others to use and to verify our whether the deal was available in limited quantities, and if
results. Second, by relying on a deals site as a source for it sold out. Each deal is also associated with a category such
data, we would be limited to data they were willing to pro- as “Restaurants”, “Nightlife”, or “Automotive”. From these
vide, as opposed to data we thought we needed (and was basic features we compute further quantities of interest such
publicly available). Gathering our own data also motivated as the revenue derived by each deal, the deal duration, and
us to gather and compare data from multiple sources. Fi- the percentage discount.
nally, due to fortuitous timing, Groupon’s recent S-1 filing With each Groupon deal, we collected intraday time-series
[10] allowed us to validate several aggregate measures of the data which monitors two time-varying parameters: cumula-
datasets we collected. tive sales, and whether or not a given deal is currently fea-
Related Work on Daily Deals: To date, there has tured. To compile these time-series, we monitored each deal
been little previous work examining Groupon and LivingSo- in roughly ten-minute intervals and downloaded the value of
cial specifically. Edelman et al. consider the benefits and the sales counter. Occasionally some of our requests failed
drawbacks of using Groupon from the side of the merchant, and therefore some gaps are present in our time-series data,
modeling whether the advertising and price discrimination but this does not materially affect our conclusions. The
effects can make such discounts profitable [9]. Dholakia polls second parameter we monitored was whether a deal was fea-
businesses to determine their experience of providing a deal tured or not, with featured deals being those deals that are
with Groupon [8], and Arabshahi examines their business presented in the subject line of daily subscriber e-mails while
model [2]. Several works have studied other online group being given prominent presentation in the associated city’s
buying schemes that arose before Groupon, and that utilize webpage. For example, visiting groupon.com/boston, one
substantially different dynamic pricing schemes [1, 12]. Ye notices that a single deal occupies a significant proportion
et al. recently provide a stochastic “tipping point” model of the screen real-estate, while the rest of the deals which
for sales from daily deal sites that incorporates social net- are concurrently active are summarized in a smaller sidebar.
work effects [21]. They provide supporting evidence for their Although Groupon has a public API1 through which one
model using a Groupon data set they collected that is sim- can obtain some basic deal information, we decided also to
ilar to, but less comprehensive, than ours, but they do not monitor the Groupon website directly. Our primary ratio-
measure social network activity. nale was that certain deal features, such as whether a link to
reviews for the merchant offering the deal was present, were
2. THE DAILY DEALS LANDSCAPE not available through the Groupon API. We used the API
to obtain a category for each deal and to validate the sales
In this section, we describe the current landscape of daily data we collected. Observed discrepancies were infrequent
deal sites exemplified by Groupon and LivingSocial. We and small: we used the API-collected data as the ground
start by describing the measurement methodology we em- truth in these cases. We did not use the API to collect
ployed to collect longitudinal data from these sites, and time-series data.
provide additional background on how these sites operate. We collected data from LivingSocial between March 21st
We then describe basic insights that can be gleaned directly and July 3rd, 2011 for the same set of 20 cities. In total,
from our datasets, including revenue and sales broken out our LivingSocial dataset contains 2,609 deals. LivingSocial
by week, by deal, by geographic location, and by deal type. deals differ from their Groupon counterparts in that they
Moving on, we observe that given an offering, daily deal have no tipping point, and in that they do not explicitly
sites can optimize the performance of the offering around indicate whether they are available in limited quantities (al-
various parameters, most obviously price, but also day-of- though they do sell out occasionally). LivingSocial runs two
week, duration, etc. We explore these through the lens of types of deals: one featured deal per day, and a secondary
our datasets. “Family Edition” deal, which offers family-friendly activities,
2.1 Measurement Methodology and receives less prominent placement on the LivingSocial
website. For LivingSocial deals we only collected data on
We collected longitudinal data from the top two group their outcomes; we did not collect time-series data.
deal sites, Groupon and LivingSocial, as well as from Face-
book and Yelp. Our datasets are complex and we describe 2.1.2 Facebook data
them in detail below. Both Groupon and LivingSocial display a Facebook Like
2.1.1 Deal data button for each deal, where the Like button is associated
with a counter representing the number of Facebook users
We collected data from Groupon between January 3rd and who have clicked the button to express their positive feed-
July 3rd, 2011. We monitored – to the best of our knowl-
1
edge – all deals offered in 20 different cities during this pe- http://www.groupon.com/pages/api
3. 300K
$15M
Superbowl ads Sales
q
Revenue Sales
750K
Revenue
$6M
q
FTD offer
q
q
1
q
q
2
q
q
3
q q q
q q q
q q q q
q
q
3
200K
$10M
q q
q q q
q
q q q
500K
q q
$4M
q
q
1 q qq
2
q
Revenue
q
Revenue
q q
Sales
q q
q
Sales
q q
q
q
q q
100K
$5M
250K
$2M
q Barnes & Noble
1
q
1 5 Nights in Puerto Vallarta
q Body Shop
2
q
2 5 Days in Cabo San Lucas
q Old Navy
3
q 7 Nights in the Caribbean
$0M
3
$0M
0K
0K
Jan Mar May Jul Apr May Jun
(a) Groupon (b) LivingSocial
Figure 1: Weekly revenue and sales in 20 selected cities (2011).
1000
q
Sales/deal
q
q
Revenue/deal Sales/deal
1200
q
Revenue/deal
$20K
$30K
q
q
q q q
q q
q
q q q
800
q
1000
$25K
q q
q Revenue/deal
$15K
q q q
Revenue/deal
Sales/deal
q q q q
Sales/deal
q q q
q
q q
q q
$20K
q
q
800
q
q q q
600
$10K
q q
$15K
q
600
$5K
400
Jan Mar May Jul Apr May Jun
(a) Groupon (b) LivingSocial
Figure 2: Revenue and coupons sold per deal week-over-week.
back. We refer to the value of the counter as the number contains 56,048 reviews, for 2,332 merchants who ran 2,496
of likes a deal has received, and we collected this value for deals on Groupon during our monitoring period. Yelp has
each Groupon and LivingSocial deal in our dataset. implemented one measure to discourage the automated col-
As a technical aside, we mention that Groupon and Liv- lection of reviews which directly affected our data collec-
ingSocial have different implementations of the Facebook tion: it hides a set of user reviews for each merchant. To
Like button that necessitated our collecting data from them see the hidden reviews, one has to solve a CAPTCHA. We
in different ways. Within each deal page, Groupon embeds did not attempt to circumvent CAPTCHAs, and we do not
code that dynamically renders the Facebook Like button. It know whether the hidden reviews are randomly selected, or
does so by sending Facebook a request that contains a unique are selected by other criteria. Since Yelp reports the total
identifier associated with the corresponding deal page. We number of reviews available per merchant, we ascertained
extracted the unique identifier from Groupon deal pages and that approximately 23% of all reviews for merchants in our
directly contacted Facebook to obtain the number of likes for dataset were hidden from our collection.
every deal. LivingSocial instead hard-codes the Like button
and its associated counter within each page. As we could
not obtain the identifier associated with each LivingSocial
2.2 Operational Insights
deal, we could not query Facebook to independently obtain Figure 1 serves as an overview of insights we are able to
the number of likes, and thus we collected the hard-coded gain using our dataset. It displays the weekly revenue as well
number from LivingSocial deal pages. as the weekly sales of coupons across all 20 cities we moni-
tored for Groupon and LivingSocial, respectively. Notably,
2.1.3 Yelp data while both Groupon and LivingSocial are widely regarded as
Groupon occasionally displays reviews for the merchant companies enjoying extremely rapid growth, our first take-
offering the deal in the form of a star-rating, as well as se- away from these plots is that sales and revenue in these 20
lected reviewer comments. The reviews are sourced from established markets are relatively flat across the time pe-
major review sites such as Yelp, Citysearch, and TripAdvi- riod. We conjecture that much of the reported growth is in
sor. For Groupon deals that were linked to Yelp reviews, newer markets.
we collected the individual reviewer ratings and comments By happenstance, Groupon’s recent S-1 filing [10] pro-
left by customers on Yelp. We collected this dataset dur- vided financial information that allowed us to validate some
ing the first week of September 2011. In total, our dataset of the aggregate revenue data that we collected indepen-