This document discusses content access control solutions using Varnish Cache. It outlines challenges and considerations when designing and deploying a paywall system, including anonymous metering, scalable storage options for paywall state, and implications for search engine optimization from Google's First Click Free policy. The document also covers exclusions from the paywall, fraud detection, and other topics relevant to implementing a paywall system with Varnish.
Time Series Foundation Models - current state and future directions
Content Access Control with Varnish Cache: Challenges and Considerations
1. Content Access Control with
Varnish Cache
A quick look at some challenges & considerations
Carlos Abalde, Roberto Moreda
{cabalde,moreda}@allenta.com
Stockholm, Varnish Summit 2014
2. Agenda
๏ Our particular journey designing & deploying
access control solutions based on Varnish Plus
๏ Contents
‣ Varnish Paywall
‣ Challenges & considerations
‣ Conclusions
3. Who are we?
๏ Allenta Consulting
‣ http://www.allenta.com
๏ Varnish Software integration partner
๏ Specialized in Varnish Paywall
‣ Paywall projects running in Italy, Ireland &
Argentina at the moment
7. Who’s Johann?
๏ According to Wikipedia, Johann Carolus is the
name of the publisher of the first newspaper
๏ He’s also the hero of this presentation
๏ Johann is yet another publisher worried for the
decline of advertising revenue in on-line media
‣ Evolution of traditional ad-based models?
‣ Alternative tool for monetizing on-line contents?
8. Johann has a wish list
๏ Transition to a subscription-based model
‣ Flexible / extensible subscription model
- Metered subscriptions
- Partial subscriptions
๏ Freemium model
๏ Owned contents
9. … a huge wish list!
๏ Separate Plug & Play component
‣ Minimal changes to existing backend
๏ Scalable & high performance solution
‣ Do not degrade current UX
๏ On-premises solution
‣ Full control of the product
10.
11.
12. What’s VPW?
๏ Part of Varnish Plus
‣ Access control logic moved to the caching edge
‣ Fast & flexible paid content delivery
๏ Win-win toolkit solution
‣ Powerful access control layer
‣ Advanced caching technology
13. What’s really VPW?
๏ Some VCL subroutines, a few general purpose
OSS VMODs, and one access control specific
VMOD
๏ Optionally,
‣ Some high performance storage
‣ Some Varnish Custom Statistics counters
‣ Some JavaScript assets
14.
15. Beyond newspapers
๏ VPW is not a traditional media specific product
๏ VPW is about moving access control logic to the
caching edge
‣ Execute access control logic at Varnish speed
‣ Improve hit ratio
‣ Simplify backend logic
16. VPW is also for…
๏ Alice, who’s running a trading site willing to
distribute certain reports only to premium users
๏ Bob, who has been asked to speed up a paid music
streaming service
๏ Emma, who’s running a slow site of stock images
limited to 5 downloads per day per authenticated
user
๏ …
19. Who’s Cosme?
๏ Cosme is an engineer working at Allenta
๏ He has been working on access control solutions
based on Varnish Plus for a few years
๏ Cosme discusses with Johann some usual
challenges & considerations when adding a
paywall layer to an existing website
‣ Anonymous metering, storage options, SEO…
20. Anonymous metering
“Let’s do this NYT style”
๏ “I don’t want the paywall to bother casual
readers. Let’s do this NYT style. Only require
authentication after 10 articles have been
accessed during the current month”
๏ “I’ve read the NYT paywall is breakable using a
simple bookmarklet. Seriously?”
๏ “What about using browser fingerprinting
to identify anonymous users?”
21. Anonymous metering
Metering cookies
๏ Metering based on cookies is breakable
‣ Is this a real issue from a business perspective?
‣ Restrict contents eligible for anonymous access
- Focus on user engaging
๏ Cookie backups in local storage, DOM…
- https://github.com/samyk/evercookie
22. Anonymous metering
Browser fingerprinting
๏ Server side metering
‣ https://github.com/Valve/fingerprintjs
๏ Not a real solution
‣ Also easily breakable
‣ Collisions
- Mobile devices, cloned desktops…
23. Paywall state
“Where is metering data stored?”
๏ “Where is metering data stored?”
๏ “Systems guys are asking about scalability of
the storage layer keeping track of the state of
the paywall. What about this?”
๏ “And what about HA? What are the options
here?”
24. Paywall state
Memcached vs. Redis
๏ Memcached
‣ https://github.com/varnish/libvmod-memcached
๏ Redis
‣ https://github.com/carlosabalde/libvmod-redis
‣ Persistence
‣ Richer API & Power of LUA scripting
25. Paywall state
Current scalability & HA options
๏ Twemproxy
‣ https://github.com/twitter/twemproxy
‣ Light-weight sharding proxy for MC & Redis
๏ Redis Sentinel
‣ http://redis.io/topics/sentinel
‣ Monitoring, notification & automatic failover
26. Paywall state
Future scalability & HA options
๏ Redis Cluster
‣ http://redis.io/topics/cluster-tutorial
‣ Automatic sharding & replication for Redis
๏ Dynomite
‣ https://github.com/Netflix/dynomite
‣ Dynamo implementation for MC & Redis
27. SEO
“Let Google bot access to all paywalled contents”
๏ “Google bot should be able to index all contents
in my site, both paywalled and not paywalled
ones”
๏ “Simply detect the bot checking the User Agent
HTTP header, check the source IP address using
the DNS VMOD, and let it access to all
paywalled contents”
28. SEO
Google’s First Click Free Policy for Web Search
๏ Google penalices content cloaking
๏ FCF requires that all users who click a Google
search result should be allowed to see the full
text of the content they are trying to access
‣ That text must be identical to the content that was
shown to Google bot on indexing time
‣ Publishers are allowed to limit the number of
accesses under the FCF policy to 5 accesses per
user each day
29. SEO
FCF implications
๏ Users may get access even when their quotas are
exhausted or they are even not authenticated
๏ Breakable exclusion based on Referrer header
‣ Well known issue of FT and other newspapers
‣ What about teasers?
- Same URL internally rewritten by Varnish
- Not useful for freemium contents
30. And much more…
๏ Access control exclusions
๏ Fraud detection
๏ Testing strategy
๏ Paywall API & Agent
๏ Usage statistics
๏ …
35. How does VPW work?
๏ Custom HTTP headers
‣ X-Pw-Access-Control…
๏ API services
‣ Authorization service…
๏ Securely signed cookies
๏ High performance storage
36. Exclusions
“And now some exceptions”
๏ “The IP ranges of these companies should
completely bypass the paywall. We have some
B2B agreements with them”
๏ “The web views used by our official mobile apps
should also bypass the paywall”
๏ “Any click on paywalled contents linked in
Facebook or Twitter should also bypass
the paywall”
37. Exclusions
Beware of fake HTTP headers
๏ It’s completely reasonable to bypass the paywall
logic based on:
‣ A Varnish ACL
‣ Some ad-hoc HTTP headers including a HMAC
signature generated using a secret shared
between Varnish and the mobile apps
๏ Bypassing the paywall logic based on the HTTP
referrer header is weak and should be carefully
analyzed
38. Fraud detection
“Sharing unmetered subscriptions”
๏ “What if some user purchases an unmetered
subscription and then shares his/her credentials
with all his/her Facebook friends?”
๏ “What if an office using a NAT proxy buy a single
unmetered subscription to all the
employees in the building?”
39. Fraud detection
Rate limiting
๏ You may be able to detect fraud in your user
management component
‣ Limit number / rate of sessions per user
‣ Force extra validations / block users when a
suspicious behavior is detected
๏ Paywall may help if you are not able to do that
‣ Redis sorted set restricting number of SIDs & IPs
per user during some short time window