6. Problem: Chinese comics portals load
slowly and have poor usability
Solution: Write a web scraping tool to
batch download images, so I can
be a proper Apple fanboy and read
comics on my iPad
7. First approach: urllib2 + lxml.html
Verdict: Total fail, couldn't handle
JavaScript and cookies
11. Coincidence: At Leapfrog, we needed
a better way to test JavaScript
behavior on our sites
Result: Leapfrog subsidizes my
comic book addiction
12. Over time, we added some nice stuff
to our spynner fork
- Ability to ignore SSL errors
- Form manipulation methods
- Screen capture (consistently-sized
images)
- Other stuff that I can't remember
13. Question: Hey Feihong, where can I get
this sexy library?
Answer: Nowhere, I'm too busy
reading comics to open source it
14. Real answer: We're working on
open sourcing it, but we ran into
some blocks
(Cast sidelong glance at Terry)
15. However, we are NOT
soliciting suggestions for a
name. We have the
PERFECT name already.
18. from punky import Browster
browser = Browster(auto_load_images=True)
browser.create_webview(show=True)
browser.load('http://www.duckduckgo.com/')
browser.fill('input#hfih',
'How do I de-pube-ify waterless urinals?')
browser.submit('form#hfh', wait_load=True)
for element in browser.all('#r12 > div'):
print unicode(element.toPlainText())
19. Random note: I
never want to have
the need to see a
urologist. But if I
do, I hope he's
wearing a badge
like this: