Dynamically skinning a legacy portal using Python, WSGI (the Python Web Server Gateway Interface), and Deliverance.
So you have a big legacy portal application which you want to change the look of, but are contractually not allowed to touch?
Here is a case study on how we used the power and flexibility of Python and WSGI and the wonder lxml to dynamically re-skin a proprietary .NET portal without even touching it.
We take a giant lump of messy invalid HTML markup and dynamically strip it back, add semantic markup and CSS and present the user with a nice svelte valid site.
I will cover the history of the legacy portal, the problems encountered, our cunning plan to dynamically re-skin the site, a technical overview of the parts of the system (lxml, WSGI, etc), and what we learned along the way.
1. Lipstick on a Pig
Dynamically re-skinning a
legacy .NET portal with python
Matt Hamilton
matth@netsight.co.uk
30th June 2009 Europython 2009, Birmingham, UK 1
2. Introduction
Dynamically re-skinning a .NET portal site
Can't name the client
Portal for teachers in the UK
Aggregating content across legacy portal, Plone
and Moodle
30th June 2009 Europython 2009, Birmingham, UK 2
3. Who Am I
Technical Director of Netsight
Web development firm in Bristol, UK
10 years experience with Zope/Plone
More of an integrator than core developer
I get involved in all those sticky projects of merging
Plone in with other systems in an enterprise
30th June 2009 Europython 2009, Birmingham, UK 3
5. Existing Portal (1.0)
Five years old by November 2009
User registrations: 46,681
Course enrolments: 33,664
Resource Bank views: 247,911
30th June 2009 Europython 2009, Birmingham, UK 5
7. Problems with Current Portal
Look-and-feel
Not very compelling
Usability
Challenging in places
A poor content management system
Can't really edit general content, so use a separate FTP server
and Dreamweaver
Vendor lock-in
Even small changes, very expensive
30th June 2009 Europython 2009, Birmingham, UK 7
8. The Future - Portal 2.0
Usability, Design and Content Review April 2008
Strategic Review August 2008
Feasibility Studies Jan 2009
Pilot Demonstrator (“Portal V1.5”) March 2009
30th June 2009 Europython 2009, Birmingham, UK 8
10. Architecture Review
Portal 1.0 - Monolithic, tightly coupled, poor
separation of skin
30th June 2009 Europython 2009, Birmingham, UK 10
11. Architecture Review
Portal 2.0 - Extensible, loosely coupled, good
separation of skin
30th June 2009 Europython 2009, Birmingham, UK 11
12. How Do We Get There?
Remember: We Can't Touch the Existing System!
30th June 2009 Europython 2009, Birmingham, UK 12
13. The Cunning Plan
Browser
Portal
1.0
Existing Skin 1.0
Portal Module of functionality e.g. portfolio
30th June 2009 Europython 2009, Birmingham, UK 13
14. Total Skin Graft!
Browser
web server
New skin via xpath and xslt
Skin 2.0 transformation. Portal
transformation proxy 1.5
Web server needs to handle SSL.
Together they give us nice URLs.
Skin 1.0
30th June 2009 Europython 2009, Birmingham, UK 14
15. Deliverance
Several Different Projects
xdv
Deliverance 0.3
To learn more on specifics of Deliverance go
to http://deliverance.openplans.org
30th June 2009 Europython 2009, Birmingham, UK 15
16. WSGI
WSGI allows you to write small modules chained
together in a 'pipeline'
Many small filters combined together as you
need
Lots of existing components out there
Very easy to write new ones
30th June 2009 Europython 2009, Birmingham, UK 16
17. WSGI Power - The Pipeline
[pipeline:portal] [filter:theme.portal]
pipeline = use = egg:dv.xdvserver#xdv
theme.portal theme_uri = file://%(here)s/theme/theme.html
ploneinterceptor rules = %(here)s/rules/content.xml
xslt [filter:ploneinterceptor]
linkrewrite use = egg:ns.ploneinterceptor#ploneinterceptor
htmlcleaner
source.portal [filter:xslt]
use = egg:dv.xdvserver#xslt
xslt_file = %(here)s/rules/transform.xsl
[filter:linkrewrite]
use = egg:ns.linkrewrite#linkrewrite
[filter:htmlcleaner]
use = egg:ns.htmlcleaner#htmlcleaner
[app:source.portal]
use = egg:Paste#proxy
address = http://www.theexistingsite.org.uk/
30th June 2009 Europython 2009, Birmingham, UK 17
18. Link Rewriting
Old URL:
http://www.theclient.org.uk/WebPortal.aspx?
page=1&module=DB920A53-01EA-4886-8878-
F2CDF5FA8CFD&mode=101&IsNonNewsDB920A53_01EA
_4886_8878_F2CDF5FA8CFD=True&newsIdDB920A53_
01EA_4886_8878_F2CDF5FA8CFD=11208#10
205 characters!
New URL:
http://www.theclient.org.uk/news/11208#10
41 characters!
30th June 2009 Europython 2009, Birmingham, UK 18
19. HTML Cleanup
LXML rules!
from lxml.html.clean import Cleaner
cleaner = Cleaner(...)
# Pretty print the HTML
dom = document_fromstring(body)
body = etree.tostring(dom, pretty_print=True)
# Clean the HTML
body = cleaner.clean_html(body)
30th June 2009 Europython 2009, Birmingham, UK 19
20. Result
Old
70kb of HTML
120 Validation errors, 61 warnings
New
40Kb of HTML
27 Errors, 1 warning (mainly xhtml/html conflicts)
No significant performance impact
30th June 2009 Europython 2009, Birmingham, UK 20
21. Putting all together
Composite:main
pipeline:portal pipeline:plone pipeline:moodle
theme.content theme.content theme.content
xslt navmerger navmerger
linkrewrite plonecontent moodlecontent
htmlcleaner source.plone source.moodle
source.portal
30th June 2009 Europython 2009, Birmingham, UK 21
22. End Result New Style
Portal content
30th June 2009 Europython 2009, Birmingham, UK 22
23. Complications
Navigation
One page, two content sources, how is the navigation
built?
Search
Search needs to go across multiple systems
Will soon be looking at Solr, Xapian, Google Mini
.NET viewstate postback
Massive hidden state variable, form wraps entire site!
30th June 2009 Europython 2009, Birmingham, UK 23
24. Questions?
matth@netsight.co.uk
30th June 2009 Europython 2009, Birmingham, UK 24
25. We are looking for Developers!
Come chat to me
or
drop an email to
careers@netsight.co.uk
30th June 2009 Europython 2009, Birmingham, UK 25