Transmogrifier is a fantastic tool for moving content from one website to another. Simple, flexible and powerful, it makes the difficult tasks of migration easy and the impossible possible. But there's more to migrating with transmogrifier than just learning the tool. The everyday task of managing content can lead to complex problems. You need a plan. In this talk, we'll look at a real-world example of the migration of a large, content-heavy website from Liferay to Plone. We'll talk about where the hidden traps were found, the tools we used to get past them, and the knowledge that would have helped us avoid them in the first place.
26. Plone Conference 2011
DB Schema
Easy to Understand
SELECT
COALESCE(gui.guestName, mbm.userName) AS userName,
mbm.createDate,
mbm.modifiedDate,
mbm.subject,
mbm.body
FROM
MBMessage mbm
JOIN
MBDiscussion mbd ON mbd.threadId = mbm.threadId
LEFT JOIN
guestUserInfo gui ON gui.messageId = mbm.messageId
WHERE
mbd.classPK=%d ORDER BY modifiedDate
27. Plone Conference 2011
DB Schema
Easy to Understand
SELECT
COALESCE(gui.guestName, mbm.userName) AS userName,
SELECT
mbm.createDate,
te.name
mbm.modifiedDate,
FROM TagsEntry te
mbm.subject,
WHERE te.entryId in
mbm.body
(SELECT tate.entryId
FROMFROM TagsAssets_TagsEntries tate
MBMessage mbm in
WHERE tate.assetId
JOIN (SELECT ta.assetId
MBDiscussion mbd ON mbd.threadId = mbm.threadId
FROM TagsAsset ta
LEFT JOIN
WHERE ta.classPK=%d))
AND te.vocabularyid ON gui.messageId = mbm.messageId
guestUserInfo gui = 41473
WHERE
mbd.classPK=%d ORDER BY modifiedDate
52. Plone Conference 2011
Get Article Comments
SELECT
COALESCE(gui.guestName, mbm.userName) AS userName,
mbm.createDate,
mbm.modifiedDate,
mbm.subject,
mbm.body
FROM
MBMessage mbm
JOIN
MBDiscussion mbd ON mbd.threadId = mbm.threadId
LEFT JOIN
guestUserInfo gui ON gui.messageId = mbm.messageId
WHERE
mbd.classPK=%d ORDER BY modifiedDate
62. Plone Conference 2011
Not Too Opinionated
A migration deals with moving pieces of content from
one place to another
63. Plone Conference 2011
Not Too Opinionated
A migration deals with moving pieces of content from
one place to another
A piece of content comes from somewhere
64. Plone Conference 2011
Not Too Opinionated
A migration deals with moving pieces of content from
one place to another
A piece of content comes from somewhere
A piece of content ends up somewhere
65. Plone Conference 2011
Not Too Opinionated
A migration deals with moving pieces of content from
one place to another
A piece of content comes from somewhere
A piece of content ends up somewhere
You should be able to do what you want to a piece of
content between point A and point B
92. Plone Conference 2011
Three Facts
1. Pipeline sections are generators
class MySection(object):
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
[setup]
def __iter__(self):
...
for item in self.previous:
[do some stuff]
...
yield item
[clean up]
93. Plone Conference 2011
Three Facts
2. SQL sections process items 1 query at a time
for query in self.queries:
result=self.connection.execute(query)
for row in result:
yield dict((x[0].encode('utf-8'), x[1])
for x in row.items())
94. Plone Conference 2011
Three Facts
3. Pipelines process one item at a time
class SectionOne(object): class SectionTwo(object):
def __iter__(self): def __iter__(self):
... ...
for item in self.previous: for item in self.previous:
[do some stuff] [do some stuff]
... ...
yield item yield item
127. Plone Conference 2011
Set up ID Map
class PostCreation(object):
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
self.transmogrifier = transmogrifier
...
if DOC_MAPS_KEY in annotations.keys():
self.doc_maps = annotations[DOC_MAPS_KEY]
else:
annotations[DOC_MAPS_KEY] = self.doc_maps = {'resourcePrimKey': {},
'urlTitle': {},
'articleId': {},
'uuid_': {},
'_path': {}}
128. Plone Conference 2011
Set up ID Map
class PostCreation(object):
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
def __iter__(self):
self.transmogrifier = transmogrifier
site = self.transmogrifier.context
... for item in self.previous:
if DOC_MAPS_KEY in annotations.keys():
path = item.get('_path', None)
self.doc_maps = annotations[DOC_MAPS_KEY]
if path:
else: try:
annotations[DOC_MAPS_KEY] = self.doc_maps = {'resourcePrimKey': {},
current = site.unrestrictedTraverse(path)
except KeyError: 'urlTitle': {},
'articleId': {},
# missing element in path somewhere, skip it?
pass 'uuid_': {},
if current: '_path': {}}
cuid = current.UID()
for key in ['resourcePrimKey', 'urlTitle', 'articleId',
'uuid_', '_path']:
if key in item:
self.doc_maps[key][item[key]] = cuid
yield item
129. Plone Conference 2011
Find Image Links
class ImageTagsFinder(object):
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
self.transmogrifier = transmogrifier
...
annotations = IAnnotations(transmogrifier)
if REWRITABLE_ELEMENTS_KEY not in annotations.keys():
annotations[REWRITABLE_ELEMENTS_KEY] = {}
self.rewriteable = annotations[REWRITABLE_ELEMENTS_KEY]
130. Plone Conference 2011
Find Image Links
class ImageTagsFinder(object):
classProvides(ISectionBlueprint)
def __iter__(self):
implements(ISection)
num_found = 0
for item in self.previous:
def __init__(self, transmogrifier, name, options, previous):
self.transmogrifier = item['_path']
path = transmogrifier
... if tree is None:
parser = etree.HTMLParser()
annotations = IAnnotations(transmogrifier)
tree = etree.fromstring(item['text'], parser)
if REWRITABLE_ELEMENTS_KEY not in annotations.keys():
if tree is not None:
annotations[REWRITABLE_ELEMENTS_KEY] = {}
all_images = tree.xpath('//img')
self.rewriteable = annotations[REWRITABLE_ELEMENTS_KEY]
if len(all_images) > 0:
# we have some anchors, do any need re-writing?
internal = []
for img in all_images:
src = img.attrib.get('src', '')
match = img_is_internal(src)
if match:
internal.append(img)
mapped['images'] = internal
self.rewriteable[path] = internal
yield item
131. Find Other Tags
Plone Conference 2011
class LinkFinder(object):
""" create a mapping of the items which have links to be modified
"""
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
self.transmogrifier = transmogrifier
annotations = IAnnotations(transmogrifier)
if REWRITABLE_ELEMENTS_KEY not in annotations.keys():
annotations[REWRITABLE_ELEMENTS_KEY] = {}
self.rewriteable = annotations[REWRITABLE_ELEMENTS_KEY]
132. Find Other Tags
Plone Conference 2011
class LinkFinder(object):
""" create a mapping of the items which have links to be modified
"""
classProvides(ISectionBlueprint)
implements(ISection)
def __iter__(self):
def __init__(self,etrees of any documents with links that need fixing
""" save transmogrifier, name, options, previous):
"""
self.transmogrifier = transmogrifier
annotations =in self.previous:
for item IAnnotations(transmogrifier)
path = item['_path']
if REWRITABLE_ELEMENTS_KEY not in annotations.keys():
annotations[REWRITABLE_ELEMENTS_KEY]parser)
tree = etree.fromstring(item['text'], = {}
if tree is not None:
self.rewriteable = annotations[REWRITABLE_ELEMENTS_KEY]
all_anchors = tree.xpath('//a')
if len(all_anchors) > 0:
# we have some anchors, do any need re-writing?
internal = []
for a in all_anchors:
href = a.attrib.get('href', '')
if is_internal_link(href):
internal.append(href)
self.rewriteable[path] = internal
yield item
133. Plone Conference 2011
Replace Found Links
class LinkReplacer(object):
""" re-write links in body texts of all created items
The work done by this item takes place entirely in the clean-up stage of
the section.
"""
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
self.transmogrifier = transmogrifier
annotations = IAnnotations(transmogrifier)
if REWRITABLE_ELEMENTS_KEY not in annotations.keys():
annotations[REWRITABLE_ELEMENTS_KEY] = {}
self.rewriteable = annotations[REWRITABLE_ELEMENTS_KEY]
134. Plone Conference 2011
Replace Found Links
class LinkReplacer(object):
""" re-write links in body texts of all created items
def __iter__(self):
The work done by this item takes place entirely in the clean-up stage of
the section. in self.previous:
for item
""" # no action takes place here
yield item
classProvides(ISectionBlueprint)
implements(ISection)
# get the maps we will use
annotations = IAnnotations(self.transmogrifier)
def __init__(self, transmogrifier, name, options, previous):
doc_maps = annotations[DOC_MAPS_KEY]
self.transmogrifier = transmogrifier
image_maps = annotations[IMAGE_MAPS_KEY]
annotations = IAnnotations(transmogrifier)
if REWRITABLE_ELEMENTS_KEY tagsin annotations.keys():
# rewrite image and anchor not
for path, info in self.rewriteable.items():
annotations[REWRITABLE_ELEMENTS_KEY] = {}
page = self.transmogrifier.context.unrestrictedTraverse(path)
self.rewriteable = annotations[REWRITABLE_ELEMENTS_KEY]
tree = info.get('tree', None)
links = info.get('links', [])
images = info.get('images', [])
lg, ln, le = rewrite_links(links, page, doc_maps,
self.logger, self.transmogrifier.context)
ig, _in, ie = rewrite_image_tags(images, page, image_maps,
self.logger)
135. Plone Conference 2011
Victory!
Photo by
Petr & Bara Ruzicka - CC-BY
http://www.flickr.com/photos/pruzicka/207209564/
136. Plone Conference 2011
Victory!
Right?
Photo by
Petr & Bara Ruzicka - CC-BY
http://www.flickr.com/photos/pruzicka/207209564/
153. class LinkReplacer(object):
""" re-write links in body texts of all created items
"""
classProvides(ISectionBlueprint)
implements(ISection)
...
def __iter__(self):
good, notenough, errors = [],[],[]
for item in self.previous:
yield item
for path, info in self.rewriteable.items():
page = self.transmogrifier.context.unrestrictedTraverse(path)
tree = info.get('tree', None)
links = info.get('links', [])
images = info.get('images', [])
lg, ln, le = rewrite_links(links, page, doc_maps,
self.logger, self.transmogrifier.context)
ig, _in, ie = rewrite_image_tags(images, page, image_maps,
self.logger)
good.extend(lg + ig)
notenough.extend(ln + _in)
errors.extend(le + ie)
with open('goodlinks.csv', 'w') as f:
goodwriter = csv.writer(f)
goodwriter.writerows(good)
with open('badlinks.csv', 'w') as f:
badwriter = csv.writer(f)
badwriter.writerows(notenough)
...
154. class LinkReplacer(object):
""" re-write links in body texts of all created items
""" def rewrite_image_tags(images, page, img_maps, logger):
good = []
classProvides(ISectionBlueprint)
notenough = []
implements(ISection)
... errors = []
def __iter__(self): image in images:
for
url = image.attrib.get('src', '')
good, notenough, errors = [],[],[]
# get information about the image to be subbed, either by id or uuid
for item in self.previous:
yield item img_id = extract_img_id_from_url(url)
for path, info in img_id is None:
if self.rewriteable.items():
img_id = extract_img_uuid_from_url(url)
page = self.transmogrifier.context.unrestrictedTraverse(path)
tree = info.get('tree', is None:
if img_id None)
links = info.get('links', []) not find a mapped image matching, not enough to go on
# we could
logger.warn('unable to find image id in url: %s' % url)
images = info.get('images', [])
lg, ln, le = rewrite_links(links, page, doc_maps, bad url', page.absolute_url(), url))
notenough.append(('missing img,
continue
self.logger, self.transmogrifier.context)
ig, _in, ie =# resolve the id we found into a plone object UID via image maps
rewrite_image_tags(images, page, image_maps,
img_info = find_image_from_id(img_id, img_maps)
self.logger)
if img_info is None:
good.extend(lg + ig)
logger.warn('unable to find mapped plone image id %s' % img_id)
notenough.extend(ln + _in)
notenough.append(('missing img, no map', page.absolute_url(), url))
errors.extend(le + ie)
continue
with open('goodlinks.csv', 'w') as f:
goodwriter # by default, use the 300x300 px medium size for in-page images
= csv.writer(f)
# To change this, adjust the value of STANDARD_IMG_SCALE
goodwriter.writerows(good)
newurl = "resolveuid/%s%s" % (img_info['uid'],
with open('badlinks.csv', 'w') as f:
badwriter = csv.writer(f) STANDARD_IMG_SCALE)
badwriter.writerows(notenough)newurl
image.attrib['src'] =
... good.append(('image match', page.absolute_url(), url))
return good, notenough, errors
162. Plone Conference 2011
Clients, when asked to describe their existing system,
will never describe it with enough accuracy to properly
plan for a migration
164. Plone Conference 2011
Learn as much as you can about the source system
when planning a migration
but know that you will always need to know more
165. Plone Conference 2011
Learn as much as you can about the source system
when planning a migration
but know that you will always need to know more
and that you will not find it out until you actually start
the migration