5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Data visualization with Python and SVG
1. Data visualization with Python and SVG
Plotting an RNA secondary structure
Sukjun Kim
The Baek Research Group of Computational Biology
Seoul National University
April 11th, 2015
Special Lecture at Biospin Group
1
2. 2
Plotting libraries for data visualization
• They have their own language for plotting.
• They should be installed prior to use.
• There are dependencies on upper level libraries.
• They are appropriate for high level graphics.
• We cannot customize a plot at low level.
R matplotlib d3.js
gnuplot Origin PgfPlots
PLplot Pyxplot Grace
3. 3
SVG(Scalable Vector Graphics)
• XML-based vector image format for two-dimensional graphics.
• The SVG specification is an open standard developed by the
World Wide Web Consortium (W3C) since 1999.
• As XML files, SVG images can be created and edited with any
text editor.
• All major modern web browsers – including Mozilla Firefox,
Internet Explorer, Google Chrome, Opera, and Safari – have
at least some degree of SVG rendering support.
(Wikipedia – Scalable Vector Graphics)
Data visualization by writing SVG document
• SVG markup language is open standard and easy to learn.
• Not only python but also any programming language can be used.
• It requires no dependent libraries.
• We can customize graphic elements at low level.
4. 4
Structure of SVG document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg"
version="1.1" width="100" height="100">
<circle cx="50" cy="50" r="40" stroke="green"
stroke-width="4" fill="yellow"/>
</svg>
XML tag
declaration of
DOCTYPE
start of SVG tag
end of SVG tag
contents of
SVG document
SVG elements
• SVG has some predefined shape elements.
• rectangle <rect>, circle <circle>, ellipse <ellipse>, line <line>,
polyline <polyline>, polygon <polygon>, path <path>, ...
• group <g>, hyperlink <a>, text <text>, ...
40
(50,50)
5. RNA secondary structural data
## microRNA structural data
seq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'
dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'
pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]
coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
5
RNAplotRNAfoldseq dotbr, pairs coor
How to generate RNA structural data?
(Vienna RNA package, http://www.tbi.univie.ac.at/RNA/)
• seq: RNA sequence.
• dotbr: dot-bracket notation which is used
to define RNA secondary structure.
• pairs: base-pairing information.
• coor: x and y coordinates for nucleotides.
This is our final
image to plot
6. Writing a SVG tag in python script
6
out = []
out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">n')
## svg elements here
out.append('</svg>n')
open('rna.svg', 'w').write(''.join(out))
<svg xmlns="http://www.w3.org/2000/svg" version="1.1">
</svg>
rna.py
rna.svg
SVG documents basically requires open and close SVG tags
8. Drawing phosphate backbone
8
points = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])
out.append('<polyline points="%s" style="fill:none;
stroke:black; stroke-width:1;"/>n'%(points))
coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
In DNA and RNA, phosphate backbone is regarded as a
skeleton of the molecule. The skeleton will be represented by
SVG <polyline> tag.
We have x and y coordinates of each nucleotide as below.
Using the coordination information, we can specifiy points
attribute of polyline tag.
10. Drawing base-pairing
10
for i, j in pairs:
x1, y1 = coor[i]
x2, y2 = coor[j]
out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f"
style="stroke:black; stroke-width:1;"/>n'%(x1, y1, x2, y2))
pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]
coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
Watson-Crick base pairs occur between A and U, and between
C and G. We will use <line> tag to represent the hydrogen
bonds.
In addition to a coordination information, we also have base-
pairing information in the form of tuple carrying the indexes of
two nucleotides.
From two types of data, base-pairing information can be
visualized as a simple line.
12. SVG Text
12
<text x="0" y="15" font-size="15"
style="fill:blue">I love SVG!</text>
(0,15)
fill:blue
font-size="15"I love SVG!
13. Drawing nucleotides
13
A
Each nucleotide will be represented by one character text
enclosed with a circle.
seq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'
coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]
<text>
<circle>
for i, base in enumerate(seq):
x, y = coor[i]
out.append('<circle cx="%.3f" cy="%.3f" r="%.3f"
style="fill:white; stroke:black; stroke-width:1"/>n'%(x, y, 5))
out.append('<text x="%.3f" y="%.3f" font-size="6" text-
anchor="middle" style="fill:black">%s</text>n'%(x, y+6*0.35, base))
RNA sequence and a coordination information is required.
<text> tag should be written after the <circle> tag.
14. Content of the python script
14
## microRNA structural data
seq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'
dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'
pairs = [(0, 68), (1, 67), (2, 66), (4, 64), (5, 63), (6, 62), (7, 61), (9, 59), (10, 58), (11, 57), (12, 56), (13, 55), (14,
54), (15, 53), (16, 52), (17, 51), (19, 49), (20, 48), (21, 47), (22, 46), (23, 45), (24, 44), (25, 43), (26, 42), (27, 41),
(28, 40), (29, 39)]
coor =
[(69.515,526.033),(69.515,511.033),(69.515,496.033),(61.778,483.306),(69.515,469.506),(69.515,454.506),(69.515,439.506),(69.
515,424.506),(62.691,412.302),(69.515,400.099),(69.515,385.099),(69.515,370.099),(69.515,355.099),(69.515,340.099),(69.515,3
25.099),(69.515,310.099),(69.515,295.099),(69.515,280.099),(61.778,266.298),(69.515,253.571),(69.515,238.571),(69.515,223.57
1),(69.515,208.571),(69.515,193.571),(69.515,178.571),(69.515,163.571),(69.515,148.571),(69.515,133.571),(69.515,118.571),(6
9.515,103.571),(56.481,95.317),(50.000,81.317),(52.139,66.039),(62.216,54.357),(77.015,50.000),(91.814,54.357),(101.891,66.0
39),(104.030,81.317),(97.549,95.317),(84.515,103.571),(84.515,118.571),(84.515,133.571),(84.515,148.571),(84.515,163.571),(8
4.515,178.571),(84.515,193.571),(84.515,208.571),(84.515,223.571),(84.515,238.571),(84.515,253.571),(92.252,266.298),(84.515
,280.099),(84.515,295.099),(84.515,310.099),(84.515,325.099),(84.515,340.099),(84.515,355.099),(84.515,370.099),(84.515,385.
099),(84.515,400.099),(91.339,412.302),(84.515,424.506),(84.515,439.506),(84.515,454.506),(84.515,469.506),(92.252,483.306),
(84.515,496.033),(84.515,511.033),(84.515,526.033)]
out = []
out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">n')
## [1] phosphate backbone - <polyline> tag
points = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])
out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>n'%(points))
## [2] base-pairing - <line> tag
for i, j in pairs:
x1, y1 = coor[i]
x2, y2 = coor[j]
out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>n'%(x1, y1, x2, y2))
## [3] nucleotide - <circle> and <text> tags
for i, base in enumerate(seq):
x, y = coor[i]
out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>n'%(x, y, 5))
out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>n'%(x, y+6*0.35,
base))
out.append('</svg>n')
open('rna.svg', 'w').write(''.join(out))
15. How to use other SVG tags? Go to w3schools.com!
20. Conclusions
20
• There are many graphic tools and libraries for data visualization.
• These software options provide a function limited to high level graphics.
• No dependent libraries or significant time investment are required for
learning a specific language to write SVG documents.
• If you want to plot a noncanonical type of graph and customize it at low
level, writing a SVG document with Python will be the best solution that
meets your purpose.
안녕하세요. 서울대학교 생물정보학 연구실 박사과정에 재학중인 김석준입니다.
제가 오늘 말씀드릴 내용은 Python과 SVG를 이용한 데이터 시각화 입니다.
여러분의 실제적인 이해를 돕기 위해 생물학적인 예제를 중심으로 구성해 왔습니다.
바로 RNA의 2차 구조를 그려보는 예제인데요.
너무 생물학적인 예제라 생각이 드시겠지만, 이 예제를 이해하고 나시면
생물학적인 주제 뿐만이 아니라 여러분이 생각하시는 모든 데이터 시각화에 있어
도움이 되실 거라 저는 생각합니다.
우리는 보통 데이터 시각화를 하기 위해 시각화 소프트웨어 또는 라이브러리들을 사용하게 됩니다.
여기 데이터 시각화를 하기 위한 수 많은 소프트웨어와 라이브러리들이 나열되어 있습니다.
그런데, 이러한 시각화 도구들을 사용하기 위해서는 시각화 도구가 갖고 있는 언어나 복잡한 사용법을 익혀야 합니다.
또한 컴퓨터에 설치하는 과정을 거쳐야 하며 설치하는 도중에 의존성 문제가 발견되기도 합니다.
그리고 높은 레벨의 그래픽만 다룰 수 있으며 낮은 레벨의 그래픽을 다루기에는 한계가 있습니다.