4. ELN architecture
• Hopefully
• I am not going to self-destruct
• Your project won’t be as exciting
• Your task is to
• Deliver a state-of-the-art ELN system
• In tight timescales
• With limited budget
• In the real world
• That the users like
• And will serve you for many years
http://www.amphora-research.com/
4
5. Introduction
• About me
• Started working with ELNs in ‘96
• President & Co-founder of Amphora
• IT background
• First ELN was enterprise-scale ELN for Kodak
• Worldwide, 1,000’s of users, diverse user base
• Completely Electronic Records (no paper)
• After a long & windy road
• New products, lots more deployments, many industries
• Certain amount of realism about ELN implementation
• Provide Patent Evidence Creation & Preservation
Systems
• Work with a wide variety of “ELN” systems etc.
• Now based in the US & UK
http://www.amphora-research.com/
5
6. This presentation
• You can download a copy of this presentation from
our web site
http://www.amphora-research.com/
6
7. Why does architecture matter?
• A good architecture can help
• Integrate “Best of breed” tools with existing investments
• Allow you to split the project into manageable pieces
• Ensure you don’t get “captured” by the vendor
• Help your system withstand the ravages of time
• Keep your TCO down
• A bad architecture will hurt
• Reliability, Scalability problems
• Reduce your options going forward
• Force you into “Big bang” project
• Some random thoughts on architecture
http://www.amphora-research.com/
7
8. ELN architecture
• Major issues
• Diversity & Flexibility
• Project size/Justification/ROI
• Creating & Preserving Evidence for Patents
• Need for long term access to ELN contents
• Scalability
• Web-based systems
• How your network can help you
• Trends
• Integration methods
• Open Source
• In the lab
• Ones to watch
http://www.amphora-research.com/
8
9. Diversity & Flexibility
• “Science” covers a wide variety of activity
• Each of these is served by its own industry
• Improvements in each area needs to happen at its
own pace
• Things change
• Different techniques
• New data types
• Another R&D centre
• New devices for use in the lab
• The very essence of “Research” is to change the
way you work
• How do we design an ELN which can
accommodate these changes?
http://www.amphora-research.com/
9
10. Dealing with change
• Build on other projects & integrate
• if it can be done within another project, then do so
• Keeps your life simpler and more focused, clear aims
• Those other projects can proceed according to the
rhythm and needs of the specific area
• Where possible employ loose coupling between
systems
• Message passing reduces implementation complexity
• SOAP/OLE/XML etc.
http://www.amphora-research.com/
10
11. Loosely-Coupled
Systems Keep You
Sane
http://www.amphora-research.com/
11
12. Project size/Justification/ROI
• Two approaches
• Either attempt to justify the whole ELN in one go
(“Big bang”)
• Or Phased
• Divide the project into phases
• Each involves a smaller investment (risk)
• With a corresponding payoff
• Move forward at a pace that’s comfortable for the
business
http://www.amphora-research.com/
12
13. Phased ELNs
• Historically this was very difficult to do with ELNs
• Record keeping
• Integration with other systems
• Needs to be designed into the project (& product)
from the start
• Patent evidence creation/preservation system
• Generic science-neutral platform (can often be your
existing IT infrastructure)
• Integrate/collaborate with discipline-specific software
• When you can do it, makes a huge difference
• Can start at a departmental level if needed
• Asking the business to take a small risk each time
http://www.amphora-research.com/
13
14. Creating & Preserving Evidence for Patents
• Specialized area with very specific (and unique)
considerations
• Best done separately from science-specific ELN
tools
• Hard to reconcile requirements of science and records
in one system
• You’ll often have a number of science-focused systems,
yet want only one Patent evidence system
• Run by a small group of people who know they’ll end
up in court
• Reduce risks & discovery costs
• You can have an “Electronic” notebook for the
scientist and still create a paper record
http://www.amphora-research.com/
14
15. Paper or Electronic?
• The choice often comes down to
• Comfort
• Practicality
• Cost
Paper
System Cost
Electronic
10 100 500 1000
http://www.amphora-research.com/
15
16. Long term access to ELN content
• Partly this is records management issue
• But there’s a heavy technical component
• What format you store your data in
• How you store your data
• Metadata
• You need to make Open Data formats part of your
purchasing requirements
http://www.amphora-research.com/
16
17. “Good” (open) file formats
• Publicly documented
• Legally unencumbered
• No patents, copyright concerns etc.
• Any patents or copyright must be in the public domain
• Ideally, self documenting (XML is a good start)
• Degrade gracefully
• If you can’t the data, at least you can see a picture
• Based on more open, primitive formats where
possible
• At least two implementations of readers, one of
which is Open Source
• Widely used (W3C or IETF standards are good
signs)
http://www.amphora-research.com/
17
18. Data formats for the long term
• Good
• For text: Plain ASCII, Unicode, HTML, possibly RTF
• For graphics: PNG, SVG
• For structured data: XML
• To preserve appearance: PDF
• Worry about
• Storing files in databases
• The database file format is probably undocumented
• Store objects on the file system and use the
database to point to them
• Anything that is proprietary - there’s no excuse for it,
and it dramatically increases your risk
• Binary files generally
• Mixing content in files (e.g. embedding XML in PDF)
• Proprietary digital signatures
http://www.amphora-research.com/
18
19. IP concerns & data formats
• Companies have always used Proprietary Data
Formats as a competitive weapon
• Companies are waking up to the use of IP tools
(licenses, patents, copyrights) to reinforce their
control over data formats
• Just because a format is published doesn’t mean it
is open
• The Microsoft Office XML formats are a particularly
bad example
• Right now it looks positively radioactive
• They’re being very careful what they say which
indicates to me they’re planning something
• http://www.groklaw.net/article.php?
story=20050330133833843
• (see section: 4. Dissecting Microsoft’s “Patent License”)
http://www.amphora-research.com/
19
20. Standards
• There are so many to choose from!
• Two key ways of generating “Standards”
• De Facto - dominant supplier/format
• De Jure - committee based
• Who gets to “bless” a standard?
• What makes a “good standard”
• De Jure process has difficulty keeping up with the real
world
• De Facto process has risk of lock-in
• Pragmatic approach
• Expect your suppliers to use open file formats
• If there is an acceptable standard, use it
• Make sure you are using the right kind of format for
each purpose
http://www.amphora-research.com/
20
21. Records considerations
• Not all the “Stuff” that’s generated during the
research process is the same
• Some of if needs to be kept for a long time
• Some is only useful for the moment
• Some will be benefit anyone
• Some is only really useful for the person who created it
(using specialized tools)
• Some material is suitable for long term
preservation, some isn’t
• You can go crazy getting into this in too much
detail
• But you also need to make sure your tools and
processes do allow you to manage the data/
records you’re creating
http://www.amphora-research.com/
21
22. Scalability
• Geographical space
• In wide area networks, latency becomes the most
noticeable issue
• Over multiple timezones, acceptable “Maintenance
Windows” disappear
• More data
• Number of data items
• Size of individual data items
• Number of users
• Larger populations generally mean more disparate
requirements
• How many people will get upset if the system goes
down
http://www.amphora-research.com/
22
23. Latency
• The science-specific “Deep” systems
• Often highly interactive
• Lots of round trips to the server for data etc.
• This is what makes them cool
• You can’t beat the speed of light (and network
hardware add significant latency)
• Therefore need to have a server close to the end user
• Federation will give you a single overview
• “Broad” systems have different usage
characteristics
• Very much like a normal web site, latency is much less
of a problem
• Very easy to have one system for worldwide use, even
for large companies
• Building large systems quite easy
http://www.amphora-research.com/
23
24. Web-based systems
• “Web based” has become a bit of a marketing tool
• Generally thin clients offer a lower TCO
• And hence IT like them
• In practice, most science-supporting ELN front
ends will be delivered as a “thick” client
• There’s a reason it’s called a browser
• Wrapping an OLE object in IE is still “thick”
• However, “Ajax” systems like GMail and Google
Maps show just what you can do with a web-based
system
• Web based systems should expose a sensbiel URL
interface
http://www.amphora-research.com/
24
25. How your network can help you
• There’s a whole load of useful network services
and Interfaces that large companies have
• Useful ones
• Single Sign On
• LDAP
• Printer/Fileserver etc.
• Security/Status monitoring etc.
• Beware of Central Digital Signature Infrastructure
• Mixing vulnerabilities - leaves you open to accidents
• Often not designed for long term use
http://www.amphora-research.com/
25
26. ELN architecture
• Major issues
• Diversity & Flexibility
• Project size/Justification/ROI
• Creating & Preserving Evidence for Patents
• Need for long term access to ELN contents
• Scale
• Web-based systems
• Trends
• Integration methods
• Open Source
• In the lab
• Ones to watch
http://www.amphora-research.com/
26
27. Integration methods
• RPC-like mechanisms
• Service Oriented Architecture
• SOAP
• REST
• Text file passing (files, email, etc.)
• URL launching
• Often overlooked, but very powerful
• What’s important
• Loose-coupling
• Open, lightweight systems
• Consistent, stable keys
• Stable URL (& domain) space
http://www.amphora-research.com/
27
28. Open Source
• Definitely one to watch
• Not the “Free” lunch you might think, but a
pragmatic business too
• Examples
• Linux
• Postgres
• JBoss,Tomcat etc.
• Ghostscript
• Open Source is part of everyone’s infrastructure
• Make sure you can run your systems on a variety of
platforms
http://www.amphora-research.com/
28
29. Why?
• Good for records
• Gives you top-to-bottom control
• Good for TCO
• We’re finding the Open Source infrastructure easier to
setup and reliable than proprietary alternatives
• Enables a better solution
• Transparent systems mean you can do things the
original designers didn't think of
• This is especially important for ELNs
http://www.amphora-research.com/
29
30. Data point
• This is just our experience offering people
alternatives for the server portion
• 2000 - “What's Open Source? What’s Linux?”
• 2001 - No way!
• 2002 - some pilots underway, some acceptance
• 2003 - majority of installations are Open Source
infrastructure
• 2005 - we’re wondering where Windows is
• We’re not abandoning proprietary infrastructure
• But it is clear that Open Source is getting serious
consideration
• Seeing a migration away from proprietary infrastructure
to Open Source
http://www.amphora-research.com/
30
31. In the lab
• ELN use in the lab is a hard problem
• Tablets, Laptops, Palmtops etc. doesn’t seem to be
working
• What does seem to work
• Small form-factor PCs on the bench
• Remote Desktop & Citrix
http://www.amphora-research.com/
31
32. Ones to watch
• Technology
• XML generally
• Web Services
• Bluetooth and WiFi
• RSS
• OpenOffice
• Jabber (as computer messaging and IM framework)
• Trends
• File format nasties
• DMCA and other copyright legislation
http://www.amphora-research.com/
32