The Blue Obelisk has brought together the computational chemistry community and those who are passionate about Open Chemistry and realizing the promise of Open Data, Open Standards, and Open Software (ODOSOS); the three pillars the group promotes. We will present current work that has taken place over the past five years, which is inspired by these pillars, and present plans for future work.
The group is actively engaged in multiple open source projects that rely on and promote open standards and open data including: Avogadro (a powerful 3D molecular editor), OpenQube (a library for quantum mechanics), ChemData (a tool for large-scale chemical data analysis and visualization), Chemkit (a library for cheminformatics), MoleQueue (a HPC queue manager), and VTK (a library for scientific data visualization). The Open Chemistry project benefits greatly from the activities of the Blue Obelisk and makes use of several prominent open-source projects including Qt and MongoDB.
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
1. Open Chemistry: Realizing Open Data, Open Standards and Open Source
Marcus D. Hanwell, Kyle Lutz, David Lonie, Chris Harris, and David Cole
Website: http://openchemistry.org/ Email: marcus.hanwell@kitware.com, kyle.lutz@kitware.com
Scientific Computing, Kitware, Inc, 28 Corporate Drive, Clifton Park, NY 12065.
Avogadro Open Chemistry Chemical Data Explorer
The Avogadro project is a cross-platform, open-source approach to building chemical The Open Chemistry project is developing a suite of applications and support libraries The Chemical Data Explorer is an cross-platform, open-source application that
structures. It uses external simulation packages in addition to integrated analysis and to improve the workflow in computational chemistry, biology, materials science and builds on the capabilities of the Visualization Toolkit, Qt and MongoDB. It can
visualization routines. The work presented here illustrates a workflow for quantum related areas. A set of open, connected components that can tackle small problems connect to a local or remote database, ingest new data from various sources and
mechanical calculations, allowing the preparation of chemical structures, rough on the desktop, and big research projects requiring significant time on the world’s top make that data semantically rich. It can apply informatics techniques to the data
optimization, and subsequent calculation of electron density isosurfaces, molecular supercomputers. it contains to search for structures with particular properties. Work is ongoing to
orbitals, etc. more tightly integrate computational job storage and search.
Log File Input File
Simulation
Results Informatics Job Submission
HPC integration
Local Cloud
Supercomputer
Figure 5: The workflow that the Open Chemistry components are being developed for.
Figure 1: Avogadro application (left), ray-traced molecule (center) and the periodic table widget (right). OpenQube
Avogadro allows the user to prepare jobs for quantum packages, such as NWChem, OpenQube is a small, open-source C++ library that reads key quantum data from Figure 3: The user interface showing a query and structures (top-left), a scatter plot matrix (top-right), scatter
GAMESS, Gaussian and Q-Chem. Due to the plugin-based nature of the Avogadro calculations produced by codes such as NWChem, GAMESS and Gaussian. It can plot with tooltip (bottom-left), and K-means clustering (bottom-right).
project, many specialized functions can be added for a large range of applications, read in basis sets, eigenvectors and density matrices, and calculate the magnitude
such as molecular docking, surface modeling and electronic structure. of the molecular orbitals and electron density on regularly-spaced grids. The data
produced can be used for further analysis and visualization of electronic structure.
Visualization Toolkit and ParaView
The Visualization Toolkit (VTK) is an open-source, C++ toolkit for 2D and
MoleQueue Chemkit 3D graphics, volume rendering, image processing, visualization and modeling.
The MoleQueue application provides a graphical interface that integrates high- Development began in 1993, and it now has a large community of developers
Chemkit is an open-source, C++ library for molecular modeling, cheminformatics,
performance computing (HPC) resources on the desktop. It offers a seamless distributed around the world in a diverse set of fields. VTK processes data using
and molecular visualization. It features a modular, plugin-based architecture and
integration layer for applications, such as Avogadro, to submit jobs to local and a data flow graph (pipeline) in which each algorithm takes zero or more inputs
includes over 40 plugins that implement 15 file formats, 6 line formats, 4 force-fields,
remote computational resources. Job lifetime is managed by MoleQueue, and results and produces zero or more outputs. VTK is scalable to large data because it has
2 partial charge models, 2 aromaticity models, 8 atom typers and 30 molecular
can be opened in any external program. distributed algorithms that use MPI to execute on large computing clusters.
descriptors. In addition, Chemkit includes an integrated visualization library built
on OpenGL/Qt, with Python bindings for easy scripting.
Figure 4: Volume rendered molecular orbital with sliced contour (left), and library dependency graph (right).
Figure 6: Cartoon rendering of protein (left), surface rendering (center), and molecule rendering (right).
ParaView is an open-source, cross-platform data analysis and visualization
application. It is one of the flagship open-source projects developed by Kitware,
Figure 2: The MoleQueue program configuration dialog for a PBS remote system.
Software Process building on VTK and Qt to provide a client-server application that allows users
• Graphical configuration of queues and programs These projects are open-source, targeting multiple platforms and architectures. A to quickly build visualizations to analyze their data. ParaView was developed to
quality-inducing software process is employed using best-of-breed technologies such analyze extremely large data sets using distributed memory computing resources.
• Support for Sun Grid Engine, PBS and running calculations locally
as Git for distributed version control, Gerrit for code review, CMake for cross- It can be used interactively with the cross-platform GUI, or scripted from Python.
• JSON-RPC protocol for interprocess communication over local sockets or ZeroMQ VTK and ParaView are being augmented with additional functionality for chemistry
platform building, CTest for unit/regression testing and CDash for software quality
• C++ and Python client libraries feedback. Most code is BSD licensed, and designed with reuse in mind. through projects such as the Google Summer of Code and Open Chemistry.