Promiscuous patterns and perils in PubChem and the MLSCN
UNM Division of Biocomputing public web applications
1. UNM Division of Biocomputing public web applications:
Computational tools for cheminformatics and molecular discovery
Jeremy Yang, Division of Biocomputing, University of New Mexico, Albuqueruqe, New Mexico, USA
ChemAxon US User Group Meeting, Boston, September 13-15, 2010
UNM Division of Biocomputing
The UNM Division of Biocomputing is a multi-disciplinary research
● Public web applications group within the Dept. of Biochemistry and Molecular Biology, in the
UNM School of Medicine. Research areas include drug discovery,
Bio-
drug informatics, cheminformatics, bioinformatics, machine learning, Activity
●To benefit and engage the scientific QSAR, lead and probe identification. A major effort of the group is
providing biomolecular screening informatics support for the UNM Datamining
community Center for Molecular Discovery (UNMCMD), an NIH Roadmap
Molecular Libraries Programming screening center. As with UNMCMD, Associative
a long-term collaboration with Givaudan Flavors S&T has involved web
●Cheminformatics and biomolecular apps and contributed to their development. These activities, and
various other projects with collaborators who are geographically
Promiscuity
discovery diverse, have motivated extensive use of web apps. Pattern
● http://pasilla.health.unm.edu Division of Biocomputing Personnel: Learning
Tudor I. Oprea (chief) Oleg Ursu
Engine
● ●
●Cristian Bologa ●Gergely Zahoransky-Kohalmi
●Employs diverse set of commercial, open- ●Stephen Mathias
●Jerome Abear
●Jeremy Yang
source and community-based components
● Limited bandwidth; etiquette essential
drug-likeness2
Starting lineup of cool tools
The new public web app server offers a diverse set of tools. Some are quite simple but useful routine functions,
WWW reigns! Now superpowered. such as depicting molecules or converting among file formats. clustermols.cgi provides an enabling front-end
Over the roughly 15 year history of the world wide web (WWW) the prevalence for a successful command-line product (Mesa's Grouping Module). Others implement complex and
and usefulness of web applications has increased continuously. The “Web experimental methodology, such as iPHACE (integrative navigation of pharmacological space)i. and “model-
OS” paradigm is increasingly a reality, given tools such as GoogleDocs and free” drug-likeness2. Several scientific software packages have been employed including ChemAxon, Mesa,
Microsoft Office Web Apps, web services and cloud computing. Although web OpenEye, CDK, OpenBabel, SciTouch and others. Generic development tools include Java, Python, R, and
apps are not new per se, greatly enhanced capabilities are available via web Gnuplot. We anticipate that the menu of web apps will expand and evolve with new research and projects.
apps now due to continual and dramatic improvements in (1) network
bandwidth, (2) processor power, (3) web software development methods and
(4) online data resources. In short, each year we can in practice do things with
web apps we could not the year prior. What has not changed is the primary
motivation for adoption of web apps: Deploying functionality via web apps is smarts
efficient and reliable for users, developers and managers. In addition and
importantly (and this has changed somewhat as web UIs have become more filters
complex), by virtue of their common features, web apps are generally easy to
use. clustering3
iPHACE:
Integrated
Web apps in cheminformatics Pharmacology
In cheminformatics, the emergence of high quality programming toolkits, both
commercial and community-based, has facilitated web app development with Space
highly diverse aims and methods. Major software providers such as Daylight, 5
Accelrys, ChemAxon and Chemical Computing Group have embraced web Exploration
technologies, understanding their advantages and broad appeal. Web apps
can vary from large scale enterprise tools (e.g. database access) to special
purpose rapid prototypes for researching new algorithms. Diverse areas of
research have been addressed with web apps, from toxicology prediction to
3D macromolecular visualization to quantum chemistry.
depiction
property
Rapid prototyping enables research “WWW dead” definitional delusion
profiling For scientific research web apps provide powerful and enabling rapid
prototyping capabilities. Effective research often depends on testing Recent pronouncements that “The Web is dead” (Wired, Sept. 2010),
supplanted by managed apps via closed environments (e.g. iPad), seem
and interatively modifying hypotheses and algorithms, and
communication among collaborators. When developers can rapidly dubious and dependent on a narrow definition of web (HTTP). Tim Berners-
Lee's functional concept of the Semantic Web seems more enlightening, a
deploy a web app implementing a new algorithm, this can allow a
perhaps geographically dispersed team to experiment, collaborate, connected global network of information resources well suited to humans
and machines, using standardized communication protocols and semantics.
and accelerate progress.
Insofar as the Internet platform enables this vision, the Internet is the Web.
Science enabled via WWW
Web apps can also amount to a sort of scientific publishing. Whereas a journal
article on computational methodology can leave many questions, and just as a
picture can be “worth 1000 words”, arguably a functioning program might be
worth 1000 pictures. In addition, agencies funding public research (such as
NIH), where outcomes include computational methodology, quite reasonably and
wisely require that such methodology and software be disseminated in effective,
extensible, sustainable ways. Web apps developed with modern software WWW a great machine. WWW are We.
engineering standards can well achieve these goals. Recently our app Our current capability to deliver great computational power via the WWW owes debts of thanks to the
smartsfilter was used for a tuberculosis study1. communities of developers and users who have pushed all the component technologies forward. In addition to
the outstanding scientific software vendors and projects, the generic components which have enabled progress
are many and spectacularly significant. There are many, but these are a few of particular importance to the web:
Mozilla/Firefox, Apache HTTP server, Apache Tomcat, Java, Perl, Python, Ruby, PHP, JavaScript, CSS and
many contributed packages and libraries written for these major projects. Web standards have progressed with
the now typical semi-regulated, semi-democratic, two steps forward one step back fashion. Viewed in toto and in
retrospect, it is startling to consider the resulting global organization and interoperability of these many and
disparate efforts. Despite its flaws and ongoing evolution, the WWW can be viewed as one of the most complex,
most powerful, most inherently interactive machines ever built. As we pursue goals in computational and data-
intensive (“Fourth Paradigm”) research, engaging the WWW effectively will be essential.
References:
(1) "Analysis and hit filtering of a very large library of compounds screened against Mycobacterium tuberculosis", S.
Ekins, Molecular BioSystems, 2010 (in press).
(2) Ursu O, Oprea TI., “Model-Free Drug-Likeness from Fragments”, J. Chem. Inf. Model. Publication Date (Web):
format July 22, 2010.
conversion similarity (3) "Clustering in Bioinformatics and Drug Discovery", by John D. MacCuish and Norah E. MacCuish, Chapman &
Hall, 2010.
ROC curves (4) NE Shemetulskis, D Weininger, CJ Blankley, JJ Yang, C Humblet, "Stigmata: An Algorithm to Determine
ChemTattoo: colorized commonalities4 Structural Commonalities in Diverse Datasets", JCICS, 1996, Vol 36, No 4, pp 862-871.
(5) Garcia-Serna R, Ursu O, Oprea TI, Mestres J., “iPHACE: integrative navigation in pharmacological space”,
Bioinformatics, 2010, 26: 985-986.
Powered by: SciTouch
Mesa OpenEye OpenBabel CDK