1. NOVEMBER
30,
2011
Why
We
Chose
'Open
Science'
To
accelerate
research
breakthroughs
on
brain
diseases,
the
Allen
Institute
puts
all
its
data
online
for
use
without
fees.
By PAUL ALLEN
The Allen Institute for Brain Science in Seattle grew out of a simple question I posed in 2002 to a
constellation of top people in the field: What's the most useful thing we could do to propel neuroscience
forward? The consensus became our inaugural project—a comprehensive, molecular-level, three-
dimensional map of the mouse brain to show precisely where every gene is active, or "expressed." It was
the first step on a long road to understand how genes function in the human brain, knowledge that will
point to ways to better diagnose and treat brain ailments.
A crucial aspect to this project—and others the Allen Institute has pursued over the last eight years—is an
"open science" research model. Early on, we considered charging commercial users for access to our online
data. From a strictly financial standpoint, it made sense to reap front-end fees and, down the line,
intellectual property royalties. The revenue could cover the high costs of maintenance and development to
keep the resource current and useful.
But our mission was to spark breakthroughs, and we didn't want to exclude underfunded neuroscientists
who just might be the ones to make the next leap. And so we made all of our data free, with no registration
required. The Institute would have no gatekeeper. Our terms-of-use agreement is about 10% as long as the
one governing iTunes.
Our facility is neither the first nor the last to use a shared database to embrace "open science" and reject the
competitive, single-lab R&D paradigm. Traditional research incentives—where journal publications are the
coin of the realm—tend to discourage vital sharing.
In 1982, even before the dawn of the Internet, a consortium of government agencies established the open
access GenBank. Maintained by a division of the National Institutes of Health (NIH), GenBank now houses
the sequence data from the Human Genome Project, the inspiration for our brain mapping.
Getty Images
2. In recent years the NIH has sponsored other data-sharing portals, including the Alzheimer's Disease
Neuroimaging Initiative and the Neuroscience Information Framework. Private nonprofits like the Pistoia
Alliance and Sage Bionetworks are curating their own open-source repositories.
But the Allen Institute remains distinct in conducting industrial-scale big science that is fundamentally
collaborative. Internally, our team of scientists and support staff works together to meet the time lines and
milestones that frame each large project. The team released the initial data set from a ground-breaking
human whole-brain atlas last year, and it is now midstream on a project to define the circuitry between
neurons and how it affects human behavior.
Most important, we generate data for the purpose of sharing it. Since opening shop in 2003, we've had 23
public releases, or about three per year. We don't wait to analyze our raw data and publish in the literature.
We pour it onto the public website as soon as it passes our quality control checks. Our goal is to speed
others' discoveries as much as to springboard our own future research.
The databases currently provide tens of millions of high-resolution images. The initial mouse brain atlas
alone involved 600 terabytes of data, or 600 trillion bytes, more than half the total content of the Internet
when we started. Since data of this volume would be of little use without effective search and navigation
tools, the Institute developed a free online viewing application as well as the downloadable Brain Explorer
3D viewer, which illuminates how expressed genes are distributed throughout the brain.
Open science is a long-term and pricey proposition. It demands consistent curating, maintenance and
updating of databases, and regular software and hardware upgrades. The institute offers online video
tutorials on a YouTube channel and in a tutorial library. For those seeking in-person walk-throughs or
forums, it hosts training workshops and user group sessions in several areas around the country each year.
These services, too, are free of charge.
It is a modest cost that is paying off as the scientific community embraces the open access model. In
October, the institute's suite of databases received more than 45,000 visits, from six continents and from
research organizations of every stripe: universities, government laboratories, independent institutes and
biotech and pharmaceutical companies. Institute brain atlases are accelerating research on the underlying
biology of a broad range of diseases, from Alzheimer's and Parkinson's to autism and schizophrenia.
Growing numbers of college educators, from UCLA to the Radboud University Nijmegen in the
Netherlands, are building curricular modules around our online resources.
What I've concluded is that foundations and other private funders who support scientific research also can
help promote wider sharing of scientific data. Before funders write a check to a university, they should ask
about the researcher's policies and track record on sharing.
On the federal level, the NIH now has such strong policies on sharing data. But I'd like to see the agency do
even more to put its funding where its directives are. I propose that the NIH—along with the National
Science Foundation and the U.S. Department of Education—direct funding into grant awards for
management and curation of existing research data of special value.
That would siphon some money for traditional research grants for new work. But I think we'd get more
bang for our buck by making more data more useful to more scientists—and, by extension, to the world
community that will benefit from their work.
Mr. Allen, the co-founder of Microsoft with Bill Gates, launched the nonprofit Allen Institute for Brain
Science in 2003.