Apidays New York 2024 - The value of a flexible API Management solution for O...
Scott Yara: The Sweet Sound of Big Data
1. Scott Yara
THE SWEET SOUND OF
BIG DATA
EMC Greenplum’s Scott Yara on the Planet-Altering
Power of Data Analytics and Collaboration
by Terry Brown
EMC+
2. That rumbling sound? Information. It’s a dull roar of taps and keystrokes and spin- GREENPLUM CHORUS:
ning disk drives, crowd noise, chaos, until people like EMC’s Scott Yara arrive. Then
SOCIAL MEETS BIG DATA—
the annoying buzz starts to sound more like music—a Big Data symphony of telling
answers to infinite questions that helps us manage a frantic, data-mad planet.
IN A BIG WAY
Asked to sum up Greenplum Chorus,
The maestro Yara is a co-founder of Greenplum, the database software company in
the Big Data collaboration software
San Mateo, California acquired by EMC in 2010. Greenplum specializes in large-scale
just announced by EMC Greenplum,
data warehousing, analytics, and now, with the introduction of Greenplum Chorus,
Scott Yara does not mince words.
enterprise-wide collaboration. Chorus, according to Yara, is extremely big news.
“I think what Java was to the Inter-
“We have found,” says Yara, now Greenplum’s Senior Vice President for Products,
net,” says Yara, “Chorus will be to
“that as companies try to gain insight from their data, people and process challenges
Big Data.”
are just as vexing as the infrastructure ones. Chorus is really the first system that
focuses on building a collaboration and social environment for doing the work of Yara, EMC Greenplum’s Senior Vice
data science. It’s really the first of its kind. President for Products, says that Cho-
rus completes the puzzle of how to
“Chorus is a breakthrough in what typically has been a pretty siloed and scattered
analyze gigantic volumes of disparate
process. Now we can integrate all the tasks that a company does to produce an insight
data. To master Big Data you need
from data and bring it all together in a single environment.”
fast processing, sophisticated analyt-
In other words, for the first time the table is set to take full advantage of Big Data— ics, and what wasn’t possible until
the analytics platform to work the data and the collaboration environment to speed Chorus—the ability to collaborate.
decisions about it. So does that mean that Big Data is ready to do what the pundits
“Chorus provides an opportunity for
say—change the world? Yara weighs his words carefully when asked about that—he’s
a data scientist to see the data as-
reluctant (and says so) to boil his viewpoint into sound bites. But he does concede
sets across a company with a simple
the presence of an inexorable trend.
search interface. It gives them the
“We’ve been at this for awhile,” says Yara. “These revolutions take a long time. But freedom to manipulate and analyze
today there really isn’t an organization on the planet that’s not thinking very deeply that data as they see fit. You’ll have
about using data, and that just wasn’t the case three or four years ago. It’s moving. your own private workspace and
And that’s exciting to see.” sandbox where you can manipulate
that data as you see fit. And then
•••
you have workflow and collaboration
“Big Data” is just what it sounds like—huge volumes of data generated by anything tools to share that data or insight
and anyone who works or plays or functions on line and in computer networks. Smart or process back to the company, in
phones, laptops, PCs, mainframes. Social networks, internet shopping, online bank- a very agile way.
ing, surveillance systems, pavement sensors, call records, health care information,
It’s data at your fingertips, light-speed
and so on and so on. Of course data has always been big, in the context of existing
analytics for an extended, enterprise-
technology—to Ebenezer Scrooge, Big Data was a tall shelf of dusty ledger books.
wide team. “We wanted to make us-
The recent need to name it in capital letters stems from more than the deluge of elec- ing data inside the enterprise a lot
tronic data and the inability of conventional systems to make sense of it. It also names more familiar and friendly,” says Yara.
the fiendishly clever new processing and analytics technologies built by people like “And so providing social collaboration
Yara and his Greenplum colleagues. That’s what led EMC, the leader in information interfaces using common streams,
infrastructure, to Greenplum—the need to offer ways to analyze the data stored and user profiles, the opportunity to share
managed on that infrastructure. things, hopefully that lends itself to
an organizational dynamic that is a
lot more natural.”
3. “For the last ten years, as the web has exploded and expanded around us,” says Yara,
“the idea of answering questions about customers or buying patterns or whatever
by looking through all this tremendous variety of information was technically impos-
sible. So Greenplum developed a way to support multiple data types, using a parallel
scale-out computing model that mirrored the internet, with analytics software support
that has made some of these really hard things much easier.”
Yara grew up in Minnesota, outside Minneapolis, matriculated to UCLA, studied com-
puter science, and left early to join an internet startup called Sandpiper Networks,
an early player in content delivery systems. Internet performance was spotty in the
early ‘90s, and Sandpiper built caching systems that sped the performance of big
websites. Sandpiper merged with the internet services company Digital Island, went
public, and was sold to the British telecom Cable & Wireless. “I think what Java was to
In 2000 Yara started a company called Metapa to capture and analyze information on
the Web. In 2002 Metapa merged with a similar startup, called Didera, whose founder
the Internet, Chorus will be
Luke Lonergan became Yara’s partner in the new venture they called Greenplum. to Big Data.”
(Where did the name come from? As Yara and Lonergan cast about for a name, one SCOTT YARA
of their employees asked his young daughter for her advice. She suggested “Apple.” EMC GREENPLUM
Told that name was taken, she offered up Greenplum, which stuck. Kids bring a lot
of naming help to the Big Data world—the developer of Hadoop, the open source
software that Greenplum uses to analyze unstructured data, named his product after
his son’s toy elephant.)
“It was a natural evolution for EMC as a business,” says Yara. “It’s a huge opportunity
to provide analytic capabilities to customers once they’ve stored all that Big Data on
EMC systems. Here’s the thing: Companies are starting to realize, and consumers are
too, that their most valuable asset isn’t necessarily the intellectual property they’ve
built, but rather the data that they generate as a consequence of their products, so
there is a very aggressive movement to monetize or gain value from that data.
“So what we’re seeing is an economy being built around the data that’s being gener-
ated across all industries and how to unlock the value of that data.”
•••
The first movers in the Big Data world were companies with Internet-enabled busi-
nesses—search engines, online retailers, social networking sites. Now other orga-
nizations—government, universities, offline companies—are learning the potential
of Big Data analytics, and the technology is ready to spread to every corner of the
marketplace because compute and storage costs have dropped dramatically. Now
companies can not only afford to gather and store information—they can also afford
to analyze it.
Now Google can detect regional flu outbreaks a week to ten days faster than the Cen-
ters for Disease Control and Prevention by monitoring increased search term activity
for phrases associated with flu systems. Cities are analyzing traffic data in real time
and making decisions to manage congestion before it becomes a story in tomorrow’s
newspaper. Smart electric grids are helping homeowners monitor and manage their
4. power use. The Federal government’s USAspending.gov website tracks government
spending and charts the data based on queries by anybody who visits the site.
Big Data is woven into the physical fabric of our lives. The “Internet of Things”—the
physical assets that become part of the information infrastructure—is changing how
companies create business models and people live their lives, giving systems and
people the ability to capture, compute, communicate, and collaborate around in-
formation. Embedded with sensors, actuators, and communications capabilities,
such assets or “things” will soon be able to absorb and transmit information on a
massive scale and, in some cases, to adapt and react to changes in the environment
automatically.
So how will lives be changed? Ask Scott Yara.
“Let’s say you’re a big retail bank,” he says. “You might have 60 million customers
that use a huge number of different products—checking account customers, home
loan customers, credit card customers. Some communicate through the website.
Some complain on Twitter. As the business owner, you want know who your top,
most loyal customers are, and what kinds of products they’re using and not using.
And what makes a great customer?”
Big Data, says Yara, lets a business owner sift through all the data available, answer
thorny questions, and know how to create a business that has more loyal customers
and keeps the bad ones away.
“Let’s say you’re a young woman who lives in a condo in the suburbs and works in an
“The adoption of Big Data
office downtown,” Yara says. “When it’s time to go to work, you instruct your condo technology in the
when to wash the dishes, when to start a load of laundry, when to open or shut the
windows depending on the weather. “ enterprise will be twice as
Then you’re at the office, Yara says. Your washing machine sends you a text that says fast and twice as big as
it’s out of detergent and can’t do the load you requested. The text includes a coupon
the virtualization cloud
for detergent at the store you where you most often shop (based on a credit card
spending pattern algorithm). It beeps you when you’re near the store (based on geo computing market.”
location data and smart car sensors) so you don’t forget. Your refrigerator texts to SCOTT YARA
say you’re low on lettuce so if you plan to have a salad with dinner tonight (based EMC GREENPLUM
on the menu you programmed into the frig) you should pick up the veggie when you
get the laundry detergent. Maybe there’s there is a two-for-one coupon included for
your favorite salad dressing.
Or maybe you’re planning a trip or wondering about your bank balance or thinking
about phone services. When you call up the airline or your bank and or your telephone
company you won’t be irritated to learn they don’t have your latest purchase history or
account details available. Those simple things will start to become much more com-
monplace and over time the services themselves will seem more personalized to you.
“That’s a big part of it,” says Yara, “but Big Data also represents the services and
business that we provide getting safer and more trustworthy because they can use
this information to better trap the bad guys. So the end result of Big Data is that
5. hopefully things start to naturally work more efficiently, more securely, and more
personally in a way that feels very natural.”
•••
Of course for some that metaphorical drone of information background noise feels a
bit creepy—the sense of systems behind every wall, in every purse or pants pocket,
on every car dashboard, overhead, in the ground, continuously gathering and for- “Big Data is about being
warding and analyzing information on everything we do. Yara is aware, but not wary.
able to take all this
“I think with any new technology,” he says, “there will always be concerns over privacy
or security or fraud. People had the same fears about the internet when it first ap- information, from an
peared —the idea of putting your credit card number online was pretty scary to a lot of incredible variety of
people back in ‘90s. And those fears are understandable. A number of companies are
building technologies to help make sure that data analysis has some level of encryp- sources, and answer
tion and access control and security. We will see a need for more awareness around
questions we couldn’t
the ethics or protection of individual rights, and there are already firms tackling these
tough issues—the Electronic Frontier Foundation, Creative Commons, and others.” answer before.”
The fact is that the Big Data revolution is here to stay. “It’s only going to get bigger,” SCOTT YARA
EMC GREENPLUM
Yara wrote in the Huffington Post last year. “There’s no turning back the tide, no going
back to an era when we knew less.”
How big will Big Data be? “We expect Data Science and data analytics to be perva-
sive,” says Yara, “with far broader reach and impact even than previous-generation
computational science. Big-data computing is perhaps the biggest innovation in
computing in the last decade. We have only begun to see its potential to collect,
organize, and process data in all walks of life.
“My simple guess is that the adoption of Big Data technology in the enterprise will
be twice as fast and twice as big as the virtualization cloud computing market has
been and that’s because while cloud computing is about the bottom line, with more
efficient and optimized infrastructure, Big Data is really about the top line, because the
information itself helps you generate more revenues. It helps you get more profitable
and so I think that the enthusiasm we see growing around Big Data is accelerating
at a pace that is faster than cloud computing itself.”
For John and Jane Doe, Yara says, Big Data works because it can capture who we are
as individuals.
“I think that in the best ways,” says Yara, “Big Data is not something new. It’s an
amplifier for existing human behavior and so when you are looking for things that
you like, whether it’s an individual or music or places to eat or someone to manage
your retirement savings, you have a set of personal preferences that the technology
knows and protects. The serendipity comes when the options available are much more
closely correlated with the things that you already like—it’s an extension of yourself.”
That sounds pretty good.