SlideShare a Scribd company logo
1 of 106
Download to read offline
+

Sunday, July 24, 2011
ajackson
                              @
                   skylineinnovations.com


Sunday, July 24, 2011
a tale of rapid
                          prototyping, data
                          warehousing, solar
                        power, an architecture
                          designed for data
                          analysis at ā€œscaleā€
                           ...and arduinos!
Sunday, July 24, 2011

So hereā€™s what iā€™d like to talk about: Who we are, how we got started, and most importantly,
how weā€™ve been able to use MongoDB to help us. Weā€™re not a traditional startup -- and while
i know that this is not a ā€œstartupsā€ talk, but a Mongo one, iā€™d like to show how Mongoā€™s
ļ¬‚exible nature really helped us as a business, and how Mongo speciļ¬cally has been a good
choice for us as we build some of our tools. Here are some themes:
Scaling



Sunday, July 24, 2011

Mongo has come to have a pretty strong association with the word ā€œscaling.ā€

Scaling is a word we throw around a lot, and it almost always means ā€œsoftware performance,
as inputs grow by orders of magnitude.ā€

But scaling also means performance as the variety of inputs increases. Iā€™d argue that itā€™s
scaling to go from 10 users to 10,000, and itā€™s also scaling to go from ten ā€˜kindsā€™ of input to
a hundred.

Thereā€™s another word for this.
Scaling
                                Flexibility


Sunday, July 24, 2011

Particularly when you scale in the real world, you start to ļ¬nd that itā€™s complicated and messy
and entropic in ways that software isnā€™t always equipped to handle. So for us, when we say
ā€œmongo helps us scaleā€, we donā€™t necessarily mean scaling to petabytes of data. Weā€™ll come
back to them as well.
Business-ļ¬rst
                        development


Sunday, July 24, 2011

This generally means ļ¬‚exibile, lightweight processes. Things that become ļ¬xed &
unchangable quickly become obsolete and sad :ā€™(
When Does
                ā€œContextā€
               become ā€œYak
                 Shavingā€?


Sunday, July 24, 2011

When i read new things or hear about new stuff, Iā€™m always trying to put it in context. So,
sometimes i put too much context in my talks :( To avoid it, I sometimes go a little too fast
over the context that *is* important. So please stop me to ask questions! Also, the problem
domain here is a little different than what we might be used to, so bear with me as we go into
plumbing & construction.
Preliminaries



Sunday, July 24, 2011
Est. 8/2009
Sunday, July 24, 2011
Project Development
                                 +
                             Technology


Sunday, July 24, 2011
ā€œProject Developmentā€
Sunday, July 24, 2011
ļ¬nance, develop, and operate
                 renewable energy and efļ¬ciency
                   installations, for measurable,
                        guaranteed savings.



Sunday, July 24, 2011
ļ¬nance, develop, and
                    operate renewable energy
                   and efļ¬ciency installations, for
                   measurable, guaranteed savings.



Sunday, July 24, 2011

Weā€™ll pay to put stuff on your roof, and weā€™ll keep it at its maximally awesome.
ļ¬nance, develop, and operate
                    renewable energy and
                  efļ¬ciency installations, for
                  measurable, guaranteed savings.



Sunday, July 24, 2011

Right now, this means solar thermal, more efficient lighting retroļ¬ts, and maybe HVAC.
ļ¬nance, develop, and operate
                  renewable energy and efļ¬ciency
                  installations, for measurable,
                      guaranteed savings.



Sunday, July 24, 2011

So, hereā€™s the interesting part. Since we put stuff on your roof for free, we need to get that
money back. What we do is, weā€™ll charge you for the energy that it saved you, but, hereā€™s the
twist. Other companies have done similar things, where they say ā€œweā€™ll pay for a system/
retroļ¬t/whatever, and youā€™ll agree to pay us an arbitrary number, and we say youā€™ll get
savings, but you wonā€™t actually be able to tell, really.ā€ That always seemed sketchy to us. So,
we actually measure the performance of this stuff, collect the data, and guarantee that you
save money.
(not webapps)



Sunday, July 24, 2011
Topics not covered:



Sunday, July 24, 2011
ā€¢ Why solar thermal?
                        ā€¢ Why hasnā€™t anyone else done this before?
                        ā€¢ Pivots? Iterations?
                        ā€¢ Whatā€™s the market size?
                        ā€¢ Funding? Capital structures?
                        ā€¢ Wait, how do you guys make money?

Sunday, July 24, 2011

Oh, right, this isnā€™t a startup talk. But feel free to ask me these later!
Solar Thermal in Five
                               Minutes
                            ( mongo next, i promise! )




Sunday, July 24, 2011
Municipal
                           =>
                          Roof
                           =>
                          Tank
                           =>
                        Customer
Sunday, July 24, 2011
Relevant Data to Track



Sunday, July 24, 2011
Temperatures
                        (about a dozen)


Sunday, July 24, 2011
Flow Rates
                        (at least two)


Sunday, July 24, 2011
Parallel data streams
                          (hopefully many)


Sunday, July 24, 2011

e.g., weather data, insolation data. Itā€™d be nice if we didnā€™t have to collect it all ourselves.
how much data?
                        20 data points @ 4 bytes
                        1 minute intervals
                        at 1000 projects (I wish!)
                        for 10 years
                        80 * 60 * 24 * 365 * 10 * 1000 = 400 GB?
                        ...not much, really, ā€œin the rawā€


Sunday, July 24, 2011

unfortunately, we canā€™t really store it with maximal efficiency, because of things like
timestamps, metadata, etc., but still.
Sunday, July 24, 2011

I hope this provides enough context on the business problems weā€™re trying to solve. It looks
like weā€™ll need a data pipeline, and weā€™ll need one fast.

Weā€™ve got data that weā€™ll need to use to build, monitor, and monetize these energy
technologies. Having worked at other smart grid companies before, Iā€™ve seen some good
data pipelines and some bad data pipelines. Iā€™d like to build a good one. The less stuff i
have to build, the better.
Sunday, July 24, 2011

As i do some research, i ļ¬nd that a lot of these data pipelines have a few well-deļ¬ned areas
of responsibility.
Acquisition,
                         Storage,
                          Search,
                         Retrieval,
                         Analytics.



Sunday, July 24, 2011

These should be self explanatory. Whatā€™s interesting is that not only are most of the end-
users of the system analysts, interested in analyzing, but that most systems seem to be
designed for the other functionality. More importantly, theyā€™re not very well decoupled: by
the time the analysts get to start building tools, the design decisions from the beginning are
inextricable from the systems that came before.
Acquisition,
                         Storage,
                          Search,
                         Retrieval,
                                                }       Designed for these



                         Analytics.            <=           Users are here




Sunday, July 24, 2011

These should be self explanatory. Whatā€™s interesting is that not only are most of the end-
users of the system analysts, interested in analyzing, but that most systems seem to be
designed for the other functionality. More importantly, theyā€™re not very well decoupled: by
the time the analysts get to start building tools, the design decisions from the beginning are
inextricable from the systems that came before.
Acquisition,
                         Storage,
                          Search,
                         Retrieval,
                         Analytics.



Sunday, July 24, 2011

These should be self explanatory. Whatā€™s interesting is that not only are most of the end-
users of the system analysts, interested in analyzing, but that most systems seem to be
designed for the other functionality. More importantly, theyā€™re not very well decoupled: by
the time the analysts get to start building tools, the design decisions from the beginning are
inextricable from the systems that came before.

Itā€™s important to remember that, while you canā€™t get good analytics without the other stuff,
the analytics is where almost all of the value is! Search & retrieval are approaching ā€œsolvedā€
Acquisition,
                         Storage,
                          Search,
                         Retrieval,
                                                }       Designed for these



                         Analytics.             <=     Users are here
                                                Business value is here!




Sunday, July 24, 2011

These should be self explanatory. Whatā€™s interesting is that not only are most of the end-
users of the system analysts, interested in analyzing, but that most systems seem to be
designed for the other functionality. More importantly, theyā€™re not very well decoupled: by
the time the analysts get to start building tools, the design decisions from the beginning are
inextricable from the systems that came before.

Itā€™s important to remember that, while you canā€™t get good analytics without the other stuff,
the analytics is where almost all of the value is! Search & retrieval are approaching ā€œsolvedā€
Sunday, July 24, 2011

so, hereā€™s how i started thinking about things. This is a design diagram from the early days
of the company.
Sunday, July 24, 2011

easy, python, no problem. There are some interesting topics here, but theyā€™re not mongoDB
related. I was pretty sure i knew how to build this part, and i was pretty sure i knew what the
data would look like.
Sunday, July 24, 2011

This part was also easy -- e-mail reports, csvs, maybe some fancy graphs, possibly some
light webapps for internal use. These would be dictated by business goals ļ¬rst, but the
technological questions were straightforward.
Sunday, July 24, 2011

Here was the real question.

What would be some use cases of an analyst having a good experience look like? What would
they expect the tools to do?
Now we can think
                        about what the data
                             looks like


Sunday, July 24, 2011

So, letā€™s think about what this data looks like, how itā€™s structured and what it is. Then, after
that, we can look at what the best ways to organize it for future usefulness.
Time series?
Time,municipal water in T,solar heated water out T,solar tank bottom taped to side,solar tank top taped to side,array in/out,array in/out,tank room ambient t,array supply temperature,array return
temperature,solar energy sensor,customer ļ¬‚ow meter,customer OIML btu meter,solar collector array ļ¬‚ow meter,solar collector array OIML btu meter,Cycle Count
Tue Mar 9 23:01:44 2010,14.7627064834,53.7822899383,12.1642527206,51.1436001456,6.40476190476,8.9582972583,22.6857033228,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333458
Tue Mar 9 23:02:44 2010,14.958038343,53.764889193,12.1642527206,51.0925345058,6.40476190476,8.85184138407,22.5716100982,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462
Tue Mar 9 23:03:45 2010,15.1145934976,53.6986641192,12.1642527206,50.8692901812,6.40476190476,8.78519002979,22.5673674246,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462
Tue Mar 9 23:04:45 2010,15.2512207824,53.5955190752,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333468
Tue Mar 9 23:05:45 2010,15.3690229715,53.5534492867,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333471
Tue Mar 9 23:06:46 2010,15.5253261193,53.5534492867,12.1642527206,50.8658228816,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333472
Tue Mar 9 23:07:46 2010,15.6676270005,53.5534492867,12.1642527206,50.9177829276,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.293277114,0.0,0.0,0.0,0.0,0.0,333472
Tue Mar 9 23:08:47 2010,15.7915083121,53.4761516976,12.1642527206,50.8398031014,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.1826467404,0.0,0.0,0.0,0.0,0.0,333477
Tue Mar 9 23:09:47 2010,15.9763741003,53.693428918,12.1642527206,50.7859446809,6.40476190476,8.78519002979,22.5461357574,24.0728390462,22.1782915595,0.0,1.0,0.0,0.0,0.0,333581
Tue Mar 9 23:10:47 2010,16.1650984572,54.0547534088,12.1642527206,50.725,6.40476190476,8.78519002979,22.4544906773,24.0728390462,22.1782915595,0.0,0.0,0.0,0.0,0.0,333614




Sunday, July 24, 2011
TIME SERIES
                           DATA


Sunday, July 24, 2011

So what is time series data?
Features, Over Time




Sunday, July 24, 2011

multi-dimensional features. Whatā€™s fun in a business like this is that weā€™re not really sure
what the features we study will be. -- Flexibility callout
Features, Over Time

               Thing
       (Feature vector, v)




                                              Time
                                                 (t)


Sunday, July 24, 2011

multi-dimensional features. Whatā€™s fun in a business like this is that weā€™re not really sure
what the features we study will be. -- Flexibility callout
Features, Over Time

               Thing
       (Feature vector, v)




                                              Time
                                                 (t)


Sunday, July 24, 2011

multi-dimensional features. Whatā€™s fun in a business like this is that weā€™re not really sure
what the features we study will be. -- Flexibility callout
Sunday, July 24, 2011

A couple of ideas:
sampling rates. ā€œregularityā€. ā€œcompletenessā€
analog vs. digital
instantaneous vs. cumulative (tradeoffs)
tn              tn+1


Sunday, July 24, 2011

Finding known interesting ranges (deļ¬nitely the most common)
tn              tn+1


Sunday, July 24, 2011

Finding known interesting ranges (deļ¬nitely the most common)
t   tā€™              etc.
Sunday, July 24, 2011

Using features to ļ¬nd interesting ranges.

These two ways to look for things should inform our design decisions.
y




                        t   tā€™              etc.
Sunday, July 24, 2011

Using features to ļ¬nd interesting ranges.

These two ways to look for things should inform our design decisions.
y
                                                                 Thresholds
       yā€™




                        t   tā€™              etc.
Sunday, July 24, 2011

Using features to ļ¬nd interesting ranges.

These two ways to look for things should inform our design decisions.
y
                                                                 Thresholds
       yā€™




                        t   tā€™              etc.
Sunday, July 24, 2011

Using features to ļ¬nd interesting ranges.

These two ways to look for things should inform our design decisions.
(more complicated stuff
                   can be thought of as
                    transformations...)


Sunday, July 24, 2011

e.g., frequency analysis, wavelets, whatever.
Sunday, July 24, 2011

At this point, I go off and do a bunch of research on existing technologies. I really hate
reinventing the wheel, and we really donā€™t have the manpower.
Time series speciļ¬c tools



                        Scientiļ¬c tools & libraries



                        Traditional data-warehousing approaches



Sunday, July 24, 2011

So, these were some of the options i looked at. I want to quickly point out why i eliminated
the ļ¬rst two classes of tools.
Time series speciļ¬c tools

                           RRDtool -- Round Robin Database




Sunday, July 24, 2011

Thereā€™s really surprisingly few of these. One of the best is the RRDtool. Itā€™s pretty sweet, and
i highly recommend it. Unfortunately, itā€™s really designed for applications that are highly
regular, and that are already pretty digital, for instance, sampling latencies, or temperatures
in a datacenter. Itā€™s not really good for unreliable sensors, nor is it really designed for long
term persistance. It also has a really high lock-in, with legacy data formats, etc. Donā€™t get
me wrong, itā€™s totally rad, but i didnā€™t think it was for us.
Scientiļ¬c tools & libraries

                           e.g., PyTables




Sunday, July 24, 2011

Pretty cool, but not many of these were mature & ready for primetime. Some that were, like
PyTables, didnā€™t really match our business use-case.
Traditional data-warehousing approaches



Sunday, July 24, 2011

So, these were some of the options i looked at. I want to quickly point out why i eliminated
the ļ¬rst two classes of tools. [...]. That leaves us with the traditional approaches. This
represents a pretty well established ļ¬eld, but very few of the tools are free, lightweight, and
mature.
Enterprise buzzwords
                           (Just google for OLAP)




Sunday, July 24, 2011



But the biggest idea i learned is that most data warehousing revolves around the idea of a
ā€œfact tableā€. They call it a ā€œmultidimensional OLAP cubeā€, but basically it exists as a totally
denormalized SQL table.
ā€œMeasuresā€
                          and their
                        ā€œDimensionsā€


Sunday, July 24, 2011

(or facts)
pretty neat!
Sunday, July 24, 2011
ā€œhow elegant!ā€

Sunday, July 24, 2011
in practice...



Sunday, July 24, 2011
Sunday, July 24, 2011
(from ā€œHow to Build OLAP Application Using Mondrian
                                + XMLA + SpagoBIā€)
Sunday, July 24, 2011

to which the only acceptable response is:
Sunday, July 24, 2011

ha! Yeah right.
Time series are not relational!
Sunday, July 24, 2011

even extracted features are not inherently relational!

Also: you donā€™t know what youā€™re looking for, you donā€™t know when youā€™ll ļ¬nd it, you wonā€™t
know when youā€™ll have to start looking for something different.
Why would you lock yourself into a schema?
We donā€™t know what
                        weā€™ll want to know.


Sunday, July 24, 2011

We wonā€™t know what we want to know. Not only are we warehousing time-series of
multidimensional feature vectors, we donā€™t even know the dimensions weā€™ll be interested in
yet!
natural ļ¬t for
                          documents


Sunday, July 24, 2011

This makes a schema-less database a natural ļ¬t for these sorts of things. Think about all the
alter-table calls iā€™ve avoided...
"_id" : {
                                "install.name" : "agni-3501",
                                "timestamp" : ISODate("2010-08-06T00:00:00Z"),
                                "frequency" : "daily" },
                        "measures" : {
                                "total-delta" : -85.78773442284201,
                                "Energy Sold" : 450087.1186574721,
                                "Generation" : 57273.159890170136,
                                "consumed-delta" : 12.569841951556597,
                                "lbs-sold" : 18848.4,
                                "Gallons Loop" : 740.5,
                                "Coincident Usage" : 400,
                                "Stored Energy" : 1306699.6439737699,
                                "Gallons Sold" : 2260,
                                "Energy Delivered" : 360069.6949259777,
                                "Total Usage" : -1605086.7261496289,
                                "Stratification" : -4.905050370111111,
                                "gen-delta-roof" : 4.819865854785763,
                                "lbs-loop" : 6520.1025 },
                        "day_of_year" : 218,
                        "day_of_week" : 4,
                        "month" : 8,
                        "week_of_year" : 31,
                        "install" : {
                                "panels" : 32,
                                "name" : "agni-3501",
                                "num_files" : "3744",
                                "heater_efficiency" : 0.8,
                                "storage" : 1612,
                                "install_completed" : ISODate("2010-08-06T00:00:00Z"),
                                "logger_type" : "emerald",
                                "_id" : ObjectId("4d2905536edfdb022f000212"),
                                "polysun_proj" : [
                                        22863.7, 24651.7, 30301.7,
                                        30053.5, 29640.5, 27806.4,
                                        27511, 28563.1, 27840.7,
                                        26470.9, 21718.9, 19145.4 ],
                                "last_seen" : "2011-01-08 05:26:35.352782" },
                        "year" : 2010,
                        "day" : 6
Sunday, July 24, 2011

isnā€™t this better?
"_id" : {
                                "install.name" : "agni-3501",
                                "timestamp" : ISODate("2010-08-06T00:00:00Z"),
                                "frequency" : "daily" },
                        "measures" : {
                                "total-delta" : -85.78773442284201,
                                "Energy Sold" : 450087.1186574721,
                                "Generation" : 57273.159890170136,
                                "consumed-delta" : 12.569841951556597,
                                "lbs-sold" : 18848.4,
                                "Gallons Loop" : 740.5,
                                "Coincident Usage" : 400,
                                "Stored Energy" : 1306699.6439737699,      ā€œmeasuresā€
                                "Gallons Sold" : 2260,
                                "Energy Delivered" : 360069.6949259777,
                                "Total Usage" : -1605086.7261496289,
                                "Stratification" : -4.905050370111111,
                                "gen-delta-roof" : 4.819865854785763,
                                "lbs-loop" : 6520.1025 },
                        "day_of_year" : 218,
                        "day_of_week" : 4,
                        "month" : 8,
                                                                         ā€œdimensionsā€
                        "week_of_year" : 31,
                        "install" : {
                                "panels" : 32,
                                "name" : "agni-3501",
                                "num_files" : "3744",
                                "heater_efficiency" : 0.8,
                                "storage" : 1612,
                                "install_completed" : ISODate("2010-08-06T00:00:00Z"),
                                "logger_type" : "emerald",
                                "_id" : ObjectId("4d2905536edfdb022f000212"),
                                "polysun_proj" : [
                                        22863.7, 24651.7, 30301.7,
                                        30053.5, 29640.5, 27806.4,
                                        27511, 28563.1, 27840.7,
                                        26470.9, 21718.9, 19145.4 ],
                                "last_seen" : "2011-01-08 05:26:35.352782" },
                                                                                         ...right?
                        "year" : 2010,
                        "day" : 6
Sunday, July 24, 2011

measures & dimensions. This would be a nice, clean division, except that it isnā€™t. Frequently
weā€™ll look for measures by other measures -- i.e., each measure serves as a dimension.
...actually, not a good
                                model.


Sunday, July 24, 2011

The line gets pretty blurry, in practice. Multi-dimensional vectors mean every measure
provides another dimension.
Anyway!
"_id" : {
                                "install.name" : "agni-3501",
                                "timestamp" : ISODate("2010-08-06T00:00:00Z"),
                                "frequency" : "daily" },
                        "measures" : {
                                "total-delta" : -85.78773442284201,
                                "Energy Sold" : 450087.1186574721,
                                "Generation" : 57273.159890170136,
                                "consumed-delta" : 12.569841951556597,
                                "lbs-sold" : 18848.4,
                                "Gallons Loop" : 740.5,
                                "Coincident Usage" : 400,
                                "Stored Energy" : 1306699.6439737699,
                                "Gallons Sold" : 2260,
                                "Energy Delivered" : 360069.6949259777,
                                "Total Usage" : -1605086.7261496289,
                                "Stratification" : -4.905050370111111,
                                "gen-delta-roof" : 4.819865854785763,
                                "lbs-loop" : 6520.1025 },
                        "day_of_year" : 218,
                        "day_of_week" : 4,
                        "month" : 8,
                        "week_of_year" : 31,
                        "install" : {
                                "panels" : 32,
                                "name" : "agni-3501",
                                "num_files" : "3744",
                                "heater_efficiency" : 0.8,
                                "storage" : 1612,
                                "install_completed" : ISODate("2010-08-06T00:00:00Z"),
                                "logger_type" : "emerald",
                                "_id" : ObjectId("4d2905536edfdb022f000212"),
                                "polysun_proj" : [
                                        22863.7, 24651.7, 30301.7,
                                        30053.5, 29640.5, 27806.4,
                                        27511, 28563.1, 27840.7,
                                        26470.9, 21718.9, 19145.4 ],
                                "last_seen" : "2011-01-08 05:26:35.352782" },
                        "year" : 2010,
                        "day" : 6
Sunday, July 24, 2011

How do we build these quickly & efficiently?
the goal: good numbers!



Sunday, July 24, 2011

Remember, the goal here is to make it easy for analysts to get comparable numbers, so when
i ask for the delivered energy for one system, compared to the delivered energy from
another, i can just get the time-series data, without having to worry about if sensors
changed, when the network was out, when a logger was replaced with another one, etc.
Sunday, July 24, 2011

So, the OLTP layer serving as our inputs essentially serves up timestamped data as CSV
series. It doesnā€™t really provide a lot of intelligence, and is basically the raw numbers
from rows
                             to columns


Sunday, July 24, 2011

So, most of what our pipeline does is turn things from rows to columns, in a ļ¬‚exible, useful
way. Iā€™m gonna walk through that process, quickly.
"_id" : {
                                "install.name" : "agni-3501",
                                "timestamp" : ISODate("2010-08-06T00:00:00Z"),
                                "frequency" : "daily" },
                        "measures" : {


                                                                       Letā€™s just look at one
                                "total-delta" : -85.78773442284201,
                                "Energy Sold" : 450087.1186574721,
                                "Generation" : 57273.159890170136,
                                "consumed-delta" : 12.569841951556597,
                                "lbs-sold" : 18848.4,
                                "Gallons Loop" : 740.5,
                                "Coincident Usage" : 400,
                                "Stored Energy" : 1306699.6439737699,
                                "Gallons Sold" : 2260,
                                "Energy Delivered" : 360069.6949259777,
                                "Total Usage" : -1605086.7261496289,
                                "Stratification" : -4.905050370111111,
                                "gen-delta-roof" : 4.819865854785763,
                                "lbs-loop" : 6520.1025 },
                        "day_of_year" : 218,
                        "day_of_week" : 4,
                        "month" : 8,
                        "week_of_year" : 31,
                        "install" : {
                                "panels" : 32,
                                "name" : "agni-3501",
                                "num_files" : "3744",
                                "heater_efficiency" : 0.8,
                                "storage" : 1612,
                                "install_completed" : ISODate("2010-08-06T00:00:00Z"),
                                "logger_type" : "emerald",
                                "_id" : ObjectId("4d2905536edfdb022f000212"),
                                "polysun_proj" : [
                                        22863.7, 24651.7, 30301.7,
                                        30053.5, 29640.5, 27806.4,
                                        27511, 28563.1, 27840.7,
                                        26470.9, 21718.9, 19145.4 ],
                                "last_seen" : "2011-01-08 05:26:35.352782" },
                        "year" : 2010,
                        "day" : 6
Sunday, July 24, 2011
row-major data
Time,municipal water in T,solar heated water out T,solar tank bottom taped to side,solar tank top taped to side,array in/out,array in/out,tank room ambient t,array supply temperature,array return
temperature,solar energy sensor,customer ļ¬‚ow meter,customer OIML btu meter,solar collector array ļ¬‚ow meter,solar collector array OIML btu meter,Cycle Count
Tue Mar 9 23:01:44 2010,14.7627064834,53.7822899383,12.1642527206,51.1436001456,6.40476190476,8.9582972583,22.6857033228,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333458
Tue Mar 9 23:02:44 2010,14.958038343,53.764889193,12.1642527206,51.0925345058,6.40476190476,8.85184138407,22.5716100982,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462
Tue Mar 9 23:03:45 2010,15.1145934976,53.6986641192,12.1642527206,50.8692901812,6.40476190476,8.78519002979,22.5673674246,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462
Tue Mar 9 23:04:45 2010,15.2512207824,53.5955190752,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333468
Tue Mar 9 23:05:45 2010,15.3690229715,53.5534492867,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333471
Tue Mar 9 23:06:46 2010,15.5253261193,53.5534492867,12.1642527206,50.8658228816,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333472
Tue Mar 9 23:07:46 2010,15.6676270005,53.5534492867,12.1642527206,50.9177829276,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.293277114,0.0,0.0,0.0,0.0,0.0,333472
Tue Mar 9 23:08:47 2010,15.7915083121,53.4761516976,12.1642527206,50.8398031014,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.1826467404,0.0,0.0,0.0,0.0,0.0,333477
Tue Mar 9 23:09:47 2010,15.9763741003,53.693428918,12.1642527206,50.7859446809,6.40476190476,8.78519002979,22.5461357574,24.0728390462,22.1782915595,0.0,1.0,0.0,0.0,0.0,333581
Tue Mar 9 23:10:47 2010,16.1650984572,54.0547534088,12.1642527206,50.725,6.40476190476,8.78519002979,22.4544906773,24.0728390462,22.1782915595,0.0,0.0,0.0,0.0,0.0,333614




Sunday, July 24, 2011
ā€œFunctionalā€
                        class Mass(BasicMeasure):
                            def __init__(self, density, volume):
                                ...

                                self._result_func = functools.partial(
                                     lambda data, density, volume: density * volume(data)
                                     density=density, volume=volume)

                            def __call__(self, data):
                               return self._result_func(data)




Sunday, July 24, 2011

quasi-functional classes that describe how to calculate a value from data.
"_id" : {
                                        "install.name" : "agni-3501",
                                        "timestamp" : ISODate("2010-08-06T00:00:00Z"),
                                        "frequency" : "daily" },
                                "measures" : {
                                        "total-delta" : -85.78773442284201,
                                        "Energy Sold" : 450087.1186574721,
                                        "Generation" : 57273.159890170136,
                                        "consumed-delta" : 12.569841951556597,




                                                        A formula:

                                                      E = āˆ†t Ɨ F
                        #pseudocode
                        class LoopEnergy(BasicMeasure):
                            def __init__(self, heat_cap, delta, mass):
                                ...
                                def result_func(data):
                                    return self.delta(data) * self.mass(data) * self.heat_cap
                                self._result_func = result_func

                            def __call__(self, data):
                                return self._result_func(data)




Sunday, July 24, 2011
Creating a Cube
                        For each install, for each chunk of data:

                            apply all known formulas to get values

                            make some convenience keys (e.g., day_of_year)

                            stuff it in mongo

                         Then, map/reduce to whatever dimensionalities youā€™re
                         interested in: e.g., downsampling.




Sunday, July 24, 2011

Hereā€™s some pseudocode for how to make a cube of multidimensional data.
So, whatā€™s the payoff?
How much water did
                         [x] use, monthly?
                > db.facts_monthly.find({"install.name": [foo]}, {"measures.Gallons Sold":
                1}).sort({ā€œ_idā€: 1})




Sunday, July 24, 2011

Complicated analytical queries can be boiled down to nearly single line mongo-queries.
Hereā€™s some examples:
What were our highest
                    production days?
                > db.facts_daily.find({}, {ā€œmeasures.Energy Soldā€: 1}).sort({_measures.Energy
                Soldā€: -1})




Sunday, July 24, 2011

Complicated analytical queries can be boiled down to nearly single line mongo-queries.
Hereā€™s some examples:
How does the distribution of [x]
                 on the weekend compare to its
                  distribution on the weekdays?
                > weekends = db.facts_daily.find({"day_of_week": {$in: [5,6]}})
                > weekdays = db.facts_daily.find({"day_of_week": {$nin: [5,6]}})
                > do stuff




Sunday, July 24, 2011

Complicated analytical queries can be boiled down to nearly single line mongo-queries.
Hereā€™s some examples:
Whatā€™s the production of installs north of a certain
                        latitude, with a certain class of panel, on Tuesdays?

                        For hours where the average delivered temperature
                        delta was above [x], what was our generation
                        efļ¬ciency?

                        Normalize by number of panels? (map/reduce)

                        Normalize by distance from equinox? (map/reduce)

                        ...etc.



Sunday, July 24, 2011
ā€¢ Building a cube can be done in parallel
                        ā€¢ Map/reduce is an easy way to think about
                          transforms.

                        ā€¢ Not maximally efļ¬cient, but parallelizes on
                          commodity hardware.




Sunday, July 24, 2011

Some advantages.
re #3 -- so what? Itā€™s not a webapp.
mongoDB:
                        The future of enterprise
                         business intelligence.
                           (they just donā€™t know it yet)




Sunday, July 24, 2011

So, hereā€™s my thesis:
document-databases are far superior to relational databases for business intelligence cases.
Not only that, but mongoDB and some common sense lets you replace multimillion dollar
IBM-level enterprise solutions with open-source awesomeness. All this in a rapid, agile way.
Lastly...



Sunday, July 24, 2011
Mongo expands in an
                           organization.


Sunday, July 24, 2011

itā€™s cool, donā€™t ļ¬ght it. Once we started using it for our analytics, we realized there was a lot
of other schema-loose data that we could use it for -- like the deļ¬nitions of the measures
themselves, or the details about an install, etc., etc.
Final Thoughts



Sunday, July 24, 2011

Ok, i want to close up with a few jumping-off points.
ā€œBusiness Intelligenceā€
                          no longer requires
                              megabucks


Sunday, July 24, 2011
Flexible tools means
                 business responsiveness
                      should be easy


Sunday, July 24, 2011
ā€œScalingā€ doesnā€™t just
                          mean depth-ļ¬rst.


Sunday, July 24, 2011

businesses grow deep, in the sense of adding more users, but they also grow broad.
Questions?



Sunday, July 24, 2011
Epilogue
                        Quest for Logging Hardware




Sunday, July 24, 2011
Thisā€™ll be easy!
        This is such an obvious and well
          explored problem space, iā€™m
           sure weā€™ll be able to ļ¬nd a
        solution that matches our needs
           without breaking the bank!




Sunday, July 24, 2011
Shopping List!
           16 temperature sensors
                4 ļ¬‚ow sensors
        maybe some miscellaneous ones
              internet backhaul
           no software/data lock in.




Sunday, July 24, 2011
Conventions
                  FTW!
        And since weā€™ve walked a couple
         convention ļ¬‚oors and product
         catalogs from major industrial
         supply vendors, iā€™m sure itā€™s in
               here somewhere!




Sunday, July 24, 2011
derp derp
                    ā€œinternetā€?
        Iā€™m sure thereā€™s a reason why all
        of these loggers have to connect
                    via USB...
                         Pace Scientiļ¬c XR5:
                              8 analog
                               3 pulse
                              ONE MB
                            no internet?
                               $500?!?



Sunday, July 24, 2011
yay windows?
            ...and require proprietary
              (windows!) software or
         subscription plans that route my
            data through their servers

                        (basically all of them!)



Sunday, July 24, 2011
Maybe the govā€™t
          can help!
           Perhaps thereā€™s some kind of
          standard that the governments
              require for solar thermal
             monitoring systems to be
            eligible for incentives or tax
                        credits.



Sunday, July 24, 2011
Vive la France!
              An obscure standard by the
                   Organisation
                Internationale de
                MĆ©trologie LĆ©gale
                   appears! Neat!




Sunday, July 24, 2011
A ā€œCertiļ¬edā€
                  Logger
                 two temperature sensors
                         one pulse
                  no increase in accuracy
                  no data backhaul -- at all
                             ...
                     whatā€™s the price?



Sunday, July 24, 2011
$1,000




Sunday, July 24, 2011
$1,000




Sunday, July 24, 2011
Hmm...
            I can solder, and arduinos are
                     pretty cheap




Sunday, July 24, 2011
Itā€™s on!




Sunday, July 24, 2011
arduino + netbook!
Sunday, July 24, 2011
TL; DR:
                        Existing loggers
                          are terrible.


Sunday, July 24, 2011

Also, existing industries arenā€™t really ready for rapid prototyping and its destructive effects.
ā€¢   http://www.ļ¬‚ickr.com/photos/rknight/4358119571/

                        ā€¢   http://4.bp.blogspot.com/_8vNzwxlohg0/
                            TJoUWqsF4LI/AAAAAAAABMg/QaUiKwCEZn8/
                            s320/turtles-all-the-way-down.jpg

                        ā€¢   http://www.ļ¬‚ickr.com/photos/rhk313/3801302914/

                        ā€¢   http://www.ļ¬‚ickr.com/photos/benny_lin/481411728/

                        ā€¢   http://spagobi.blogspot.com/
                            2010_08_01_archive.html

                        ā€¢   http://community.qlikview.com/forums/t/37106.aspx


Sunday, July 24, 2011

More Related Content

What's hot

Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2Databricks
Ā 
Getting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterGetting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterScyllaDB
Ā 
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)궛 吓
Ā 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
Ā 
Fast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
Ā 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
Ā 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit
Ā 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai WƤhner
Ā 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
Ā 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in KubernetesRafał Leszko
Ā 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
Ā 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Ā 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
Ā 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
Ā 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBaseå¼ŗ ēŽ‹
Ā 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
Ā 
Hybrid Columnar Compression in a non-Exadata System
Hybrid Columnar Compression in a non-Exadata SystemHybrid Columnar Compression in a non-Exadata System
Hybrid Columnar Compression in a non-Exadata SystemEnkitec
Ā 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai WƤhner
Ā 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
Ā 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
Ā 

What's hot (20)

Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2
Ā 
Getting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterGetting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers Faster
Ā 
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Ā 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Ā 
Fast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data ā€“ Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Ā 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Ā 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Ā 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Ā 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Ā 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
Ā 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Ā 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Ā 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Framework
Ā 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Ā 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
Ā 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Ā 
Hybrid Columnar Compression in a non-Exadata System
Hybrid Columnar Compression in a non-Exadata SystemHybrid Columnar Compression in a non-Exadata System
Hybrid Columnar Compression in a non-Exadata System
Ā 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Ā 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
Ā 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
Ā 

Viewers also liked

MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
Ā 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series DataMongoDB
Ā 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
Ā 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
Ā 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
Ā 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
Ā 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
Ā 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
Ā 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
Ā 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in CassandraEric Evans
Ā 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
Ā 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLBasho Technologies
Ā 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
Ā 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
Ā 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBrian Enochson
Ā 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
Ā 
Resilience an engineering construction perspective
Resilience an engineering construction perspectiveResilience an engineering construction perspective
Resilience an engineering construction perspectiveBob Prieto
Ā 
International Journal of Industrial Engineering and Design vol 2 issue 1
International Journal of Industrial Engineering and Design vol 2 issue 1International Journal of Industrial Engineering and Design vol 2 issue 1
International Journal of Industrial Engineering and Design vol 2 issue 1JournalsPub www.journalspub.com
Ā 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series dataAnuj Sahni
Ā 

Viewers also liked (20)

MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
Ā 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
Ā 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
Ā 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
Ā 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
Ā 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
Ā 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
Ā 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
Ā 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
Ā 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
Ā 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
Ā 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
Ā 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Ā 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
Ā 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
Ā 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
Ā 
Resilience an engineering construction perspective
Resilience an engineering construction perspectiveResilience an engineering construction perspective
Resilience an engineering construction perspective
Ā 
Riak TS
Riak TSRiak TS
Riak TS
Ā 
International Journal of Industrial Engineering and Design vol 2 issue 1
International Journal of Industrial Engineering and Design vol 2 issue 1International Journal of Industrial Engineering and Design vol 2 issue 1
International Journal of Industrial Engineering and Design vol 2 issue 1
Ā 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series data
Ā 

Similar to Time Series Data Storage in MongoDB

Operations as a Strategic Weapon
Operations as a Strategic WeaponOperations as a Strategic Weapon
Operations as a Strategic WeaponJohn Willis
Ā 
IT-enabled Business Innovation Workshop 8 July 2011
IT-enabled Business Innovation Workshop 8 July 2011IT-enabled Business Innovation Workshop 8 July 2011
IT-enabled Business Innovation Workshop 8 July 2011Lead & Transform
Ā 
Drupal as a winning Web Platform
Drupal as a winning Web PlatformDrupal as a winning Web Platform
Drupal as a winning Web PlatformChapter Three
Ā 
Building an experimentation framework
Building an experimentation frameworkBuilding an experimentation framework
Building an experimentation frameworkzsqr
Ā 
SplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunk
Ā 
How to Recruit and Retain Top Talent - Insight into Building a Stellar Team
How to Recruit and Retain Top Talent - Insight into Building a Stellar TeamHow to Recruit and Retain Top Talent - Insight into Building a Stellar Team
How to Recruit and Retain Top Talent - Insight into Building a Stellar TeamGlenn Hilton
Ā 
How to Recruit and Retain Top Talent in the Drupal Community
How to Recruit and Retain Top Talent in the Drupal CommunityHow to Recruit and Retain Top Talent in the Drupal Community
How to Recruit and Retain Top Talent in the Drupal CommunityMediacurrent
Ā 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Balanced Team
Ā 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaPeter O'Kelly
Ā 
Wibiya founders at The Junction
Wibiya founders at The JunctionWibiya founders at The Junction
Wibiya founders at The JunctionDaniel Tal
Ā 
CMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalCMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalBlake Hall
Ā 
Promise notes
Promise notesPromise notes
Promise notesCS, NcState
Ā 
Web heresies
Web heresiesWeb heresies
Web heresiesJames Aylett
Ā 
LISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps TransformationLISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps Transformationbenrockwood
Ā 
SecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurity Ninja
Ā 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastoreikailan
Ā 

Similar to Time Series Data Storage in MongoDB (20)

Operations as a Strategic Weapon
Operations as a Strategic WeaponOperations as a Strategic Weapon
Operations as a Strategic Weapon
Ā 
IT-enabled Business Innovation Workshop 8 July 2011
IT-enabled Business Innovation Workshop 8 July 2011IT-enabled Business Innovation Workshop 8 July 2011
IT-enabled Business Innovation Workshop 8 July 2011
Ā 
Drupal as a winning Web Platform
Drupal as a winning Web PlatformDrupal as a winning Web Platform
Drupal as a winning Web Platform
Ā 
Building an experimentation framework
Building an experimentation frameworkBuilding an experimentation framework
Building an experimentation framework
Ā 
SplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrackSplunkLive New York 2011: DealerTrack
SplunkLive New York 2011: DealerTrack
Ā 
How to Recruit and Retain Top Talent - Insight into Building a Stellar Team
How to Recruit and Retain Top Talent - Insight into Building a Stellar TeamHow to Recruit and Retain Top Talent - Insight into Building a Stellar Team
How to Recruit and Retain Top Talent - Insight into Building a Stellar Team
Ā 
How to Recruit and Retain Top Talent in the Drupal Community
How to Recruit and Retain Top Talent in the Drupal CommunityHow to Recruit and Retain Top Talent in the Drupal Community
How to Recruit and Retain Top Talent in the Drupal Community
Ā 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Ā 
Varieties of Self-Awareness and Their Uses in Natural and Artificial Systems ...
Varieties of Self-Awareness and Their Uses in Natural and Artificial Systems ...Varieties of Self-Awareness and Their Uses in Natural and Artificial Systems ...
Varieties of Self-Awareness and Their Uses in Natural and Artificial Systems ...
Ā 
Alternative Software Development Methodology
Alternative Software Development MethodologyAlternative Software Development Methodology
Alternative Software Development Methodology
Ā 
Agile xptdd@gosoft
Agile xptdd@gosoftAgile xptdd@gosoft
Agile xptdd@gosoft
Ā 
Agile xp tdd@gosoft
Agile xp tdd@gosoftAgile xp tdd@gosoft
Agile xp tdd@gosoft
Ā 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery Enigma
Ā 
Wibiya founders at The Junction
Wibiya founders at The JunctionWibiya founders at The Junction
Wibiya founders at The Junction
Ā 
CMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalCMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social Drupal
Ā 
Promise notes
Promise notesPromise notes
Promise notes
Ā 
Web heresies
Web heresiesWeb heresies
Web heresies
Ā 
LISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps TransformationLISA 2011 Keynote: The DevOps Transformation
LISA 2011 Keynote: The DevOps Transformation
Ā 
SecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurityBSides las vegas - Agnitio
SecurityBSides las vegas - Agnitio
Ā 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
Ā 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Ā 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Ā 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Ā 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Ā 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Ā 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 

Time Series Data Storage in MongoDB

  • 2. ajackson @ skylineinnovations.com Sunday, July 24, 2011
  • 3. a tale of rapid prototyping, data warehousing, solar power, an architecture designed for data analysis at ā€œscaleā€ ...and arduinos! Sunday, July 24, 2011 So hereā€™s what iā€™d like to talk about: Who we are, how we got started, and most importantly, how weā€™ve been able to use MongoDB to help us. Weā€™re not a traditional startup -- and while i know that this is not a ā€œstartupsā€ talk, but a Mongo one, iā€™d like to show how Mongoā€™s ļ¬‚exible nature really helped us as a business, and how Mongo speciļ¬cally has been a good choice for us as we build some of our tools. Here are some themes:
  • 4. Scaling Sunday, July 24, 2011 Mongo has come to have a pretty strong association with the word ā€œscaling.ā€ Scaling is a word we throw around a lot, and it almost always means ā€œsoftware performance, as inputs grow by orders of magnitude.ā€ But scaling also means performance as the variety of inputs increases. Iā€™d argue that itā€™s scaling to go from 10 users to 10,000, and itā€™s also scaling to go from ten ā€˜kindsā€™ of input to a hundred. Thereā€™s another word for this.
  • 5. Scaling Flexibility Sunday, July 24, 2011 Particularly when you scale in the real world, you start to ļ¬nd that itā€™s complicated and messy and entropic in ways that software isnā€™t always equipped to handle. So for us, when we say ā€œmongo helps us scaleā€, we donā€™t necessarily mean scaling to petabytes of data. Weā€™ll come back to them as well.
  • 6. Business-ļ¬rst development Sunday, July 24, 2011 This generally means ļ¬‚exibile, lightweight processes. Things that become ļ¬xed & unchangable quickly become obsolete and sad :ā€™(
  • 7. When Does ā€œContextā€ become ā€œYak Shavingā€? Sunday, July 24, 2011 When i read new things or hear about new stuff, Iā€™m always trying to put it in context. So, sometimes i put too much context in my talks :( To avoid it, I sometimes go a little too fast over the context that *is* important. So please stop me to ask questions! Also, the problem domain here is a little different than what we might be used to, so bear with me as we go into plumbing & construction.
  • 10. Project Development + Technology Sunday, July 24, 2011
  • 12. ļ¬nance, develop, and operate renewable energy and efļ¬ciency installations, for measurable, guaranteed savings. Sunday, July 24, 2011
  • 13. ļ¬nance, develop, and operate renewable energy and efļ¬ciency installations, for measurable, guaranteed savings. Sunday, July 24, 2011 Weā€™ll pay to put stuff on your roof, and weā€™ll keep it at its maximally awesome.
  • 14. ļ¬nance, develop, and operate renewable energy and efļ¬ciency installations, for measurable, guaranteed savings. Sunday, July 24, 2011 Right now, this means solar thermal, more efficient lighting retroļ¬ts, and maybe HVAC.
  • 15. ļ¬nance, develop, and operate renewable energy and efļ¬ciency installations, for measurable, guaranteed savings. Sunday, July 24, 2011 So, hereā€™s the interesting part. Since we put stuff on your roof for free, we need to get that money back. What we do is, weā€™ll charge you for the energy that it saved you, but, hereā€™s the twist. Other companies have done similar things, where they say ā€œweā€™ll pay for a system/ retroļ¬t/whatever, and youā€™ll agree to pay us an arbitrary number, and we say youā€™ll get savings, but you wonā€™t actually be able to tell, really.ā€ That always seemed sketchy to us. So, we actually measure the performance of this stuff, collect the data, and guarantee that you save money.
  • 18. ā€¢ Why solar thermal? ā€¢ Why hasnā€™t anyone else done this before? ā€¢ Pivots? Iterations? ā€¢ Whatā€™s the market size? ā€¢ Funding? Capital structures? ā€¢ Wait, how do you guys make money? Sunday, July 24, 2011 Oh, right, this isnā€™t a startup talk. But feel free to ask me these later!
  • 19. Solar Thermal in Five Minutes ( mongo next, i promise! ) Sunday, July 24, 2011
  • 20. Municipal => Roof => Tank => Customer Sunday, July 24, 2011
  • 21. Relevant Data to Track Sunday, July 24, 2011
  • 22. Temperatures (about a dozen) Sunday, July 24, 2011
  • 23. Flow Rates (at least two) Sunday, July 24, 2011
  • 24. Parallel data streams (hopefully many) Sunday, July 24, 2011 e.g., weather data, insolation data. Itā€™d be nice if we didnā€™t have to collect it all ourselves.
  • 25. how much data? 20 data points @ 4 bytes 1 minute intervals at 1000 projects (I wish!) for 10 years 80 * 60 * 24 * 365 * 10 * 1000 = 400 GB? ...not much, really, ā€œin the rawā€ Sunday, July 24, 2011 unfortunately, we canā€™t really store it with maximal efficiency, because of things like timestamps, metadata, etc., but still.
  • 26. Sunday, July 24, 2011 I hope this provides enough context on the business problems weā€™re trying to solve. It looks like weā€™ll need a data pipeline, and weā€™ll need one fast. Weā€™ve got data that weā€™ll need to use to build, monitor, and monetize these energy technologies. Having worked at other smart grid companies before, Iā€™ve seen some good data pipelines and some bad data pipelines. Iā€™d like to build a good one. The less stuff i have to build, the better.
  • 27. Sunday, July 24, 2011 As i do some research, i ļ¬nd that a lot of these data pipelines have a few well-deļ¬ned areas of responsibility.
  • 28. Acquisition, Storage, Search, Retrieval, Analytics. Sunday, July 24, 2011 These should be self explanatory. Whatā€™s interesting is that not only are most of the end- users of the system analysts, interested in analyzing, but that most systems seem to be designed for the other functionality. More importantly, theyā€™re not very well decoupled: by the time the analysts get to start building tools, the design decisions from the beginning are inextricable from the systems that came before.
  • 29. Acquisition, Storage, Search, Retrieval, } Designed for these Analytics. <= Users are here Sunday, July 24, 2011 These should be self explanatory. Whatā€™s interesting is that not only are most of the end- users of the system analysts, interested in analyzing, but that most systems seem to be designed for the other functionality. More importantly, theyā€™re not very well decoupled: by the time the analysts get to start building tools, the design decisions from the beginning are inextricable from the systems that came before.
  • 30. Acquisition, Storage, Search, Retrieval, Analytics. Sunday, July 24, 2011 These should be self explanatory. Whatā€™s interesting is that not only are most of the end- users of the system analysts, interested in analyzing, but that most systems seem to be designed for the other functionality. More importantly, theyā€™re not very well decoupled: by the time the analysts get to start building tools, the design decisions from the beginning are inextricable from the systems that came before. Itā€™s important to remember that, while you canā€™t get good analytics without the other stuff, the analytics is where almost all of the value is! Search & retrieval are approaching ā€œsolvedā€
  • 31. Acquisition, Storage, Search, Retrieval, } Designed for these Analytics. <= Users are here Business value is here! Sunday, July 24, 2011 These should be self explanatory. Whatā€™s interesting is that not only are most of the end- users of the system analysts, interested in analyzing, but that most systems seem to be designed for the other functionality. More importantly, theyā€™re not very well decoupled: by the time the analysts get to start building tools, the design decisions from the beginning are inextricable from the systems that came before. Itā€™s important to remember that, while you canā€™t get good analytics without the other stuff, the analytics is where almost all of the value is! Search & retrieval are approaching ā€œsolvedā€
  • 32. Sunday, July 24, 2011 so, hereā€™s how i started thinking about things. This is a design diagram from the early days of the company.
  • 33. Sunday, July 24, 2011 easy, python, no problem. There are some interesting topics here, but theyā€™re not mongoDB related. I was pretty sure i knew how to build this part, and i was pretty sure i knew what the data would look like.
  • 34. Sunday, July 24, 2011 This part was also easy -- e-mail reports, csvs, maybe some fancy graphs, possibly some light webapps for internal use. These would be dictated by business goals ļ¬rst, but the technological questions were straightforward.
  • 35. Sunday, July 24, 2011 Here was the real question. What would be some use cases of an analyst having a good experience look like? What would they expect the tools to do?
  • 36. Now we can think about what the data looks like Sunday, July 24, 2011 So, letā€™s think about what this data looks like, how itā€™s structured and what it is. Then, after that, we can look at what the best ways to organize it for future usefulness.
  • 37. Time series? Time,municipal water in T,solar heated water out T,solar tank bottom taped to side,solar tank top taped to side,array in/out,array in/out,tank room ambient t,array supply temperature,array return temperature,solar energy sensor,customer ļ¬‚ow meter,customer OIML btu meter,solar collector array ļ¬‚ow meter,solar collector array OIML btu meter,Cycle Count Tue Mar 9 23:01:44 2010,14.7627064834,53.7822899383,12.1642527206,51.1436001456,6.40476190476,8.9582972583,22.6857033228,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333458 Tue Mar 9 23:02:44 2010,14.958038343,53.764889193,12.1642527206,51.0925345058,6.40476190476,8.85184138407,22.5716100982,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462 Tue Mar 9 23:03:45 2010,15.1145934976,53.6986641192,12.1642527206,50.8692901812,6.40476190476,8.78519002979,22.5673674246,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462 Tue Mar 9 23:04:45 2010,15.2512207824,53.5955190752,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333468 Tue Mar 9 23:05:45 2010,15.3690229715,53.5534492867,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333471 Tue Mar 9 23:06:46 2010,15.5253261193,53.5534492867,12.1642527206,50.8658228816,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333472 Tue Mar 9 23:07:46 2010,15.6676270005,53.5534492867,12.1642527206,50.9177829276,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.293277114,0.0,0.0,0.0,0.0,0.0,333472 Tue Mar 9 23:08:47 2010,15.7915083121,53.4761516976,12.1642527206,50.8398031014,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.1826467404,0.0,0.0,0.0,0.0,0.0,333477 Tue Mar 9 23:09:47 2010,15.9763741003,53.693428918,12.1642527206,50.7859446809,6.40476190476,8.78519002979,22.5461357574,24.0728390462,22.1782915595,0.0,1.0,0.0,0.0,0.0,333581 Tue Mar 9 23:10:47 2010,16.1650984572,54.0547534088,12.1642527206,50.725,6.40476190476,8.78519002979,22.4544906773,24.0728390462,22.1782915595,0.0,0.0,0.0,0.0,0.0,333614 Sunday, July 24, 2011
  • 38. TIME SERIES DATA Sunday, July 24, 2011 So what is time series data?
  • 39. Features, Over Time Sunday, July 24, 2011 multi-dimensional features. Whatā€™s fun in a business like this is that weā€™re not really sure what the features we study will be. -- Flexibility callout
  • 40. Features, Over Time Thing (Feature vector, v) Time (t) Sunday, July 24, 2011 multi-dimensional features. Whatā€™s fun in a business like this is that weā€™re not really sure what the features we study will be. -- Flexibility callout
  • 41. Features, Over Time Thing (Feature vector, v) Time (t) Sunday, July 24, 2011 multi-dimensional features. Whatā€™s fun in a business like this is that weā€™re not really sure what the features we study will be. -- Flexibility callout
  • 42. Sunday, July 24, 2011 A couple of ideas: sampling rates. ā€œregularityā€. ā€œcompletenessā€ analog vs. digital instantaneous vs. cumulative (tradeoffs)
  • 43. tn tn+1 Sunday, July 24, 2011 Finding known interesting ranges (deļ¬nitely the most common)
  • 44. tn tn+1 Sunday, July 24, 2011 Finding known interesting ranges (deļ¬nitely the most common)
  • 45. t tā€™ etc. Sunday, July 24, 2011 Using features to ļ¬nd interesting ranges. These two ways to look for things should inform our design decisions.
  • 46. y t tā€™ etc. Sunday, July 24, 2011 Using features to ļ¬nd interesting ranges. These two ways to look for things should inform our design decisions.
  • 47. y Thresholds yā€™ t tā€™ etc. Sunday, July 24, 2011 Using features to ļ¬nd interesting ranges. These two ways to look for things should inform our design decisions.
  • 48. y Thresholds yā€™ t tā€™ etc. Sunday, July 24, 2011 Using features to ļ¬nd interesting ranges. These two ways to look for things should inform our design decisions.
  • 49. (more complicated stuff can be thought of as transformations...) Sunday, July 24, 2011 e.g., frequency analysis, wavelets, whatever.
  • 50. Sunday, July 24, 2011 At this point, I go off and do a bunch of research on existing technologies. I really hate reinventing the wheel, and we really donā€™t have the manpower.
  • 51. Time series speciļ¬c tools Scientiļ¬c tools & libraries Traditional data-warehousing approaches Sunday, July 24, 2011 So, these were some of the options i looked at. I want to quickly point out why i eliminated the ļ¬rst two classes of tools.
  • 52. Time series speciļ¬c tools RRDtool -- Round Robin Database Sunday, July 24, 2011 Thereā€™s really surprisingly few of these. One of the best is the RRDtool. Itā€™s pretty sweet, and i highly recommend it. Unfortunately, itā€™s really designed for applications that are highly regular, and that are already pretty digital, for instance, sampling latencies, or temperatures in a datacenter. Itā€™s not really good for unreliable sensors, nor is it really designed for long term persistance. It also has a really high lock-in, with legacy data formats, etc. Donā€™t get me wrong, itā€™s totally rad, but i didnā€™t think it was for us.
  • 53. Scientiļ¬c tools & libraries e.g., PyTables Sunday, July 24, 2011 Pretty cool, but not many of these were mature & ready for primetime. Some that were, like PyTables, didnā€™t really match our business use-case.
  • 54. Traditional data-warehousing approaches Sunday, July 24, 2011 So, these were some of the options i looked at. I want to quickly point out why i eliminated the ļ¬rst two classes of tools. [...]. That leaves us with the traditional approaches. This represents a pretty well established ļ¬eld, but very few of the tools are free, lightweight, and mature.
  • 55. Enterprise buzzwords (Just google for OLAP) Sunday, July 24, 2011 But the biggest idea i learned is that most data warehousing revolves around the idea of a ā€œfact tableā€. They call it a ā€œmultidimensional OLAP cubeā€, but basically it exists as a totally denormalized SQL table.
  • 56. ā€œMeasuresā€ and their ā€œDimensionsā€ Sunday, July 24, 2011 (or facts)
  • 61. (from ā€œHow to Build OLAP Application Using Mondrian + XMLA + SpagoBIā€) Sunday, July 24, 2011 to which the only acceptable response is:
  • 62. Sunday, July 24, 2011 ha! Yeah right.
  • 63. Time series are not relational! Sunday, July 24, 2011 even extracted features are not inherently relational! Also: you donā€™t know what youā€™re looking for, you donā€™t know when youā€™ll ļ¬nd it, you wonā€™t know when youā€™ll have to start looking for something different. Why would you lock yourself into a schema?
  • 64. We donā€™t know what weā€™ll want to know. Sunday, July 24, 2011 We wonā€™t know what we want to know. Not only are we warehousing time-series of multidimensional feature vectors, we donā€™t even know the dimensions weā€™ll be interested in yet!
  • 65. natural ļ¬t for documents Sunday, July 24, 2011 This makes a schema-less database a natural ļ¬t for these sorts of things. Think about all the alter-table calls iā€™ve avoided...
  • 66. "_id" : { "install.name" : "agni-3501", "timestamp" : ISODate("2010-08-06T00:00:00Z"), "frequency" : "daily" }, "measures" : { "total-delta" : -85.78773442284201, "Energy Sold" : 450087.1186574721, "Generation" : 57273.159890170136, "consumed-delta" : 12.569841951556597, "lbs-sold" : 18848.4, "Gallons Loop" : 740.5, "Coincident Usage" : 400, "Stored Energy" : 1306699.6439737699, "Gallons Sold" : 2260, "Energy Delivered" : 360069.6949259777, "Total Usage" : -1605086.7261496289, "Stratification" : -4.905050370111111, "gen-delta-roof" : 4.819865854785763, "lbs-loop" : 6520.1025 }, "day_of_year" : 218, "day_of_week" : 4, "month" : 8, "week_of_year" : 31, "install" : { "panels" : 32, "name" : "agni-3501", "num_files" : "3744", "heater_efficiency" : 0.8, "storage" : 1612, "install_completed" : ISODate("2010-08-06T00:00:00Z"), "logger_type" : "emerald", "_id" : ObjectId("4d2905536edfdb022f000212"), "polysun_proj" : [ 22863.7, 24651.7, 30301.7, 30053.5, 29640.5, 27806.4, 27511, 28563.1, 27840.7, 26470.9, 21718.9, 19145.4 ], "last_seen" : "2011-01-08 05:26:35.352782" }, "year" : 2010, "day" : 6 Sunday, July 24, 2011 isnā€™t this better?
  • 67. "_id" : { "install.name" : "agni-3501", "timestamp" : ISODate("2010-08-06T00:00:00Z"), "frequency" : "daily" }, "measures" : { "total-delta" : -85.78773442284201, "Energy Sold" : 450087.1186574721, "Generation" : 57273.159890170136, "consumed-delta" : 12.569841951556597, "lbs-sold" : 18848.4, "Gallons Loop" : 740.5, "Coincident Usage" : 400, "Stored Energy" : 1306699.6439737699, ā€œmeasuresā€ "Gallons Sold" : 2260, "Energy Delivered" : 360069.6949259777, "Total Usage" : -1605086.7261496289, "Stratification" : -4.905050370111111, "gen-delta-roof" : 4.819865854785763, "lbs-loop" : 6520.1025 }, "day_of_year" : 218, "day_of_week" : 4, "month" : 8, ā€œdimensionsā€ "week_of_year" : 31, "install" : { "panels" : 32, "name" : "agni-3501", "num_files" : "3744", "heater_efficiency" : 0.8, "storage" : 1612, "install_completed" : ISODate("2010-08-06T00:00:00Z"), "logger_type" : "emerald", "_id" : ObjectId("4d2905536edfdb022f000212"), "polysun_proj" : [ 22863.7, 24651.7, 30301.7, 30053.5, 29640.5, 27806.4, 27511, 28563.1, 27840.7, 26470.9, 21718.9, 19145.4 ], "last_seen" : "2011-01-08 05:26:35.352782" }, ...right? "year" : 2010, "day" : 6 Sunday, July 24, 2011 measures & dimensions. This would be a nice, clean division, except that it isnā€™t. Frequently weā€™ll look for measures by other measures -- i.e., each measure serves as a dimension.
  • 68. ...actually, not a good model. Sunday, July 24, 2011 The line gets pretty blurry, in practice. Multi-dimensional vectors mean every measure provides another dimension. Anyway!
  • 69. "_id" : { "install.name" : "agni-3501", "timestamp" : ISODate("2010-08-06T00:00:00Z"), "frequency" : "daily" }, "measures" : { "total-delta" : -85.78773442284201, "Energy Sold" : 450087.1186574721, "Generation" : 57273.159890170136, "consumed-delta" : 12.569841951556597, "lbs-sold" : 18848.4, "Gallons Loop" : 740.5, "Coincident Usage" : 400, "Stored Energy" : 1306699.6439737699, "Gallons Sold" : 2260, "Energy Delivered" : 360069.6949259777, "Total Usage" : -1605086.7261496289, "Stratification" : -4.905050370111111, "gen-delta-roof" : 4.819865854785763, "lbs-loop" : 6520.1025 }, "day_of_year" : 218, "day_of_week" : 4, "month" : 8, "week_of_year" : 31, "install" : { "panels" : 32, "name" : "agni-3501", "num_files" : "3744", "heater_efficiency" : 0.8, "storage" : 1612, "install_completed" : ISODate("2010-08-06T00:00:00Z"), "logger_type" : "emerald", "_id" : ObjectId("4d2905536edfdb022f000212"), "polysun_proj" : [ 22863.7, 24651.7, 30301.7, 30053.5, 29640.5, 27806.4, 27511, 28563.1, 27840.7, 26470.9, 21718.9, 19145.4 ], "last_seen" : "2011-01-08 05:26:35.352782" }, "year" : 2010, "day" : 6 Sunday, July 24, 2011 How do we build these quickly & efficiently?
  • 70. the goal: good numbers! Sunday, July 24, 2011 Remember, the goal here is to make it easy for analysts to get comparable numbers, so when i ask for the delivered energy for one system, compared to the delivered energy from another, i can just get the time-series data, without having to worry about if sensors changed, when the network was out, when a logger was replaced with another one, etc.
  • 71. Sunday, July 24, 2011 So, the OLTP layer serving as our inputs essentially serves up timestamped data as CSV series. It doesnā€™t really provide a lot of intelligence, and is basically the raw numbers
  • 72. from rows to columns Sunday, July 24, 2011 So, most of what our pipeline does is turn things from rows to columns, in a ļ¬‚exible, useful way. Iā€™m gonna walk through that process, quickly.
  • 73. "_id" : { "install.name" : "agni-3501", "timestamp" : ISODate("2010-08-06T00:00:00Z"), "frequency" : "daily" }, "measures" : { Letā€™s just look at one "total-delta" : -85.78773442284201, "Energy Sold" : 450087.1186574721, "Generation" : 57273.159890170136, "consumed-delta" : 12.569841951556597, "lbs-sold" : 18848.4, "Gallons Loop" : 740.5, "Coincident Usage" : 400, "Stored Energy" : 1306699.6439737699, "Gallons Sold" : 2260, "Energy Delivered" : 360069.6949259777, "Total Usage" : -1605086.7261496289, "Stratification" : -4.905050370111111, "gen-delta-roof" : 4.819865854785763, "lbs-loop" : 6520.1025 }, "day_of_year" : 218, "day_of_week" : 4, "month" : 8, "week_of_year" : 31, "install" : { "panels" : 32, "name" : "agni-3501", "num_files" : "3744", "heater_efficiency" : 0.8, "storage" : 1612, "install_completed" : ISODate("2010-08-06T00:00:00Z"), "logger_type" : "emerald", "_id" : ObjectId("4d2905536edfdb022f000212"), "polysun_proj" : [ 22863.7, 24651.7, 30301.7, 30053.5, 29640.5, 27806.4, 27511, 28563.1, 27840.7, 26470.9, 21718.9, 19145.4 ], "last_seen" : "2011-01-08 05:26:35.352782" }, "year" : 2010, "day" : 6 Sunday, July 24, 2011
  • 74. row-major data Time,municipal water in T,solar heated water out T,solar tank bottom taped to side,solar tank top taped to side,array in/out,array in/out,tank room ambient t,array supply temperature,array return temperature,solar energy sensor,customer ļ¬‚ow meter,customer OIML btu meter,solar collector array ļ¬‚ow meter,solar collector array OIML btu meter,Cycle Count Tue Mar 9 23:01:44 2010,14.7627064834,53.7822899383,12.1642527206,51.1436001456,6.40476190476,8.9582972583,22.6857033228,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333458 Tue Mar 9 23:02:44 2010,14.958038343,53.764889193,12.1642527206,51.0925345058,6.40476190476,8.85184138407,22.5716100982,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462 Tue Mar 9 23:03:45 2010,15.1145934976,53.6986641192,12.1642527206,50.8692901812,6.40476190476,8.78519002979,22.5673674246,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333462 Tue Mar 9 23:04:45 2010,15.2512207824,53.5955190752,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333468 Tue Mar 9 23:05:45 2010,15.3690229715,53.5534492867,12.1642527206,50.8293877551,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333471 Tue Mar 9 23:06:46 2010,15.5253261193,53.5534492867,12.1642527206,50.8658228816,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.3083978559,0.0,0.0,0.0,0.0,0.0,333472 Tue Mar 9 23:07:46 2010,15.6676270005,53.5534492867,12.1642527206,50.9177829276,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.293277114,0.0,0.0,0.0,0.0,0.0,333472 Tue Mar 9 23:08:47 2010,15.7915083121,53.4761516976,12.1642527206,50.8398031014,6.40476190476,8.78519002979,22.5652456306,24.0728390462,22.1826467404,0.0,0.0,0.0,0.0,0.0,333477 Tue Mar 9 23:09:47 2010,15.9763741003,53.693428918,12.1642527206,50.7859446809,6.40476190476,8.78519002979,22.5461357574,24.0728390462,22.1782915595,0.0,1.0,0.0,0.0,0.0,333581 Tue Mar 9 23:10:47 2010,16.1650984572,54.0547534088,12.1642527206,50.725,6.40476190476,8.78519002979,22.4544906773,24.0728390462,22.1782915595,0.0,0.0,0.0,0.0,0.0,333614 Sunday, July 24, 2011
  • 75. ā€œFunctionalā€ class Mass(BasicMeasure): def __init__(self, density, volume): ... self._result_func = functools.partial( lambda data, density, volume: density * volume(data) density=density, volume=volume) def __call__(self, data): return self._result_func(data) Sunday, July 24, 2011 quasi-functional classes that describe how to calculate a value from data.
  • 76. "_id" : { "install.name" : "agni-3501", "timestamp" : ISODate("2010-08-06T00:00:00Z"), "frequency" : "daily" }, "measures" : { "total-delta" : -85.78773442284201, "Energy Sold" : 450087.1186574721, "Generation" : 57273.159890170136, "consumed-delta" : 12.569841951556597, A formula: E = āˆ†t Ɨ F #pseudocode class LoopEnergy(BasicMeasure): def __init__(self, heat_cap, delta, mass): ... def result_func(data): return self.delta(data) * self.mass(data) * self.heat_cap self._result_func = result_func def __call__(self, data): return self._result_func(data) Sunday, July 24, 2011
  • 77. Creating a Cube For each install, for each chunk of data: apply all known formulas to get values make some convenience keys (e.g., day_of_year) stuff it in mongo Then, map/reduce to whatever dimensionalities youā€™re interested in: e.g., downsampling. Sunday, July 24, 2011 Hereā€™s some pseudocode for how to make a cube of multidimensional data. So, whatā€™s the payoff?
  • 78. How much water did [x] use, monthly? > db.facts_monthly.find({"install.name": [foo]}, {"measures.Gallons Sold": 1}).sort({ā€œ_idā€: 1}) Sunday, July 24, 2011 Complicated analytical queries can be boiled down to nearly single line mongo-queries. Hereā€™s some examples:
  • 79. What were our highest production days? > db.facts_daily.find({}, {ā€œmeasures.Energy Soldā€: 1}).sort({_measures.Energy Soldā€: -1}) Sunday, July 24, 2011 Complicated analytical queries can be boiled down to nearly single line mongo-queries. Hereā€™s some examples:
  • 80. How does the distribution of [x] on the weekend compare to its distribution on the weekdays? > weekends = db.facts_daily.find({"day_of_week": {$in: [5,6]}}) > weekdays = db.facts_daily.find({"day_of_week": {$nin: [5,6]}}) > do stuff Sunday, July 24, 2011 Complicated analytical queries can be boiled down to nearly single line mongo-queries. Hereā€™s some examples:
  • 81. Whatā€™s the production of installs north of a certain latitude, with a certain class of panel, on Tuesdays? For hours where the average delivered temperature delta was above [x], what was our generation efļ¬ciency? Normalize by number of panels? (map/reduce) Normalize by distance from equinox? (map/reduce) ...etc. Sunday, July 24, 2011
  • 82. ā€¢ Building a cube can be done in parallel ā€¢ Map/reduce is an easy way to think about transforms. ā€¢ Not maximally efļ¬cient, but parallelizes on commodity hardware. Sunday, July 24, 2011 Some advantages. re #3 -- so what? Itā€™s not a webapp.
  • 83. mongoDB: The future of enterprise business intelligence. (they just donā€™t know it yet) Sunday, July 24, 2011 So, hereā€™s my thesis: document-databases are far superior to relational databases for business intelligence cases. Not only that, but mongoDB and some common sense lets you replace multimillion dollar IBM-level enterprise solutions with open-source awesomeness. All this in a rapid, agile way.
  • 85. Mongo expands in an organization. Sunday, July 24, 2011 itā€™s cool, donā€™t ļ¬ght it. Once we started using it for our analytics, we realized there was a lot of other schema-loose data that we could use it for -- like the deļ¬nitions of the measures themselves, or the details about an install, etc., etc.
  • 86. Final Thoughts Sunday, July 24, 2011 Ok, i want to close up with a few jumping-off points.
  • 87. ā€œBusiness Intelligenceā€ no longer requires megabucks Sunday, July 24, 2011
  • 88. Flexible tools means business responsiveness should be easy Sunday, July 24, 2011
  • 89. ā€œScalingā€ doesnā€™t just mean depth-ļ¬rst. Sunday, July 24, 2011 businesses grow deep, in the sense of adding more users, but they also grow broad.
  • 91. Epilogue Quest for Logging Hardware Sunday, July 24, 2011
  • 92. Thisā€™ll be easy! This is such an obvious and well explored problem space, iā€™m sure weā€™ll be able to ļ¬nd a solution that matches our needs without breaking the bank! Sunday, July 24, 2011
  • 93. Shopping List! 16 temperature sensors 4 ļ¬‚ow sensors maybe some miscellaneous ones internet backhaul no software/data lock in. Sunday, July 24, 2011
  • 94. Conventions FTW! And since weā€™ve walked a couple convention ļ¬‚oors and product catalogs from major industrial supply vendors, iā€™m sure itā€™s in here somewhere! Sunday, July 24, 2011
  • 95. derp derp ā€œinternetā€? Iā€™m sure thereā€™s a reason why all of these loggers have to connect via USB... Pace Scientiļ¬c XR5: 8 analog 3 pulse ONE MB no internet? $500?!? Sunday, July 24, 2011
  • 96. yay windows? ...and require proprietary (windows!) software or subscription plans that route my data through their servers (basically all of them!) Sunday, July 24, 2011
  • 97. Maybe the govā€™t can help! Perhaps thereā€™s some kind of standard that the governments require for solar thermal monitoring systems to be eligible for incentives or tax credits. Sunday, July 24, 2011
  • 98. Vive la France! An obscure standard by the Organisation Internationale de MĆ©trologie LĆ©gale appears! Neat! Sunday, July 24, 2011
  • 99. A ā€œCertiļ¬edā€ Logger two temperature sensors one pulse no increase in accuracy no data backhaul -- at all ... whatā€™s the price? Sunday, July 24, 2011
  • 102. Hmm... I can solder, and arduinos are pretty cheap Sunday, July 24, 2011
  • 104. arduino + netbook! Sunday, July 24, 2011
  • 105. TL; DR: Existing loggers are terrible. Sunday, July 24, 2011 Also, existing industries arenā€™t really ready for rapid prototyping and its destructive effects.
  • 106. ā€¢ http://www.ļ¬‚ickr.com/photos/rknight/4358119571/ ā€¢ http://4.bp.blogspot.com/_8vNzwxlohg0/ TJoUWqsF4LI/AAAAAAAABMg/QaUiKwCEZn8/ s320/turtles-all-the-way-down.jpg ā€¢ http://www.ļ¬‚ickr.com/photos/rhk313/3801302914/ ā€¢ http://www.ļ¬‚ickr.com/photos/benny_lin/481411728/ ā€¢ http://spagobi.blogspot.com/ 2010_08_01_archive.html ā€¢ http://community.qlikview.com/forums/t/37106.aspx Sunday, July 24, 2011