1. S ANAND
DATA SCIENTIST
GRAMENER.COM
DATA VISUALISATION
IN JAVASCRIPT
2.
3. WHY VISUALISE?
Consider the sales report shown 2010 Bangalore Delhi Hyderabad Mumbai
alongside Month Price Sales Price Sales Price Sales Price Sales
It shows performance of 4 Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
branches with average price and Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
sales across 4 cities Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
Each of the branches change May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
prices every month with a Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
corresponding change in the Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
sales value Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
Basic analytics of these Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
numbers reveal a consistent Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
performance across 4 branches. Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Further, these sales figures have
Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
a consistent Correlation and
Linear regression across all cities
4. WHY VISUALISE?
The four cities are completely
different in behaviour and need
different strategies for growth.
Bangalore sales has generally
increased with price.
Hyderabad has a nearly perfect
increase in sales with price,
except for one aberration.
Delhi, however, shows a decline
in sales as price is increased
beyond a certain point.
Mumbai sales fluctuated despite
a constant price, except for 1
month.
5. DETECTING FRAUD
“
We know meter readings are
incorrect, for various reasons.
We don’t, however, have the
concrete proof we need to start the
process of meter reading
ENERGY UTILITY automation.
Part of our problem is the volume
of data that needs to be analysed.
The other is the inexperience in
tools or analyses to identify such
patterns.
6. This plot shows the frequency of all meter readings from
Why would Apr-2010 to Mar-2011. An unusually large number of
these happen?
readings are aligned with the tariff slab boundaries.
This clearly shows Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
collusion of some form 217 219 200 200 200 200 200 200 200 350 200 200
with the customers. 250 200 200 200 201 200 200 200 250 200 200 150
250 150 150 200 200 200 200 200 200 200 200 150
This happens with specific 150 200 200 200 200 200 200 200 200 200 200 50
customers, not randomly. 200 200 200 150 180 150 50 100 50 70 100 100
Here are such customers’ 100 100 100 100 100 100 100 100 100 100 110 100
100 150 123 123 50 100 50 100 100 100 100 100
meter readings.
0 111 100 100 100 100 100 100 100 100 50 50
0 100 27 100 50 100 100 100 100 100 70 100
If we define the “extent of
1 1 1 100 99 50 100 100 100 100 100 100
fraud” as the percentage
excess of the 100 unit
meter reading, Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
the value varies Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%
considerably Section 2 66% 92% New section
66% 87% 70% 64% is
… and 63% 50% 58% 38% 41% 54%
manager arrives transferred50%
out
across sections, Section 3 90% 46% 47% 43% 28% 31% 32% 19% 38% 8% 34%
Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%
and time
Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%
Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%
… with some
Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%
explainable Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%
anamolies. Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
7. SECURITIES FINDING PATTERNS
Which securities move together?
How should I diversify?
What should I sell to reduce risk?
What’s a reliable predictor of a security?
8. 68% correlation
between AUD & EUR
Plot of 6 month daily
AUD - EUR values … that move
counter-cyclically to
indices
Block of correlated
currencies
… clustered
hierarchically
Good evening. My name is Anand, and you can find more about me by googling for “S Anand”. My site is the first hit.I’ll be talking about recent trends in technology, and how you can leverage them.