Have you ever written a long project for a simple column rename and thought, this should be easier? What about nicely named output statements? Yeah they bother me too. Oh, and DEDUP(SORT(DISTINCT()))? There is a better way! Learn how Dapper can help!
4. Engineers on big projects may need this level of control. But.
QAs Analysts
Developers
Data
Scientist
5. For these people, ECL syntax is a bit of a trial!
Dedup
• DEDUP(SORT(DISTRIBUTE(x, HASH(y)), x, LOCAL), x, LOCAL);
One column transform
• PROJECT(x, TRANSFORM(RECORDOF(LEFT), SELF.y := LEFT.y+1; SELF := LEFT;);
Named output
• OUTPUT(x, NAMED('x'));
Write to CSV
• OUTPUT(x, , '~ROB::TEMP::x', CSV(HEADING(SINGLE), SEPARATOR(','), TERMINATOR('n'),
QUOTE('"')));
Grouped count
• [I ran out of space]
Dapper – A Bundle to Make Your ECL Neater
6. How does this stuff work in other languages? Well, R is nice!
library(dplyr)
df <- read.csv('x')
df <- select(df, col1, col2)
df <- mutate(df, col3 = col1 +
col2)
df <- group_by(df, col3)
df <- summarise(df, col5 = n())
write.csv(df, file='output.csv')
Dapper – A Bundle to Make Your ECL Neater
7. How does this stuff work in other languages? Well, R is nice!
library(dplyr)
df <-
read.csv('x') %>%
select(col1, col2) %>%
mutate(col3 = col1 + col2)
%>%
group_by(col3) %>%
summarise(col5 = n()) %>%
write.csv(file='output.csv')
Dapper – A Bundle to Make Your ECL Neater
8. SQL is also lovely, but can be hard to arrange into a single call
SELECT COUNT(col2), col1 FROM TABLE GROUP BY
col1;
Dapper – A Bundle to Make Your ECL Neater
9. ….and Python is, as always, Python
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
…
Dapper – A Bundle to Make Your ECL Neater
13. View Data
//load data
StarWars :=
ExampleData.starwars;
// Look at the data
tt.nrows(StarWars);
tt.head(StarWars);
Dapper – A Bundle to Make Your ECL Neater
15. //Fill blank species with unknown
fillblankHome := tt.mutate(StarWars, species, IF(species = '', 'Unkn.', species));
tt.head(fillblankHome);
Fill in some blanks
Dapper – A Bundle to Make Your ECL Neater
16. That’s right, we don’t need LEFT or SELF!!!
What sorcery is this?!?!?
Dapper – A Bundle to Make Your ECL Neater
17. Okay, we now need to make our BMI column!
//make height meters
heightMeters := tt.mutate(fillblankHome, height, height/100);
//Create a BMI for each character
bmi := tt.append(heightMeters, REAL, BMI, mass/(height^2));
//Look at just the new column and name
bmiSelect := tt.select(bmi, 'name, bmi');
tt.head(bmiSelect);
Dapper – A Bundle to Make Your ECL Neater
18. Let's work through an example
Sort!
//Find the highest
sortedBMI := tt.arrange(bmiSelect, '-bmi');
tt.head(sortedBMI);
Dapper – A Bundle to Make Your ECL Neater
19. Lovely, I feel that’s
one of life’s great
questions
answered
I do of course
have other
questions on Star
Wars
20. Has anyone else noticed the lack of diversity in the SW
universe?
//How many of each species are there?
species := tt.countn(sortedBMI, 'species');
sortedspecies := tt.arrange(species, '-n');
tt.head(sortedspecies);
Dapper – A Bundle to Make Your ECL Neater
21. There are some pretty exciting eye colours though!
//Finally let's look at unique hair/eye colour combinations:
colourData := tt.select(StarWars, 'eye_color');
unqiueColours := tt.distinct(colourData, 'eye_color');
//see arrangedistinct() for fancy sort/dedup
tt.head(unqiueColours);
Dapper – A Bundle to Make Your ECL Neater
22. Let's work through an example
Save
//and save our results
tt.to_csv(sortedBMI, 'ROB::TEMP::STARWARSCSV');
tt.to_thor(sortedBMI, 'ROB::TEMP::STARWARS');
Dapper – A Bundle to Make Your ECL Neater
24. IMPORT dapper.ExampleData;
IMPORT dapper.TransformTools as tt;
//load data
StarWars := ExampleData.starwars;
// Look at the data
tt.nrows(StarWars);
tt.head(StarWars);
//Fill blank species with unknown
fillblankHome := tt.mutate(StarWars, species, IF(species = '', 'Unkn.',
species));
tt.head(fillblankHome);
//Create a BMI for each character
bmi := tt.append(fillblankHome, REAL, BMI, mass/height^2);
tt.head(bmi);
//Find the highest
sortedBMI := tt.arrange(bmi, '-bmi');
tt.head(sortedBMI);
//Jabba should probably go on a diet.
Dapper IMPORT dapper.ExampleData;
//load data
StarWars := ExampleData.starwars;
// Look at the data
OUTPUT(COUNT(StarWars), NAMED('COUNTstarWars'));
OUTPUT(StarWars, NAMED('starWars'));
//Fill blank species with unknown
//Create a BMI for each character
fillblankHomeAndBMI :=
PROJECT(StarWars,
TRANSFORM({RECORDOF(LEFT); REAL BMI;},
SELF.BMI := LEFT.mass / LEFT.Height^2;
SELF.species := IF(LEFT.species = '', 'Unkn.', LEFT.species);
SELF := LEFT;));
OUTPUT(fillblankHomeAndBMI, NAMED('fillblankHomeAndBMI'));
//Find the highest
sortedBMI := SORT(fillblankHomeAndBMI, -bmi);
OUTPUT(sortedBMI, NAMED('sortedBMI'));
//Jabba should probably go on a diet.
Base ECL
Dapper – A Bundle to Make Your ECL Neater
25. //How many of each species are there?
species := tt.countn(sortedBMI, 'species');
sortedspecies := tt.arrange(species, '-n');
tt.head(sortedspecies);
//Finally let's look at eye colour :
colourData := tt.select(StarWars, 'eye_color');
unqiueColours := tt.distinct(colourData, 'eye_color');
//see arrangedistinct() for fancy sort/dedup
tt.head(unqiueColours);
//and save our results
tt.to_csv(sortedBMI,
'ROB::TEMP::STARWARSCSV');
//How many of each species are there?
CountRec := RECORD
STRING Species := sortedBMI.species;
INTEGER n := COUNT(GROUP);
END;
species := TABLE(sortedBMI, CountRec, species);
sortedspecies := SORT(species, -n);
OUTPUT(sortedspecies, NAMED('sortedspecies'));
//Finally let's look at unique eye colour:
colourData := TABLE(sortedBMI, {eye_color});
unqiueColours := DEDUP(SORT(DISTRIBUTE(colourData,
HASH(eye_color)),
eye_color, LOCAL), eye_color, LOCAL);
OUTPUT(COUNT(unqiueColours), NAMED('COUNTunqiueColours'));
OUTPUT(unqiueColours, NAMED('unqiueColours'));
//and save our results
OUTPUT(sortedBMI, , 'ROB::TEMP::STARWARSCSV',
CSV(HEADING(SINGLE), SEPARATOR(','),
TERMINATOR('n'), QUOTE('"')));
Dapper
Base ECL
Dapper – A Bundle to Make Your ECL Neater
27. Interested? You can install from our GitHub:
ecl bundle install https://github.com/OdinProAgrica/dapper.git
There’s also a more in-depth walkthrough (and infographic)
here:
https://hpccsystems.com/blog/dapper-bundle
Similar projects? Yes, yes we have!
https://github.com/OdinProAgrica
Dapper – A Bundle to Make Your ECL Neater
28. Bonus deck! We would like to introduce you to hpycc
Dapper – A Bundle to Make Your ECL Neater
29. Hpycc is a Python package that builds on the ideas of Dapper
That is:
How can we make HPCC Systems more useable to the Data Scientist?
How can this translate to engineering and development?
Dapper – A Bundle to Make Your ECL Neater
30. Things I find overly taxing
• Spraying new data
• Running scripts that I can customise easily
• Getting the results of queries and files
• ECL dev when I’m offsite
Dapper – A Bundle to Make Your ECL Neater
31. What if you could run all this from a Python notebook?
Now you can!
Dapper – A Bundle to Make Your ECL Neater
32. For the purposes of this demo I’ve made a throwaway function
Dapper – A Bundle to Make Your ECL Neater
33. I’m dev-ing locally so I’ll need HPCC Systems running
…then create a connection to my server
Dapper – A Bundle to Make Your ECL Neater
34. Let’s grab the raw Star Wars dataset…
Dapper – A Bundle to Make Your ECL Neater
35. What if we have more than one output?
Dapper – A Bundle to Make Your ECL Neater
42. Interested? You can install from pypi:
pip install hpycc
There’s also a more info on our github:
Similar projects? Yes, yes we have!
https://github.com/OdinProAgrica
https://github.com/OdinProAgrica/hpycc
Dapper – A Bundle to Make Your ECL Neater
43. Watch this space for our most recent project: Wally!
Dapper – A Bundle to Make Your ECL Neater
44. A little flavour of what we have already…
Dapper – A Bundle to Make Your ECL Neater
45. Interested? You can install from our github:
pip install hpycc
There’s also a more info on our github:
Similar projects? Yes, yes we have!
https://github.com/OdinProAgrica
https://github.com/OdinProAgrica/wally
Dapper – A Bundle to Make Your ECL Neater
47. …we are also building a stringtools as part of the Dapper
bundle
IMPORT dapper.stringtools as st;
source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion';
target := 'nobody expects the spanish inquisition';
Dapper – A Bundle to Make Your ECL Neater
48. …we are also building a stringtools as part of the Dapper
bundle
source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion';
target := 'nobody expects the spanish inquisition';
Dapper – A Bundle to Make Your ECL Neater
49. …we are also building a stringtools as part of the Dapper
bundle
IMPORT STD;
source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion';
target := 'nobody expects the spanish inquisition';
one := TRIM(std.Str.ToLowerCase(source), LEFT, RIGHT);
two := REGEXREPLACE('1', one, 'body');
three := REGEXREPLACE('[^a-z ]', two, '');
four := REGEXREPLACE('mm', three, 'n');
five := REGEXREPLACE('req', four, 'inq');
six := REGEXREPLACE('s+', five, ' ');
six;
Dapper – A Bundle to Make Your ECL Neater
50. …we are also building a stringtools as part of the Dapper
bundle
IMPORT dapper.stringtools as st;
source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion';
target := 'nobody expects the spanish inquisition';
regexDS := DATASET([
{'1' , 'body'},
{'[^a-z ]', '' },
{'mm' , 'n' },
{'req' , 'inq' },
{'s+' , ' ' }
], {STRING Regex; STRING Repl;});
st.regexLoop(source, regexDS);
target;
Dapper – A Bundle to Make Your ECL Neater