SlideShare a Scribd company logo
1 of 32
Whirlwind tour of Pig
           Chris Wilkes
    cwilkes@seattlehadoop.org
Agenda


  1   Why Pig?
  2   Data types
  3   Operators
  4   UDFs
  5   Using Pig
Agenda


  1   Why Pig?
  2   Data types
  3   Operators
  4   UDFs
  5   Using Pig
Why Pig?                                      Tired of boilerplate


•   Started off writing Mappers/Reducers in Java

    •   Fun at first

    •   Gets a little tedious

•   Need to do more than one MR step

    •   Write own flow control

    •   Utility classes to pass parameters / input paths

•   Go back and change a Reducer’s input type

    •   Did you change it in the Job setup?

    •   Processing two different input types in first job
Why Pig?                 Java MapReduce boilerplate example




•   Typical use case: have two different input types

    •   log files (timestamps and userids)

    •   database table dump (userids and names)

•   Want to combine the two together

•   Relatively simple, but tedious
Why Pig?                   Java MapReduce boilerplate example

Need to handle two different output types, need a single
class that can handle both, designate with a “tag”:
 Mapper<LongWritable,Text,TaggedKeyWritable,TaggedValueWritable>
 Reducer<TaggedKeyWritable,TaggedValueWritable,Text,PurchaseInfoWritable>

Inside of Mapper check in setup() or run() for Path of
input to decide if this is a log file or database table
 if (context.getInputSplit().getPath().contains(“logfile”)) {
   inputType=”LOGFILE” } else if { ... inputType=”DATABASE”}

Reducer: check tag and then combine
 if (key.getTag().equals(“LOGFILE”) { LogEntry logEntry = value.getValue() } else
 (key.getTag().equals(“DATABASE”) { UserInfo userInfo = value.getValue() }
 context.write(userInfo.getId(), logEntry.getTime() + “ “ + userInfo.getName())
Where’s your shears?

"I was working on my thesis
and realized I needed a
reference. I'd
seen a post on comp.arch
recently that cited a paper,
so I fired up
gnus. While I was searching
the for the post, I came
across another
post whose MIME encoding
screwed up my ancient version
of gnus, so I
stopped and downloaded the
latest version of gnus.
Agenda


  1   Why Pig?
  2   Data types
  3   Operators
  4   UDFs
  5   Using Pig
Data Types




•   From largest to smallest:

    •   Bag (relation / group)

    •   Tuple

    •   Field

•   A bag is a collection of tuples, tuples have fields
Data Types                                                                   Bag

$ cat logs                    $ cat groupbooks.pig
101 1002       10.09          logs = LOAD 'logs' AS
101 8912       5.96             (userid: int, bookid: long, price: double);
102 1002       10.09          bookbuys = GROUP logs BY bookid;
103 8912       5.96           DESCRIBE bookbuys;
103 7122       88.99          DUMP bookbuys;

$ pig -x local groupbooks.pig
bookbuys: {group: long,logs: {userid: int,bookid: long,price: double}}
                                   Tuple
(1002L,{(101,1002L,10.09),(102,1002L,10.09)})
(7122L,{(103,7122L,88.99)})
(8912L,{(101,8912L,5.96),(103,8912L,5.96)})
                    Inner bag
                                                              Field
      Field
Data Types                                               Tuple and Fields

$ cat booksexpensive.pig
logs = LOAD 'logs' AS (userid: int, bookid: long, price: double);
bookbuys = GROUP logs BY bookid;
expensive = FOREACH bookbuys {
      inside = FILTER logs BY price > 6.0;
      GENERATE inside;
}
                                         Refers to the
DESCRIBE expensive;
                                            inner bag
DUMP expensive;
$ pig -x local booksexpensive.pig
expensive: {inside: {userid: int,bookid: long,price: double}}
({(101,1002L,10.09),(102,1002L,10.09)})
({(103,7122L,88.99)})
({})                Inner bag
                            Note: can always refer to $0, $1, etc
Agenda


  1   Why Pig?
  2   Data types
  3   Operators
  4   UDFs
  5   Using Pig
Operator                                                             Load

This will load all files under the logs/2010/05 directory
(or the logs/2010/05 file) and put into clicklogs:
      clicklogs = LOAD 'logs/2010/05';


Names the files in the tuple “userid” and “url” -- instead
of having to refer as $0 and $1

      clicklogs = LOAD 'logs/2010/05' as (userid: int, url: chararray)


              Inner bag occurs till a dump/store
  Note: no actual loading
  command is executed.
Operator                                           Load

By default splits on the tab character (the same as the
key/value separator in MapReduce jobs). Can also specify
your own delimiter:
         LOAD ‘logs’ USING PigStorage(‘~’)

PigStorage implements LoadFunc -- implement this
interface to create your own loader, ie “RegExLoader”
from the Piggybank.


              Inner bag
Operator                              Describe, Dump, and Store

“Describe” prints out that variable’s schema:
    DUMP combotimes;
    combotimes: {group: chararray,
      enter: {time: chararray,userid: chararray},
      exit: {time: chararray,userid: chararray,cost: double}}
To see output on the screen type “dump varname”:
    DUMP namesandaddresses;
 To output to a file / directory use store:
    STORE patienttrials INTO ‘trials/2010’;


                Inner bag
Operator                                                           Group

 $ cat starnames           $ cat starpositions
 1     Mintaka             1     R.A. 05h 32m 0.4s, Dec. -00 17' 57"
 2     Alnitak             2     R.A. 05h 40m 45.5s, Dec. -01 56' 34"
 3     Epsilon Orionis     3     R.A. 05h 36m 12.8s, Dec. -01 12' 07"
    $ cat starsandpositions.pig
    names = LOAD 'starnames' as (id: int, name: chararray);
    positions = LOAD 'starpositions' as (id: int, position: chararray);
    nameandpos = GROUP names BY id, positions BY id;
    DESCRIBE nameandpos;
    DUMP nameandpos;
 nameandpos: {group: int,names: {id: int,name: chararray},
  positions: {id: int,position: chararray}}
 (1,{(1,Mintaka)},{(1,R.A. bag 32m 0.4s, Dec. -00 17' 57")})
                   Inner 05h
 (2,{(2,Alnitak)},{(2,R.A. 05h 40m 45.5s, Dec. -01 56' 34")})
 (3,{(3,Epsilon Orionis)},{(3,R.A. 05h 36m 12.8s, Dec. -01 12' 07")})
Operator                                                                 Join

Just like GROUP but flatter
   $ cat starsandpositions2.pig
   names = LOAD 'starnames' as (id: int, name: chararray);
   positions = LOAD 'starpositions' as (id: int, position: chararray);
   nameandpos = JOIN names BY id, positions BY id;
   DESCRIBE nameandpos;
   DUMP nameandpos;

 nameandpos: {names::id: int,names::name: chararray,
 positions::id: int,positions::position: chararray}

 (1,Mintaka,1,R.A. 05h 32m 0.4s, Dec. -00 17' 57")
 (2,Alnitak,2,R.A.Inner bag
                   05h 40m 45.5s, Dec. -01 56' 34")
 (3,Epsilon Orionis,3,R.A. 05h 36m 12.8s, Dec. -01 12' 07")
Operator                                                          Flatten

Ugly looking output from before:
  expensive: {inside: {userid: int,bookid: long,price: double}}
  ({(101,1002L,10.09),(102,1002L,10.09)})
  ({(103,7122L,88.99)})
Use the FLATTEN operator
  expensive = FOREACH bookbuys {
        inside = FILTER logs BY price > 6.0;
        GENERATE group, FLATTEN (inside);
  }
  expensive: {group: long,inside::userid: int,inside::bookid:
  long,inside::price: double}
  (1002L,101,1002L,10.09)
                 Inner bag
  (1002L,102,1002L,10.09)
  (7122L,103,7122L,88.99)
Operator                                       Renaming in Foreach

 All columns with cumbersome names:
  expensive: {group: long,inside::userid: int,inside::bookid:
  long,inside::price: double}
 Pick and rename:
  expensive = FOREACH bookbuys {
     inside = FILTER logs BY price > 6.0;
     GENERATE group AS userid,
       FLATTEN (inside.(bookid, price)) AS (bookid, price);
  }
                                               Kept the type!
 Now easy to use:
  expensive: {userid: long,bookid: long,price: double}
  (1002L,1002L,10.09)
  (1002L,1002L,10.09) bag
                Inner
  (7122L,7122L,88.99)
Operator                                                          Split

When input file mixes types or needs separation
  $ cat enterexittimes
  2010-05-10 12:55:12     user123 enter
  2010-05-10 13:14:23     user456 enter
  2010-05-10 13:16:53     user123 exit 23.79
  2010-05-10 13:17:49     user456 exit 0.50

  inandout = LOAD 'enterexittimes';
  SPLIT inandout INTO enter1 IF $2 == 'enter', exit1 IF $2 == 'exit';

                               (2010-05-10 12:55:12,user123,enter)
                     enter1:
                               (2010-05-10 13:14:23,user456,enter)
  (2010-05-10 13:16:53,user123,exit,23.79)
                                            : exit1
  (2010-05-10 13:17:49,user456,exit,0.50)
Operator                                                          Split

If same schema for each line can specify on load, in this
case need to do a foreach:
 enter = FOREACH enter1 GENERATE
   (chararray)$0 AS time:chararray, (chararray)$1 AS userid:chararray;
 exit = FOREACH exit1 GENERATE
   (chararray)$0 AS time:chararray, (chararray)$1 AS userid:chararray,
   (double)$3 AS cost:double;
 DESCRIBE enter;
 DESCRIBE exit;

 enter: {time: chararray,userid: chararray}
 exit: {time: chararray,userid: chararray,cost: double}
Operator                                                 Sample, Limit

For testing purposes sample both large inputs:
   names1 = LOAD 'starnames' as (id: int, name: chararray);
   names = SAMPLE names1 0.3;
   positions1 = LOAD 'starpositions' as (id: int, position: chararray);
   positions = SAMPLE positions1 0.3;
Running returns random rows every time
   (1,Mintaka,1,R.A. 05h 32m 0.4s, Dec. -00 17' 57")
Limit only returns the first N results. Use with OrderBy
to return the top results:
   nameandpos1 = JOIN names BY id, positions BY id;
   nameandpos2 = ORDER nameandpos1 BY names::id DESC;
   nameandpos Inner bag
              = LIMIT nameandpos2 2;
   (3,Epsilon Orionis,3,R.A. 05h 36m 12.8s, Dec. -01 12' 07")
   (2,Alnitak,2,R.A. 05h 40m 45.5s, Dec. -01 56' 34")
Agenda


  1   Why Pig?
  2   Data types
  3   Operators
  4   UDFs
  5   Using Pig
UDF

UDF: User Defined Function
Operates on single values or a group
Simple example: IsEmpty (a FilterFunc)
   users = JOIN names BY id, addresses BY id;
   D = FOREACH users GENERATE group,
    FLATTEN ((IsEmpty(names::firstName) ? “none” : names::firstName)
Working over an aggregate, ie COUNT:
   users = JOIN names BY id, books BY buyerId;
   D = FOREACH users GENERATE group, COUNT(books)
Working on two values:
   distance1= CROSS stars and stars;
   distance =
Agenda


  1   Why Pig?
  2   Data types
  3   Operators
  4   UDFs
  5   Using Pig
LOAD and GROUP
logfiles = LOAD ‘logs’ AS (userid: int, bookid: long, price:
double);
userinfo = LOAD ‘users’ AS (userid: int, name: chararray);
userpurchases = GROUP logfiles BY userid, userinfo BY
userid;
DESCRIBE userpurchases;
DUMP userpurchases;
Inside {} are bags (unordered)
inside () are tuples (ordered list of fields)

report = FOREACH userpurchases GENERATE
FLATTEN(userinfo.name) AS name, group AS userid,
FLATTEN(SUM(logfiles.price)) AS cost;
bybigspender = ORDER report BY cost DESC;
DUMP bybigspender;

(Bob,103,94.94999999999999)
(Joe,101,16.05)
(Cindy,102,10.09)
Entering and exiting recorded in same file:

2010-05-10 12:55:12 user123 enter
2010-05-10 13:14:23 user456 enter
2010-05-10 13:16:53 user123 exit 23.79
2010-05-10 13:17:49 user456 exit 0.50
inandout = LOAD 'enterexittimes';
SPLIT inandout INTO enter
  IF $2 == 'enter', exit1 IF $2 == 'exit';

enter = FOREACH enter1 GENERATE
 (chararray)$0 AS time:chararray,
 (chararray)$1 AS userid:chararray;

exit = FOREACH exit1 GENERATE
 (chararray)$0 AS time:chararray,
 (chararray)$1 AS userid:chararray,
 (double)$3 AS cost:double;
combotimes = GROUP enter BY $1, exit BY $1;

purchases = FOREACH combotimes GENERATE
 group AS userid,
 FLATTEN(enter.$0) AS entertime,
 FLATTEN(exit.$0) AS exittime,
 FLATTEN(exit.$2);

DUMP purchases;
Schema for inandout, enter1, exit1 unknown.

enter: {time: chararray,userid: chararray}
exit: {time: chararray,userid: chararray,cost: double}

combotimes: {group: chararray,
 enter: {time: chararray,userid: chararray},
 exit: {time: chararray,userid: chararray,cost: double}}

purchases: {userid: chararray,entertime: chararray,
 exittime: chararray,cost: double}
UDFs
• User Defined Function
• For doing an operation on data
• Already use several builtins:
  • COUNT
  • SUM
•

More Related Content

What's hot

Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Workhorse Computing
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Workhorse Computing
 
(Parameterized) Roles
(Parameterized) Roles(Parameterized) Roles
(Parameterized) Rolessartak
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserHoward Lewis Ship
 
Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.
Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.
Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.Samuel Fortier-Galarneau
 
MTDDC 2010.2.5 Tokyo - Brand new API
MTDDC 2010.2.5 Tokyo - Brand new APIMTDDC 2010.2.5 Tokyo - Brand new API
MTDDC 2010.2.5 Tokyo - Brand new APISix Apart KK
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsSunil Kumar Gunasekaran
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - GuilinJackson Tian
 
Powerful JavaScript Tips and Best Practices
Powerful JavaScript Tips and Best PracticesPowerful JavaScript Tips and Best Practices
Powerful JavaScript Tips and Best PracticesDragos Ionita
 
Introdução ao Perl 6
Introdução ao Perl 6Introdução ao Perl 6
Introdução ao Perl 6garux
 
Lithium: The Framework for People Who Hate Frameworks
Lithium: The Framework for People Who Hate FrameworksLithium: The Framework for People Who Hate Frameworks
Lithium: The Framework for People Who Hate FrameworksNate Abele
 
Beyond javascript using the features of tomorrow
Beyond javascript   using the features of tomorrowBeyond javascript   using the features of tomorrow
Beyond javascript using the features of tomorrowAlexander Varwijk
 
PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)Nikita Popov
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
The Zen of Lithium
The Zen of LithiumThe Zen of Lithium
The Zen of LithiumNate Abele
 
Python decorators
Python decoratorsPython decorators
Python decoratorsAlex Su
 
Decorators in Python
Decorators in PythonDecorators in Python
Decorators in PythonBen James
 
Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Davide Rossi
 
Javascript the New Parts v2
Javascript the New Parts v2Javascript the New Parts v2
Javascript the New Parts v2Federico Galassi
 

What's hot (20)

Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
(Parameterized) Roles
(Parameterized) Roles(Parameterized) Roles
(Parameterized) Roles
 
Backbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The BrowserBackbone.js: Run your Application Inside The Browser
Backbone.js: Run your Application Inside The Browser
 
Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.
Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.
Decorators Explained: A Powerful Tool That Should Be in Your Python Toolbelt.
 
MTDDC 2010.2.5 Tokyo - Brand new API
MTDDC 2010.2.5 Tokyo - Brand new APIMTDDC 2010.2.5 Tokyo - Brand new API
MTDDC 2010.2.5 Tokyo - Brand new API
 
GORM
GORMGORM
GORM
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applications
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - Guilin
 
Powerful JavaScript Tips and Best Practices
Powerful JavaScript Tips and Best PracticesPowerful JavaScript Tips and Best Practices
Powerful JavaScript Tips and Best Practices
 
Introdução ao Perl 6
Introdução ao Perl 6Introdução ao Perl 6
Introdução ao Perl 6
 
Lithium: The Framework for People Who Hate Frameworks
Lithium: The Framework for People Who Hate FrameworksLithium: The Framework for People Who Hate Frameworks
Lithium: The Framework for People Who Hate Frameworks
 
Beyond javascript using the features of tomorrow
Beyond javascript   using the features of tomorrowBeyond javascript   using the features of tomorrow
Beyond javascript using the features of tomorrow
 
PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally? (Forum PHP 2015)
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
The Zen of Lithium
The Zen of LithiumThe Zen of Lithium
The Zen of Lithium
 
Python decorators
Python decoratorsPython decorators
Python decorators
 
Decorators in Python
Decorators in PythonDecorators in Python
Decorators in Python
 
Grails: a quick tutorial (1)
Grails: a quick tutorial (1)Grails: a quick tutorial (1)
Grails: a quick tutorial (1)
 
Javascript the New Parts v2
Javascript the New Parts v2Javascript the New Parts v2
Javascript the New Parts v2
 

Similar to Pig Introduction to Pig

Practical pig
Practical pigPractical pig
Practical pigtrihug
 
Getting Started with PL/Proxy
Getting Started with PL/ProxyGetting Started with PL/Proxy
Getting Started with PL/ProxyPeter Eisentraut
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePedro Figueiredo
 
Can't Miss Features of PHP 5.3 and 5.4
Can't Miss Features of PHP 5.3 and 5.4Can't Miss Features of PHP 5.3 and 5.4
Can't Miss Features of PHP 5.3 and 5.4Jeff Carouth
 
The Story About The Migration
 The Story About The Migration The Story About The Migration
The Story About The MigrationEDB
 
The Rust Programming Language: an Overview
The Rust Programming Language: an OverviewThe Rust Programming Language: an Overview
The Rust Programming Language: an OverviewRoberto Casadei
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?osfameron
 
Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Kang-min Liu
 
Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWorkhorse Computing
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mark Needham
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoMatt Stine
 

Similar to Pig Introduction to Pig (20)

Apache pig
Apache pigApache pig
Apache pig
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 
Practical pig
Practical pigPractical pig
Practical pig
 
Getting Started with PL/Proxy
Getting Started with PL/ProxyGetting Started with PL/Proxy
Getting Started with PL/Proxy
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
 
Can't Miss Features of PHP 5.3 and 5.4
Can't Miss Features of PHP 5.3 and 5.4Can't Miss Features of PHP 5.3 and 5.4
Can't Miss Features of PHP 5.3 and 5.4
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
 
The Story About The Migration
 The Story About The Migration The Story About The Migration
The Story About The Migration
 
The Rust Programming Language: an Overview
The Rust Programming Language: an OverviewThe Rust Programming Language: an Overview
The Rust Programming Language: an Overview
 
Groovy intro for OUDL
Groovy intro for OUDLGroovy intro for OUDL
Groovy intro for OUDL
 
Groovy
GroovyGroovy
Groovy
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?
 
Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)
 
Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#
 
Subroutines
SubroutinesSubroutines
Subroutines
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hw09 Hadoop + Clojure
Hw09   Hadoop + ClojureHw09   Hadoop + Clojure
Hw09 Hadoop + Clojure
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to Go
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Pig Introduction to Pig

  • 1. Whirlwind tour of Pig Chris Wilkes cwilkes@seattlehadoop.org
  • 2. Agenda 1 Why Pig? 2 Data types 3 Operators 4 UDFs 5 Using Pig
  • 3. Agenda 1 Why Pig? 2 Data types 3 Operators 4 UDFs 5 Using Pig
  • 4. Why Pig? Tired of boilerplate • Started off writing Mappers/Reducers in Java • Fun at first • Gets a little tedious • Need to do more than one MR step • Write own flow control • Utility classes to pass parameters / input paths • Go back and change a Reducer’s input type • Did you change it in the Job setup? • Processing two different input types in first job
  • 5. Why Pig? Java MapReduce boilerplate example • Typical use case: have two different input types • log files (timestamps and userids) • database table dump (userids and names) • Want to combine the two together • Relatively simple, but tedious
  • 6. Why Pig? Java MapReduce boilerplate example Need to handle two different output types, need a single class that can handle both, designate with a “tag”: Mapper<LongWritable,Text,TaggedKeyWritable,TaggedValueWritable> Reducer<TaggedKeyWritable,TaggedValueWritable,Text,PurchaseInfoWritable> Inside of Mapper check in setup() or run() for Path of input to decide if this is a log file or database table if (context.getInputSplit().getPath().contains(“logfile”)) { inputType=”LOGFILE” } else if { ... inputType=”DATABASE”} Reducer: check tag and then combine if (key.getTag().equals(“LOGFILE”) { LogEntry logEntry = value.getValue() } else (key.getTag().equals(“DATABASE”) { UserInfo userInfo = value.getValue() } context.write(userInfo.getId(), logEntry.getTime() + “ “ + userInfo.getName())
  • 7. Where’s your shears? "I was working on my thesis and realized I needed a reference. I'd seen a post on comp.arch recently that cited a paper, so I fired up gnus. While I was searching the for the post, I came across another post whose MIME encoding screwed up my ancient version of gnus, so I stopped and downloaded the latest version of gnus.
  • 8. Agenda 1 Why Pig? 2 Data types 3 Operators 4 UDFs 5 Using Pig
  • 9. Data Types • From largest to smallest: • Bag (relation / group) • Tuple • Field • A bag is a collection of tuples, tuples have fields
  • 10. Data Types Bag $ cat logs $ cat groupbooks.pig 101 1002 10.09 logs = LOAD 'logs' AS 101 8912 5.96 (userid: int, bookid: long, price: double); 102 1002 10.09 bookbuys = GROUP logs BY bookid; 103 8912 5.96 DESCRIBE bookbuys; 103 7122 88.99 DUMP bookbuys; $ pig -x local groupbooks.pig bookbuys: {group: long,logs: {userid: int,bookid: long,price: double}} Tuple (1002L,{(101,1002L,10.09),(102,1002L,10.09)}) (7122L,{(103,7122L,88.99)}) (8912L,{(101,8912L,5.96),(103,8912L,5.96)}) Inner bag Field Field
  • 11. Data Types Tuple and Fields $ cat booksexpensive.pig logs = LOAD 'logs' AS (userid: int, bookid: long, price: double); bookbuys = GROUP logs BY bookid; expensive = FOREACH bookbuys { inside = FILTER logs BY price > 6.0; GENERATE inside; } Refers to the DESCRIBE expensive; inner bag DUMP expensive; $ pig -x local booksexpensive.pig expensive: {inside: {userid: int,bookid: long,price: double}} ({(101,1002L,10.09),(102,1002L,10.09)}) ({(103,7122L,88.99)}) ({}) Inner bag Note: can always refer to $0, $1, etc
  • 12. Agenda 1 Why Pig? 2 Data types 3 Operators 4 UDFs 5 Using Pig
  • 13. Operator Load This will load all files under the logs/2010/05 directory (or the logs/2010/05 file) and put into clicklogs: clicklogs = LOAD 'logs/2010/05'; Names the files in the tuple “userid” and “url” -- instead of having to refer as $0 and $1 clicklogs = LOAD 'logs/2010/05' as (userid: int, url: chararray) Inner bag occurs till a dump/store Note: no actual loading command is executed.
  • 14. Operator Load By default splits on the tab character (the same as the key/value separator in MapReduce jobs). Can also specify your own delimiter: LOAD ‘logs’ USING PigStorage(‘~’) PigStorage implements LoadFunc -- implement this interface to create your own loader, ie “RegExLoader” from the Piggybank. Inner bag
  • 15. Operator Describe, Dump, and Store “Describe” prints out that variable’s schema: DUMP combotimes; combotimes: {group: chararray, enter: {time: chararray,userid: chararray}, exit: {time: chararray,userid: chararray,cost: double}} To see output on the screen type “dump varname”: DUMP namesandaddresses; To output to a file / directory use store: STORE patienttrials INTO ‘trials/2010’; Inner bag
  • 16. Operator Group $ cat starnames $ cat starpositions 1 Mintaka 1 R.A. 05h 32m 0.4s, Dec. -00 17' 57" 2 Alnitak 2 R.A. 05h 40m 45.5s, Dec. -01 56' 34" 3 Epsilon Orionis 3 R.A. 05h 36m 12.8s, Dec. -01 12' 07" $ cat starsandpositions.pig names = LOAD 'starnames' as (id: int, name: chararray); positions = LOAD 'starpositions' as (id: int, position: chararray); nameandpos = GROUP names BY id, positions BY id; DESCRIBE nameandpos; DUMP nameandpos; nameandpos: {group: int,names: {id: int,name: chararray}, positions: {id: int,position: chararray}} (1,{(1,Mintaka)},{(1,R.A. bag 32m 0.4s, Dec. -00 17' 57")}) Inner 05h (2,{(2,Alnitak)},{(2,R.A. 05h 40m 45.5s, Dec. -01 56' 34")}) (3,{(3,Epsilon Orionis)},{(3,R.A. 05h 36m 12.8s, Dec. -01 12' 07")})
  • 17. Operator Join Just like GROUP but flatter $ cat starsandpositions2.pig names = LOAD 'starnames' as (id: int, name: chararray); positions = LOAD 'starpositions' as (id: int, position: chararray); nameandpos = JOIN names BY id, positions BY id; DESCRIBE nameandpos; DUMP nameandpos; nameandpos: {names::id: int,names::name: chararray, positions::id: int,positions::position: chararray} (1,Mintaka,1,R.A. 05h 32m 0.4s, Dec. -00 17' 57") (2,Alnitak,2,R.A.Inner bag 05h 40m 45.5s, Dec. -01 56' 34") (3,Epsilon Orionis,3,R.A. 05h 36m 12.8s, Dec. -01 12' 07")
  • 18. Operator Flatten Ugly looking output from before: expensive: {inside: {userid: int,bookid: long,price: double}} ({(101,1002L,10.09),(102,1002L,10.09)}) ({(103,7122L,88.99)}) Use the FLATTEN operator expensive = FOREACH bookbuys { inside = FILTER logs BY price > 6.0; GENERATE group, FLATTEN (inside); } expensive: {group: long,inside::userid: int,inside::bookid: long,inside::price: double} (1002L,101,1002L,10.09) Inner bag (1002L,102,1002L,10.09) (7122L,103,7122L,88.99)
  • 19. Operator Renaming in Foreach All columns with cumbersome names: expensive: {group: long,inside::userid: int,inside::bookid: long,inside::price: double} Pick and rename: expensive = FOREACH bookbuys { inside = FILTER logs BY price > 6.0; GENERATE group AS userid, FLATTEN (inside.(bookid, price)) AS (bookid, price); } Kept the type! Now easy to use: expensive: {userid: long,bookid: long,price: double} (1002L,1002L,10.09) (1002L,1002L,10.09) bag Inner (7122L,7122L,88.99)
  • 20. Operator Split When input file mixes types or needs separation $ cat enterexittimes 2010-05-10 12:55:12 user123 enter 2010-05-10 13:14:23 user456 enter 2010-05-10 13:16:53 user123 exit 23.79 2010-05-10 13:17:49 user456 exit 0.50 inandout = LOAD 'enterexittimes'; SPLIT inandout INTO enter1 IF $2 == 'enter', exit1 IF $2 == 'exit'; (2010-05-10 12:55:12,user123,enter) enter1: (2010-05-10 13:14:23,user456,enter) (2010-05-10 13:16:53,user123,exit,23.79) : exit1 (2010-05-10 13:17:49,user456,exit,0.50)
  • 21. Operator Split If same schema for each line can specify on load, in this case need to do a foreach: enter = FOREACH enter1 GENERATE (chararray)$0 AS time:chararray, (chararray)$1 AS userid:chararray; exit = FOREACH exit1 GENERATE (chararray)$0 AS time:chararray, (chararray)$1 AS userid:chararray, (double)$3 AS cost:double; DESCRIBE enter; DESCRIBE exit; enter: {time: chararray,userid: chararray} exit: {time: chararray,userid: chararray,cost: double}
  • 22. Operator Sample, Limit For testing purposes sample both large inputs: names1 = LOAD 'starnames' as (id: int, name: chararray); names = SAMPLE names1 0.3; positions1 = LOAD 'starpositions' as (id: int, position: chararray); positions = SAMPLE positions1 0.3; Running returns random rows every time (1,Mintaka,1,R.A. 05h 32m 0.4s, Dec. -00 17' 57") Limit only returns the first N results. Use with OrderBy to return the top results: nameandpos1 = JOIN names BY id, positions BY id; nameandpos2 = ORDER nameandpos1 BY names::id DESC; nameandpos Inner bag = LIMIT nameandpos2 2; (3,Epsilon Orionis,3,R.A. 05h 36m 12.8s, Dec. -01 12' 07") (2,Alnitak,2,R.A. 05h 40m 45.5s, Dec. -01 56' 34")
  • 23. Agenda 1 Why Pig? 2 Data types 3 Operators 4 UDFs 5 Using Pig
  • 24. UDF UDF: User Defined Function Operates on single values or a group Simple example: IsEmpty (a FilterFunc) users = JOIN names BY id, addresses BY id; D = FOREACH users GENERATE group, FLATTEN ((IsEmpty(names::firstName) ? “none” : names::firstName) Working over an aggregate, ie COUNT: users = JOIN names BY id, books BY buyerId; D = FOREACH users GENERATE group, COUNT(books) Working on two values: distance1= CROSS stars and stars; distance =
  • 25. Agenda 1 Why Pig? 2 Data types 3 Operators 4 UDFs 5 Using Pig
  • 26. LOAD and GROUP logfiles = LOAD ‘logs’ AS (userid: int, bookid: long, price: double); userinfo = LOAD ‘users’ AS (userid: int, name: chararray); userpurchases = GROUP logfiles BY userid, userinfo BY userid; DESCRIBE userpurchases; DUMP userpurchases;
  • 27. Inside {} are bags (unordered) inside () are tuples (ordered list of fields) report = FOREACH userpurchases GENERATE FLATTEN(userinfo.name) AS name, group AS userid, FLATTEN(SUM(logfiles.price)) AS cost; bybigspender = ORDER report BY cost DESC; DUMP bybigspender; (Bob,103,94.94999999999999) (Joe,101,16.05) (Cindy,102,10.09)
  • 28. Entering and exiting recorded in same file: 2010-05-10 12:55:12 user123 enter 2010-05-10 13:14:23 user456 enter 2010-05-10 13:16:53 user123 exit 23.79 2010-05-10 13:17:49 user456 exit 0.50
  • 29. inandout = LOAD 'enterexittimes'; SPLIT inandout INTO enter IF $2 == 'enter', exit1 IF $2 == 'exit'; enter = FOREACH enter1 GENERATE (chararray)$0 AS time:chararray, (chararray)$1 AS userid:chararray; exit = FOREACH exit1 GENERATE (chararray)$0 AS time:chararray, (chararray)$1 AS userid:chararray, (double)$3 AS cost:double;
  • 30. combotimes = GROUP enter BY $1, exit BY $1; purchases = FOREACH combotimes GENERATE group AS userid, FLATTEN(enter.$0) AS entertime, FLATTEN(exit.$0) AS exittime, FLATTEN(exit.$2); DUMP purchases;
  • 31. Schema for inandout, enter1, exit1 unknown. enter: {time: chararray,userid: chararray} exit: {time: chararray,userid: chararray,cost: double} combotimes: {group: chararray, enter: {time: chararray,userid: chararray}, exit: {time: chararray,userid: chararray,cost: double}} purchases: {userid: chararray,entertime: chararray, exittime: chararray,cost: double}
  • 32. UDFs • User Defined Function • For doing an operation on data • Already use several builtins: • COUNT • SUM •

Editor's Notes