2. What is this?
● Perl becomes a deft manipulator of bytes
● Albeit slow, relative to C.
3. Other languages
● Ruby/PHP (same or follows closely)
● Python (suffers from Perl hatred)
– Changed the format grammar
● Node.JS (two versions: perl-like, python-like)
4. Useful?
● Good for small jobs, light duty
● Not something you use very often
● Will never be asked during a job interview
5. My story
● Worked at SOLiD Technology
● Proprietary protocol that ran over serial
● Same data frame was used in the socket
communication as well.
6. What we need to start
● Hex viewer
– we will use Ghex
● The binary data format specification
– We will use ID3v2
● Perl
– pack/unpack is a built-in
7. House rules
● We will mostly talk about unpack
– The format specifier is basically the same between
pack and unpack
● Whereas the documentation uses the term
template, I use the term format specifier
●
8. Example: ID3
● Mp3 with ID3v2 metadata
● Only parsing the header
● http://id3.org/id3v2.3.0#ID3v2_header
12. Pack/Unpack
my $file_name = "admiralbob77_-_Beautiful_Mystery_4.mp3";
open my $fh, "<", $file_name;
binmode $fh, ":raw";
read $fh, my $data, 128;
close $fh;
my @L;
@L = unpack "H H H H H H", $data;
print Dumper @L;
20. Format specifier
C
S
L
Q
B
Unsigned char (8-bit) value
Unsigned short (16-bit) value
Unsigned long (32-bit) value
Unsigned quad (64-bit) value
Bit String (descending order)
21. Format specifier
c
s
l
q
b
signed char (8-bit) value
signed short (16-bit) value
signed long (32-bit) value
signed quad (64-bit) value
bit String (ascending order)
22. Unpack major/minor
my ($tag, $other) = unpack "A3 a*", $data;
# major / minor version
my ($major, $minor, $other) = unpack "C C a*", $other;
print "major: $majorn";
print "minor: $minorn";
23. Unpack bit flags
my ($flags) = unpack "B3 a*", $other;
print "flags: $flagsn";
# split the flags
my ($flag_unsynchronization, $flag_extended_header,
$flag_experimental_indicator)
= split m{}, $flags;
print " flag_unsynchronization: $flag_unsynchronizationn";
print " flag_extended_header: $flag_extended_headern";
print " flag_experimental_indicator: $flag_experimental_indicatorn";
24. Unpack/Pack/Unpack
# Next 32 bits describes the size of header
my $bit_string = unpack "B32 a*", $other;
print "bit_string: $bit_stringn";
# Ignore all eighth bits
$bit_string = "0000" . join '', $bit_string =~ m{.(.{7})}g;
print "bit_string: $bit_stringn";
# pack to binary
# unpack to long int
my $size = unpack "L", pack "B*", $bit_string;
print "size: $sizen";
34. Big endian vs Little Endian
● Intel is Little Endian
● ID3v2 size is Big Endian
35. Format specifier
n
N
v
V
unsigned (16-bit) in "network" (big-endian) order.
unsigned (32-bit) in "network" (big-endian) order.
unsigned (16-bit) in "VAX" (little-endian) order.
unsigned (32-bit) in "VAX" (little-endian) order.
39. Named Unpack
use List::MoreUtils qw(part);
sub named_unpack {
my ($data, $format_list) = @_;
my %named;
my $i;
my ($lefts, $rights) = part { $i++ % 2 } @$format_list;
my $format = join ('', @$rights);
@named{@$lefts} = unpack ($format, $data);
return %named;
}