This article will give you an introduction to installing PostgreSql modules.
- Learn how to query the key-value pairs with hstore
- Store and validate ISBN numbers with isn
- Store encrypted data with chkpass
- Do partial keyword match (fuzzy string matching) with fuzzystrmatch
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Install and Use PostgreSQL Modules
1. Admin How To
Installing
and Using
PostgreSQL
Modules
In this article, we will learn how to install and use the PostgreSQL modules chkpass,
fuzzystrmatch, isn and hstore. Modules add different capabilities to a database, like
admin and monitoring tools, new data types, operators, functions and algorithms.
Let’s look at modules that add new data types and algorithms, which will help us to
push some of the application logic to the database.
P
ostgreSQL has been called the ‘most advanced open su postgres
source database’. I have been using it for the last four createdb module_test
years as an RDBMS for Foodlets.in, and as a spatial
data store at CSTEP (Center for Study of Science, Technology Apply the chkpass, fuzzystrmatch, isn and hstore modules
and Policy). PostgreSQL is one piece of software that doesn’t to the module_test database by running the following
fail to impress me every now and then. commands:
Installing the modules psql -d module_test -f chkpass.sql
psql -d module_test -f fuzzystrmatch.sql
Note: I am running Ubuntu 10.04 and PostgreSQL 8.4. psql -d module_test -f isn.sql
psql -d module_test -f hstore.sql
Install the postgresql-contrib package and restart the
database server, then check the contrib directory for the list of Let us now look at an example of how each of the
available modules: modules is used.
sudo apt-get install postgresql-contrib Using chkpass
sudo /etc/init.d/postgresql-8.4 restart The chkpass module will introduce a new data type,
cd /usr/share/postgresql/8.4/contrib/ ‘chkpass’, in the database. This type is used to store an
ls encrypted field, e.g., a password. Let’s see how chkpass
works for a user account table that we create and insert
Create a test database called module_test: two rows into:
88 | March 2012 | LINUX For You | www.LinuxForU.com
2. How To Admin
CREATE TABLE accounts (username varchar(100), password Using isn
chkpass); This module will introduce data types to store
INSERT INTO accounts(username, “password”) VALUES (‘user1’, international standard numbers like International Standard
‘pass1’); Book Numbers (ISBN), International Standard Music
INSERT INTO accounts(username, “password”) VALUES (‘user2’, Numbers (ISMN), International Standard Serial Numbers
‘pass2’); (ISSN), Universal Product Codes (UPC), etc. It will also
add functions to validate data, type-cast numbers from
We can authenticate users with a query like the one older formats to the newer 13-digit formats, and vice-
that follows: versa. Let’s test this module for storing book information:
SELECT count(*) from accounts where username=’user1’ and CREATE TABLE books(number isbn13, title varchar(100))
password = ‘pass1’ INSERT INTO books(“number”, title) VALUES (‘978-03’,
‘Rework’);
The ‘=’ operator uses the eq(column_name, text) in
the module to test for equality. Chkpass uses the Unix The INSERT statement throws an error: Invalid
crypt() function, and hence it is weak; only the first eight input syntax for ISBN number: “978-03”. However, this
characters of the text are used in the algorithm. Chkpass works just fine:
has limited practical use; the pgcrypto module is an
effective alternative. INSERT INTO books(“number”, title) VALUES (‘978-0307463746’,
‘Rework’)
Using fuzzystrmatch
This module installs the soundx(), difference(), To convert a 10-digit ISBN to 13 digits, use the
levenshtein() and metaphone() functions. Soundx() and isbn13() function:
metaphone() are phonetic algorithms—they convert a
text string to a code string based on its pronunciation. INSERT INTO books(“number”, title) VALUES
Difference() and levenshtein() return a numeric value (isbn13(‘0307463745’), ‘Rework’)
based on the similarity of the two input strings. Let’s
now look into the levenshtein() and metaphone() (Actually, the name of the book mentioned here,
functions. The Levenshtein distance between two 'Rework' by Jason Fried, happens to be my favourite
strings is the minimum number of insertions, deletions book on product/project management! I have prescribed
or substitutions required to convert one string to it to all my team-mates.)
another.
Using hstore
SELECT levenshtein(‘foodlets’, ‘booklets’); You must have heard enough about NoSQL and key-
value databases. It’s not always NoSQL vs relational
This query returns 2, as is obvious. databases—with the hstore module, PostgreSQL
The metaphone() function takes a text string and allows you to store data in the form of key-value pairs,
the maximum length of the output code as its two input within a column of a table. Imagine you are processing
parameters. These examples return FTLTS: spreadsheets and you have no idea about the column
headers and the data type of the data in the sheets.
SELECT metaphone(‘foodlets’, 6); That’s when hstore comes to your rescue! Incidentally,
SELECT metaphone(‘fudlets’, 6); hstore takes keys and values as text; the value can
be NULL, but not the key. Let’s create a table with a
If we try to get the Levenshtein distance between the column of type hstore and insert some rows:
returned strings, this returns 0:
CREATE TABLE kv_data( id integer, data hstore)
SELECT levenshtein(‘FTLTS’,’FTLTS’); INSERT into kv_data values
(1, hstore(‘name’, ‘amit’) || hstore(‘city’, ‘bangalore’)),
This means that the two words sound similar. (2, hstore(‘name’, ‘raghu’) || hstore(‘age’, ‘26’)),
Fuzzystrmatch is very helpful in implementing the (3, hstore(‘name’, ‘ram’) || hstore(‘age’, ‘28’));
search feature for a website. Now the search can work with
alternate spellings and misspelled keywords. Reminds you You can create your own keys like ‘height’,
of the ‘Did you mean...’ feature on Google Search, right? ‘favourite_book,’ etc. The ‘||’ operator is used for
www.LinuxForU.com | LINUX For You | March 2012 | 89
3. Admin How To
concatenation. Now that we have a table and a few
rows of data, let’s look at some SELECT, UPDATE and
DELETE queries. To select rows with the value for ‘city’ as
‘bangalore’, use the following query:
SELECT * from kv_data where data->’city’ = ‘bangalore’
To get the average age across the table (returns 27.0), use
the query given below:
SELECT avg((data->’age’)::integer) age from kv_data;
Here, ::integer is used to type-cast the text value to an
integer, so that math operations can be performed on it.
To select and sort rows by ‘name’ values, use:
SELECT * from kv_data order by data->’name’ desc
Update the ‘city’ value to ‘delhi’ for all rows, as follows:
UPDATE kv_data SET data = data || (‘city’ => ‘delhi’);
Then, delete the ‘age’ key (and values) from all rows, as
shown below:
UPDATE kv_data set data = delete(data, ‘age’)
Next, delete rows with the ‘name’ as ‘amit’:
DELETE from kv_data where data->’name’ = ‘amit’
Although not a full-fledged key-value storage, hstore does
provide us with the flexibility of a key-value database and the
power of SQL queries.
Other useful modules
Here are some other modules you may find useful:
• Pgcrypto provides functions for hashing and
encryption. It supports SHA, MD5, Blowfish, AES
and other algorithms.
• Citext adds a case-insensitive text data type, which
stores text in lower-case form.
• Uuid-ossp provides functions to generate universally
unique identifiers.
• Pg_trgm adds functions to find text similarity based
on trigram matching.
By: Sagar Arlekar
The author is a research engineer at CSTEP, Bengaluru. He
works in the domains of GIS and agent-based simulations. He
co-founded Foodlets.in, a visual food guide built entirely on
open source technologies.
90 | March 2012 | LINUX For You | www.LinuxForU.com