Regular Expressions, most often known as “RegEx” is one of the most popular and widely accepted technology used for parsing the specific data contents from large text.
3. RegEx Parsing
What is RegEx ?
Regular Expressions, most often known as “RegEx” is one of the most popular
and widely accepted technology used for parsing the specific data contents from
large text.
A regular expression is a specific pattern or a specific sequence of some special
characters (known as “meta characters”) that gives you an ability to concisely and
flexibly “match” or “capture” (specify and recognize) strings of text, such as
sequence of particular characters, words, or patterns of characters.
What is RegEx Parsing ?
The technique of extracting only the required data and neglecting all the other
unnecessery content from the given large text with the help of Regular
Expressions is nothing but "RegEx Parsing".
3
4. RegEx Parsing
Tools used for Parsing ?
There are many tools that are used for RegEx
parsing. Some of them are :
1) RegexBuddy
2) RegexPal
3) RegexMagic
5) RegexPlanet
6) Rubular
5. RegEx Parsing
How it Works ?
The main thing about regular expressions that makes it so
simple and useful for all is it’s syntax. The regular expression
syntax is declarative: The pattern "looks like" what you want to
match. Another most important thing that makes regular
expression to spread their magic very quickly is it’s vast and
powerful set of meta characters. Regular expressions are
blessed with very rich and powerful set of meta characters.
Each of these meta characters has its unique meaning in itself
and plays an important role independently or dependently in
making the regular expression more powerful.
6. Metacharacters of Regular Expression
d - Matches a digit character.
s
- Matches any whitespace character including space, tab, form-feed, etc.
w - Matches any word character including underscore.
n
- Matches a newline character.
*
- Matches the preceding subexpression zero or more times.
+
- Matches the preceding subexpression one or more times.
?
- Matches the preceding subexpression zero or one time.
[a-z] - A range of characters. Matches any character in the specified range.
6
8. Advantages
It is a compact way of describing sets of strings which conform to a pattern.
The regular expression syntax is declarative: The pattern "looks like" what
you want to match.
Easily compatible with different programing languages.
Easier for non-programmers than code.
Less error prone than code.
Use in Compiler Construction.
8