SlideShare a Scribd company logo
1 of 7
Regular Expression
R.Rajkumar
Asst.Professor
CSE
Lexical analyzer
• Lexical analysis, also called scanning, is the phase of the compilation
process which deals with the actual program being compiled, character by
character. The higher level parts of the compiler will call the lexical
analyzer with the command "get the next word from the input", and it is
the scanner's job to sort through the input characters and find this word.
• The types of "words" commonly found in a program are:
• programming language keywords, such as if, while, struct, int etc.
• operator symbols like =, +, -, &&, !, <= etc.
• other special symbols like: ( ), { }, [ ], ;, & etc.
• constants like 1, 2, 3, 'a', 'b', 'c', "any quoted string" etc.
• variable and function names (called identifers) such as x, i, t1 etc.
• Some languages (such as C) are case sensitive, in that they differentiate
between eg. if and IF; thus the former would be a keyword, the latter a
variable name.
Tokens
• Also, most languages would insist that identifers cannot be any of the keywords, or
contain operator symbols (versions of Fortran don't, making lexical analysis quite
difficult).
• In addition to the basic grouping process, lexical analysis usually performs the
following tasks:
• Since there are only a finite number of types of words, instead of passing the actual
word to the next phase we can save space by passing a suitable representation. This
representation is known as a token.
• If the language isn't case sensitive, we can eliminate differences between case at this
point by using just one token per keyword, irrespective of case; eg. #define IF-
TOKEN 1 #define WHILE-TOKEN 2 ..... ..... if we meet "IF", "If", "iF", "if" then return
IF_TOKEN if we meet "WHILE, "While", "WHile", ... then return WHILE-TOKEN
• We can pick out mistakes in the lexical syntax of the program such as using a
character which is not valid in the language. (Note that we do not worry about the
combination of patterns; eg. the pattern of characters"+*" would be returned
as PLUS-TOKEN, MULT-TOKEN, and it would be up to the next phase to see that
these should not follow in sequence.)
• We can eliminate pieces of the program that are no longer relevant, such as spaces,
tabs, carriage-returns (in most languages), and comments.
• In order to specify the lexical analysis process, what we need is some method of
describing which patterns of characters correspond to which words.
Regular Expressions
• Regular expressions are used to define patterns of characters; they are used in UNIX tools
such as awk, grep, vi and, of course, lex.
• A regular expression is just a form of notation, used for describing sets of words. For any
given set of characters , a regular expression over is defined by:
• The empty string, , which denotes a string of length zero, and means ``take nothing from
the input''. It is most commonly used in conjunction with other regular expressions eg. to
denote optionality.
• Any character in may be used in a regular expression. For instance, if we write a as a
regular expression, this means ``take the letter a from the input''; ie. it denotes the
(singleton) set of words {``a''}
• The union operator, ``|'', which denotes the union of two sets of words. Thus the regular
expression a|b denotes the set {``a'', ``b''}, and means ``take either the letter a or the
letter b from the input''
• Writing two regular expressions side-by-side is known as concatenation; thus the regular
expression ab denotes the set {``ab''} and means ``take the character a followed by the
character b from the input''.
• The Kleene closure of a regular expression, denoted by ``*'', indicates zero or more
occurrences of that expression. Thus a* is the (infinite) set {, ``a'', ``aa'', ``aaa'', ...} and
means ``take zero or more as from the input''.
• Brackets may be used in a regular expression to enforce precedence or increase clarity.
Thompson Algorithm
for converting RE to NFA
Lexical1
Lexical1

More Related Content

What's hot

4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysisjigeno
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysisSudhaa Ravi
 
4 lexical and syntax
4 lexical and syntax4 lexical and syntax
4 lexical and syntaxMunawar Ahmed
 
Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Daniyal Mughal
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysisIffat Anjum
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler designSudip Singh
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designSadia Akter
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisMunni28
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzerahmed51236
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyserabhishek gupta
 
System Programming Unit IV
System Programming Unit IVSystem Programming Unit IV
System Programming Unit IVManoj Patil
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processorshindept123
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationAkhil Kaushik
 

What's hot (20)

4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
4 lexical and syntax
4 lexical and syntax4 lexical and syntax
4 lexical and syntax
 
Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzer
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyser
 
System Programming Unit IV
System Programming Unit IVSystem Programming Unit IV
System Programming Unit IV
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical analysis-using-lex
Lexical analysis-using-lexLexical analysis-using-lex
Lexical analysis-using-lex
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processor
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code Generation
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 

Similar to Lexical1 (20)

Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnf
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
6. describing syntax and semantics
6. describing syntax and semantics6. describing syntax and semantics
6. describing syntax and semantics
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Lexical
LexicalLexical
Lexical
 
Lexical analysis
Lexical analysisLexical analysis
Lexical analysis
 
A Quick Taste of C
A Quick Taste of CA Quick Taste of C
A Quick Taste of C
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Java unit 2
Java unit 2Java unit 2
Java unit 2
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

Lexical1

  • 2. Lexical analyzer • Lexical analysis, also called scanning, is the phase of the compilation process which deals with the actual program being compiled, character by character. The higher level parts of the compiler will call the lexical analyzer with the command "get the next word from the input", and it is the scanner's job to sort through the input characters and find this word. • The types of "words" commonly found in a program are: • programming language keywords, such as if, while, struct, int etc. • operator symbols like =, +, -, &&, !, <= etc. • other special symbols like: ( ), { }, [ ], ;, & etc. • constants like 1, 2, 3, 'a', 'b', 'c', "any quoted string" etc. • variable and function names (called identifers) such as x, i, t1 etc. • Some languages (such as C) are case sensitive, in that they differentiate between eg. if and IF; thus the former would be a keyword, the latter a variable name.
  • 3. Tokens • Also, most languages would insist that identifers cannot be any of the keywords, or contain operator symbols (versions of Fortran don't, making lexical analysis quite difficult). • In addition to the basic grouping process, lexical analysis usually performs the following tasks: • Since there are only a finite number of types of words, instead of passing the actual word to the next phase we can save space by passing a suitable representation. This representation is known as a token. • If the language isn't case sensitive, we can eliminate differences between case at this point by using just one token per keyword, irrespective of case; eg. #define IF- TOKEN 1 #define WHILE-TOKEN 2 ..... ..... if we meet "IF", "If", "iF", "if" then return IF_TOKEN if we meet "WHILE, "While", "WHile", ... then return WHILE-TOKEN • We can pick out mistakes in the lexical syntax of the program such as using a character which is not valid in the language. (Note that we do not worry about the combination of patterns; eg. the pattern of characters"+*" would be returned as PLUS-TOKEN, MULT-TOKEN, and it would be up to the next phase to see that these should not follow in sequence.) • We can eliminate pieces of the program that are no longer relevant, such as spaces, tabs, carriage-returns (in most languages), and comments. • In order to specify the lexical analysis process, what we need is some method of describing which patterns of characters correspond to which words.
  • 4. Regular Expressions • Regular expressions are used to define patterns of characters; they are used in UNIX tools such as awk, grep, vi and, of course, lex. • A regular expression is just a form of notation, used for describing sets of words. For any given set of characters , a regular expression over is defined by: • The empty string, , which denotes a string of length zero, and means ``take nothing from the input''. It is most commonly used in conjunction with other regular expressions eg. to denote optionality. • Any character in may be used in a regular expression. For instance, if we write a as a regular expression, this means ``take the letter a from the input''; ie. it denotes the (singleton) set of words {``a''} • The union operator, ``|'', which denotes the union of two sets of words. Thus the regular expression a|b denotes the set {``a'', ``b''}, and means ``take either the letter a or the letter b from the input'' • Writing two regular expressions side-by-side is known as concatenation; thus the regular expression ab denotes the set {``ab''} and means ``take the character a followed by the character b from the input''. • The Kleene closure of a regular expression, denoted by ``*'', indicates zero or more occurrences of that expression. Thus a* is the (infinite) set {, ``a'', ``aa'', ``aaa'', ...} and means ``take zero or more as from the input''. • Brackets may be used in a regular expression to enforce precedence or increase clarity.