SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
TABLE OF CONTENT

 1. Table of Content
 2. Introduction
       1. Summary
       2. About The Author
       3. Before We Begin
 3. Overview
       1. The Four Parts of a Language
       2. Meet Awesome: Our Toy Language
 4. Lexer
       1. Lex (Flex)
       2. Ragel
       3. Python Style Indentation For Awesome
       4. Do It Yourself I
 5. Parser
       1. Bison (Yacc)
       2. Lemon
       3. ANTLR
       4. PEGs
       5. Operator Precedence
       6. Connecting The Lexer and Parser in Awesome
       7. Do It Yourself II
 6. Runtime Model
       1. Procedural
       2. Class-based
       3. Prototype-based
       4. Functional
       5. Our Awesome Runtime
       6. Do It Yourself III
7. Interpreter
       1. Do It Yourself IV
 8. Compilation
       1. Using LLVM from Ruby
       2. Compiling Awesome to Machine Code
 9. Virtual Machine
       1. Byte-code
       2. Types of VM
       3. Prototyping a VM in Ruby
10. Going Further
       1. Homoiconicity
       2. Self-Hosting
       3. What’s Missing?
11. Resources
       1. Books & Papers
       2. Events
       3. Forums and Blogs
       4. Interesting Languages
12. Solutions to Do It Yourself
       1. Solutions to Do It Yourself I
       2. Solutions to Do It Yourself II
       3. Solutions to Do It Yourself III
       4. Solutions to Do It Yourself IV
13. Appendix: Mio, a minimalist homoiconic language
       1. Homoicowhat?
       2. Messages all the way down
       3. The Runtime
       4. Implementing Mio in Mio
       5. But it’s ugly
14. Farewell!
Published November 2011.

Cover background image © Asja Boros

Content of this book is © Marc-André Cournoyer. All right reserved. This eBook copy
is for a single user. You may not share it in any way unless you have written permission
of the author.
This is a sample chapter.
Buy the full book online at
   createyourproglang.com
PARSER

    By themselves, the tokens output by the lexer are just building blocks. The parser
    contextualiees them by organieing them in a structure. The lexer produces an array of
    tokensj the parser produces a tree of nodes.

    Lets take those tokens from previous section:


1    [IDENTIFIER print] [STRING "I ate"] [COMMA]
2                          [NUMBER 3] [COMMA]
3                          [IDENTIFIER pies]



    The most common parser output is an Abstract Syntax Tree, or AST. It’s a tree of
    nodes that represents what the code means to the language. The previous lexer
    tokens will produce the following:


1    [lCall name=print,
2           argements=[lString valee="I ate"k,
3                          lNemher valee=3k,
4                          lLocal name=piesk]
5    k]



    Or as a visual tree:

    Figure 2
The parser found that print was a method call and the following tokens are the
    arguments.

    Parser generators are commonly used to accomplish the otherwise tedious task of
    building a parser. Much like the English language, a programming language needs a
    grammar to define its rules. The parser generator will convert this grammar into a parser
    that will compile lexer tokens into AST nodes.


    BISON (YACC )

    Bison is a modern version of Yacc, the most widely used parser. Yacc stands for Yet
    Another Compiler Compiler, because it compiles the grammar to a compiler of
    tokens. It’s used in several mainstream languages, like Ruby. Most often used with Lex, it
    has been ported to several target languages.

             Racc for Ruby
             Ply for Python
             iavaCC for iava

    Like Lex, from the previous chapter, Yacc compiles a grammar into a parser. Here’s how
    a Yacc grammar rule is defined:


1    Call: /* Name of the rele */
2         Expression '.' IDENTIFIER                     { yy = CallNodefnew(y1, y3, NULL); }
3    j Expression '.' IDENTIFIER '(' ArgList ')'        { yy = CallNodefnew(y1, y3, y5); }
4    /*     y1       y2        y3     y4   y5      y6     l= valees from the rele are stored in
5                                                          these variahles. */
6    ;



    On the left is defined how the rule can be matched using tokens and other rules.
    On the right side, between brackets is the action to execute when the rule matches.
In that block, we can reference tokens being matched using $1, $2, etc. Finally, we store
the result in $$.


LEMON

Lemon is huite similar to Yacc, with a few differences. From its website:

                    Using a different grammar syntax which is less prone to programming

                    errors.

                    The parser generated by Lemon is both re-entrant and thread-safe.

                    Lemon includes the concept of a non-terminal destructor, which

                    makes it much easier to write a parser that does not leak memory.


For more information, refer to the the manual or check real examples inside Potion.


ANTLR

ANTLR is another parsing tool. This one let’s you declare lexing and parsing rules in
the same grammar. It has been ported to several target languages.


PEGS

Parsing Expression Grammars, or PEGs, are very powerful at parsing complex
languages. I’ve used a PEG generated from pegkleg in tinyrb to parse Ruby’s
infamous syntax with encouraging results (tinyrb’s grammar).

Treetop is an interesting Ruby tool for creating PEG.


OPERATOR PRECEDENCE

One of the common pitfalls of language parsing is operator precedence. Parsing x
+ y    * z should not produce the same result as (x                + y)      * z, same for all other
operators. Each language has an operator precedence table, often based on
    mathematics order of operations. Several ways to handle this exist. Yacc-based
    parsers implement the Shunting Yard algorithm in which you give a precedence
    level to each kind of operator. Operators are declared in Bison and Yacc with %left
    and %right macros. Read more in Bison’s manual.

    Here’s the operator precedence table for our language, based on the C language
    operator precedence:


1    left   '.'
2    right 'n'
3    left   '*' '/'
4    left   '+' '-'
5    left   'k' 'k=' 'l' 'l='
6    left   '==' 'n='
7    left   'vv'
8    left   'jj'
9    right '='
10   left   ','



    The higher the precedence (top is higher), the sooner the operator will be parsed. If
    the line a + b    * c is being parsed, the part b   * c will be parsed first since *
    has higher precedence than +. Now, if several operators having the same
    precedence are competing to be parsed all the once, the conflict is resolved using
    associativity, declared with the left and right keyword before the token. For
    example, with the expression a = b        = c. Since = has right-to-left associativity, it
    will start parsing from the right, b   = c. Resulting in a = (b = c).

    For other types of parsers (ANTLR and PEG) a simpler but less efficient alternative
    can be used. Simply declaring the grammar rules in the right order will produce the
    desired result:
1    expression:          exeality
 2    exeality:            additive ( ( '==' j 'n=' ) additive )*
 3    additive:            meltiplicative ( ( '+' j '-' ) meltiplicative )*
 4    meltiplicative:    primary ( ( '*' j '/' ) primary )*
 5    primary:            '(' expression ')' j NUMBER j zARIABLE j '-' primary



     The parser will try to match rules recursively, starting from expression and
     finding its way to primary. Since multiplicative is the last rule called in the
     parsing process, it will have greater precedence.


     CONNECTING THE LEXER AND PARSER IN
     AWESOME

     For our Awesome parser we’ll use Racc, the Ruby version of Yacc. It’s much harder
     to build a parser from scratch than it is to create a lexer. However, most languages
     end up writing their own parser because the result is faster and provides better error
     reporting.

     The input file you supply to Racc contains the grammar of your language and is very
     similar to a Yacc grammar.


 1    class qarser                                                                  grammar.y
 2

 3    g Declare toiens prodeced hy the lexer
 4    toien IF ELSE
 5    toien DEF
 6    toien CLASS
 7    toien NEWLINE
 8    toien NUMBER
 9    toien STRING
10    toien TRUE FALSE NIL
11    toien IDENTIFIER
12    toien CONSTANT
13    toien INDENT DEDENT
14
This is a sample chapter.
Buy the full book online at

  createyourproglang.com

Más contenido relacionado

La actualidad más candente

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)Binary Studio
 
Get started python programming part 1
Get started python programming   part 1Get started python programming   part 1
Get started python programming part 1Nicholas I
 
Python 3 Programming Language
Python 3 Programming LanguagePython 3 Programming Language
Python 3 Programming LanguageTahani Al-Manie
 
Intro to Python Programming Language
Intro to Python Programming LanguageIntro to Python Programming Language
Intro to Python Programming LanguageDipankar Achinta
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way PresentationAmira ElSharkawy
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonYi-Fan Chu
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Tom Lee
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]Inside Python [OSCON 2012]
Inside Python [OSCON 2012]Tom Lee
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists AnymoreJRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists AnymoreErin Dees
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]Tom Lee
 
introduction to python
 introduction to python introduction to python
introduction to pythonJincy Nelson
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric SystemErin Dees
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM CompilerErin Dees
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & stylePython Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & styleKevlin Henney
 
Learn python – for beginners
Learn python – for beginnersLearn python – for beginners
Learn python – for beginnersRajKumar Rampelli
 

La actualidad más candente (20)

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
 
Get started python programming part 1
Get started python programming   part 1Get started python programming   part 1
Get started python programming part 1
 
Python 3 Programming Language
Python 3 Programming LanguagePython 3 Programming Language
Python 3 Programming Language
 
Python - the basics
Python - the basicsPython - the basics
Python - the basics
 
Intro to Python Programming Language
Intro to Python Programming LanguageIntro to Python Programming Language
Intro to Python Programming Language
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]Inside Python [OSCON 2012]
Inside Python [OSCON 2012]
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists AnymoreJRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists Anymore
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
 
introduction to python
 introduction to python introduction to python
introduction to python
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
 
Python
PythonPython
Python
 
Unit VI
Unit VI Unit VI
Unit VI
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM Compiler
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & stylePython Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & style
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Learn python – for beginners
Learn python – for beginnersLearn python – for beginners
Learn python – for beginners
 

Similar a How to create a programming language

Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Renzo Borgatti
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysismengistu23
 
What is the deal with Elixir?
What is the deal with Elixir?What is the deal with Elixir?
What is the deal with Elixir?George Coffey
 
Mozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationMozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationCorey Richardson
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical AnalyzerArchana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Erlang kickstart
Erlang kickstartErlang kickstart
Erlang kickstartRyan Brown
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholisticoscon2007
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
JavaScript: Core Part
JavaScript: Core PartJavaScript: Core Part
JavaScript: Core Part維佋 唐
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptxSouvikRoy149
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationAkhil Kaushik
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Keyappasami
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environmentsJ'tong Atong
 
117 A Outline 25
117 A Outline 25117 A Outline 25
117 A Outline 25wasntgosu
 

Similar a How to create a programming language (20)

Parser
ParserParser
Parser
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
What is the deal with Elixir?
What is the deal with Elixir?What is the deal with Elixir?
What is the deal with Elixir?
 
Mozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationMozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 Presentation
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Perl intro
Perl introPerl intro
Perl intro
 
Erlang kickstart
Erlang kickstartErlang kickstart
Erlang kickstart
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
JavaScript: Core Part
JavaScript: Core PartJavaScript: Core Part
JavaScript: Core Part
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptx
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code Generation
 
Parsing
ParsingParsing
Parsing
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
117 A Outline 25
117 A Outline 25117 A Outline 25
117 A Outline 25
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

How to create a programming language

  • 1.
  • 2. TABLE OF CONTENT 1. Table of Content 2. Introduction 1. Summary 2. About The Author 3. Before We Begin 3. Overview 1. The Four Parts of a Language 2. Meet Awesome: Our Toy Language 4. Lexer 1. Lex (Flex) 2. Ragel 3. Python Style Indentation For Awesome 4. Do It Yourself I 5. Parser 1. Bison (Yacc) 2. Lemon 3. ANTLR 4. PEGs 5. Operator Precedence 6. Connecting The Lexer and Parser in Awesome 7. Do It Yourself II 6. Runtime Model 1. Procedural 2. Class-based 3. Prototype-based 4. Functional 5. Our Awesome Runtime 6. Do It Yourself III
  • 3. 7. Interpreter 1. Do It Yourself IV 8. Compilation 1. Using LLVM from Ruby 2. Compiling Awesome to Machine Code 9. Virtual Machine 1. Byte-code 2. Types of VM 3. Prototyping a VM in Ruby 10. Going Further 1. Homoiconicity 2. Self-Hosting 3. What’s Missing? 11. Resources 1. Books & Papers 2. Events 3. Forums and Blogs 4. Interesting Languages 12. Solutions to Do It Yourself 1. Solutions to Do It Yourself I 2. Solutions to Do It Yourself II 3. Solutions to Do It Yourself III 4. Solutions to Do It Yourself IV 13. Appendix: Mio, a minimalist homoiconic language 1. Homoicowhat? 2. Messages all the way down 3. The Runtime 4. Implementing Mio in Mio 5. But it’s ugly 14. Farewell!
  • 4. Published November 2011. Cover background image © Asja Boros Content of this book is © Marc-André Cournoyer. All right reserved. This eBook copy is for a single user. You may not share it in any way unless you have written permission of the author.
  • 5. This is a sample chapter. Buy the full book online at createyourproglang.com
  • 6. PARSER By themselves, the tokens output by the lexer are just building blocks. The parser contextualiees them by organieing them in a structure. The lexer produces an array of tokensj the parser produces a tree of nodes. Lets take those tokens from previous section: 1 [IDENTIFIER print] [STRING "I ate"] [COMMA] 2 [NUMBER 3] [COMMA] 3 [IDENTIFIER pies] The most common parser output is an Abstract Syntax Tree, or AST. It’s a tree of nodes that represents what the code means to the language. The previous lexer tokens will produce the following: 1 [lCall name=print, 2 argements=[lString valee="I ate"k, 3 lNemher valee=3k, 4 lLocal name=piesk] 5 k] Or as a visual tree: Figure 2
  • 7. The parser found that print was a method call and the following tokens are the arguments. Parser generators are commonly used to accomplish the otherwise tedious task of building a parser. Much like the English language, a programming language needs a grammar to define its rules. The parser generator will convert this grammar into a parser that will compile lexer tokens into AST nodes. BISON (YACC ) Bison is a modern version of Yacc, the most widely used parser. Yacc stands for Yet Another Compiler Compiler, because it compiles the grammar to a compiler of tokens. It’s used in several mainstream languages, like Ruby. Most often used with Lex, it has been ported to several target languages. Racc for Ruby Ply for Python iavaCC for iava Like Lex, from the previous chapter, Yacc compiles a grammar into a parser. Here’s how a Yacc grammar rule is defined: 1 Call: /* Name of the rele */ 2 Expression '.' IDENTIFIER { yy = CallNodefnew(y1, y3, NULL); } 3 j Expression '.' IDENTIFIER '(' ArgList ')' { yy = CallNodefnew(y1, y3, y5); } 4 /* y1 y2 y3 y4 y5 y6 l= valees from the rele are stored in 5 these variahles. */ 6 ; On the left is defined how the rule can be matched using tokens and other rules. On the right side, between brackets is the action to execute when the rule matches.
  • 8. In that block, we can reference tokens being matched using $1, $2, etc. Finally, we store the result in $$. LEMON Lemon is huite similar to Yacc, with a few differences. From its website: Using a different grammar syntax which is less prone to programming errors. The parser generated by Lemon is both re-entrant and thread-safe. Lemon includes the concept of a non-terminal destructor, which makes it much easier to write a parser that does not leak memory. For more information, refer to the the manual or check real examples inside Potion. ANTLR ANTLR is another parsing tool. This one let’s you declare lexing and parsing rules in the same grammar. It has been ported to several target languages. PEGS Parsing Expression Grammars, or PEGs, are very powerful at parsing complex languages. I’ve used a PEG generated from pegkleg in tinyrb to parse Ruby’s infamous syntax with encouraging results (tinyrb’s grammar). Treetop is an interesting Ruby tool for creating PEG. OPERATOR PRECEDENCE One of the common pitfalls of language parsing is operator precedence. Parsing x + y * z should not produce the same result as (x + y) * z, same for all other
  • 9. operators. Each language has an operator precedence table, often based on mathematics order of operations. Several ways to handle this exist. Yacc-based parsers implement the Shunting Yard algorithm in which you give a precedence level to each kind of operator. Operators are declared in Bison and Yacc with %left and %right macros. Read more in Bison’s manual. Here’s the operator precedence table for our language, based on the C language operator precedence: 1 left '.' 2 right 'n' 3 left '*' '/' 4 left '+' '-' 5 left 'k' 'k=' 'l' 'l=' 6 left '==' 'n=' 7 left 'vv' 8 left 'jj' 9 right '=' 10 left ',' The higher the precedence (top is higher), the sooner the operator will be parsed. If the line a + b * c is being parsed, the part b * c will be parsed first since * has higher precedence than +. Now, if several operators having the same precedence are competing to be parsed all the once, the conflict is resolved using associativity, declared with the left and right keyword before the token. For example, with the expression a = b = c. Since = has right-to-left associativity, it will start parsing from the right, b = c. Resulting in a = (b = c). For other types of parsers (ANTLR and PEG) a simpler but less efficient alternative can be used. Simply declaring the grammar rules in the right order will produce the desired result:
  • 10. 1 expression: exeality 2 exeality: additive ( ( '==' j 'n=' ) additive )* 3 additive: meltiplicative ( ( '+' j '-' ) meltiplicative )* 4 meltiplicative: primary ( ( '*' j '/' ) primary )* 5 primary: '(' expression ')' j NUMBER j zARIABLE j '-' primary The parser will try to match rules recursively, starting from expression and finding its way to primary. Since multiplicative is the last rule called in the parsing process, it will have greater precedence. CONNECTING THE LEXER AND PARSER IN AWESOME For our Awesome parser we’ll use Racc, the Ruby version of Yacc. It’s much harder to build a parser from scratch than it is to create a lexer. However, most languages end up writing their own parser because the result is faster and provides better error reporting. The input file you supply to Racc contains the grammar of your language and is very similar to a Yacc grammar. 1 class qarser grammar.y 2 3 g Declare toiens prodeced hy the lexer 4 toien IF ELSE 5 toien DEF 6 toien CLASS 7 toien NEWLINE 8 toien NUMBER 9 toien STRING 10 toien TRUE FALSE NIL 11 toien IDENTIFIER 12 toien CONSTANT 13 toien INDENT DEDENT 14
  • 11. This is a sample chapter. Buy the full book online at createyourproglang.com