How Programming Languages Work

Posted 2013-12-04 04:48 PM GMT

About a year ago I finished a Udacity course called "Programming Languages." This course opened my eyes to how modern programming languages work. I recommend it to anyone interested in becoming a better programmer. Here is a very brief summary of what I learned.

Grammar

Every programming language (and every human language) has a grammar. The grammar's of human languages evolve over time. Human lanaguages are so extensive, expressive, and ambiguous, that creating an exhaustive formal definition of the grammar is practically impossible. Programming languages, on the other hand, must be completely unambiguous (the computer is incapable of trying to figure out what you meant), and an exhaustive formal definitition of the grammar is a requirement.

The grammar defines the different parts of the language. For instance, a variable might be defined as any characters a-z, A-Z, 0-9, -, _, or $, but the first character must be a-z or A-Z.

Parser

The grammar is used to make a parser. The parser takes code (in the form of a string), and turns it into an Abstract Syntax Tree (commonly referred to as an AST). Parse errors occur when something in your code breaks the grammar.

Example parse error in English:

I has good grammars.

Example that will cause a parse error in JavaScript:

for (var i=0; i<5; i++ {                     // Note the parentheses is not closed here.
  console.log('Why won't this work?!?!')
}

Abstract Syntax Tree

The AST can be used by another computer program to compile the code, because an AST is a structured object that gives meaning to the different parts of the code.

Below is an example of JavaScript code and its corresponding AST (built using Esprima):

var myName = 'Steven';
{
    "type": "Program",
    "body": [
        {
            "type": "VariableDeclaration",
            "declarations": [
                {
                    "type": "VariableDeclarator",
                    "id": {
                        "type": "Identifier",
                        "name": "myName"
                    },
                    "init": {
                        "type": "Literal",
                        "value": "Steven",
                        "raw": "'Steven'"
                    }
                }
            ],
            "kind": "var"
        }
    ]
}

Compiler

The compiler "walks" the AST, and turns it into something that can be executed by the computer. Some compilers turn the code into binary, some turn the code into assembly, and some turn the code into some other high-level programming language. Compilation errors happen when the grammar is correct, but the code doesn't make any sense.

Example compilation error in English:

I have a goose, and the bus stop merry goes round.

Example compilation error Python:

def busStop():
    merry = goesRound   #goesRound was never defined.
    return merry

Interpreter

Some programming languages are not compiled at all. Instead they are interpreted. Instead of being compiled to machine code, the interpreter uses the AST to execute each instruction on the fly. Interpreted languages are slower than compiled languages, but they are generally easier and faster to develop because there is no compile step. Scripting languages like JavaScript, PHP, and Perl fall into this category.

Next week I will write a post about how all of this applies to Dust.js