Programming Languages G22.2110-001 Walter Williams
This course delves into the design and implementation of programming languages, focusing on concepts like type systems, object-oriented programming, and language attributes. Students will explore different computation models such as imperative, functional, and declarative styles, as well as the roles of compilers and interpreters. Key textbooks include works by Scott on compilers, Barnes on Ada, and Paulson on ML. Emphasis is placed on practical programming skills and theoretical understanding, aiming to equip students with the knowledge to effectively design and analyze programming languages.
Programming Languages G22.2110-001 Walter Williams
E N D
Presentation Transcript
Programming LanguagesG22.2110-001Walter Williams G22.2110-001
Administrative Stuff • Homework, Exams, etc. • Weekly assignments • Programming projects • Mid-Term & Final Exams • No cheating • Join the mailing list: http://www.cs.nyu.edu/mailman/listinfo/g22_2110_001_su04 • Recitation G22.2110-001
What’s covered in Lectures & Texts • Purpose of course is for you to understand: • The issues involved in programming language design • The various strategies for programming and how languages support those strategies • Type systems, OO support, abstraction, concurrent & generic programming • Not just learning to program in different languages • TextBooks: • Scott – covers both compilers and programming languages. You can skip the compiler stuff. • Barnes – Ada language, used by Defense Dept and other critical applications. • Paulson – ML language, widely used in AI, Language theory, etc. • Others: • Stanley Lippman, The C++ Object Model; • Bjarne Stroustrup, The Design and Evolution of C++ • The Little Schemer • Java Language Specification G22.2110-001
Language & Communication • Human (Natural) Language • Problem Domain Language • Algorithmic Language • Documentation Language • Programming Language • Language as a tool for thought G22.2110-001
Programming Language Stakeholders • Software Developers • Specification & Design • Coding • Compiler Writers • Maintenance Programmers • Quality Control & Support • Management G22.2110-001
Language Attributes • Expressiveness • APL for arrays, Lisp for lists, etc. • All major languages are Turing complete • Efficiency • Of coding, compilation or execution • Readability • By programming experts, domain experts and non-experts • Scalability • Communicating parallel programmers • Modules, separate compilation and information hiding • Safety and Security • Market Attributes • Popularity => availability of programmers, tools, libraries, etc. G22.2110-001
Models (Styles) of Computation • Imperative (Procedural) • Mutable storage – modified by assignment • Fortran, Algol, C++, Java • Functional (Applicative) • Pure mathematical functions – no side effects • ML, Haskell, Smalltalk • Declarative • Programs are sets of (logical) assertions • Prolog, SQL • Object Oriented • Orthogonal to the three models above • Inheritance, Polymorphism, Encapsulation G22.2110-001
Compilers & Interpreters • Compiling vs. Interpreting • Compilers translate at compile time, once • Interpreters translate at runtime, every time • Front End • Syntactic Analysis: Lexical Analysis & Parsing • Semantic Analysis & Error Checking • Generates Intermediate Code • Back End • Most optimizations • Turns Intermediate Code into Executable G22.2110-001
Programming Environments • Development Environment • Interactive Development Environments • Smalltalk browser environment • Microsoft IDE • Development Frameworks • Swing, MFC • Language aware Editors • Libraries • Java Swing classes • C++ Standard Template Library (STL) • Libraries change much more quickly than the language • Libraries usually very different for different languages G22.2110-001
Lexical Issues • Lexical Elements are Tokens • Keywords, operators, punctuation, names, numbers, etc. • Tokens are described by regular expressions (Type 3 grammars) • Examples • Identifiers: letter (letter or digit)* • Integer: digit digit* • Terminal symbols of lexical grammar are usually characters • ASCII, Unicode, etc. • Escape sequences and tri-grams G22.2110-001
Syntax & Semantics • Syntax • Deals with Form • Gives structure to a stream of lexical elements • Semantics • Deals with meaning • Meaning often depends on context • Both syntax and semantics can be represented by grammars – attribute grammars are used for semantics. • Distinction is somewhat artificial • Syntax is that which can be conveniently expressed using a context free grammar • Semantics is everything else G22.2110-001
Language and Grammar • An Alphabet Σis a finite set of lexical symbols • Formal languages use letters of the alphabet as lexical symbols • Programming languages use Tokens • L systems use lines to draw realistic images of trees and flowers • Language L is a subset of strings in Σ* • A grammar G defines the subset of Σ* that belongs to L, and excludes the subset that does not belong to L. • A grammar can be used to generate new strings in L or to accept (or reject) strings in (or not it) L. G22.2110-001
CFG Example Block: { BlockStatementsopt } BlockStatements: BlockStatement BlockStatements BlockStatement BlockStatement: LocalVariableDeclarationStatement Statement LocalVariableDeclarationStatement: LocalVariableDeclaration ; LocalVariableDeclaration: TypeName VariableDeclaratorId Statement: while ( expr ) BlockStatement ; G22.2110-001
Context Free Grammars • Substitution Rules of the form: A ::= ω where A is a Non-Terminal symbol and ω is a string of terminal and non-terminal symbols • A Simple CFG for a language E • S ::= EXPR • S ::= EXPR S • EXPR ::= EXPR ‘+’ EXPR • EXPR ::= EXPR ‘–’ EXP • EXPR ::= ‘(‘ EXPR ‘)’ • EXPR ::= digit • At least one rule must have only terminal symbols on RHS • Every rule must have exactly one non-terminal on LHS • Terminal Symbols: digit + – ( ) • Non-Terminal Symbols: EXPR S • Examples of statements in E: 1 1+1 (1+1) - 1 G22.2110-001
Formal CFG • A CFG, G, is a 4-tuple G = (Σ, N, S, δ) • Σ is an alphabet of terminal symbols • N is a set of non-terminal symbols • S is a distinguished element of N, called the start symbol, which represents all strings in the language. • δ is a set of rules of the formA ::= ω | A N, ω (Σ, N)+ G22.2110-001
CFG Idioms • L ::= a L | a makes a list of one or more ‘a’s • L ::= a , L | a makes a comma separated list of ‘a’s • L ::= a L | λ makes a list of zero or more ‘a’s • λ is a null symbol • L :: L L | a | λanother way to make a list • P ::= (P) makes P’s within nested parenthesis of arbitrary depth. G22.2110-001
non-terminal symbols are identified by angle brackets e.g. <stmt> Terminal Symbols are token names or literal symbols “::=“ is definitional equivalence ‘|’ indicates “or” Many variations [ ] for optional elements Parentheses for grouping + and * (kleene star) Superscripts for n occurances Subscripts, opt in Java Italics or lowercase for Non-terminal symbols <stmt> ::= while (<exp>) <stmt> | if (<exp> ) <stmt> [else <stmt>] | id = EXP | <stmt_list> ; <stmt_list> ::= <stmt> | <stmt_list> <stmt> ; <exp> ::= <exp> <op> <exp> | ID | NUMBER; <op> ::= + | - | * | / ; Most language specifications use some variation of BNF Backus-Naur Form (BNF) G22.2110-001
Derivation & Parse Tree • Parse tree represents structure of parse • Leaf nodes are terminal symbols • Intermediate nodes are non-terminal symbols • Root node is start symbol of grammar • Derivation tree also records which rules were used to build tree • Each node represents a specific production • Example • (1 + 2 + 3 ) - 2 G22.2110-001
Grammars – Chomsky Hierarchy • Type 0 – Unrestricted • Can express anything that can be computed • Impossible to parse • Type 1 – Context Sensitive • Difficult to parse • Attribute Grammars used for programming language semantics • Type 2 – Context Free • CFGs used for describing programming language syntax • Type 3 – Regular • Used to describe lexical elements of programming languages G22.2110-001
Grammatical Problems • Programming languages use restricted grammars, such as LL or LR, which are not as powerful as general CFGs • Dangling Else – Not LR shift reduce conflict • S ::= if E then S • S ::= if E then S else S • Solutions: • Always choose shift • Specify endmarker e.g., endif • Left Recursion – Not LL • Ambiguity • Foo(A) (in C) declaration or use of function Foo? • Requires lookahead in parser or more complex grammar G22.2110-001
Programming Language History G22.2110-001