1 / 19

Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages

Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages. Instructor: Li Ma Department of Computer Science Texas Southern University, Houston. January, 2008. Review and Preview. Last lecture Introduction to programming languages Fundamental concepts

gitano
Download Presentation

Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Languages and DesignLecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern University, Houston January, 2008

  2. Review and Preview • Last lecture • Introduction to programming languages • Fundamental concepts • Computation models • Programming models/paradigms • Program processing • Today’s lecture • Syntax specifications of programming languages • Reference: Chapter 4 of “Foundations of Programming Languages: Design and Implementation”, S. H. Roosta • Three mechanisms: regular expressions, formal grammars, attribute grammars

  3. Language Description • A formal language is any set of character strings with characters chosen from a fixed, finite set of an alphabet of symbols • The strings that belong to the language are called its constructs, or phrases • Any programming language description can be classified according to its • Syntax, which deals with the formation of phrases • Semantics, which deals with the meaning of phrases • Pragmatics, which deals with the practical use of phrases

  4. Syntax • Syntax refers to the formation of constructs in the language and defines relations between them • It describes the structure of the language without addressing the meaning of the constructs of the language • Syntax of a programming language is similar to the grammar of a natural language • Three mechanisms describe the design and implementation of programming languages • Regular expressions • Formal grammars • Attribute grammars

  5. Regular Expressions • Invented by Stephen Kleene in about 1950 • Represent a form of language definition • Each regular expression E denotes some language L(E) defined over the alphabet of the language • Defined by the following set of rules • Alternation • If a and b are regular expressions, then so is (a+b) • The language defined by (a+b) has all the strings from the language identified by a and all strings from the language identified by b • Concatenation • If a and b are regular expressions, then so is (a*b) • The language defined by (a*b) has all the strings formed by concatenating a string from the set of strings identified by a to the end of a string in the set identified by b

  6. Regular Expressions (cont’) • Defined by the following set of rules (cont’) • Kleene closure • If a is a regular expression, then so is a* • The defined language of a* consists of all the strings formed by concatenating zero or more strings in the language identified by a • Positive closure • If a is a regular expression, then so is a+ • The defined language of a+ consists of all the strings formed by concatenating one or more strings in the language identified by a • a+ is the same as a* except that ε is excluded • Empty • ø is a regular expression and defined language consisting of no strings • Atom • any single symbol such as a or ε is a regular expression with a defined language consisting of the single string {a} or {ε}

  7. Defined Language of the Regular Expressions

  8. Formal Grammars • A grammar is a notation that you can use to specify a structural description of the various constructs in the language • Four components of the grammar of a programming language • Terminal symbols • Variable symbols (nonterminal) • Production rules • Start symbol

  9. Production Rules • Each production rule has • symbols as its left side • the symbol => • a string over the set of terminals and variables as its right side • A production rule indicates that the left-side symbols drive or simply imply the right-side symbols • Derivation begins with the start symbol • Each successive string in the sequence derived from the preceding string

  10. Definitions for Grammar • The grammar of a programming language can be defined as a quadruple, G = (T, V, P, S) • T is a finite set of terminal symbols, lowercase characters • V is a finite set of variable symbols (V∩T = ø), uppercase characters • P is a finite set of production rules of the form α.X.β => δ, where α, β, and δ in (VUT)* and X in V • S in V is the start symbol of the phrase • Two grammars, G1 and G2, are equivalent if and only if L(G1) = L(G2)

  11. Classification of Grammars • Type 0: unrestricted grammar • Requires at least one nonterminal symbol on the left side of a production rule • Form α => β, where α in (VUT)+ and β in (VUT)* • Recursively enumerable grammar, or phrase structured grammar • Type 1: context-sensitive grammar • Requires that the right side of a production rule have no fewer symbols than the left side • Form α => β, where α = δ1Aδ2, β = δ1ωδ2, A in V, ω in (VUT)+ and δ1, δ2 in (VUT)*

  12. Classification of Grammars (cont’) • Type 2: context-free grammar • Requires that the left side of a production rule be a single variable symbol and the right side be a combination of terminal and variable terminals • Form A => α, where A in V and α in (VUT)* • Backus-Naur Form (BNF) grammar • Equivalent to context-free grammar • Differ only in the notation • Nonterminal enclosed by < > • The symbol ::= is used for derivation

  13. Classification of Grammars (cont’) • Type 3: regular grammar • Restricted to only one terminal or one terminal and one variable on the right side of a production rule • Restrictive grammar • Right-linear grammar • Form A => xB or A => x, where A, B in V, x in T • Rightmost derivation • Left-linear grammar • Form A => Bx or A => x, where A, B in V, x in T • Leftmost derivation

  14. Syntax Tree • Two parts of programming language syntax • Lexical syntax: describes the smallest units with significance, called tokens • Phrase-structure syntax: explains how tokens are arranged into programs • The syntactic structure of a phrase can be represented with a syntax tree (derivation tree or parse tree) • Terminal nodes – terminal symbols • Internal nodes – variable symbols • Root – start symbol • The label of an internal node – left side of the production rule; the labels of the children of the node (from left to right) – right side of the production rule

  15. Syntax Tree (cont’) • Recognition/representation • Determining whether the phrase is syntactically valid • Production rules are used to construct a syntax tree • The grammar-oriented compiling technique consists of two components • A lexical analyzer: convert the stream of input characters to a stream of tokens • A syntactic analyzer: form a derivation tree from the token list, is a combination of • A parser • An intermediate code generator

  16. Parsers • Parsing: deriving the parse tree • Two basic approaches to deriving parse trees • Top-down parsers • Begin with the start symbol as the root of the tree • Repeatedly replace variable symbols with a string of terminal symbols • Bottom-up parsers • Begin with a string of terminal symbols • Repeatedly replace sequences in the string with variable symbols • The process continues until the start symbol is produced • In both cases, the tree is the result of a syntactic analysis of the grammar

  17. Ambiguity • Ambiguous grammar: A grammar represents a phrase of its language in two or more derivation tree • Due to lack of syntactic structure • Should eliminate ambiguity whenever possible • Revise the grammar • Introduce a disambiguity rule

  18. BNF Variations • Other notational variations • Example: Notation { … }ij can be used to express any number n of occurrences of the enclosed sequence of symbols, for i≤n≤j • Extended BNF grammar • Add some extra notations to allow easier description of languages • Anything that can be specified with BNF can also be specified with Extended BNF (EBNF) grammar • Increases the readability and writability of the production rules • Syntax diagram • A pictorial technique, equivalent to BNF grammar • In this approach, each production rule is represented as a directed graph whose vertices are symbols • Terminal symbols: circles • Variable symbols: rectangles

  19. Attribute Grammars • Developed by Donald Knuth in 1968 • Powerful and elegant mechanisms that formalize both the context-free and context-sensitive aspects of a language’s syntax • Can be used to determine whether a variable has been declared and whether the use of the variable is consistent with its declaration • An extension to a context-free grammar with certain formal primitives • enable syntax aspects of a language to be specified more precisely

More Related