PLC : SESSION 2 (Describing Syntax and Semantics)

What is Syntax and Semantics?

Syntax: the form of the expressions, statements, and program units

Semantics: the meaning of the expressions, statements, and program units

  Ex:   while (<Boolean_expr>)<statement>

         * The semantics of this statement form is that when the current value of the Boolean expression is true, the embedded statement is executed.

*  The form of a statement should strongly suggest what the statement is meant to accomplish.

 

The General Problem of Describing Syntax

A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. The strings of a language are called sentences or statements. Formal descriptions of the syntax of programming languages, often do not include descriptions of the lowest-level syntactic units. These small units are called lexemes. The description of lexemes can be given by a lexical specification, which is usually separate from the syntactic description of the language. The lexemes of a programming language include its numeric literals, operators, and special words, among others. Each lexeme group is represented by a name, or token. So, a token of a language is a category of its lexemes. In some cases, a token has only a single possible lexeme.

 

Formal Definition of Languages

There are two formal definition of languages

  1. Recognizers

A recognition device reads input strings over the alphabet of the language and decides whether  the input strings belong to the language. The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this role, the recognizer need not test all possible strings of characters from some set to determine whether each is in the language. Rather, it need only determine whether given programs are in the language. In effect then, the syntax analyser determines whether the given programs are syntactically correct.

  1. Generators

A language generator is a device that can be used to generate the sentences of a language. We can think of the generator as having a button that produces a sentence of the language every time it is pushed. Because the particular sentences that is produced by a generator when its button is pushed is unpredictable, a generator seems to be a device of limited usefulness as a language descriptor.

Formal Methods of Describing Syntax

There are two formal methods of describing syntax

  1. Context-Free Grammars

Context-Free Grammars was developed by Noam Chomsky in the mid-1950s. Chomsky, who was a noted linguist, described four classes of generative devices or grammars that define four classes of languages. Two of these grammar classes, named context-free and regular. They turned out to be useful for describing the syntax of programming languages.

  1. Backus-Naur Form (BNF)

Backus-Naur Form was invented by John Backus in 1959. Backus, who was a prominent member of the ACM-GAMM group, presented a landmark paper to describe the syntax of ALGOL 58. This paper introduced a new formal notation for specifying programming language syntax. The new notation was later modified slightly by Peter Naur for the description of ALGOL 60. This revised method of syntax description became known as Backnus-Naur Form, or simply BNF.

 

BNF Fundamentals

BNF is a metalanguage for programming languages. A metalanguage is a language that is used to describe another language. BNF uses abstractions for syntactic structures. The actual definition of <assign> can be given by :

<assign> → <var> = <expression>

* The text on the left side of the arrow, which it aptly called the left-hand side (LHS), is the abstraction being defined.

* The text to the right of the arrow is the definition of the LHS. It’s called the right-hand side (RHS) and consists of some mixture of tokens, lexemes, and references to other abstractions.

Altogether, the definition is called a rule or production. The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply non-terminals, and the lexemes and tokens of the rules are called terminal symbols, or simply terminals. A BNF description, or grammar, is a collection of rules. Although BNF is simple, it’s sufficiently powerful to describe nearly all of the syntax of programming languages. In particular, it can describe lists od similar constructs, the order in which different constructs must appear, and nested structures to any depth, and even imply operator precedence and operator associativity.

 

Describing Lists

–  Syntactic lists are described using recursion.

<ident_list>   -> ident

| ident, <ident_list>

–  A rule is recursive if its LHS appears in its RHS.

Variable-length lists in mathematics are often written using an ellipsis. BNF doesn’t include the ellipsis, so an alternative method is required for describing lists of syntactic elements in programming languages. For BNF, the alternative is recursion. A rule is recursive if its LHS appears in its RHS.

 

Grammars and Derivations

A grammar is a generative device for defining languages. The sentences of the language are generated through a sequence of applications of the rules, beginning with a special nonterminal of the grammar called the start symbol. This sequence of the rule applications is called a derivation.

The example of grammar

<program> ® <stmts>

<stmts> ® <stmt> | <stmt> ; <stmts>

<stmt> ® <var> = <expr>

<var> ® a | b | c | d

<expr> ® <term> + <term> | <term> – <term>

<term> ® <var> | const

The example of derivation

<program> => <stmts> => <stmt>

=> <var> = <expr>

=> a = <expr>

=> a = <term> + <term>

=> a = <var> + <term>

=> a = b + <term>

=> a = b + const

 

Parse Trees

One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. These hierarchical structures are called parse trees. Every internal node of a parse tree is labelled with a nonterminal symbol; every leaf is labelled with a terminal symbol. Every subtree of a parse tree describes  one instance of an abstraction in the sentence.

 

Ambiguity in Grammars

A grammar that generates a sentential form for which there are two or more distinct parse trees is said to be ambiguous. Syntactic ambiguity of language structures is a problem because compilers often base the semantics of those structures on their syntactic form. Specifically, the compiler chooses the code to be generated for a statement by examining its parse tree. If a language structure has more than one parse tree, then the meaning of the structure cannot be determined uniquely. This problem is discussed in two specific examples in the following subsections.

The example of ambiguous grammar for expression

<assign> ® <id> = <expr>

<id> ® A | B | C

<expr> ® <expr> + <expr>

| <expr> * <expr>

| ( <expr> )

| <id>

The example of unambiguous grammar for expression

<assign> → <id> = <expr>

<id> → A | B | C

<expr> → <expr> + <term>

| <term>

<term> → <term> * <factor>

| <factor>

<factor> → ( <expr> )

| <id>

 

Static Semantics

The static semantics of a language is only indirectly related to the meaning of programs during execution; rather, it has to do with the legal forms of programs (syntax rather than semantics). Many static semantic rules of a language state its type constraints. Static semantics is so named because the analysis required to check these specifications can be done at compile time.

 

Attribute Grammars

An attribute grammar is a grammar with the following additional features

  1. Associated with each grammar symbol X is a set of attributes A(X).

The set A(X) consists of two disjoint sets S(X) and I(X), called synthesized and                      inherited attributes, respectively. Synthesized attributes are used to pass                                  semantic information up a parse tree, while inherited attributes pass semantic                        information down and across a tree.

  1. Associated with each grammar rule is a set of semantic functions and a possibly empty set of predicate functions over the attributes of the symbols in the grammar rule
  2. A predicate function has the form of a Boolean expression on the union of the attribute set and a set of literal attribute values

 

Semantics

There is no single widely acceptable notation or formalism for describing semantics. There are several different reasons underlying the need for a methodology and notation for describing semantics. Programmers obviously need to know precisely what the statements of a language do before they can use them effectively in their programs. Compiler writers must know exactly what language constructs mean to design implementations for them correctly. If there were a precise semantics specification of a programming language, programs written in the language potentially could be proven correct without testing. Also, compilers could be shown to produce programs that exhibited exactly the behavior given in the language definition; that is, their correctness could be verified. A complete specification of the syntax and semantics of a programming language could be used by a tool to generate a compiler for the language automatically. Finally, language designers, who would develop the semantic descriptions of their languages, could in the process discover ambiguities and inconsistencies in their designs.

 

Operational Semantics

The idea behind operational semantics is to describe the meaning of a statement or program by specifying the effects of running it on a machine. The effects on the machine are viewed as the sequence of changes in its state, where the machine’s state is the collection of the values in its storage. There are different levels of uses of operational semantics. At the highest level, the interest is in the final result of the execution of a complete program. This is sometimes called natural operational semantics. At the lowest level, operational semantics can be used to determine the precise meaning of a program through an examination of the complete sequence of state changes that occur when the program is executed. This use is sometimes called structural operational semantics.

 

Denotational Semantics

Denotational semantics is the most rigorous and most widely known formal method for describing the meaning of programs. It is solidly based on recursive function theory. In denotational semantics, the domain is called the syntactic domain, because it is syntactic structures that are mapped. The range is called the semantic domain. Denotational semantics is related to operational semantics. In operational semantics, programming language constructs are translated into simpler programming language constructs, which become the basis of the meaning of the construct. In denotational semantics, programming language constructs are mapped to mathematical objects, either sets or, more often, functions. However, unlike operational semantics, denotational semantics does not model the step-by-step computational processing of programs.

 

Axiomatic Semantics

Axiomatic semantics, thus named because it is based on mathematical logic. Axiomatic semantics specifies what can be proven about the program. Recall that one of the possible uses of semantic specifications is to prove the correctness of programs. In axiomatic semantics, there is no model of the state of a machine or program or model of state changes that take place when the program is executed. The meaning of a program is based on relationships among program variables and constants, which are the same for every execution of the program. Axiomatic semantics has two distinct applications: program verification and program semantics specification. This section focuses on program verification in its description of axiomatic semantics. Axiomatic semantics was defined in conjunction with the development of an approach to proving the correctness of programs.

This entry was posted in Programming Language Concept. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *