PLC : SESSION 4 (Data Type) (GSLC)

In this session, we have GSLC class so Ms.Yanfi only gave us some questions through the Binusmaya Forum and gave us about a week to answer it through the forum.

 

Here is the question:

  1. What are the advantages and disadvantages of decimal data types?
  2. How does a decimal value waste memory space?
  3. In what way is the static type checking better than dynamic type checking?
  4. What is coercion in programming languages?
  5. Explain how coercion rules can weaken the beneficial effect of strong typing?

 

And here are the answers I wrote on the forum:

  1. The advantages of decimal data types is being able to precisely store decimal values, at least those within a restricted range, which cannot be done with floating-point and accuracy.

The disadvantages of decimal data types are that that the range of value is                               restricted because no exponents are allowed, and their representation in                                   memory is mildly wasteful.

 

  1. Decimal types are stored very much like character strings, using binary codes for the decimal digits. These representations are called binary coded decimal (BCD). In some cases, they are stored one digit per byte, but in others, they are packed two digits per byte. Either way, they take more storage than binary representations. It takes at least four bits to code a decimal digit. Therefore, to store a six-digit coded decimal number requires 24 bits of memory. However, it takes only 20 bits to store the same number in binary. Decimal values store numeric information as digits encoded using the four bit binary equivalents: 0 (0000) to 9 (1001). That means a single byte can hold values between 0 and 99. But simply using the same byte to hold a binary value will yield values between 0 and 255 (or –128 and +127).

 

  1. The big benefit of static type checking is that it allows many type errors to be caught early in the development cycle. Static typing usually results in compiled code that executes more quickly because when the compiler knows the exact data types that are in use, it can produce optimized machine code (i.e. faster and/or using less memory). Static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed.

 

  1. Many programming languages support the conversion of a value into another of a different data type. This kind of type conversions can be implicitly or explicitly made. Implicit conversion, which is also called coercion, is automatically done. Explicit conversion, which is also called casting, is performed by code instructions. This code treats a variable of one data type as if it belongs to a different data type. The languages that support implicit conversion define the rules that will be automatically applied when primitive compatible values are involved. The C code below illustrates implicit and explicit coercion. In line 2 the int constant 3 is automatically converted to double before assignment (implicit coercion). An explicit coercion is performed by involving the destination type with parenthesis, which is done in line 3.

A. double x, y;

B. 2 x = 3; // implicitly coercion (coercion)

C. 3 y = (double) 5; // explicitly coercion (casting)

 

  1. They decrease the type error detection ability of the compiler, coercion is an automatic conversion into a legal type of an operands of an operator , it means if it’s legal type (can be read by the system) , it won’t be considered as a type error. For example, suppose a program had the int variables a and b and the float variable d. Now, if a programmer meant to type a + b, but mistakenly typed a + d, the error would not be detected by the compiler. The value of a would simply be coerced to float. So, the value of strong typing is weakened by coercion.
Posted in Programming Language Concept | Leave a comment

PLC : SESSION 4 (Data Type)

Data Types

A data type defines a collection of data values and a set of predefined operations on those values. Computer programs produce results by manipulating data. A descriptor is the collection of the attributes of a variable. If the attributes are all static, descriptors are required only at compile time. An object represents an instance of a user-defined abstract data types. In object-oriented languages, every instance of every class, whether predefined or user-defined, is called an object.

Primitive Data Types

Data types that are not defined in terms of other types are called primitive data types. Nearly all programming languages provide a set of primitive data types. Some of the primitive types are merely reflections of the hardware, for example, most integer types. Others require only a little nonhardware support for their implementation.

Integer (int)

The most common primitive numeric data type is integer. Many computers now support several sizes of integers. These sizes of integers, and often a few others, are supported by some programming languages. A negative integer could be stored in sign-magnitude notation, in which the sign bit is set to indicate negative and the remainder of the bit string represent the absolute value of the number. Most computers now use a notation called twos complement to store negative integers, which is convenient for addition and subtraction. In twos-complement notation, the representation of a negative integer is formed by taking the logical complement of the positive version of the number and adding one.

Floating-Point

Floating-point data types model real numbers, but the representations are only approximations for many real values. On most computers, floating-point numbers are stored in binary, which exacerbates the problem. Another problem with floating-point types is the loss of accuracy through arithmetic operations. Floating-point values are represented as fractions and exponents, a form that is borrowed from scientific notation. Most languages include two floating-point types, often called float and double. The float type is the standard size, usually being stored in four bytes of memory. The double type is provided for situations where larger fractional parts and/or a larger range of exponents is needed. Precision is the accuracy of the fractional part of a value, measured as the number of bits. Range is a combination of the range of fractions and, more important, the range of exponents.

Complex

Some programming languages support a complex data type—for example, Fortran and Python. Complex values are represented as ordered pairs of floating-point values.

Decimal

Most larger computers that are designed to support business systems applications have hardware support for decimal data types. Decimal data types store a fixed number of decimal digits, with the decimal point at a fixed position in the value. Decimal types have the advantage of being able to precisely store decimal values, at least those within a restricted range, which cannot be done with floating-point. The disadvantages of decimal types are that the range of values is restricted because no exponents are allowed, and their representation in memory is mildly wasteful, for reasons discussed in the following paragraph. Decimal types are stored very much like character strings, using binary codes for the decimal digits. These representations are called binary coded decimal (BCD).

Boolean

Boolean types are perhaps the simplest of all types. Their range of values has only two elements: one for true and one for false. Boolean types are often used to represent switches or flags in programs. Although other types, such as integers, can be used for these purposes, the use of Boolean types is more readable. A Boolean value could be represented by a single bit, but because a single bit of memory cannot be accessed efficiently on many machines, they are often stored in the smallest efficiently addressable cell of memory, typically a byte.

Character Type

Character data are stored in computers as numeric coding. Traditionally, the most commonly used coding was the 8-bit code ASCII (American Standard Code for Information Interchange), which uses the values 0 to 127 to code 128 different characters. ISO 8859-1 is another 8-bit character code, but it allows 256 different characters. Because of the globalization of business and the need for computers to communicate with other computers around the world, the ASCII character set became inadequate. In response, in 1991, the Unicode Consortium published the UCS-2 standard, a 16-bit character set. This character code is often called Unicode. Unicode includes the characters from most of the world’s natural languages.

Character String Types

A character string type is one in which the values consist of sequences of characters. Character string constants are used to label output, and the input and output of all kinds of data are often done in terms of strings. Of course, character strings also are an essential type for all programs that do character manipulation.

String and Their Operations

The most common string operations are assignment, catenation, substring reference, comparison, and pattern matching. A substring reference is a reference to a substring of a given string. Substring references are discussed in the more general context of arrays, where the substring references are called slices. In general, both assignment and comparison operations on character strings are complicated by the possibility of string operands of different lengths. Pattern matching is another fundamental character string operation. In some languages, pattern matching is supported directly in the language. In others, it is provided by a function or class library. If strings are not defined as a primitive type, string data is usually stored in arrays of single characters and referenced as such in the language.

String Length Options

There are several design choices regarding the length of string values. First, the length can be static and set when the string is created. Such a string is called a static length string. The second option is to allow strings to have varying length up to a declared and fixed maximum set by the variable’s definition, as exemplified by the strings in C and the C-style strings of C++. These are called limited dynamic length strings. Such string variables can store any number of characters between zero and the maximum. The third option is to allow strings to have varying length with no maximum, as in JavaScript, Perl, and the standard C++ library. These are called dynamic length strings. This option requires the overhead of dynamic storage allocation and deallocation but provides maximum flexibility.

User-Defined Ordinal Types

An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers. In Java, for example, the primitive ordinal types are integer, char, and boolean. There are two user-defined ordinal types that have been supported by programming languages: enumeration and subrange.

Enumeration Types

An enumeration type is one in which all of the possible values, which are named constants, are provided, or enumerated, in the definition. Enumeration types provide a way of defining and grouping collections of named constants, which are called enumeration constants. If an enumeration variable is coerced to a numeric type, then there is little control over its range of legal operations or its range of values. If an int type value is coerced to an enumeration type, then an enumeration type variable could be assigned any integer value, whether it represented an enumeration constant or not. Enumeration types can provide advantages in both readability and reliability.

Subrange Types

A subrange type is a contiguous subsequence of an ordinal type. Subrange types enhance readability by making it clear to readers that variables of subtypes can store only certain ranges of values. Reliability is increased with subrange types, because assigning a value to a subrange variable that is outside the specified range is detected as an error, either by the compiler (in the case of the assigned value being a literal value) or by the run-time system (in the case of a variable or expression). It is odd that no contemporary language except Ada has subrange types.

Array Types

An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element. The individual data elements of an array are of the same type. References to individual array elements are specified using subscript expressions. If any of the subscript expressions in a reference include variables, then the reference will require an additional run-time calculation to determine the address of the memory location being referenced.

Arrays and Indices

Specific elements of an array are referenced by means of a two-level syntactic mechanism, where the first part is the aggregate name, and the second part is a possibly dynamic selector consisting of one or more items known as subscripts or indices. If all of the subscripts in a reference are constants, the selector is static; otherwise, it is dynamic. The selection operation can be thought of as a mapping from the array name and the set of subscript values to an element in the aggregate. Indeed, arrays are sometimes called finite mappings.

Subscript Bindings and Array Categories

The binding of the subscript type to an array variable is usually static, but the subscript value ranges are sometimes dynamically bound. In some languages, the lower bound of the subscript range is implicit. There are five categories of arrays, based on the binding to subscript ranges, the binding to storage, and from where the storage is allocated. The category names indicate the design choices of these three. In the first four of these categories, once the subscript ranges are bound and the storage is allocated, they remain fixed for the lifetime of the variable. Keep in mind that when the subscript ranges are fixed, the array cannot change size.

            A static array is one in which the subscript ranges are statically bound and storage allocation is static (done before run time). The advantage of static arrays is efficiency: No dynamic allocation or deallocation is required. The disadvantage is that the storage for the array is fixed for the entire execution time of the program.

            A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the allocation is done at declaration elaboration time during execution. The advantage of fixed stack-dynamic arrays over static arrays is space efficiency. A large array in one subprogram can use the same space as a large array in a different subprogram, as long as both subprograms are not active at the same time. The same is true if the two arrays are in different blocks that are not active at the same time. The disadvantage is the required allocation and deallocation time.

            A stack-dynamic array is one in which both the subscript ranges and the storage allocation are dynamically bound at elaboration time. Once the subscript ranges are bound and the storage is allocated, however, they remain fixed during the lifetime of the variable. The advantage of stack-dynamic arrays over static and fixed stack-dynamic arrays is flexibility. The size of an array need not be known until the array is about to be used.

            A fixed heap-dynamic array is similar to a fixed stack-dynamic array, in that the subscript ranges and the storage binding are both fixed after storage is allocated. The differences are that both the subscript ranges and storage bindings are done when the user program requests them during execution, and the storage is allocated from the heap, rather than the stack. The advantage of fixed heap-dynamic arrays is flexibility—the array’s size always fits the problem. The disadvantage is allocation time from the heap, which is longer than allocation time from the stack.

            A heap-dynamic array is one in which the binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime. The advantage of heap-dynamic arrays over the others is flexibility: Arrays can grow and shrink during program execution as the need for space changes. The disadvantage is that allocation and deallocation take longer and may happen many times during execution of the program.

Array Initialization

Some languages provide the means to initialize arrays at the time their storage is allocated. An array aggregate for a single-dimensioned array is a list of literals delimited by parentheses and slashes.

Array Operations

An array operation is one that operates on an array as a unit. The most common array operations are assignment, catenation, comparison for equality and inequality, and slices.

Rectangular and Jagged Arrays

A rectangular array is a multidimensioned array in which all of the rows have the same number of elements and all of the columns have the same number of elements. Rectangular arrays model rectangular tables exactly. A jagged array is one in which the lengths of the rows need not be the same. Jagged arrays are made possible when multidimensioned arrays are actually arrays of arrays.

Slices

A slice of an array is some substructure of that array. For example, if A is a matrix, then the first row of A is one possible slice, as are the last row and the first column. It is important to realize that a slice is not a new data type. Rather, it is a mechanism for referencing part of an array as a unit. If arrays cannot be manipulated as units in a language, that language has no use for slices.

Associative Arrays

An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. In the case of non-associative arrays, the indices never need to be stored (because of their regularity). In an associative array, however, the user-defined keys must be stored in the structure. So each element of an associative array is in fact a pair of entities, a key and a value.

Record Types

A record is an aggregate of data elements in which the individual elements are identified by names and accessed through offsets from the beginning of the structure. There is frequently a need in programs to model a collection of data in which the individual elements are not of the same type or size.

The fundamental difference between a record and an array is that record elements, or fields, are not referenced by indices. Instead, the fields are named with identifiers, and references to the fields are made using these identifiers. Another difference between arrays and records is that records in some languages are allowed to include unions.

References to the individual fields of records are syntactically specified by several different methods, two of which name the desired field and its enclosing records. Most of languages use dot notation for field references, where the components of the reference are connected with periods. A fully qualified reference to a record field is one in which all intermediate record names, from the largest enclosing record to the specific field, are named in the reference.

Tuple Types

A tuple is a data type that is similar to a record, except that the elements are not named. Tuples can be catenated with the plus (+) operator. They can be deleted with the del statement. There are also other operators and functions that operate on tuples.

List Types

Lists were first supported in the first functional programming language, LISP. They have always been part of the functional languages, but in recent years they have found their way into some imperative languages.

Union Types

A union is a type whose variables may store different type values at different times during program execution. As an example of the need for a union type, consider a table of constants for a compiler, which is used to store the constants found in a program being compiled. One field of each table entry is for the value of the constant. Suppose that for a particular language being compiled, the types of constants were integer, floating point, and Boolean. In terms of table management, it would be convenient if the same location, a table field, could store a value of any of these three types. Then all constant values could be addressed in the same way. The type of such a location is, in a sense, the union of the three value types it can store.

Pointer and Reference Types

A pointer type is one in which the variables have a range of values that consists of memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference a memory cell. Pointers are designed for two distinct kinds of uses. First, pointers provide some of the power of indirect addressing, which is frequently used in assembly language programming. Second, pointers provide a way to manage dynamic storage. A pointer can be used to access a location in an area where storage is dynamically allocated called a heap. Variables that are dynamically allocated from the heap are called heap dynamic variables. They often do not have identifiers associated with them and thus can be referenced only by pointer or reference type variables. Variables without names are called anonymous variables. It is in this latter application area of pointers that the most important design issues arise.

Pointer Operations

Languages that provide a pointer type usually include two fundamental pointer operations: assignment and dereferencing. The first operation sets a pointer variable’s value to some useful address. If pointer variables are used only to manage dynamic storage, then the allocation mechanism, whether by operator or built-in subprogram, serves to initialize the pointer variable. If pointers are used for indirect addressing to variables that are not heap dynamic, then there must be an explicit operator or built-in subprogram for fetching the address of a variable, which can then be assigned to the pointer variable.

Reference Types

A reference type variable is similar to a pointer, with one important and fundamental difference: A pointer refers to an address in memory, while a reference refers to an object or a value in memory. As a result, although it is natural to perform arithmetic on addresses, it is not sensible to do arithmetic on references.

Type Checking

Type checking is the activity of ensuring that the operands of an operator are of compatible types. A compatible type is one that either is legal for the operator or is allowed under language rules to be implicitly converted by compiler-generated code (or the interpreter) to a legal type. This automatic conversion is called a coercion. A type error is the application of an operator to an operand of an inappropriate type. If all bindings of variables to types are static in a language, then type checking can nearly always be done statically. Dynamic type binding requires type checking at run time, which is called dynamic type checking.

Strong Typing

One of the ideas in language design that became prominent in the so-called structured-programming revolution of the 1970s was strong typing. Strong typing is widely acknowledged as being a highly valuable language characteristic. Unfortunately, it is often loosely defined, and it is often used in computing literature without being defined at all. A programming language is strongly typed if type errors are always detected. This requires that the types of all operands can be determined, either at compile time or at run time. The importance of strong typing lies in its ability to detect all misuses of variables that result in type errors. A strongly typed language also allows the detection, at run time, of uses of the incorrect type values in variables that can store values of more than one type.

Type Equivalence

The idea of type compatibility was defined when the issue of type checking was introduced. The compatibility rules dictate the types of operands that are acceptable for each of the operators and thereby specify the possible type errors of the language. The rules are called compatibility because in some cases the type of an operand can be implicitly converted by the compiler or run-time system to make it acceptable to the operator. The type compatibility rules are simple and rigid for the predefined scalar types. However, in the cases of structured types, such as arrays and records and some user-defined types, the rules are more complex. Coercion of these types is rare, so the issue is not type compatibility, but type equivalence. That is, two types are equivalent if an operand of one type in an expression is substituted for one of the other type, without coercion. Type equivalence is a strict form of type compatibility—compatibility without coercion. The central issue here is how type equivalence is defined.

There are two approaches to defining type equivalence: name type equivalence and structure type equivalence. Name type equivalence means that two variables have equivalent types if they are defined either in the same declaration or in declarations that use the same type name. Structure type equivalence means that two variables have equivalent types if their types have identical structures. There are some variations of these two approaches, and many languages use combinations of them.

Posted in Programming Language Concept | Leave a comment

PLC : SESSION 3 (Names, Bindings, and Scopes)

Names

A name is a string of characters used to identify some entity in a program. Fortran 95+ allows up to 31 characters in its names. C99 has no length limitation on its internal names, but only the first 63 are significant. External names in C99 (those defined outside functions, which must be handled by the linker) are restricted to 31 characters. Names in Java, C#, and Ada have no length limit, and all characters in them are significant. C++ does not specify a length limit on names, although implementers sometimes do. Names in most programming languages have the same form, a letter followed by a string consisting of letters, digits, and underscore characters.

All variable names in PHP must begin with a dollar sign. In Perl, the special character at the beginning of a variable’s name, $, @, or %, specifies its type (although in a different sense than in other languages). In Ruby, special characters at the beginning of a variable’s name, @ or @@, indicate that the variable is an instance or a class variable, respectively.

In many languages, notably the C-based languages, uppercase and lowercase letters in names are distinct; that is, names in these languages are case sensitive. For example, the following three names are distinct in C++: rose, ROSE, and Rose. To some people, this is a serious detriment to readability, because names that look very similar in fact denote different entities. In that sense, case sensitivity violates the design principle that language constructs that look similar should have similar meanings. But in languages whose variable names are case-sensitive, although Rose and rose look similar, there is no connection between them.

Special words in programming languages are used to make programs more readable by naming actions to be performed. They also are used to separate the syntactic parts of statements and programs. In most languages, special words are classified as reserved words, which means they cannot be redefined by programmers, but in some they are only keywords, which means they can be redefined.

A keyword is a word of a programming language that is special only in certain contexts. Fortran is the only remaining widely used language whose special words are keywords. In Fortran, the word Integer, when found at the beginning of a statement and followed by a name, is considered a keyword that indicates the statement is a declarative statement. However, if the word Integer is followed by the assignment operator, it is considered a variable name.

 

Variables

A program variable is an abstraction of a computer memory cell or collection of cells. Programmers often think of variable names as names for memory locations, but there is much more to a variable than just a name. A variable can be characterized as a sextuple of attributes (name, address, value, type, lifetime, and scope). Variable names are the most common names in programs. The address of a variable is the machine memory address with which it is associated. The type of a variable determines the range of values the variable can store and the set of operations that are defined for values of the type. The value of a variable is the contents of the memory cell or cells associated with the variable.

 

The Concept of Binding

A binding is an association between an attribute and an entity, such as between a variable and its type or value, or between an operation and a symbol. The time at which a binding takes place is called binding time. Binding and binding times are prominent concepts in the semantics of programming languages. Bindings can take place at language design time, language implementation time, compile time, load time, link time, or run time.

 

Possible Binding Times

There are five possible binding times

  1. Language Design Time

Bind operator symbols to operations

  1. Language Implementation Time

Bind floating point type to a representation

  1. Compile Time

Bind a variable to a type in C or Java

  1. Load Time

Bind a C or C++ static variable to a memory cell)

  1. Runtime

Bind a non-static local variable to a memory cell

 

Static Type Binding

An explicit declaration is a statement in a program that lists variable names and specifies that they are a particular type. An implicit declaration is a means of associating variables with types through default conventions, rather than declaration statements. In this case, the first appearance of a variable name in a program constitutes its implicit declaration. Both explicit and implicit declarations create static bindings to types. Most widely used programming languages that use static type binding exclusively and were designed since the mid-1960s require explicit declarations of all variables (Perl, JavaScript, Ruby, and ML are some exceptions).

 

Dynamic Type Binding

With dynamic type binding, the type of a variable is not specified by a declaration statement, nor can it be determined by the spelling of its name. Instead, the variable is bound to a type when it is assigned a value in an assignment statement. When the assignment statement is executed, the variable being assigned is bound to the type of the value of the expression on the right side of the assignment. Such an assignment may also bind the variable to an address and a memory cell, because different type values may require different amounts of storage. Any variable can be assigned any type value. Furthermore, a variable’s type can change any number of times during program execution. It is important to realize that the type of a variable whose type is dynamically bound may be temporary.

 

Storage Bindings and Lifetime

The fundamental character of an imperative programming language is in large part determined by the design of the storage bindings for its variables. It is therefore important to have a clear understanding of these bindings. The memory cell to which a variable is bound somehow must be taken from a pool of available memory. This process is called allocation. Deallocation is the process of placing a memory cell that has been unbound from a variable back into the pool of available memory. The lifetime of a variable is the time during which the variable is bound to a specific memory location. So, the lifetime of a variable begins when it is bound to a specific cell and ends when it is unbound from that cell.

 

Static Variables

Static variables are those that are bound to memory cells before program execution begins and remain bound to those same memory cells until program execution terminates. Statically bound variables have several valuable applications in programming. One advantage of static variables is efficiency. All addressing of static variables can be direct; other kinds of variables often require indirect addressing, which is slower. Also, no run-time overhead is incurred for allocation and deallocation of static variables, although this time is often negligible. One disadvantage of static binding to storage is reduced flexibility; in particular, a language that has only static variables cannot support recursive subprograms. Another disadvantage is that storage cannot be shared among variables. For example, suppose a program has two subprograms, both of which require large arrays.

 

Stack-Dynamic Variables

Stack-dynamic variables are those whose storage bindings are created when their declaration statements are elaborated, but whose types are statically bound. Elaboration of such a declaration refers to the storage allocation and binding process indicated by the declaration, which takes place when execution reaches the code to which the declaration is attached. Therefore, elaboration occurs during run time. For example, the variable declarations that appear at the beginning of a Java method are elaborated when the method is called and the variables defined by those declarations are deallocated when the method completes its execution.

The advantages of stack-dynamic variables are as follows: To be useful, at least in most cases, recursive subprograms require some form of dynamic local storage so that each active copy of the recursive subprogram has its own version of the local variables. These needs are conveniently met by stack-dynamic variables. Even in the absence of recursion, having stack-dynamic local storage for subprograms is not without merit, because all subprograms share the same memory space for their locals.

The disadvantages, relative to static variables, of stack-dynamic variables are the run-time overhead of allocation and deallocation, possibly slower accesses because indirect addressing is required, and the fact that subprograms  cannot be history sensitive. The time required to allocate and deallocate stack dynamic variables is not significant, because all of the stack-dynamic variables that are declared at the beginning of a subprogram are allocated and deallocated together, rather than by separate operations.

 

Explicit Heap-Dynamic Variables

Explicit heap-dynamic variables are nameless (abstract) memory cells that are allocated and deallocated by explicit run-time instructions written by the programmer. These variables, which are allocated from and deallocated to the heap, can only be referenced through pointer or reference variables. The heap is a collection of storage cells whose organization is highly disorganized because of the unpredictability of its use.

Explicit heap-dynamic variables are often used to construct dynamic structures, such as linked lists and trees, that need to grow and/or shrink during execution. Such structures can be built conveniently using pointers or references and explicit heap-dynamic variables.

The disadvantages of explicit heap-dynamic variables are the difficulty of using pointer and reference variables correctly, the cost of references to the variables, and the complexity of the required storage management implementation. This is essentially the problem of heap management, which is costly and complicated.

 

Implicit Heap-Dynamic Variables

Implicit heap-dynamic variables are bound to heap storage only when they are assigned values. In fact, all their attributes are bound every time they are assigned. The advantage of such variables is that they have the highest degree of flexibility, allowing highly generic code to be written. One disadvantage of implicit heap-dynamic variables is the run-time overhead of maintaining all the dynamic attributes, which could include array subscript types and ranges, among others. Another disadvantage is the loss of some error detection by the compiler.

 

Scope

The scope of a variable is the range of statements in which the variable is visible. A variable is visible in a statement if it can be referenced in that statement. The scope rules of a language determine how a particular occurrence of a name is associated with a variable, or in the case of a functional language, how a name is associated with an expression. In particular, scope rules determine how references to variables declared outside the currently executing subprogram or block are associated with their declarations and thus their attributes.

 

Static Scope

There are two categories of static-scoped languages, those in which subprograms can be nested, which creates nested static scopes, and those in which subprograms cannot be nested. In the latter category, static scopes are also created by subprograms but nested scopes are created only by nested class definitions and blocks. Ada, JavaScript, Common LISP, Scheme, Fortran 2003+, F#, and Python allow nested subprograms, but the C-based languages do not.

In static-scoped languages with nested subprograms, this process can be thought of in the following way. Suppose a reference is made to a variable x in subprogram sub1. The correct declaration is found by first searching the declarations of subprogram sub1. If no declaration is found for the variable there, the search continues in the declarations of the subprogram that declared subprogram sub1, which is called its static parent. If a declaration of x is not found there, the search continues to the next-larger enclosing unit (the unit that declared sub1’s parent), and so forth, until a declaration for x is found or the largest unit’s declarations have been searched without success. In that case, an undeclared variable error is reported. The static parent of subprogram sub1, and its static parent, and so forth up to and including the largest enclosing subprogram, are called the static ancestors of sub1.

 

Blocks

Many languages allow new static scopes to be defined in the midst of executable code. This powerful concept, introduced in ALGOL 60, allows a section of code to have its own local variables whose scope is minimized. Such variables are typically stack dynamic, so their storage is allocated when the section is entered and deallocated when the section is exited. Such a section of code is called a block. Blocks provide the origin of the phrase block-structured language. The C-based languages allow any compound statement (a statement sequence surrounded by matched braces) to have declarations and thereby define a new scope. Such compound statements are called blocks.

 

Global Scope

Some languages, including C, C++, PHP, JavaScript, and Python, allow a program structure that is a sequence of function definitions, in which variable definitions can appear outside the functions. Definitions outside functions in a file create global variables, which potentially can be visible to those functions. C and C++ have both declarations and definitions of global data. Declarations specify types and other attributes but do not cause allocation of storage. Definitions specify attributes and cause storage allocation. For a specific global name, a C program can have any number of compatible declarations, but only a single definition.

A declaration of a variable outside function definitions specifies that the variable is defined in a different file. A global variable in C is implicitly visible in all subsequent functions in the file, except those that include a declaration of a local variable with the same name. A global variable that is defined after a function can be made visible in the function by declaring it to be external.

PHP statements can be interspersed with function definitions. Variables in PHP are implicitly declared when they appear in statements. Any variable that is implicitly declared outside any function is a global variable; variables implicitly declared in functions are local variables. The scope of global variables extends from their declarations to the end of the program but skips over any subsequent function definitions. So, global variables are not implicitly visible in any function.

 

Evaluation of Static Scoping

Static scoping provides a method of nonlocal access that works well in many situations. However, it is not without its problems. First, in most cases it allows more access to both variables and subprograms than is necessary. It is simply too crude a tool for concisely specifying such restrictions. Second, and perhaps more important, is a problem related to program evolution. Software is highly dynamic—programs that are used regularly continually change. These changes often result in restructuring, thereby destroying the initial structure that restricted variable and subprogram access. To avoid the complexity of maintaining these access restrictions, developers often discard structure when it gets in the way. Thus, getting around the restrictions of static scoping can lead to program designs that bear little resemblance to the original, even in areas of the program in which changes have not been made. Designers are encouraged to use far more globals than are necessary. All subprograms can end up being nested at the same level, in the main program, using globals instead of deeper levels of nesting.

 

Dynamic Scope

Dynamic scoping is based on the calling sequence of subprograms, not on their spatial relationship to each other. Thus, the scope can be determined only at run time. Perl’s dynamic scoping is unusual—in fact, it is not exactly like that discussed in this section, although the semantics are often that of traditional dynamic scoping.

 

Evaluation of Dynamic Scope

The effect of dynamic scoping on programming is profound. When dynamic scoping is used, the correct attributes of non-local variables visible to a program statement cannot be determined statically. Furthermore, a reference to the name of such a variable is not always to the same variable. A statement in a subprogram that contains a reference to a non-local variable can refer to different non-local variables during different executions of the subprogram. Several kinds of programming problems follow directly from dynamic scoping.

First, during the time span beginning when a subprogram begins its execution and ending when that execution ends, the local variables of the subprogram are all visible to any other executing subprogram, regardless of its textual proximity or how execution got to the currently executing subprogram. There is no way to protect local variables from this accessibility. Subprograms are always executed in the environment of all previously called subprograms that have not yet completed their executions. As a result, dynamic scoping results in less reliable programs than static scoping.

A second problem with dynamic scoping is the inability to type check references to non-locals statically. This problem results from the inability to statically find the declaration for a variable referenced as a non-local.

Dynamic scoping also makes programs much more difficult to read, because the calling sequence of subprograms must be known to determine the meaning of references to non-local variables. This task can be virtually impossible for a human reader. Finally, accesses to non-local variables in dynamic-scoped languages take far longer than accesses to non-locals when static scoping is used.

Posted in Programming Language Concept | Leave a comment

PLC : SESSION 2 (Describing Syntax and Semantics)

What is Syntax and Semantics?

Syntax: the form of the expressions, statements, and program units

Semantics: the meaning of the expressions, statements, and program units

  Ex:   while (<Boolean_expr>)<statement>

         * The semantics of this statement form is that when the current value of the Boolean expression is true, the embedded statement is executed.

*  The form of a statement should strongly suggest what the statement is meant to accomplish.

 

The General Problem of Describing Syntax

A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. The strings of a language are called sentences or statements. Formal descriptions of the syntax of programming languages, often do not include descriptions of the lowest-level syntactic units. These small units are called lexemes. The description of lexemes can be given by a lexical specification, which is usually separate from the syntactic description of the language. The lexemes of a programming language include its numeric literals, operators, and special words, among others. Each lexeme group is represented by a name, or token. So, a token of a language is a category of its lexemes. In some cases, a token has only a single possible lexeme.

 

Formal Definition of Languages

There are two formal definition of languages

  1. Recognizers

A recognition device reads input strings over the alphabet of the language and decides whether  the input strings belong to the language. The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this role, the recognizer need not test all possible strings of characters from some set to determine whether each is in the language. Rather, it need only determine whether given programs are in the language. In effect then, the syntax analyser determines whether the given programs are syntactically correct.

  1. Generators

A language generator is a device that can be used to generate the sentences of a language. We can think of the generator as having a button that produces a sentence of the language every time it is pushed. Because the particular sentences that is produced by a generator when its button is pushed is unpredictable, a generator seems to be a device of limited usefulness as a language descriptor.

Formal Methods of Describing Syntax

There are two formal methods of describing syntax

  1. Context-Free Grammars

Context-Free Grammars was developed by Noam Chomsky in the mid-1950s. Chomsky, who was a noted linguist, described four classes of generative devices or grammars that define four classes of languages. Two of these grammar classes, named context-free and regular. They turned out to be useful for describing the syntax of programming languages.

  1. Backus-Naur Form (BNF)

Backus-Naur Form was invented by John Backus in 1959. Backus, who was a prominent member of the ACM-GAMM group, presented a landmark paper to describe the syntax of ALGOL 58. This paper introduced a new formal notation for specifying programming language syntax. The new notation was later modified slightly by Peter Naur for the description of ALGOL 60. This revised method of syntax description became known as Backnus-Naur Form, or simply BNF.

 

BNF Fundamentals

BNF is a metalanguage for programming languages. A metalanguage is a language that is used to describe another language. BNF uses abstractions for syntactic structures. The actual definition of <assign> can be given by :

<assign> → <var> = <expression>

* The text on the left side of the arrow, which it aptly called the left-hand side (LHS), is the abstraction being defined.

* The text to the right of the arrow is the definition of the LHS. It’s called the right-hand side (RHS) and consists of some mixture of tokens, lexemes, and references to other abstractions.

Altogether, the definition is called a rule or production. The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply non-terminals, and the lexemes and tokens of the rules are called terminal symbols, or simply terminals. A BNF description, or grammar, is a collection of rules. Although BNF is simple, it’s sufficiently powerful to describe nearly all of the syntax of programming languages. In particular, it can describe lists od similar constructs, the order in which different constructs must appear, and nested structures to any depth, and even imply operator precedence and operator associativity.

 

Describing Lists

–  Syntactic lists are described using recursion.

<ident_list>   -> ident

| ident, <ident_list>

–  A rule is recursive if its LHS appears in its RHS.

Variable-length lists in mathematics are often written using an ellipsis. BNF doesn’t include the ellipsis, so an alternative method is required for describing lists of syntactic elements in programming languages. For BNF, the alternative is recursion. A rule is recursive if its LHS appears in its RHS.

 

Grammars and Derivations

A grammar is a generative device for defining languages. The sentences of the language are generated through a sequence of applications of the rules, beginning with a special nonterminal of the grammar called the start symbol. This sequence of the rule applications is called a derivation.

The example of grammar

<program> ® <stmts>

<stmts> ® <stmt> | <stmt> ; <stmts>

<stmt> ® <var> = <expr>

<var> ® a | b | c | d

<expr> ® <term> + <term> | <term> – <term>

<term> ® <var> | const

The example of derivation

<program> => <stmts> => <stmt>

=> <var> = <expr>

=> a = <expr>

=> a = <term> + <term>

=> a = <var> + <term>

=> a = b + <term>

=> a = b + const

 

Parse Trees

One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. These hierarchical structures are called parse trees. Every internal node of a parse tree is labelled with a nonterminal symbol; every leaf is labelled with a terminal symbol. Every subtree of a parse tree describes  one instance of an abstraction in the sentence.

 

Ambiguity in Grammars

A grammar that generates a sentential form for which there are two or more distinct parse trees is said to be ambiguous. Syntactic ambiguity of language structures is a problem because compilers often base the semantics of those structures on their syntactic form. Specifically, the compiler chooses the code to be generated for a statement by examining its parse tree. If a language structure has more than one parse tree, then the meaning of the structure cannot be determined uniquely. This problem is discussed in two specific examples in the following subsections.

The example of ambiguous grammar for expression

<assign> ® <id> = <expr>

<id> ® A | B | C

<expr> ® <expr> + <expr>

| <expr> * <expr>

| ( <expr> )

| <id>

The example of unambiguous grammar for expression

<assign> → <id> = <expr>

<id> → A | B | C

<expr> → <expr> + <term>

| <term>

<term> → <term> * <factor>

| <factor>

<factor> → ( <expr> )

| <id>

 

Static Semantics

The static semantics of a language is only indirectly related to the meaning of programs during execution; rather, it has to do with the legal forms of programs (syntax rather than semantics). Many static semantic rules of a language state its type constraints. Static semantics is so named because the analysis required to check these specifications can be done at compile time.

 

Attribute Grammars

An attribute grammar is a grammar with the following additional features

  1. Associated with each grammar symbol X is a set of attributes A(X).

The set A(X) consists of two disjoint sets S(X) and I(X), called synthesized and                      inherited attributes, respectively. Synthesized attributes are used to pass                                  semantic information up a parse tree, while inherited attributes pass semantic                        information down and across a tree.

  1. Associated with each grammar rule is a set of semantic functions and a possibly empty set of predicate functions over the attributes of the symbols in the grammar rule
  2. A predicate function has the form of a Boolean expression on the union of the attribute set and a set of literal attribute values

 

Semantics

There is no single widely acceptable notation or formalism for describing semantics. There are several different reasons underlying the need for a methodology and notation for describing semantics. Programmers obviously need to know precisely what the statements of a language do before they can use them effectively in their programs. Compiler writers must know exactly what language constructs mean to design implementations for them correctly. If there were a precise semantics specification of a programming language, programs written in the language potentially could be proven correct without testing. Also, compilers could be shown to produce programs that exhibited exactly the behavior given in the language definition; that is, their correctness could be verified. A complete specification of the syntax and semantics of a programming language could be used by a tool to generate a compiler for the language automatically. Finally, language designers, who would develop the semantic descriptions of their languages, could in the process discover ambiguities and inconsistencies in their designs.

 

Operational Semantics

The idea behind operational semantics is to describe the meaning of a statement or program by specifying the effects of running it on a machine. The effects on the machine are viewed as the sequence of changes in its state, where the machine’s state is the collection of the values in its storage. There are different levels of uses of operational semantics. At the highest level, the interest is in the final result of the execution of a complete program. This is sometimes called natural operational semantics. At the lowest level, operational semantics can be used to determine the precise meaning of a program through an examination of the complete sequence of state changes that occur when the program is executed. This use is sometimes called structural operational semantics.

 

Denotational Semantics

Denotational semantics is the most rigorous and most widely known formal method for describing the meaning of programs. It is solidly based on recursive function theory. In denotational semantics, the domain is called the syntactic domain, because it is syntactic structures that are mapped. The range is called the semantic domain. Denotational semantics is related to operational semantics. In operational semantics, programming language constructs are translated into simpler programming language constructs, which become the basis of the meaning of the construct. In denotational semantics, programming language constructs are mapped to mathematical objects, either sets or, more often, functions. However, unlike operational semantics, denotational semantics does not model the step-by-step computational processing of programs.

 

Axiomatic Semantics

Axiomatic semantics, thus named because it is based on mathematical logic. Axiomatic semantics specifies what can be proven about the program. Recall that one of the possible uses of semantic specifications is to prove the correctness of programs. In axiomatic semantics, there is no model of the state of a machine or program or model of state changes that take place when the program is executed. The meaning of a program is based on relationships among program variables and constants, which are the same for every execution of the program. Axiomatic semantics has two distinct applications: program verification and program semantics specification. This section focuses on program verification in its description of axiomatic semantics. Axiomatic semantics was defined in conjunction with the development of an approach to proving the correctness of programs.

Posted in Programming Language Concept | Leave a comment

PLC : Session 1 (Introduction To Programming Language Concept)

Programming Language

A Programming Language is a formal language that specifies a set of instructions that can be used to produce various kinds of output. Programming languages generally consist of instructions for a computer. Programming languages can be used to create programs that implement specific algorithms.

Why we must learn the concepts of programming languages?

  1. Increase our ability to express ideas
  2. Improve our background for choosing appropriate languages
  3. Increase our ability to learn the new languages
  4. Understand the significance of implementation well
  5. Use the languages that we are already known better
  6. And, increase our advancement in computing

Programming Domain

Scientific Applications : Programs tend to involve use of arithmetic on real numbers, arrays/matrices, and “counting” loops. The first high-level programming language for scientific applications was FORTRAN. It was specifically designed for scientific computing. However, most scientific applications are now written in more general programming languages, such as C and C++.

Business Applications : Among the desires here are to do decimal arithmetic (to deal with monetary amounts) and to produce elaborate reports. The first language for business applications was COBOL was designed for this area, and few competitors have ever emerged. Now, Microsoft’s .NET platform is heavily used in business applications running the Microsoft Windows operating system.

Artificial Intelligence : Tends to require symbolic manipulation rather than numeric, use of lists rather than arrays. The first language for artificial intelligence was LISP. However, now Python is one of the most widely used programming languages in the AI field of Artificial Intelligence thanks to its simplicity.

Systems Programming : The first language for systems programming was PL/S (for IBM), BLISS (for Digital) and Extended ALGOL (for Burroughs). However, most system software is now written in more general programming languages, such as C and C++.

Web Software : The World Wide Web is supported by an electic of languages, ranging from mark-up languages, such as HTML, which isn’t a programming language, to general-purpose programming language, such as Java. Typically want good string processing capabilities. Scripting languages such as PHP and JavaScript have become popular here.

Language Evaluation Criteria

There are four language evaluation criteria

1. Readability (Easy to Read and Understand)

There are four characteristics that contribute to the readability

A. Overall Simplicity

The overall simplicity of a programming language strongly affects its                                        readability. Readability problems occur whenever the program’s author has                            learned a different subset from that subset with which the reader is familiar                            and have too many features that make the language difficult to learn. The                                second complicating characteristic of a programming language is feature                                  multiplicity and the third potential problem is operator overloading, in                                    which a single operator symbol has more than one meaning.

B. Orthogonality

Orthogonality is a programming language means that a relatively small set of                          primitive constructs can be combined in a relatively small number of ways to                          build the control and data structures of the languages. Furthermore, every                              possible combination of primitives is legal and meaningful.

C. Data Types

The presence of adequate facilities for defining data types and data structures in a language is another significant aid to readability. For example, suppose a numeric type is used for an indicator flag because there is no Boolean type in the language.

D. Syntax Design

The syntax, or form, of the elements of a language has a significant effect on the readability of programs. Program appearance and thus program readability are strongly influenced by the forms of a language’s special words. Designing statements so that their appearance at least partially indicates their purpose is an obvious aid to readability.

2. Writability (Easy to Create)

There are three characteristics that contribute to the writability

A. Simplicity and Orthogonality

A smaller number of primitive constructs and a consistent set of rules for combining them (that is, orthogonality) is much better than simply having a large number of primitives.

B. Support for Abstraction

Abstraction is a key concept in contemporary programming language design. This is a reflection of the central role that abstraction plays in modern program design methodologies. The degree of abstraction allowed by a programming language and the naturalness of its expression are therefore important to its writability.

C. Expressivity

Expressivity in a language means that there are very powerful operators that allow a great deal of computation to be accomplished with a very small program. More commonly, it means that a language has relatively convenient, rather than cumbersome, ways of specifying computations.

3. Reliability (Performs Well)

There are four characteristics that contribute to the reliability

A. Type Checking

Type checking is simply testing for type errors in a given program, either by the compiler or during program execution. Type checking is an important factor in language reliability. Because run-time type checking is expensive, compile-time type checking is more desirable.

B. Exception Handling

The ability of a program to intercept run-time errors (as well as other unusual conditions detectable by the program), take corrective measures, and then continue is an obvious aid to reliability. This language facility is called exception handling.

C. Aliasing

Loosely defined, aliasing is having two or more distinct names that can be used to access the same memory cell. It is now widely accepted that aliasing is a dangerous feature in a programming language.

D. Readability and Writability

Both readability and writability influence reliability. A program written in a language that does not support natural ways to express the required algorithms will necessarily use unnatural approaches. Unnatural approaches are less likely to be correct for all possible situations. The easier a program is to write, the more likely it is to be correct. Readability affects reliability in both the writing and maintenance phases of the life cycle. Programs that are difficult to read are difficult both to write and to modify.

4. Cost (Total Cost)

The total cost of a programming language is a function of many of its characteristics. There are seven types of cost. First, there is the cost of training programmers to use the language, which is a function of the simplicity and orthogonality of the language and the experience of the programmers. Second, there is the cost of writing programs in the language. This is a function of the writability of the language, which depends in part on its closeness in purpose to the particular application. Third, there is the cost of compiling programs in the language.

Fourth, the cost of executing programs written in a language is greatly influenced by that language’s design. The fifth factor in the cost of a language is the cost of the language implementation system. Sixth, there is the cost of poor reliability. The final consideration is the cost of maintaining programs, which includes both corrections and modifications to add new functionality.

The cost of software maintenance depends on a number of language characteristics, primarily readability. Of all the contributors to language costs, three are most important: program development, maintenance, and reliability. Because these are functions of writability and readability, these two evaluation criteria are, in turn, the most important.

Influences on Programming Language Design

There are two influences on programming language design

1. Computer Architecture

The basic architecture of computers has had a profound effect on language design. Most of the popular languages of the past 50 years have been designed around the prevalent computer architecture, called the von Neumann architecture, after one of its originators, John von Neumann. These languages are called imperative languages. In a von Neumann computer, both data and programs are stored in the same memory. The central processing unit (CPU), which executes instructions, is separate from the memory.

2. Programming Design Methodologies

A. 1950s and Early 1960s

Simple applications and there was a worry about machine efficiency.

B. Late 1960s and Early 1970s

An intense analysis had begun in large part by the structured-programming movement, of both the software development process and programming language design. People efficiency became important, readability, better control structures.

C. Late 1970s

A shift from procedure-oriented to data-oriented program design methodologies began.

D. Middle 1980s

Object-oriented design began as the latest step in the evolution of data-oriented software development.

Language Categories

There are four languages categories

1. Imperative

Imperative means central features are variables, assignment statements and iteration. The examples are C and Pascal.

2. Functional

Functional means main means of making computations is by applying functions to given parameters. The examples are LISP and Scheme.

3. Logic

Logic means rule-based and riles are specified in no special order. The example is Prolog.

4. Object-oriented

Object-oriented means encapsulate data objects with processing, inheritance and dynamic type binding and grew out of imperative languages. The examples are C++ and Java.

Implementation Methods

There are three implementation methods

1. Compilation

Compilation means programs are translated into machine language includes JIT systems. The translation is slow but the execution is fast. The examples are C, COBOL and Ada. The usage is for large commercial applications.

2. Pure Interpretation

Pure interpretation means programs are interpreted by another program known as an interpreter. The implementation is easy but the execution is slower. The examples are Java Script and PHP. The usage is for small programs or when efficiency is not an issue.

3. Hybrid Implementation System

Hybrid implementation system means a compromise between compilers and pure interpreters. The implementation is easy and faster than pure interpretation. The examples are Perl and Java. The usage is for small and medium systems when efficiency is not the first concern.

Posted in Programming Language Concept | Leave a comment

Hello world!

Welcome to Binusian blog.
This is the first post of any blog.binusian.org member blog. First of all, i want to introduce my self, my name is Evan Renanto. My major is Computer Science and i am Binusian21.

Posted in Uncategorized | Leave a comment