COMS W4115
Programming Languages and Translators
Lecture 2: September 14, 2009
Language Design Issues
Overview
- Course project
- Fundamental elements of programming languages
- An overview of the C programming language
1. Course Project
- Project description
- Form a team of five to create and implement
an innovative little language of your own design.
- Each team member must write at least 500 lines of
code for the compiler.
- Project constitutes 40% of final grade.
- Project team: elect one person to serve each of the following functions.
- Project manager
- This person sets the project schedule, holds weekly meetings
with the entire team, maintains the project log, and makes
sure the project deliverables get done on time.
- Language and tools guru
- This person defines the baseline process to track language changes
and maintain the intellectual integrity of the language.
- This person teaches the team how to use various tools
used to build the compiler.
- System architect
- This person defines the compiler architecture, modules, and
interfaces.
- System integrator
- This person defines the system integration environment
and makes sure the compiler components work together.
- Tester and validator
- This person defines the test suites and executes them
to make sure the compiler meets the language specification.
- Project due dates and deliverables:
- Oct. 7: Language white paper (written by entire team, 3-4 pages).
- Nov. 4: Language tutorial (written by entire team, 10-20 pages).
- Chapter 1 of Kernighan and Ritchie is a good model of a
language tutorial.
- Describe a few representative programs that illustrate the
nature and scope of your language.
- A "hello, world" program is de rigueur.
- Nov. 4: Language reference manual (written by entire team, 20-30 pages).
- Appendix A of Kernighan and Ritchie is a good model.
- Give a complete description of the lexical and syntactic
structure of your language.
- Include a full grammar for your language.
- Dec. 7 and 9: Ten-minute project presentations in class.
- Dec. 21-23: Working compiler and demo.
- Dec. 21-23: Final project report due at project demo.
- Let Peter Lu (yl2505@columbia.edu)
know by Sep 16 what implementation language you are most comfortable with
(e.g., C or Java) and the kinds of areas for which you might be interested
in creating a new language. Contact Peter for help on joining
a team or finding additional teammates.
2. Fundamental Elements of Programming Languages
- Programming model
- For example, C is an imperative language, designed around the
von Neumann model of computation.
- Program structure
- Character set and lexical conventions
- Names, scopes, bindings, and lifetimes
- Data types and operators
- Expressions and assignment statements
- Control flow
- Procedures and control abstraction
- Data abstraction and object orientation
- Concurrency
3. The C Programming Language
- C is a general-purpose programming language, originally developed
by Dennis Ritchie at Bell Labs in 1972 for implementing UNIX,
but is now widely used
for systems and applications development.
- The
Microsoft C Language Reference Manual describes C as
- "a general-purpose programming language known
for its efficiency, economy, and portability. While these characteristics
make it a good choice for almost any kind of programming, C has proven
especially useful in systems programming because it facilitates writing
fast, compact programs that are readily adaptable to other systems.
Well-written C programs are often as fast as assembly-language programs,
and they are typically easier for programmers to read and maintain."
- See
The Development of the C language
by Dennis Ritchie for the history surrounding the development of C.
- The original version of C as published in Kernighan and Ritchie,
The C Programming Language, Prentice Hall 1978, is called
K&R C.
- C was standardized by ANSI in 1989 (and is called C89) and this
version was adopted by ISO/IEC in 1990 (commonly called C90).
The 1988 edition of Kernighan and Ritchie's The C Programming
Language describes the ANSI standard version of C.
- A new international standard for C was created by ISO/IEC in
1999 and is referred to as C99.
- Example source program: Line-counting program in C (from K & R, p. 19).
#include <stdio.h>
/* count lines in input */
main()
{
int c, nl;
nl = 0;
while ((c = getchar()) != EOF)
if (c == '\n')
++nl;
printf("%d\n", nl);
}
4. An Overview of the C Programming Language
- From Appendix A of Kernighan and Ritchie, 1988.
- Program structure
- Source files and source programs
- A source program consists of one or more source files called translation units.
- A translation unit is the input to the compiler.
- A translation unit is a sequence of external declarations,
which are either declarations or function definitions.
- The main function
- Every C program must have one main function,
which is where program execution starts.
- You may pass command-line arguments (parameters) to a C program
when it begins executing with the syntax
main(int argc, char *argv[])
Lifetime, scope, visibility, and linkage
- Lifetime is the period of time during the execution of a program
in which a variable or function exists.
- The scope of a name is the region of the program in which it is known (visible).
C uses static scope; that is, the scope of a name can be determined at
compile time.
- The linkage of a name determines whether the same name in another scope refers
to the same variable or function.
Name spaces
- Identifiers in a C program can fall into several disjoint name spaces. This allows the
same identifier to be used for different purposes even in the same scope if the uses
are in different name spaces.
- E.g., the three uses of student in this structure declaration are in three different
name spaces (structure tag, structure member, structure variable):
struct student { /* here student is a structure tag */
char student[25]; /* here student is a structure member */
int id;
} student; /* here student is a structure variable */
5. Lexical Conventions of C
- Character set
- A C source program contains characters from the source character set,
which includes the upper and lowercase letters, digits, underscore,
punctuation, and some graphic characters such as space and various
kinds of tabs. There are a number of mechanisms to specify various
kinds of constants.
- A C target program uses characters from the target character set
when it executes in the target environment.
- The source and target character sets are related but need not be
the same and can use different encodings.
- See the C99 standard or
C characters for more details.
- Tokens
- identifiers
- keywords
- constants
- string literals
- operators
- separators
- whitespace
- Comments
- C89 has the multiple line
/* ... */ style comment,
- C99 has added the C++ style
// this is a single line comment
6. Identifiers
- Identifiers (or names) in C can refer to
- objects (variables)
- functions
- tags of structures, unions, or enumerations
- members of structures or unions
- enumeration constants
- typedef names
- An object, sometimes called a variable, is a location in storage
that has two main
attributes, its storage class (automatic or static) and type.
7. Types
- Types determine the permissible values and operations within
a program.
- They provide an implicit context for operations.
- They help detect bugs.
- A type system is a set of rules for
- defining and associating types with various parts of a
program, and
- defining type equivalence, compatability and inference.
- Basic types of C89
- char
- integer
- floating point
- enumeration
- void
- Derived types
- arrays of objects of a given type
- functions returning objects of a given type
- pointers to objects of a given type
- structures containing a sequence of objects of various types
- unions capable of containing any one of several objects
of various types
8. Objects and Lvalues
- An object in C (not to be confused with an object
in an object-oriented language) is a named region of storage.
- An lvalue is an expression that refers to an object.
- If
p is an expression of pointer type, then
*p is an lvalue expression
referring to the object to which p points.
9. Conversions
- Some operators in C may convert the value of an operand from one type
to another. E.g., if
f is a variable of floating type
and i is a variable an integral type, the assignment
f = i converts the integral value of i to the
closest floating
representable value before assigning that value to f.
10. Expressions
- C has a rich set of operators for manipulating data.
- The precedence and associativity of operators is fully specified,
but the order of evaluation (with some exceptions) is unspecified.
- The assignment operators that can be used in assignment expressions
group right-to-left.
- The assignment operators are:
= *= /= %= += -= <<= >>= &= ^= !=
- All assignment operators require a modifiable lvalue as a left operand.
- In C pointers and arrays have a strong relationship. The declaration
-
int a[10];
- defines an array
a of size 10.
If pa is a pointer to an integer declared as
-
int *pa;
- then the assignment
-
x = *pa;
- copies the contents of
a[0] into x.
- C allows pointer arithmetic so the assignment
-
y = *(pa+1);
- copies the contents of
a[1] into y.
- Note that after the assignment
-
pa = &a[0]; /* & is the "address of operator" */
pa and a have identical values.
This assignment could also have been written as
-
pa = a;
11. Declarations and Definitions
- Declarations specify the interpretation given to each identifier
but do not necessarily reserve storage associated with the identifier.
- Declarations that reserve storage are called definitions.
12. Typedefs
- A typedef is a declaration (whose storage class specifier is typedef)
that defines identifiers (called typedef names) that name types.
13. Type Equivalence
- Programming languages often use a combination of two approaches to
type equivalence.
- In name equivalence two variables have the equivalent types
if they are defined either in the same declaration or in declarations
that use the same type name.
- In structural equivalence two variables have equivalent types
if their types have identical structures.
- C uses both name and structural type equivalence.
- C uses name equivalence for structures, enumerations, and unions,
(unless they are defined in different files in which case
structural equivalence is used).
- Other nonscalar types use structural equivalence.
- Array types are equivalent if their components are of the same type.
14. Statements
- Most statements in C are executed in sequence.
- Statements are executed for their effect and do not have values.
- C has the following kinds of statements:
- labeled statements
- expression statements
- compound (also called block) statements
- selection statements: if, if-else, switch
- iterations statements: while, do, for
- jump statements: goto, continue, break, return
15. Preprocessing
- Lines beginning with # defined commands that communicate with a preprocessor.
- The preprocessor can perform macro substitution, conditional compilation, and
inclusion of named files.
- The effect of a preprocessor command lasts until the end of the translation unit.
16. Reading Assignment
- Kernighan and Ritchie, 1988, Appendix A.
17. References
aho@cs.columbia.edu