Showing posts with label Programming in c. Show all posts
Showing posts with label Programming in c. Show all posts

Monday, March 4, 2013

Origination of C-Programming Language





Origination of C-Programming Language and getting its Name

To understand programming language you must be familiar with two words programming and language. Language is the collection of symbols, marks, sounds, gestures, icons and pictures that have special meaning and are used by people(may even be any creature) for sharing emotions or information among them. Programming is the process of listing the jobs in a sequence that something/somebody should have to be performed. So, programming language is the collection of notations for describing computation to people and to the machines.

C
 is a general-purpose programming language initially developed by Dennis Ritchie between 1969-1972; many of its principles and ideas were taken from earlier language B and B’s earlier ancestors CPL (Combined Programming Language) and BCPL(Basic Combined Programming Language); which are developed by Martin Richards. CPL was developed with the purpose of creating languages that were capable of both high level, machine independent programming and would still allow the programmer to control the behavior of individual bits of information. One major drawback of CPL was that it was too large for use in many applications. In 1967, BCPL was created as scaled down version of CPL while still retaining its basic features. In 1970 Ken Thomson, while working at Bell Labs, took this process further by developing the B language. B was the scaled down version of BCPL written specially for use in system programming.  Finally in 1972, a coworker of Ken Thomson, Dennis Ritchie, returned some of the generality found in BCPL to the B language in the process of developing the language we now know as C. The logic behind the name of B-programming was that it was first scaled down version of BCPL so B was used to represent the first scaled down version and was the first character of BCPL; and C-programming is the second scaled down version of BCPL and it uses the second character of the BCPL to represent the whole programming language .

C’s power and flexibility soon became apparent.  Because of this, the UNIX operating system which was originally written in assembly language, was almost immediately re-written in C. During the rest of 1970’s, C spread throughout many colleges and universities because of it’s close ties to UNIX and availability of C compiler.

The origin of C is closely tied to the development of the UNIX operating system, originally implemented in assembly language in PDP-7 by Ritchie and Thomson, incorporating several ideas from colleagues. Eventually they decided to port the operating system to PDP-11. B’s inability to take advantages of some of the PDP-11’s features, notably byte addressability, led to the development of an early version of C.

The original PDP-11 version of the UNIX was developed in assembly language. By 1973, with the addition of struct types, the C language had become powerful enough that most operating system kernel implemented in language other than assembly.

K&R C
In 1978, Brian Kernighan and Dennis Ritchie published the first edition of The Programming Language. This book was known to C programmers as “K&R” and served as an informal specification of the language. The version of C that describes is commonly referred as K&R C. The second edition of the book covers the ANSI C standard.

K&R introduced several features and are listed below-
a)     Standard I/O library,
b)     long int data type,
c)     unsigned int data type
d)     compound assignment operators of the form =op(such as =-) were changed to the form op= to remove the semantic ambiguity created by such construct  as i=-10, which had been interpreted as i=   -10.
Even after the publication of the 1989, C standard, for many years K&R C was still considered the “lowest common denominator” to which C programmers restricted themselves when maximum portability was desired; since many older compilers were still in use, and because carefully written k&R C can be legal standard C as well.

In early versions of C, only functions that returned a non-int value needed to be declared if used before the function definition; a function used without any previous declaration was assumed to return type int.

In the years following the publication of K&R C, several unofficial features were added to the language, supported by compilers from AT&T and some other vendors. These included:
a)     void functions(i.e. function with no return value).
b)     Functions returning struct or union types (rather than pointer).
c)     Assignment for struct data type.
d)     Enumerated type.
The large number of extensions and lack of agreement on a standard library, together with the language popularity implemented the K&R specification, led to the necessity of standardization.

ANSI C and ISO C
During late 1970s and 1980s version of C were implemented for wide variety of mainframe computers, minicomputers and microcomputers, including the IBM PC, as its popularity began increase significantly.

In 1983, the American National Standards Institute(ANSI) formed a committee, X3J11, to establish a standard specification of C. X3J11 based the C standard in the UNIX implementation; however, the non-portable portion of the UNIX C library was handed off to the IEEE working group 1003 to become the basis for the 1988 POSIX standard. In 1989, the C standard was ratified as ANSI X3 159-1989 “Programming Language C”. This version of language is often referred to as ANSI C, or C89.

In 1990, the ANSI C standard(with formatting changes) was adopted by International Organization for Standardization(ISO) as ISO/ICE 9899:1990, which is sometime called C90. Therefore, the term C89 and C90 refer to same programming language.

ANSI like other national standards bodies, no longer develops the C standard independently, but defers to the international C standard maintained by the working group ISO/ICE JTC1/SC22/WJ14. National adaptation of an update to international standard typically occurs within a year of ISO publication.

One of the C standardization process was to produce the superset of K&R C, incorporating many of the unofficial features subsequently introduced. The standard committee also included several additional features such as function prototype(borrowed from C++), void pointers, supports for international character sets and locales, and processor enhancements. Although the syntax for parameter declaration was argumented to include the style used in C++, the K&R interface continued to be permitted, for compatibility with existing source code.

C89 is supported by current C compilers, and most C code being written today is based on it. Any program written only in standard C and without any hardware dependent assumption will run correctly on any platform with a confirming C implementation, with in its resource limits without such precautions, programs may compile only on certain platform or with a particular compiler.

C99
After the ANSI/ISO standardization process, the C Language specification remained relatively static for several years. In 1995 Normative Amendment 1 to the 1990 C standard was published, to correct some details and to add more extensive support for international character sets. The C standard was further revised in the late 1990’s, leading to the publication of ISO 9899:1999 in 1999, which is commonly referred to as “C99”.  It has since been amended three times by Technical Corrigenda.

C99 introduced several new features, including inline functions, several new data types (including long long int and a complex type to represent complex number), variable-length arrays, improved support for IEEE 754 floating point, support for varadic macros, and support for on-line comments beginning with //, as in BCPL or C++. Many of these had already been implemented as extensions in several C compilers.

C99 is for the most part backward compatible with C90, but stricter in some ways; in particular, a declaration that lacks a type specifier no longer has int implicitly assumed. A standard macro –STDC-VERSION- is defined with values 1999011 to indicate that C99 support is available GCC, Solaris Studio, and other C compilers now support many or all of the new features of C99.

C11
In 2007, work began on another revision of the C standard informally called “C1X” until its official publication on 2011-12-08. The C standards committee adopted guidelines to limit the adaption of new features that had not been tested existing implementations.

The C11 standard adds numerous new features of C and the library, including type generic macro, anonymous structures, improved Unicode support, atomic operations, multi-threading, and bounds-checked functions. It also makes some portions of the existing C99 library optional and compatibility with C++.



Embedded C
Historically, embedded C programming requires nonstandard extensions to the C language in order to support exotic features such as fixed-point arithmetic, multiple distinct memory banks, and basic I/O operations.
In 2008, the C standards committee published a technical report extending the C language to address these issues by providing a common standard for all implementations to adhere to. It includes a number of features not available in normal C, such as fixed-point arithmetic, named address spaces, and basic I/O hardware addressing.

C is not a very low level language, nor a big one, and is not specialized to any particular area of application. But its absence of restriction and its generality make it more convenient and effective for many tasks than supposedly more powerful languages. C is middle level language; this doesnot mean that C is less powerful, harder to use, or less of less developed. Instead C combines the advantages of a high-level language with the functionalism of assembly language. Like the higher-level language, C provides block structures, standalone functions, and some small amount of data typing. It allows the manipulation of bits, bytes, words, and pointers, like assembly but it abstracts the hardware away from the code so that something written in C is very portable, meaning that a program can be easily adapted to run on several different computers. This is the great thing for system programmers; they can get the efficiency of assembly language programming without all the fuss and then they have a highly portable program; it allows programmers to do many things that probably would be caught as errors in high-level language. This is both an advantage and disadvantage. For the inexperienced programmer, it may be confusing when the behavior of a program is not correct. A high-level language catches many more possible errors at compile time, C lacks the highly typed environment that characterizes high-level languages.

According to Brian Kernighan co-author with Dennis Ritchie of “The C Programming Language”.; “Although the absence of some of features may seems like a grave of deficiency … keeping the language down to modest size has real benefits.” Since, C is relatively small; it can be described in small space and learned quickly. It has only 32 keywords to learn and support all type of conversion (QBASIC has 159 keywords).

Uses of C
C was initially used for system development work. But why use C…???? mainly because it produces code that run nearly as fast as written in assembly language. C is mostly used for writing the following types of programming:
            -Operating System                           -Text Editors
            -Language Compilers                      -Printer Spoolers
            -Assembly                                         -Network Drivers
            -Databases                                         -Utilities
Importance/Advantages of programming in C
1)     Easy to understand
2)     Freedom of using different type of data.
3)     Short listed words could be use.
4)     Efficient and fast programming.
5)     It can be used as mid-level language.
6)     Any type of software and operating system be developed with the help of C language.
All fields have technical words. These words are useful when you communicate with people in your field. But they do not communicate with outsiders. C also has its own technical words and reserved for self maintenance with special meaning and referred as keywords or reserved words. All of keywords are listed below with corresponding version’s amendment.

Below this can be ignored

C89 has 32 keyboards
auto                                                    extern                                     sizeof
break                                                  float                                        static
case                                                    for                                           struct
char                                                    goto                                        switch
const                                                   if                                             typedef
continue                                             int                                           union
default                                                long                                        unsigned
do                                                        register                                   void
double                                                            return                                     volatile
else                                                     short                                       while
enum                                                  signed

C99 adds five more keyboards
            _bool                                                  inline
            _complex                                           restrict
            _imaginary

C11 add seven more keywords
            _Alignas                                            _Noreturn
            _alignof                                             _Static_Assert
            _Atomic                                             _thread_local
            _generic

Tuesday, March 27, 2012

Chapter 14: What's Next?

Chapter 14: What's Next?

This last handout contains a brief list of the significant topics in C which we have not covered, and which you'll want to investigate further if you want to know all of C.

Types and Declarations

We have not talked about the void, short int, and long double types. void is a type with no values, used as a placeholder to indicate functions that do not return values or that accept no arguments, and in the ``generic'' pointer type void * that can point to anything. short int is an integer type that might use less space than a plain int; long double is a floating-point type that might have even more range or precision than plain double.

The char type and the various sizes of int also have ``unsigned'' versions, which are declared using the keyword unsigned. Unsigned types cannot hold negative values but have guaranteed properties on overflow. (Whether a plain char is signed or unsigned is implementation-defined; you can use the keyword signed to force a character type to contain signed characters.) Unsigned types are also useful when manipulating individual bits and bytes, when ``sign extension'' might otherwise be a problem.

Two additional type qualifiers const and volatile allow you to declare variables (or pointers to data) which you promise not to change, or which might change in unexpected ways behind the program's back.

There are user-defined structure and union types. A structure or struct is a ``record'' consisting of one or more values of one or more types concreted together into one entity which can be manipulated as a whole. A union is a type which, at any one time, can hold a value from one of a specified set of types.

There are user-defined enumeration types (``enum'') which are like integers but which always contain values from some fixed, predefined set, and for which the values are referred to by name instead of by number.

Pointers can point to functions as well as to data types.

Types can be arbitrarily complicated, when you start using multiple levels of pointers, arrays, functions, structures, and/or unions. Eventually, it's important to understand the concept of a declarator: in the declaration

        int i, *ip, *fpi();

we have the base type int and three declarators i, *ip, and *fpi(). The declarator gives the name of a variable (or function) and also indicates whether it is a simple variable or a pointer, array, function, or some more elaborate combination (array of pointers, function returning pointer, etc.). In the example, i is declared to be a plain int, ip is declared to be a pointer to int, and fpi is declared to be a function returning pointer to int. (Complicated declarators may also contain parentheses for grouping, since there's a precedence hierarchy in declarators as well as expressions: [] for arrays and () for functions have higher precedence than * for pointers.)

We have not said much about pointers to pointers, or arrays of arrays (i.e. multidimensional arrays), or the ramifications of array/pointer equivalence on multidimensional arrays. (In particular, a reference to an array of arrays does not generate a pointer to a pointer; it generates a pointer to an array. You cannot pass a multidimensional array to a function which accepts pointers to pointers.)

Variables can be declared with a hint that they be placed in high-speed CPU registers, for efficiency. (These hints are rarely needed or used today, because modern compilers do a good job of register allocation by themselves, without hints.)

A mechanism called typedef allows you to define user-defined aliases (i.e. new and perhaps more-convenient names) for other types.

Operators

The bitwise operators &, |, ^, and ~ operate on integers thought of as binary numbers or strings of bits. The & operator is bitwise AND, the | operator is bitwise OR, the ^ operator is bitwise exclusive-OR (XOR), and the ~ operator is a bitwise negation or complement. (&, |, and ^ are ``binary'' in that they take two operands; ~ is unary.) These operators let you work with the individual bits of a variable; one common use is to treat an integer as a set of single-bit flags. You might define the 3rd (2**2) bit as the ``verbose'' flag bit by defining

        #define VERBOSE 4

Then you can ``turn the verbose bit on'' in an integer variable flags by executing

        flags = flags | VERBOSE;
or
        flags |= VERBOSE;

and turn it off with

        flags = flags & ~VERBOSE;
or
        flags &= ~VERBOSE;

and test whether it's set with

        if(flags & VERBOSE)

The left-shift and right-shift operators << and >> let you shift an integer left or right by some number of bit positions; for example, value << 2 shifts value left by two bits.

The ?: or conditional operator (also called the ``ternary operator'') essentially lets you embed an if/then statement in an expression. The assignment

        a = expr ? b : c;

is roughly equivalent to

        if(expr)
               a = b;
        else    a = c;

Since you can use ?: anywhere in an expression, it can do things that if/then can't, or that would be cumbersome with if/then. For example, the function call

        f(a, b, c ? d : e);

is roughly equivalent to

        if(c)
               f(a, b, d);
        else    f(a, b, e);

(Exercise: what would the call

        g(a, b, c ? d : e, h ? i : j, k);

be equivalent to?)

The comma operator lets you put two separate expressions where one is required; the expressions are executed one after the other. The most common use for comma operators is when you want multiple variables controlling a for loop, for example:

        for(i = 0, j = 10; i < j; i++, j--)

A cast operator allows you to explicitly force conversion of a value from one type to another. A cast consists of a type name in parentheses. For example, you could convert an int to a double by typing

        int i = 10;
        double d;
        d = (double)i;

(In this case, though, the cast is redundant, since this is a conversion that C would have performed for you automatically, i.e. if you'd just said d = i .) You use explicit casts in those circumstances where C does not do a needed conversion automatically. One example is division: if you're dividing two integers and you want a floating-point result, you must explicitly force at least one of the operands to floating-point, otherwise C will perform an integer division and will discard the remainder. The code

        int i = 1, j = 2;
        double d = i / j;

will set d to 0, but

        d = (double)i / j;

will set d to 0.5. You can also ``cast to void'' to explicitly indicate that you're ignoring a function's return value, as in

        (void)fclose(fp);

or

        (void)printf("Hello, world!\n");

(Usually, it's a bad idea to ignore return values, but in some cases it's essentially inevitable, and the (void) cast keeps some compilers from issuing warnings every time you ignore a value.)

There's a precise, mildly elaborate set of rules which C uses for converting values automatically, in the absence of explicit casts.

The . and -> operators let you access the members (components) of structures and unions.

Statements

The switch statement allows you to jump to one of a number of numeric case labels depending on the value of an expression; it's more convenient than a long if/else chain. (However, you can use switch only when the expression is integral and all of the case labels are compile-time constants.)

The do/while loop is a loop that tests its controlling expression at the bottom of the loop, so that the body of the loop always executes once even if the condition is initially false. (C's do/while loop is therefore like Pascal's repeat/until loop, while C's while loop is like Pascal's while/do loop.)

Finally, when you really need to write ``spaghetti code,'' C does have the all-purpose goto statement, and labels to go to.

Functions

Functions can't return arrays, and it's tricky to write a function as if it returns an array (perhaps by simulating the array with a pointer) because you have to be careful about allocating the memory that the returned pointer points to.

The functions we've written have all accepted a well-defined, fixed number of arguments. printf accepts a variable number of arguments (depending on how many % signs there are in the format string) but we haven't seen how to declare and write functions that do this.

C Preprocessor

If you're careful, it's possible (and can be useful) to use #include within a header file, so that you end up with ``nested header files.''

It's possible to use #define to define ``function-like'' macros that accept arguments; the expansion of the macro can therefore depend on the arguments it's ``invoked'' with.

Two special preprocessing operators # and ## let you control the expansion of macro arguments in fancier ways.

The preprocessor directive #if lets you conditionally include (or, with #else, conditionally not include) a section of code depending on some arbitrary compile-time expression. (#if can also do the same macro-definedness tests as #ifdef and #ifndef, because the expression can use a defined() operator.)

Other preprocessing directives are #elif, #error, #line, and #pragma.

There are a few predefined preprocessor macros, some required by the C standard, others perhaps defined by particular compilation environments. These are useful for conditional compilation (#ifdef, #ifndef).

Standard Library Functions

C's standard library contains many features and functions which we haven't seen.

We've seen many of printf's formatting capabilities, but not all. Besides format specifier characters for a few types we haven't seen, you can also control the width, precision, justification (left or right) and a few other attributes of printf's format conversions. (In their full complexity, printf formats are about as elaborate and powerful as FORTRAN format statements.)

A scanf function lets you do ``formatted input'' analogous to printf's formatted output. scanf reads from the standard input; a variant fscanf reads from a specified file pointer.

The sprintf and sscanf functions let you ``print'' and ``read'' to and from in-memory strings instead of files. We've seen that atoi lets you convert a numeric string into an integer; the inverse operation can be performed with sprintf:

        int i = 10;
        char str[10];
        sprintf(str, "%d", i);

We've used printf and fprintf to write formatted output, and getchar, getc, putchar, and putc to read and write characters. There are also functions gets, fgets, puts, and fputs for reading and writing lines (though we rarely need these, especially if we're using our own getline and maybe fgetline), and also fread and fwrite for reading or writing arbitrary numbers of characters.

It's possible to ``un-read'' a character, that is, to push it back on an input stream, with ungetc. (This is useful if you accidentally read one character too far, and would prefer that some other part of your program read that character instead.)

You can use the ftell, fseek, and rewind functions to jump around in files, performing random access (as opposed to sequential) I/O.

The feof and ferror functions will tell you whether you got EOF due to an actual end-of-file condition or due to a read error of some sort. You can clear errors and end-of-file conditions with clearerr.

You can open files in ``binary'' mode, or for simultaneous reading and writing. (These options involve extra characters appended to fopen's mode string: b for binary, + for read/write.)

There are several more string functions in . A second set of string functions strncpy, strncat, and strncmp all accept a third argument telling them to stop after n characters if they haven't found the \0 marking the end of the string. A third set of ``mem'' functions, including memcpy and memcmp, operate on blocks of memory which aren't necessarily strings and where \0 is not treated as a terminator. The strchr and strrchr functions find characters in strings. There is a motley collection of ``span'' and ``scan'' functions, strspn, strcspn, and strpbrk, for searching out or skipping over sequences of characters all drawn from a specified set of characters. The strtok function aids in breaking up a string into words or ``tokens,'' much like our own getwords function.

The header file contains several functions which let you classify and manipulate characters: check for letters or digits, convert between upper- and lower-case, etc.

A host of mathematical functions are defined in the header file . (As we've mentioned, besides including , you may on some Unix systems have to ask for a special library containing the math functions while compiling/linking.)

There's a random-number generator, rand, and a way to ``seed'' it, srand. rand returns integers from 0 up to RAND_MAX (where RAND_MAX is a constant #defined in ). One way of getting random integers from 1 to n is to call

        (int)(rand() / (RAND_MAX + 1.0) * n) + 1

Another way is

        rand() / (RAND_MAX / n + 1) + 1

It seems like it would be simpler to just say

        rand() % n + 1

but this method is imperfect (or rather, it's imperfect if n is a power of two and your system's implementation of rand() is imperfect, as all too many of them are).

Several functions let you interact with the operating system under which your program is running. The exit function returns control to the operating system immediately, terminating your program and returning an ``exit status.'' The getenv function allows you to read your operating system's or process's ``environment variables'' (if any). The system function allows you to invoke an operating-system command (i.e. another program) from within your program.

The qsort function allows you to sort an array (of any type); you supply a comparison function (via a function pointer) which knows how to compare two array elements, and qsort does the rest. The bsearch function allows you to search for elements in sorted arrays; it, too, operates in terms of a caller-supplied comparison function.

Several functions--time, asctime, gmtime, localtime, asctime, mktime, difftime, and strftime--allow you to determine the current date and time, print dates and times, and perform other date/time manipulations. For example, to print today's date in a program, you can write

        #include 
 
        time_t now;
        now = time((time_t *)NULL);
        printf("It's %.24s", ctime(&now));

The header file lets you manipulate variable-length function argument lists (such as the ones printf is called with). Additional members of the printf family of functions let you write your own functions which accept printf-like format specifiers and variable numbers of arguments but call on the standard printf to do most of the work.

There are facilities for dealing with multibyte and ``wide'' characters and strings, for use with multinational character sets.

Chapter 13: Reading the Command Line

Chapter 13: Reading the Command Line

We've mentioned several times that a program is rarely useful if it does exactly the same thing every time you run it. Another way of giving a program some variable input to work on is by invoking it with command line arguments.

(We should probably admit that command line user interfaces are a bit old-fashioned, and currently somewhat out of favor. If you've used Unix or MS-DOS, you know what a command line is, but if your experience is confined to the Macintosh or Microsoft Windows or some other Graphical User Interface, you may never have seen a command line. In fact, if you're learning C on a Mac or under Windows, it can be tricky to give your program a command line at all. Think C for the Macintosh provides a way; I'm not sure about other compilers. If your compilation environment doesn't provide an easy way of simulating an old-fashioned command line, you may skip this chapter.)

C's model of the command line is that it consists of a sequence of words, typically separated by whitespace. Your main program can receive these words as an array of strings, one word per string. In fact, the C run-time startup code is always willing to pass you this array, and all you have to do to receive it is to declare main as accepting two parameters, like this:

        int main(int argc, char *argv[])
        {
        ...
        }

When main is called, argc will be a count of the number of command-line arguments, and argv will be an array (``vector'') of the arguments themselves. Since each word is a string which is represented as a pointer-to-char, argv is an array-of-pointers-to-char. Since we are not defining the argv array, but merely declaring a parameter which references an array somewhere else (namely, in main's caller, the run-time startup code), we do not have to supply an array dimension for argv. (Actually, since functions never receive arrays as parameters in C, argv can also be thought of as a pointer-to-pointer-to-char, or char **. But multidimensional arrays and pointers to pointers can be confusing, and we haven't covered them, so we'll talk about argv as if it were an array.) (Also, there's nothing magic about the names argc and argv. You can give main's two parameters any names you like, as long as they have the appropriate types. The names argc and argv are traditional.)

The first program to write when playing with argc and argv is one which simply prints its arguments:

#include 
 
main(int argc, char *argv[])
{
int i;
 
for(i = 0; i < argc; i++)
        printf("arg %d: %s\n", i, argv[i]);
return 0;
}

(This program is essentially the Unix or MS-DOS echo command.)

If you run this program, you'll discover that the set of ``words'' making up the command line includes the command you typed to invoke your program (that is, the name of your program). In other words, argv[0] typically points to the name of your program, and argv[1] is the first argument.

There are no hard-and-fast rules for how a program should interpret its command line. There is one set of conventions for Unix, another for MS-DOS, another for VMS. Typically you'll loop over the arguments, perhaps treating some as option flags and others as actual arguments (input files, etc.), interpreting or acting on each one. Since each argument is a string, you'll have to use strcmp or the like to match arguments against any patterns you might be looking for. Remember that argc contains the number of words on the command line, and that argv[0] is the command name, so if argc is 1, there are no arguments to inspect. (You'll never want to look at argv[i], for i >= argc, because it will be a null or invalid pointer.)

As another example, also illustrating fopen and the file I/O techniques of the previous chapter, here is a program which copies one or more input files to its standard output. Since ``standard output'' is usually the screen by default, this is therefore a useful program for displaying files. (It's analogous to the obscurely-named Unix cat command, and to the MS-DOS type command.) You might also want to compare this program to the character-copying program of section 6.2.

#include 
 
main(int argc, char *argv[])
{
int i;
FILE *fp;
int c;
 
for(i = 1; i < argc; i++)
        {
        fp = fopen(argv[i], "r");
        if(fp == NULL)
               {
               fprintf(stderr, "cat: can't open %s\n", argv[i]);
               continue;
               }
 
        while((c = getc(fp)) != EOF)
               putchar(c);
 
        fclose(fp);
        }
 
return 0;
}

As a historical note, the Unix cat program is so named because it can be used to concatenate two files together, like this:

        cat a b > c

This illustrates why it's a good idea to print error messages to stderr, so that they don't get redirected. The ``can't open file'' message in this example also includes the name of the program as well as the name of the file.

Yet another piece of information which it's usually appropriate to include in error messages is the reason why the operation failed, if known. For operating system problems, such as inability to open a file, a code indicating the error is often stored in the global variable errno. The standard library function strerror will convert an errno value to a human-readable error message string. Therefore, an even more informative error message printout would be

        fp = fopen(argv[i], "r");
        if(fp == NULL)
               fprintf(stderr, "cat: can't open %s: %s\n",
                               argv[i], strerror(errno));

If you use code like this, you can #include to get the declaration for errno, and to get the declaration for strerror().

Chapter 12: Input and Output

Chapter 12: Input and Output

So far, we've been calling printf to print formatted output to the ``standard output'' (wherever that is). We've also been calling getchar to read single characters from the ``standard input,'' and putchar to write single characters to the standard output. ``Standard input'' and ``standard output'' are two predefined I/O streams which are implicitly available to us. In this chapter we'll learn how to take control of input and output by opening our own streams, perhaps connected to data files, which we can read from and write to.


12.1 File Pointers and fopen

How will we specify that we want to access a particular data file? It would theoretically be possible to mention the name of a file each time it was desired to read from or write to it. But such an approach would have a number of drawbacks. Instead, the usual approach (and the one taken in C's stdio library) is that you mention the name of the file once, at the time you open it. Thereafter, you use some little token--in this case, the file pointer--which keeps track (both for your sake and the library's) of which file you're talking about. Whenever you want to read from or write to one of the files you're working with, you identify that file by using its file pointer (that is, the file pointer you obtained when you opened the file). As we'll see, you store file pointers in variables just as you store any other data you manipulate, so it is possible to have several files open, as long as you use distinct variables to store the file pointers.

You declare a variable to store a file pointer like this:

        FILE *fp;

The type FILE is predefined for you by . It is a data structure which holds the information the standard I/O library needs to keep track of the file for you. For historical reasons, you declare a variable which is a pointer to this FILE type. The name of the variable can (as for any variable) be anything you choose; it is traditional to use the letters fp in the variable name (since we're talking about a file pointer). If you were reading from two files at once you'd probably use two file pointers:

        FILE *fp1, *fp2;

If you were reading from one file and writing to another you might declare and input file pointer and an output file pointer:

        FILE *ifp, *ofp;

Like any pointer variable, a file pointer isn't any good until it's initialized to point to something. (Actually, no variable of any type is much good until you've initialized it.) To actually open a file, and receive the ``token'' which you'll store in your file pointer variable, you call fopen. fopen accepts a file name (as a string) and a mode value indicating among other things whether you intend to read or write this file. (The mode variable is also a string.) To open the file input.dat for reading you might call

        ifp = fopen("input.dat", "r");

The mode string "r" indicates reading. Mode "w" indicates writing, so we could open output.dat for output like this:

        ofp = fopen("output.dat", "w");

The other values for the mode string are less frequently used. The third major mode is "a" for append. (If you use "w" to write to a file which already exists, its old contents will be discarded.) You may also add a + character to the mode string to indicate that you want to both read and write, or a b character to indicate that you want to do ``binary'' (as opposed to text) I/O.

One thing to beware of when opening files is that it's an operation which may fail. The requested file might not exist, or it might be protected against reading or writing. (These possibilities ought to be obvious, but it's easy to forget them.) fopen returns a null pointer if it can't open the requested file, and it's important to check for this case before going off and using fopen's return value as a file pointer. Every call to fopen will typically be followed with a test, like this:

        ifp = fopen("input.dat", "r");
        if(ifp == NULL)
               {
               printf("can't open file\n");
               exit or return
               }

If fopen returns a null pointer, and you store it in your file pointer variable and go off and try to do I/O with it, your program will typically crash.

It's common to collapse the call to fopen and the assignment in with the test:

        if((ifp = fopen("input.dat", "r")) == NULL)
               {
               printf("can't open file\n");
               exit or return
               }

You don't have to write these ``collapsed'' tests if you're not comfortable with them, but you'll see them in other people's code, so you should be able to read them.

12.2 I/O with File Pointers

For each of the I/O library functions we've been using so far, there's a companion function which accepts an additional file pointer argument telling it where to read from or write to. The companion function to printf is fprintf, and the file pointer argument comes first. To print a string to the output.dat file we opened in the previous section, we might call

        fprintf(ofp, "Hello, world!\n");

The companion function to getchar is getc, and the file pointer is its only argument. To read a character from the input.dat file we opened in the previous section, we might call

        int c;
        c = getc(ifp);

The companion function to putchar is putc, and the file pointer argument comes last. To write a character to output.dat, we could call

        putc(c, ofp);

Our own getline function calls getchar and so always reads the standard input. We could write a companion fgetline function which reads from an arbitrary file pointer:

#include 
 
/* Read one line from fp, */
/* copying it to line array (but no more than max chars). */
/* Does not place terminating \n in line array. */
/* Returns line length, or 0 for empty line, or EOF for end-of-file. */
 
int fgetline(FILE *fp, char line[], int max)
{
int nch = 0;
int c;
max = max - 1;                 /* leave room for '\0' */
 
while((c = getc(fp)) != EOF)
        {
        if(c == '\n')
               break;
 
        if(nch < max)
               {
               line[nch] = c;
               nch = nch + 1;
               }
        }
 
if(c == EOF && nch == 0)
        return EOF;
 
line[nch] = '\0';
return nch;
}

Now we could read one line from ifp by calling

        char line[MAXLINE];
        ...
        fgetline(ifp, line, MAXLINE);

12.3 Predefined Streams

Besides the file pointers which we explicitly open by calling fopen, there are also three predefined streams. stdin is a constant file pointer corresponding to standard input, and stdout is a constant file pointer corresponding to standard output. Both of these can be used anywhere a file pointer is called for; for example, getchar() is the same as getc(stdin) and putchar(c) is the same as putc(c, stdout). The third predefined stream is stderr. Like stdout, stderr is typically connected to the screen by default. The difference is that stderr is not redirected when the standard output is redirected. For example, under Unix or MS-DOS, when you invoke

        program > filename

anything printed to stdout is redirected to the file filename, but anything printed to stderr still goes to the screen. The intent behind stderr is that it is the ``standard error output''; error messages printed to it will not disappear into an output file. For example, a more realistic way to print an error message when a file can't be opened would be

        if((ifp = fopen(filename, "r")) == NULL)
               {
               fprintf(stderr, "can't open file %s\n", filename);
               exit or return
               }

where filename is a string variable indicating the file name to be opened. Not only is the error message printed to stderr, but it is also more informative in that it mentions the name of the file that couldn't be opened. (We'll see another example in the next chapter.)

12.4 Closing Files

Although you can open multiple files, there's a limit to how many you can have open at once. If your program will open many files in succession, you'll want to close each one as you're done with it; otherwise the standard I/O library could run out of the resources it uses to keep track of open files. Closing a file simply involves calling fclose with the file pointer as its argument:

        fclose(fp);

Calling fclose arranges that (if the file was open for output) any last, buffered output is finally written to the file, and that those resources used by the operating system (and the C library) for this file are released. If you forget to close a file, it will be closed automatically when the program exits.

12.5 Example: Reading a Data File

Suppose you had a data file consisting of rows and columns of numbers:

        1       2       34
        5       6       78
        9       10      112

Suppose you wanted to read these numbers into an array. (Actually, the array will be an array of arrays, or a ``multidimensional'' array; see section 4.1.2.) We can write code to do this by putting together several pieces: the fgetline function we just showed, and the getwords function from chapter 10. Assuming that the data file is named input.dat, the code would look like this:

#define MAXLINE 100
#define MAXROWS 10
#define MAXCOLS 10
 
int array[MAXROWS][MAXCOLS];
char *filename = "input.dat";
FILE *ifp;
char line[MAXLINE];
char *words[MAXCOLS];
int nrows = 0;
int n;
int i;
 
ifp = fopen(filename, "r");
if(ifp == NULL)
        {
        fprintf(stderr, "can't open %s\n", filename);
        exit(EXIT_FAILURE);
        }
 
while(fgetline(ifp, line, MAXLINE) != EOF)
        {
        if(nrows >= MAXROWS)
               {
               fprintf(stderr, "too many rows\n");
               exit(EXIT_FAILURE);
               }
 
        n = getwords(line, words, MAXCOLS);
 
        for(i = 0; i < n; i++)
               array[nrows][i] = atoi(words[i]);
        nrows++;
        }

Each trip through the loop reads one line from the file, using fgetline. Each line is broken up into ``words'' using getwords; each ``word'' is actually one number. The numbers are however still represented as strings, so each one is converted to an int by calling atoi before being stored in the array. The code checks for two different error conditions (failure to open the input file, and too many lines in the input file) and if one of these conditions occurs, it prints an error message, and exits. The exit function is a Standard library function which terminates your program. It is declared in , and accepts one argument, which will be the exit status of the program. EXIT_FAILURE is a code, also defined by , which indicates that the program failed. Success is indicated by a code of EXIT_SUCCESS, or simply 0. (These values can also be returned from main(); calling exit with a particular status value is essentially equivalent to returning that same status value from main.)