Tuesday, March 27, 2012

Chapter 14: What's Next?

Chapter 14: What's Next?

This last handout contains a brief list of the significant topics in C which we have not covered, and which you'll want to investigate further if you want to know all of C.

Types and Declarations

We have not talked about the void, short int, and long double types. void is a type with no values, used as a placeholder to indicate functions that do not return values or that accept no arguments, and in the ``generic'' pointer type void * that can point to anything. short int is an integer type that might use less space than a plain int; long double is a floating-point type that might have even more range or precision than plain double.

The char type and the various sizes of int also have ``unsigned'' versions, which are declared using the keyword unsigned. Unsigned types cannot hold negative values but have guaranteed properties on overflow. (Whether a plain char is signed or unsigned is implementation-defined; you can use the keyword signed to force a character type to contain signed characters.) Unsigned types are also useful when manipulating individual bits and bytes, when ``sign extension'' might otherwise be a problem.

Two additional type qualifiers const and volatile allow you to declare variables (or pointers to data) which you promise not to change, or which might change in unexpected ways behind the program's back.

There are user-defined structure and union types. A structure or struct is a ``record'' consisting of one or more values of one or more types concreted together into one entity which can be manipulated as a whole. A union is a type which, at any one time, can hold a value from one of a specified set of types.

There are user-defined enumeration types (``enum'') which are like integers but which always contain values from some fixed, predefined set, and for which the values are referred to by name instead of by number.

Pointers can point to functions as well as to data types.

Types can be arbitrarily complicated, when you start using multiple levels of pointers, arrays, functions, structures, and/or unions. Eventually, it's important to understand the concept of a declarator: in the declaration

        int i, *ip, *fpi();

we have the base type int and three declarators i, *ip, and *fpi(). The declarator gives the name of a variable (or function) and also indicates whether it is a simple variable or a pointer, array, function, or some more elaborate combination (array of pointers, function returning pointer, etc.). In the example, i is declared to be a plain int, ip is declared to be a pointer to int, and fpi is declared to be a function returning pointer to int. (Complicated declarators may also contain parentheses for grouping, since there's a precedence hierarchy in declarators as well as expressions: [] for arrays and () for functions have higher precedence than * for pointers.)

We have not said much about pointers to pointers, or arrays of arrays (i.e. multidimensional arrays), or the ramifications of array/pointer equivalence on multidimensional arrays. (In particular, a reference to an array of arrays does not generate a pointer to a pointer; it generates a pointer to an array. You cannot pass a multidimensional array to a function which accepts pointers to pointers.)

Variables can be declared with a hint that they be placed in high-speed CPU registers, for efficiency. (These hints are rarely needed or used today, because modern compilers do a good job of register allocation by themselves, without hints.)

A mechanism called typedef allows you to define user-defined aliases (i.e. new and perhaps more-convenient names) for other types.

Operators

The bitwise operators &, |, ^, and ~ operate on integers thought of as binary numbers or strings of bits. The & operator is bitwise AND, the | operator is bitwise OR, the ^ operator is bitwise exclusive-OR (XOR), and the ~ operator is a bitwise negation or complement. (&, |, and ^ are ``binary'' in that they take two operands; ~ is unary.) These operators let you work with the individual bits of a variable; one common use is to treat an integer as a set of single-bit flags. You might define the 3rd (2**2) bit as the ``verbose'' flag bit by defining

        #define VERBOSE 4

Then you can ``turn the verbose bit on'' in an integer variable flags by executing

        flags = flags | VERBOSE;
or
        flags |= VERBOSE;

and turn it off with

        flags = flags & ~VERBOSE;
or
        flags &= ~VERBOSE;

and test whether it's set with

        if(flags & VERBOSE)

The left-shift and right-shift operators << and >> let you shift an integer left or right by some number of bit positions; for example, value << 2 shifts value left by two bits.

The ?: or conditional operator (also called the ``ternary operator'') essentially lets you embed an if/then statement in an expression. The assignment

        a = expr ? b : c;

is roughly equivalent to

        if(expr)
               a = b;
        else    a = c;

Since you can use ?: anywhere in an expression, it can do things that if/then can't, or that would be cumbersome with if/then. For example, the function call

        f(a, b, c ? d : e);

is roughly equivalent to

        if(c)
               f(a, b, d);
        else    f(a, b, e);

(Exercise: what would the call

        g(a, b, c ? d : e, h ? i : j, k);

be equivalent to?)

The comma operator lets you put two separate expressions where one is required; the expressions are executed one after the other. The most common use for comma operators is when you want multiple variables controlling a for loop, for example:

        for(i = 0, j = 10; i < j; i++, j--)

A cast operator allows you to explicitly force conversion of a value from one type to another. A cast consists of a type name in parentheses. For example, you could convert an int to a double by typing

        int i = 10;
        double d;
        d = (double)i;

(In this case, though, the cast is redundant, since this is a conversion that C would have performed for you automatically, i.e. if you'd just said d = i .) You use explicit casts in those circumstances where C does not do a needed conversion automatically. One example is division: if you're dividing two integers and you want a floating-point result, you must explicitly force at least one of the operands to floating-point, otherwise C will perform an integer division and will discard the remainder. The code

        int i = 1, j = 2;
        double d = i / j;

will set d to 0, but

        d = (double)i / j;

will set d to 0.5. You can also ``cast to void'' to explicitly indicate that you're ignoring a function's return value, as in

        (void)fclose(fp);

or

        (void)printf("Hello, world!\n");

(Usually, it's a bad idea to ignore return values, but in some cases it's essentially inevitable, and the (void) cast keeps some compilers from issuing warnings every time you ignore a value.)

There's a precise, mildly elaborate set of rules which C uses for converting values automatically, in the absence of explicit casts.

The . and -> operators let you access the members (components) of structures and unions.

Statements

The switch statement allows you to jump to one of a number of numeric case labels depending on the value of an expression; it's more convenient than a long if/else chain. (However, you can use switch only when the expression is integral and all of the case labels are compile-time constants.)

The do/while loop is a loop that tests its controlling expression at the bottom of the loop, so that the body of the loop always executes once even if the condition is initially false. (C's do/while loop is therefore like Pascal's repeat/until loop, while C's while loop is like Pascal's while/do loop.)

Finally, when you really need to write ``spaghetti code,'' C does have the all-purpose goto statement, and labels to go to.

Functions

Functions can't return arrays, and it's tricky to write a function as if it returns an array (perhaps by simulating the array with a pointer) because you have to be careful about allocating the memory that the returned pointer points to.

The functions we've written have all accepted a well-defined, fixed number of arguments. printf accepts a variable number of arguments (depending on how many % signs there are in the format string) but we haven't seen how to declare and write functions that do this.

C Preprocessor

If you're careful, it's possible (and can be useful) to use #include within a header file, so that you end up with ``nested header files.''

It's possible to use #define to define ``function-like'' macros that accept arguments; the expansion of the macro can therefore depend on the arguments it's ``invoked'' with.

Two special preprocessing operators # and ## let you control the expansion of macro arguments in fancier ways.

The preprocessor directive #if lets you conditionally include (or, with #else, conditionally not include) a section of code depending on some arbitrary compile-time expression. (#if can also do the same macro-definedness tests as #ifdef and #ifndef, because the expression can use a defined() operator.)

Other preprocessing directives are #elif, #error, #line, and #pragma.

There are a few predefined preprocessor macros, some required by the C standard, others perhaps defined by particular compilation environments. These are useful for conditional compilation (#ifdef, #ifndef).

Standard Library Functions

C's standard library contains many features and functions which we haven't seen.

We've seen many of printf's formatting capabilities, but not all. Besides format specifier characters for a few types we haven't seen, you can also control the width, precision, justification (left or right) and a few other attributes of printf's format conversions. (In their full complexity, printf formats are about as elaborate and powerful as FORTRAN format statements.)

A scanf function lets you do ``formatted input'' analogous to printf's formatted output. scanf reads from the standard input; a variant fscanf reads from a specified file pointer.

The sprintf and sscanf functions let you ``print'' and ``read'' to and from in-memory strings instead of files. We've seen that atoi lets you convert a numeric string into an integer; the inverse operation can be performed with sprintf:

        int i = 10;
        char str[10];
        sprintf(str, "%d", i);

We've used printf and fprintf to write formatted output, and getchar, getc, putchar, and putc to read and write characters. There are also functions gets, fgets, puts, and fputs for reading and writing lines (though we rarely need these, especially if we're using our own getline and maybe fgetline), and also fread and fwrite for reading or writing arbitrary numbers of characters.

It's possible to ``un-read'' a character, that is, to push it back on an input stream, with ungetc. (This is useful if you accidentally read one character too far, and would prefer that some other part of your program read that character instead.)

You can use the ftell, fseek, and rewind functions to jump around in files, performing random access (as opposed to sequential) I/O.

The feof and ferror functions will tell you whether you got EOF due to an actual end-of-file condition or due to a read error of some sort. You can clear errors and end-of-file conditions with clearerr.

You can open files in ``binary'' mode, or for simultaneous reading and writing. (These options involve extra characters appended to fopen's mode string: b for binary, + for read/write.)

There are several more string functions in . A second set of string functions strncpy, strncat, and strncmp all accept a third argument telling them to stop after n characters if they haven't found the \0 marking the end of the string. A third set of ``mem'' functions, including memcpy and memcmp, operate on blocks of memory which aren't necessarily strings and where \0 is not treated as a terminator. The strchr and strrchr functions find characters in strings. There is a motley collection of ``span'' and ``scan'' functions, strspn, strcspn, and strpbrk, for searching out or skipping over sequences of characters all drawn from a specified set of characters. The strtok function aids in breaking up a string into words or ``tokens,'' much like our own getwords function.

The header file contains several functions which let you classify and manipulate characters: check for letters or digits, convert between upper- and lower-case, etc.

A host of mathematical functions are defined in the header file . (As we've mentioned, besides including , you may on some Unix systems have to ask for a special library containing the math functions while compiling/linking.)

There's a random-number generator, rand, and a way to ``seed'' it, srand. rand returns integers from 0 up to RAND_MAX (where RAND_MAX is a constant #defined in ). One way of getting random integers from 1 to n is to call

        (int)(rand() / (RAND_MAX + 1.0) * n) + 1

Another way is

        rand() / (RAND_MAX / n + 1) + 1

It seems like it would be simpler to just say

        rand() % n + 1

but this method is imperfect (or rather, it's imperfect if n is a power of two and your system's implementation of rand() is imperfect, as all too many of them are).

Several functions let you interact with the operating system under which your program is running. The exit function returns control to the operating system immediately, terminating your program and returning an ``exit status.'' The getenv function allows you to read your operating system's or process's ``environment variables'' (if any). The system function allows you to invoke an operating-system command (i.e. another program) from within your program.

The qsort function allows you to sort an array (of any type); you supply a comparison function (via a function pointer) which knows how to compare two array elements, and qsort does the rest. The bsearch function allows you to search for elements in sorted arrays; it, too, operates in terms of a caller-supplied comparison function.

Several functions--time, asctime, gmtime, localtime, asctime, mktime, difftime, and strftime--allow you to determine the current date and time, print dates and times, and perform other date/time manipulations. For example, to print today's date in a program, you can write

        #include 
 
        time_t now;
        now = time((time_t *)NULL);
        printf("It's %.24s", ctime(&now));

The header file lets you manipulate variable-length function argument lists (such as the ones printf is called with). Additional members of the printf family of functions let you write your own functions which accept printf-like format specifiers and variable numbers of arguments but call on the standard printf to do most of the work.

There are facilities for dealing with multibyte and ``wide'' characters and strings, for use with multinational character sets.

No comments:

Post a Comment