Related
Second edition. I'm looking at their hash table example in section 6.6. I found the full source transcribed here. This is the part I'm puzzling over:
struct nlist *np;
if((np=lookup(name))==NULL){
np=(struct nlist *)malloc(sizeof(*np));
Why the cast to (struct nlist *) on the last line? I can remove it without getting any compiler warnings.
I'm similarly confused by
free((void *)np->def);
Are these intended to aid readability somehow?
Casting the result of malloc was necessary in some pre-ANSI dialects of C, and the usage was retained in K&R2 (which covers the language as of the 1989 ANSI C standard).
This is mentioned in the errata list for the book. I've seen it via Dennis Ritchie's home page, which isn't currently available (I hope AT&T hasn't permanently removed it), but I found it elsewhere:
142(§6.5, toward the end): The remark about casting the return value of malloc ("the proper method is to declare ... then explicitly coerce") needs to be rewritten. The example is correct and works, but the advice is debatable in the context of the 1988-1989 ANSI/ISO standards. It's not necessary (given that coercion of void * to ALMOSTANYTYPE * is automatic), and possibly harmful if malloc, or a proxy for it, fails to be declared as returning void *. The explicit cast can cover up an unintended error. On the other hand, pre-ANSI, the cast was necessary, and it is in C++ also.
Despite the opinion of the legions of posters here who will immediately jump on any code with an unnecessary (but harmless) cast of malloc(), the truth is that it just doesn't matter. Yes, assignment to and from void * does not require casting, but nor is it forbidden, and the arguments for leaving it in or taking it out really aren't that strong.
There are more important things to spend brain cells on. It just doesn't matter.
For that example to be completely correct today, you have to put
#include <stdlib.h>
so you get the proper prototype for malloc(3). In this case, it's not important if you do a cast or not, as malloc is declared as returning void * there, and no need to cast from this type to another pointer type (but you can if you desire).
Today, it's better not to do the cast, as you can hide a more than frequent error. If you do the cast and don't provide a prototype to the compiler for malloc, the compiler assumes malloc is default declared as int malloc(); (returning an int instead of a void *, and taking an unspecified number of arguments) and you want (as you stated it explicitly) to convert that int to a pointer. The compiler will call malloc and take the supposed int result (the 32 bit value, not the actual, 64bit returned by malloc ---depending on the architecture calling conventions these values can be related or not, but they are always different, as the int space is smaller than the pointer space) as return value, convert it blindly to the cast type you propose (withoug warning, as you have explicitly put the cast, add the missing 32bits to complete a full 64bit pointer ---THIS IS TRULY DANGEROUS IF INTEGER TYPES ARE NOT THE SAME SIZE AS POINTER TYPES---, you can check this on 64bit platforms where they aren't, or in old MS-DOS compilers in large memory models, where they aren't also) and hide the real problem (which is that you did not provide a proper #include file) You will be lucky if it works, as that means all the virtual pointers returned by malloc are below the 0x100000000 limit (this in 64bit intel architectures, let's see in 64bit big endian architectures) This is the actual source of undefined behaviour you should expect.
Normally, with modern compilers, you'll probably get some kind of warning for using a function with no declaration (if you provided no prototype) but the compiler will compile the program and generate an executable program, probably not the one you want. This is one of the main differences between C and C++ languages (C++ don't allow you to compile code with a function call if you have not declared a prototype for it before, so you'll get an error, instead of a possible warning, if you get the invalid malloc I mention above)
This kind of errors are very difficult to target and that's the reason the people that advocates for not casting actually does.
The cast is deprecated. void * can be assigned directly to any pointer type; you actually even should do so. K&R is a bit outdated in some aspects and you should definitively get something more recent (and for newer standards - C99 upwards).
See n1570: 6.3.2.3/1.
#include<stdio.h>
int main()
{
const int a=1;
int *p=(int *)&a;
(*p)++;
printf("%d %d\n",*p,a);
if(a==1)
printf("No\n");//"No" in g++.
else
printf("Yes\n");//"Yes" in gcc.
return 0;
}
The above code gives No as output in g++ compilation and Yes in gcc compilation. Can anybody please explain the reason behind this?
Your code triggers undefined behaviour because you are modifying a const object (a). It doesn't have to produce any particular result, not even on the same platform, with the same compiler.
Although the exact mechanism for this behaviour isn't specified, you may be able to figure out what is happening in your particular case by examining the assembly produced by the code (you can see that by using the -S flag.) Note that compilers are allowed to make aggressive optimizations by assuming code with well defined behaviour. For instance, a could simply be replaced by 1 wherever it is used.
From the C++ Standard (1.9 Program execution)
4 Certain other operations are described in this International
Standard as undefined (for example, the effect of attempting to
modify a const object). [ Note: This International Standard imposes
no requirements on the behavior of programs that contain undefined
behavior. —end note ]
Thus your program has undefined behaviour.
In your code, notice following two lines
const int a=1; // a is of type constant int
int *p=(int *)&a; // p is of type int *
you are putting the address of a const int variable to an int * and then trying to modify the value, which should have been treated as const. This is not allowed and invokes undefined behaviour.
For your reference, as mentioned in chapter 6.7.3, C11 standard, paragraph 6
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is
made to refer to an object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined
So, to cut the long story short, you cannot rely on the outputs for comaprison. They are the result of undefined behaviour.
Okay we have here 'identical' code passed to "the same" compiler but once
with a C flag and the other time with a C++ flag. As far as any reasonable
user is concerned nothing has changed. The code should be interpreted
identically by the compiler because nothing significant has happened.
Actually, that's not true. While I would be hard pressed to point to it in
a standard but the precise interpretation of 'const' has slight differences
between C and C++. In C it's very much an add-on, the 'const' flag
says that this normal variable 'a' should not be written to by the code
round here. But there is a possibility that it will be written to
elsewhere. With C++ the emphasis is much more to the immutable constant
concept and the compiler knows that this constant is more akin to an
'enum' that a normal variable.
So I expect this slight difference means that slightly different parse
trees are generated which eventually leads to different assembler.
This sort of thing is actually fairly common, code that's in the C/C++
subset does not always compile to exactly the same assembler even with
'the same' compiler. It tends to be caused by other language features
meaning that there are some things you can't prove about the code right
now in one of the languages but it's okay in the other.
Usually C is the performance winner (as was re-discovered by the Linux
kernel devs) because it's a simpler language but in this example, C++
would probably turn out faster (unless the C dev switches to a macro
or enum
and catches the unreasonable act of taking the address of an immutable constant).
Why is it sensible for a language to allow implicit declarations of functions and typeless variables? I get that C is old, but allowing to omit declarations and default to int() (or int in case of variables) doesn't seem so sane to me, even back then.
So, why was it originally introduced? Was it ever really useful? Is it actually (still) used?
Note: I realise that modern compilers give you warnings (depending on which flags you pass them), and you can suppress this feature. That's not the question!
Example:
int main() {
static bar = 7; // defaults to "int bar"
return foo(bar); // defaults to a "int foo()"
}
int foo(int i) {
return i;
}
See Dennis Ritchie's "The Development of the C Language": http://web.archive.org/web/20080902003601/http://cm.bell-labs.com/who/dmr/chist.html
For instance,
In contrast to the pervasive syntax variation that occurred during the
creation of B, the core semantic content of BCPL—its type structure
and expression evaluation rules—remained intact. Both languages are
typeless, or rather have a single data type, the 'word', or 'cell', a
fixed-length bit pattern. Memory in these languages consists of a
linear array of such cells, and the meaning of the contents of a cell
depends on the operation applied. The + operator, for example, simply
adds its operands using the machine's integer add instruction, and the
other arithmetic operations are equally unconscious of the actual
meaning of their operands. Because memory is a linear array, it is
possible to interpret the value in a cell as an index in this array,
and BCPL supplies an operator for this purpose. In the original
language it was spelled rv, and later !, while B uses the unary *.
Thus, if p is a cell containing the index of (or address of, or
pointer to) another cell, *p refers to the contents of the pointed-to
cell, either as a value in an expression or as the target of an
assignment.
This typelessness persisted in C until the authors started porting it to machines with different word lengths:
The language changes during this period, especially around 1977, were largely focused on considerations of portability and type safety,
in an effort to cope with the problems we foresaw and observed in
moving a considerable body of code to the new Interdata platform. C at
that time still manifested strong signs of its typeless origins.
Pointers, for example, were barely distinguished from integral memory
indices in early language manuals or extant code; the similarity of
the arithmetic properties of character pointers and unsigned integers
made it hard to resist the temptation to identify them. The unsigned
types were added to make unsigned arithmetic available without
confusing it with pointer manipulation. Similarly, the early language
condoned assignments between integers and pointers, but this practice
began to be discouraged; a notation for type conversions (called
`casts' from the example of Algol 68) was invented to specify type
conversions more explicitly. Beguiled by the example of PL/I, early C
did not tie structure pointers firmly to the structures they pointed
to, and permitted programmers to write pointer->member almost without
regard to the type of pointer; such an expression was taken
uncritically as a reference to a region of memory designated by the
pointer, while the member name specified only an offset and a type.
Programming languages evolve as programming practices change. In modern C and the modern programming environment, where many programmers have never written assembly language, the notion that ints and pointers are interchangeable may seem nearly unfathomable and unjustifiable.
It's the usual story — hysterical raisins (aka 'historical reasons').
In the beginning, the big computers that C ran on (DEC PDP-11) had 64 KiB for data and code (later 64 KiB for each). There was a limit to how complex you could make the compiler and still have it run. Indeed, there was scepticism that you could write an O/S using a high-level language such as C, rather than needing to use assembler. So, there were size constraints. Also, we are talking a long time ago, in the early to mid 1970s. Computing in general was not as mature a discipline as it is now (and compilers specifically were much less well understood). Also, the languages from which C was derived (B and BCPL) were typeless. All these were factors.
The language has evolved since then (thank goodness). As has been extensively noted in comments and down-voted answers, in strict C99, implicit int for variables and implicit function declarations have both been made obsolete. However, most compilers still recognize the old syntax and permit its use, with more or less warnings, to retain backwards compatibility, so that old source code continues to compile and run as it always did. C89 largely standardized the language as it was, warts (gets()) and all. This was necessary to make the C89 standard acceptable.
There is still old code around using the old notations — I spend quite a lot of time working on an ancient code base (circa 1982 for the oldest parts) which still hasn't been fully converted to prototypes everywhere (and that annoys me intensely, but there's only so much one person can do on a code base with multiple millions of lines of code). Very little of it still has 'implicit int' for variables; there are too many places where functions are not declared before use, and a few places where the return type of a function is still implicitly int. If you don't have to work with such messes, be grateful to those who have gone before you.
Probably the best explanation for "why" comes from here:
Two ideas are most characteristic of C among languages of its class: the relationship between arrays and pointers, and the way in which declaration syntax mimics expression syntax. They are also among its most frequently criticized features, and often serve as stumbling blocks to the beginner. In both cases, historical accidents or mistakes have exacerbated their difficulty. The most important of these has been the tolerance of C compilers to errors in type. As should be clear from the history above, C evolved from typeless languages. It did not suddenly appear to its earliest users and developers as an entirely new language with its own rules; instead we continually had to adapt existing programs as the language developed, and make allowance for an existing body of code. (Later, the ANSI X3J11 committee standardizing C would face the same problem.)
Systems programming languages don't necessarily need types; you're mucking around with bytes and words, not floats and ints and structs and strings. The type system was grafted onto it in bits and pieces, rather than being part of the language from the very beginning. As C has moved from being primarily a systems programming language to a general-purpose programming language, it has become more rigorous in how it handles types. But, even though paradigms come and go, legacy code is forever. There's still a lot of code out there that relies on that implicit int, and the standards committee is reluctant to break anything that's working. That's why it took almost 30 years to get rid of it.
A long, long time ago, back in the K&R, pre-ANSI days, functions looked quite different than they do today.
add_numbers(x, y)
{
return x + y;
}
int ansi_add_numbers(int x, int y); // modern, ANSI C
When you call a function like add_numbers, there is an important difference in the calling conventions: all types are "promoted" when the function is called. So if you do this:
// no prototype for add_numbers
short x = 3;
short y = 5;
short z = add_numbers(x, y);
What happens is x is promoted to int, y is promoted to int, and the return type is assumed to be int by default. Likewise, if you pass a float it is promoted to double. These rules ensured that prototypes weren't necessary, as long as you got the right return type, and as long as you passed the right number and type of arguments.
Note that the syntax for prototypes is different:
// K&R style function
// number of parameters is UNKNOWN, but fixed
// return type is known (int is default)
add_numbers();
// ANSI style function
// number of parameters is known, types are fixed
// return type is known
int ansi_add_numbers(int x, int y);
A common practice back in the old days was to avoid header files for the most part, and just stick the prototypes directly in your code:
void *malloc();
char *buf = malloc(1024);
if (!buf) abort();
Header files are accepted as a necessary evil in C these days, but just as modern C derivatives (Java, C#, etc.) have gotten rid of header files, old-timers didn't really like using header files either.
Type safety
From what I understand about the old old days of pre-C, there wasn't always much of a static typing system. Everything was an int, including pointers. In this old language, the only point of function prototypes would be to catch arity errors.
So if we hypothesize that functions were added to the language first, and then a static type system was added later, this theory explains why prototypes are optional. This theory also explains why arrays decay to pointers when used as function arguments -- since in this proto-C, arrays were nothing more than pointers which get automatically initialized to point to some space on the stack. For example, something like the following may have been possible:
function()
{
auto x[7];
x += 1;
}
Citations
The Development of the C Language, Dennis M. Ritchie
On typelessness:
Both languages [B and BCPL] are typeless, or rather have a single data type, the 'word,' or 'cell,' a fixed-length bit pattern.
On the equivalence of integers and pointers:
Thus, if p is a cell containing the index of (or address of, or pointer to) another cell, *p refers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment.
Evidence for the theory that prototypes were omitted due to size constraints:
During development, he continually struggled against memory limitations: each language addition inflated the compiler so it could barely fit, but each rewrite taking advantage of the feature reduced its size.
Some food for thought. (It's not an answer; we actually know the answer — it's permitted for backward compatibility.)
And people should look at COBOL code base or f66 libraries before saying why it's not cleaned up in 30 years or so!
gcc with its default does not spit out any warnings.
With -Wall and gcc -std=c99 do spit out the correct thing
main.c:2: warning: type defaults to ‘int’ in declaration of ‘bar’
main.c:3: warning: implicit declaration of function ‘foo’
The lint functionality built into modern gcc is showing its color.
Interestingly the modern clone of lint, the secure lint — I mean splint — gives only one warning by default.
main.c:3:10: Unrecognized identifier: foo
Identifier used in code has not been declared. (Use -unrecog to inhibit
warning)
The llvm C compiler clang which also has a static analyser built into it like gcc, spits out the two warnings by default.
main.c:2:10: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
static bar = 7; // defaults to "int bar"
~~~~~~ ^
main.c:3:10: warning: implicit declaration of function 'foo' is invalid in C99
[-Wimplicit-function-declaration]
return foo(bar); // defaults to a "int foo()"
^
People used to think we don't need backward compatibility for 80's stuff. All the code must be cleaned up or replaced. But it turns out it's not the case. A lot of production code stays in prehistoric non-standard times.
EDIT:
I didn't look through other answers before posting mine. I may have misunderstood the intention of the poster. But the thing is there was a time when you hand compiled your code, and use toggle to put the binary pattern in memory. They didn't need a "type system". Nor does the PDP machine in front of which Richie and Thompson posed like this :
Don't look at the beard, look at the "toggles", which I heard were used to bootstrap the machine.
And also look how they used to boot UNIX in this paper. It's from the Unix 7th edition manual.
http://wolfram.schneider.org/bsd/7thEdManVol2/setup/setup.html
The point of the matter is they didn't need so much software layer managing a machine with KB sized memory. Knuth's MIX has 4000 words. You don't need all these types to program a MIX computer. You can happily compare a integer with pointer in a machine like this.
I thought why they did this is quite self-evident. So I focused on how much is left to be cleaned up.
I'm quite often confused when coming back to C by the inability to create an array using the following initialisation pattern...
const int SOME_ARRAY_SIZE = 6;
const int myArray[SOME_ARRAY_SIZE];
My understanding of the problem is that the const operator does not guarantee const-ness but rather merely asserts that the value pointed to by SOME_ARRAY_SIZE will not change at runtime. But why can the compiler not assume that the value is constant at compile time? It says 6 right there in the source code...
I think I'm missing something core in my fundamental understanding of C. Somebody help me out here. :)
[UPDATE]After reading a bit more around C99 and variable length arrays I think I understand this a bit better. What I was trying to create was a variable length array - const does not create a compile time constant but rather a runtime constant. Therfore I was initialising a variable length array, which is only valid in C99 at a function/block scope. A variable length array at the file scope is impossible as the compiler cannot assign a fixed memory address to an unbounded array.[/UPDATE]
Well, in C++ the semantics are a bit different. In C++ your code would work fine. You must distinguish between 2 things, const and constant expression. Const means simply, as you described, that the value is read-only. constant expression, on the other hand, means the value is known compile time and is a compile-time constant. The semantics of const in C are always of the first type. The only constant expressions in C are literals, that's why #define is used for such kind of things.
In C++ however, any const object initialized with a constant expression is in itself a constant expression.
I don't know exactly WHY this is so in C, it's just the way it is
The problem is that the language syntax demands a integer value between the [ ]. SOME_ARRAY_SIZE is still a variable (even if you told the compiler nobody is allowed to vary it!)
The const keyword is basically a read-only indication. It does not, really, indicate the underlying value will not change, even though that is the case in your example.
When it comes to pointers, this is more clear:
void foo(int const * p)
{
if (*p == 100)
{
bar();
/* Here, the compiler can not assume that *p is 100 */
}
}
In this case, a compiler should not accept the code in your example, as it requires the array size to be constant. If it would accept it, the user could later run into trouble when porting the code a more strict compiler.
You can do this in C99, and some compilers prior to C99 also had support for this as an extension to C89 (e.g. gcc). If you're stuck with an old compiler that doesn't have C99 support though (e.g. MSVC) then you'll have to do it the old skool way and use a #define for the array size.
Note that that above comments apply only to such declarations at local scope (i.e. automatic variables). C99 still doesn't allow such declarations at global scope.
i just did a very quick test with my Xcode and Objective C file I currently had open on my machine and put this in the .m file:
const int arrs = 6;
const int arr[arrs];
This compiles without any issues.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
e.g. I'm usually compulsive obsessive and like to do
static int i = 0;
if (!i) i = var;
but
static int i;
if (!i) i = var;
would also work.
Why? Why can't it segfault so we can all be happy that undefined variables are evil and be concise about it?
Not even the compilers complain:(
This 'philosophy' of indecisiveness in C has made me do errors such as this:
strcat(<uninitialized>, <proper_string>)) //wrong!!1
strcpy(<uninitialized>, <proper_string>)) //nice
In your example, i is not an undefined variable, it is an uninitialized variable. And C has good reasons for not producing an error in these cases. For instance, a variable may be uninitialized when it is defined but assigned a value before it is used, so it is not a semantic error to lack an initialization in the definition statement.
Not all uses of uninitialized variables can be checked at compile-time. You could suggest that the program check every access to every variable by performing a runtime check, but that requires incurring a runtime overhead for something that is not necessary if the programmer wrote the code correctly. That's against the philosophy of C. A similar argument applies to why automatically-allocated variables aren't initialized by default.
However, in cases where the use of a variable before being initialized can be detected at compile-time, most modern compilers will emit a warning about it, if you have your warning level turned up high enough (which you always should). So even though the standard does not require it, it's easy to get a helpful diagnostic about this sort of thing.
Edit: Your edit to your question makes it make no sense. If i is declared to be static then it is initialized -- to zero.
This comes from C's "lightweight" and "concise" roots. Default initializing to zero bytes was free (for global variables). And why specify anything in source text when you know what the compiler is going to do?
Uninitialized auto variables can contain random data, and in that case your "if" statements are not only odd but don't reliably do what you desire.
It seems you don't understand something about C.
int i;
actually DOES define the variable in addition to declaring it. There is memory storage. There is just no initialization when in function scope.
int i=0;
declares, defines, and initializes the storage to 0.
if (!i)
is completely unnecessary before assigning a value to i. All it does is test the value of integer i (which may or may not be initialized to a specific value depending on which statement above you used).
It would only be useful if you did:
int *i = malloc(sizeof int);
because then i would be a pointer you are checking for validity.
You said:
Why? Why can't it segfault so we can all be happy that undefined variables are evil and be concise about it?
A "segfault" or segmentation fault, is a term that is a throwback to segmented memory OSes. Segmentation was used to get around the fact that the size of the machine word was inadequate to address all of available memory. As such, it is a runtime error, not a compile time one.
C is really not that many steps up from assembly language. It just does what you tell it to do. When you define your int, a machine word's worth of memory is allocated. Period. That memory is in a particular state at runtime, whether you initialize it specifically or leave it to randomness.
It's to squeeze every last cycle out of your CPU. On a modern CPU of course not initializing a variable until the last millisecond is a totally trivial thing, but when C was designed, that was not necessarily the case.
That behavior is undefined. Stack variables are uninitialized, so your second example may work in your compiler on your platform that one time you ran it, but it probably won't in most cases.
Getting to your broader question, compile with -Wall and -pedantic and it may make you happier. Also, if you're going to be ocd about it, you may as well write if (i == 0) i = var;
p.s. Don't be ocd about it. Trust that variable initialization works or don't use it. With C99 you can declare your variables right before you use them.
C simply gives you space, but it doesn't promise to know what is in that space. It is garbage data. An automatically added check in the compiler is possible, but that extends compile times. C is a powerful language, and as such you have the power to do anything and fall on your face at the same time. If you want something, you have to explicitly ask for it. Thus is the C philosophy.
Where is var defined?
The codepad compiler for C gives me the following error:
In function 'main': Line 4: error:
'var' undeclared (first use in this
function) Line 4: error: (Each
undeclared identifier is reported only
once Line 4: error: for each function
it appears in.)
for the code:
int main(void)
{
static int i;
if (!i) i = var;
return 0;
}
If I define var as an int then the program compiles fine.
I am not really sure where your problem is. The program seems to be working fine. Segfault is not for causing your program to crash because you coded something that may be undefined in the language. The variable i is unitialized not undefined. You defined it as static int. Had you simply done:
int main(void)
{
i = var;
return 0;
}
Then it would most definately be undefined.
Your compiler should be throwing a warning because i isn't initialized to catch these sort of gotchas. It seems your if statement is sort of a catch for that warning, even if the compiler does not report it.
Static variables (in function or file scope) and global variables are always initialized to zero.
Stack variables have no dependable value. To answer your question, uninitialized stack variables are often set to non-zero, so they often evaluate as true. That cannot be depended on.
When I was running Gentoo Linux I once found a bug in some open source Unicode handling code that checked an uninitialized variable against -1 in a while loop. On 32-bit x86 with GCC this code always ran fine, because the variable was never -1. On AMD-64 with its extra registers, the variable ended up always set to -1 and the processing loop never ran.
So, always use the compiler's high warning levels when building so you can find those bugs.
That's C, you cannot do much about it. The basic purpose of C is to be fast - a default initialization of the variable takes a few more CPU cycles and therefore you have to explicitly specify that you want to spend them. Because of this (and many other pitfalls) C is not considered good for those who don't know what they are doing :)
The C standard says the behavior of uninitialized auto variables is undefined. A particular C compiler may initialize all variables to zero (or pointers to null), but since this is undefined behavior you can't rely on it being the case for a compiler, or even any version of a particular compiler. In other words, always be explicit, and undefined means just that: The behavior is not defined and may vary from implementation to implementation.
-- Edit --
As pointed out, the particular question was about static variables, which have defined initialization behavior according to the standard. Although it's still good practice to always explicitly initialize variables, this answer is only relevant to auto variables which do not have defined behavior.