How can some GCC compilers modify a constant char pointer? - c

I am reading a book titled "Understanding and Using C pointers".
On page 110, it had these lines:
... However, in some compilers, such as GCC, modification of the string literal is possible. Consider the following example:
char *tabheader = "Sound";
*tabheader = 'L';
printf("%s\n", tabheader); //Displays "Lound"
It goes on and describe the usage of const char *tabheader which will prevent from modifying this variable.
I am currently using Cloud 9/Ubuntu. I compiled this code using GCC and ran it. It caused segmentation fault error as I expected.
I am very perplexed with these statements in the book.
All this time, my understanding of the statement char *tabheader = "Sound"; is same as const char *tabHeader = "Sound"; Now, this book is saying that is dependent on which gcc compiler
My question is this: Which GCC compiler allows this code to run?
What is your opinion on this?
Does this also belong to undefined behavior?

This would work in versions of GCC prior to 4.0 if you use the -fwritable-strings option when compiling. This option was removed in 4.0.

It would work on systems that don't store string literals in a protected part of memory. For example, the AVR port of GCC stores string literals in RAM and all RAM is writable, so you can probably write to them. In general, writing to a string literal is undefined behavior so you should not do it.
You mentioned you were confused about the difference between these two lines:
char *tabheader = "Sound";
const char *tabHeader = "Sound";
The main difference is that with the const qualifier, the compiler knows at compile time that you cannot write to the string so it will give you errors at compile time instead of undefined behavior at run time if you try to write to it.

gcc has many modes and compatibilities. Originally (1970s) in C, there was no const type and certainly no concept that a string literal was constant. It was occasional (but infrequent) practice in those days to use a string literal as a buffer initialization.
The eventual and slow evolution of string literals to being implied constants has caused pain with maintenance of ancient code which depends on earlier behavior. Gcc's philosophy apparently enables old behavior with a compiler flag. For example, from man gcc for gcc 6.3.1 20161221 (Red Hat 6.3.1-1), the section on -std is (partially):
-std=
Determine the language standard. This option is currently only
supported when compiling C or C++.
The compiler can accept several base standards, such as c90 or
c++98, and GNU dialects of those standards, such as gnu90 or
gnu++98. When a base standard is specified, the compiler accepts
all programs following that standard plus those using GNU
extensions that do not contradict it. For example, -std=c90 turns
off certain features of GCC that are incompatible with ISO C90,
such as the "asm" and "typeof" keywords, but not other GNU
extensions that do not have a meaning in ISO C90, such as omitting
the middle term of a "?:" expression. On the other hand, when a GNU
dialect of a standard is specified, all features supported by the
compiler are enabled, even when those features change the meaning
of the base standard. As a result, some strict-conforming programs
may be rejected. The particular standard is used by -Wpedantic to
identify which features are GNU extensions given that version of
the standard. For example -std=gnu90 -Wpedantic warns about C++
style // comments, while -std=gnu99 -Wpedantic does not.
A value for this option must be provided; possible values are
c90
c89
iso9899:1990
Support all ISO C90 programs (certain GNU extensions that
conflict with ISO C90 are disabled). Same as -ansi for C code.
iso9899:199409
ISO C90 as modified in amendment 1.
c99
c9x
iso9899:1999
iso9899:199x
ISO C99. This standard is substantially completely supported,
modulo bugs and floating-point issues (mainly but not entirely
relating to optional C99 features from Annexes F and G). See
<http://gcc.gnu.org/c99status.html> for more information. The
names c9x and iso9899:199x are deprecated.
c11
c1x
iso9899:2011
ISO C11, the 2011 revision of the ISO C standard. This
standard is substantially completely supported, modulo bugs,
floating-point issues (mainly but not entirely relating to
optional C11 features from Annexes F and G) and the optional
Annexes K (Bounds-checking interfaces) and L (Analyzability).
The name c1x is deprecated.
gnu90
gnu89
GNU dialect of ISO C90 (including some C99 features).
gnu99
gnu9x
GNU dialect of ISO C99. The name gnu9x is deprecated.
gnu11
gnu1x
GNU dialect of ISO C11. This is the default for C code. The
name gnu1x is deprecated.
c++98
c++03
The 1998 ISO C++ standard plus the 2003 technical corrigendum
and some additional defect reports. Same as -ansi for C++ code.
gnu++98
gnu++03
GNU dialect of -std=c++98.
c++11
c++0x
The 2011 ISO C++ standard plus amendments. The name c++0x is
deprecated.
gnu++11
gnu++0x
GNU dialect of -std=c++11. The name gnu++0x is deprecated.
c++14
c++1y
The 2014 ISO C++ standard plus amendments. The name c++1y is
deprecated.
...
Note that there are other compiler flags which control acceptance or rejection or alternate handling of K&R function headers and similar aspects.

Related

How can I specify to gcc that {} init in C shouldn't compile?

with gcc using -std=gnu99, the following code compiles:
void f()
{
struct X data = {};
// do something with data
}
Is this valid C ?
Is this a gnu extension ?
How can I tell gcc to not accept this kind of init ?
I want to ensure compatibility with other compilers (like visual 2015 for example)
If you want to reject code containing GNU-specific extensions, use -std=c99 -pedantic-errors (-pedantic will issue diagnostics for non-standard extensions, but it won't necessarily reject the code outright). However, if you want to guarantee ISO conformance, be aware that this isn't a 100% solution. From the gcc man page:
Some users try to use -pedantic to check programs for strict ISO C conformance. They soon find that it does not do quite what they want: it finds some non-ISO practices, but not all---only those for which ISO C requires a diagnostic, and some others for which diagnostics have been added.
A feature to report any failure to conform to ISO C might be useful in some instances, but would require considerable additional work and would be quite different from -pedantic. We don't have plans to support such a feature in the near future.
The -pedantic option will cause a warning to be displayed in this case, and -Werror will cause all warnings to be treated as errors.
For example:
x1.c: In function ‘f’:
x1.c:11:19: error: ISO C forbids empty initializer braces [-Werror=pedantic]
struct X data = {};
No, the empty initializer is not standard C. It is a gcc extension. See this for a detailed description.
By specifying -std=gnu99, you allowed the GNU extensions to be used. You can force the compiler to only allow the standard conforming code, by specifying -std=cXX option.
From the gcc online manual (emphasis mine)
-std=
The compiler can accept several base standards, such as ‘c90’ or ‘c++98’, and GNU dialects of those standards, such as ‘gnu90’ or ‘gnu++98’. When a base standard is specified, the compiler accepts all programs following that standard plus those using GNU extensions that do not contradict it. For example, -std=c90 turns off certain features of GCC that are incompatible with ISO C90, such as the asm and typeof keywords, but not other GNU extensions that do not have a meaning in ISO C90, such as omitting the middle term of a ?: expression. On the other hand, when a GNU dialect of a standard is specified, all features supported by the compiler are enabled, even when those features change the meaning of the base standard. As a result, some strict-conforming programs may be rejected. The particular standard is used by -Wpedantic to identify which features are GNU extensions given that version of the standard. For example -std=gnu90 -Wpedantic warns about C++ style ‘//’ comments, while -std=gnu99 -Wpedantic does not.

GCC options for strict C90 code?

I am trying to find what is the combination of gcc flags to use when testing strict C90 conformance. According to previous post: GCC options for strictest C code?, I should only need a --std=c90.
However here is what I tried:
$ cat t.c
#include <stdint.h> /* added in C99 */
int main()
{
uint64_t t;
return 0;
}
$ gcc -std=c90 -ansi -pedantic t.c
The above does work well (no warnings/errors produced).
Does anyone knows of:
gcc flags to have strict ISO/IEC 9899:1990 conformance
A different compiler (tcc, clang...) with different set of flags ?
EDIT:
Sorry for my wording, yes I would really like to mimic a strictly conforming C90 compiler, in other word it should fail if the code tries to use any feature added later (C99 comes to mind). So pthread include header ought to emit a warning when compiled in what GNU/GCC calls C90 mode (just like stdint.h header should produce a warning without C99). -pedantic nicely warns me about usage of long long, I do not see why it should not warn me about uint64_t.
I used the terminology of ISO/IEC 9899:1990 as quoted from:
http://en.wikipedia.org/wiki/C_(programming_language)#ANSI_C_and_ISO_C
In 1990, the ANSI C standard (with formatting changes) was adopted by
the International Organization for Standardization (ISO) as ISO/IEC
9899:1990, which is sometimes called C90. Therefore, the terms "C89"
and "C90" refer to the same programming language.
EDIT2:
GCC documentation are actually quite clear:
Some features that are part of the C99 standard are accepted as
extensions in C90 mode, and some features that are part of the C11
standard are accepted as extensions in C90 and C99 modes.
So my question is rephrased into:
Is there a compiler + standard include header on a linux system which strictly conforms to C90 ?
C90 compliance doesn't mean that the compiler can't offer other headers that aren't mentioned in the C90 standard. (sys/socket.h, for instance.) If you want to disallow these for some strange reason, you can pass the -I option to add an extra include path, and in that path put versions of all the C99-only headers which are simply #error Don't include me.
Keep in mind here that GCC itself is a conforming freestanding implementation of the C standard specified; such an implementation only supplies a small subset of the standard header files, and practically none of the actual functionality of the C standard library, instead relying on another party -- glibc on Linux systems, for instance -- to supply the C standard library's functionality.
What you seek is something that not only warns you when you are using a C99/C11/GNU language feature that is not in C90, but when you use a library function that is not defined by C90 itself. Sadly, the compiler alone cannot do this for the reason stated above -- it is aloof to what libc it is used with. On glibc systems, the C standard library will pick up on the macros defined by -std=c90 or -ansi:
The macro __STRICT_ANSI__ is predefined when the -ansi option is used. Some header files may notice this macro and refrain from declaring certain functions or defining certain macros that the ISO standard doesn't call for; this is to avoid interfering with any programs that might use these names for other things.
and give you some help by turning off gratuitous extensions:
If you compile your programs using ‘gcc -ansi’, you get only the ISO C library features, unless you explicitly request additional features by defining one or more of the feature macros.
However, this only covers extensions and POSIX-but-not-ISO C functions; it will not save you if a function's behavior is specified differently in ISO C and POSIX.1!

Should I use "-ansi" or explicit "-std=..." as compiler flags?

I've read that ANSI C is not exactly the same as ISO C and compilers may differ in interpretation of what "-ansi" is about. (gcc maps it to C90, clang maps it to C89) At the moment I would tend to use "-std=..." over "-ansi" as then it is explicitly shown which standard is used. As I am specifically interested in compiling on Linux, Windows and MAC, I fear some compilers could not understand "-std=..." but "-ansi". So are there any pros and cons for using the one over the other?
If you want the compiler to enforce the 1989 ANSI C standard, or equivalently the 1990 ISO C standard (they describe exactly the same language), you can safely use either -ansi or -std=c89.
The name -ansi is, strictly speaking, incorrect; it refers to the 1989 ANSI C standard, but ANSI itself considers that standard to be obsolete; it was replaced by the 1999 ISO C standard (which ANSI officially adopted shortly after it was released) which itself either has been, or soon will be, replaced by the new 2011 ISO C standard. But changing the meaning of the -ansi option would break too many Makefiles and build scripts.
The gcc 4.7 and later versions also recognize -std=c90 as a synonym for -std=c89. gcc 4.7 was released in March 2012, so -std=c90 is reasonably portable unless you need to allow for older versions of gcc.
-std=c99 enforces (most of) the 1999 ISO C standard. Since Microsoft in particular doesn't support C99 (even after all these years), using this option means the compiler won't warn you about use of C99-specific features that might not be supported elsewhere. gcc's C99 support is documented here.
gcc 4.7 has partial support for the new ISO C 2011 standard, with -std=c11. That support has improved in later releases, but is not yet complete. gcc C11 status is documented here, and is said to be similar to the level of C99 support.
There are more options, and a number of aliases for the ones I've mentioned; for example, the option -std=c9x was added before the 1999 ISO standard was finalized, and it's still supported; similarly, -std=c1x is a synonym for -std=c11.
I believe that clang is intended to be as compatible as possible with gcc, so it should support the same options with the same meanings (except perhaps for some of the newer ones, depending on which versions of gcc and clang you're using).
The gcc manual has the full details, with one section describing the supported standards and another specifying the various -ansi and -std=... options. The links are to the 4.7 version. You can also run info gcc (if you have the GNU info command and the gcc documentation installed), or you can see multiple versions of the manual here.
If you're going to use compilers other than gcc (and compilers that aim to be gcc-compatible), you'll have to read their documentation to find out how to enforce various versions of the C standard.
-ansi and -std= compiler flags may be shared by other compilers but they are gcc flags.
As of now -ansi is equivalent to -std=c89 in gcc but this may1) change in the future so I suggest you to use -std=c89 over -ansi. Indeed ISO c99 for example has also been ratified by ANSI.
You should note that c89 and c90 are essentially the same C Standard. c89 is the ANSI name while c90 is the ISO name.
From gcc page:
There were no technical differences between these publications, although the sections of the ANSI standard were renumbered and became clauses in the ISO standard. This standard, in both its forms, is commonly known as C89, or occasionally as C90, from the dates of ratification.
1) As noted by Keith Thompson in the comments, even though it's probably unlikely as it would break many build scripts.

C99 not default C- version for GCC?

Why does not GCC compile the C99 by default? I mean why is it necessary to add --std=c99 flag
everytime a code in C99 is written?
Edit: As of GCC 5, -std=gnu11 is the default. See Porting to GCC 5.
See C Dialect Options, gnu89 is the default.
`gnu89'
GNU dialect of ISO C90 (including some
C99 features). This is the default for
C code.
As #tsv mentioned, ISO C99 is not fully supported yet:
`c99'
`c9x'
`iso9899:1999'
`iso9899:199x'
ISO C99. Note that this standard is not yet fully supported; see http://gcc.gnu.org/c99status.html for more information. The names `c9x' and `iso9899:199x' are deprecated.
And also:
`gnu99'
`gnu9x'
GNU dialect of ISO C99. When ISO C99 is fully implemented in GCC, this will become the default. The name `gnu9x' is deprecated.
Perhaps because it still isn't fully implemented - see C99 status.
It also could be argued C99 features haven't been widely adopted, although that's something of a circular argument.
Use the command c99 to compile C programs.
The current POSIX standard specifies the command c99, so it should be available in most Unix-like systems.
The reason is that default configurations of gcc take a really long time to be changed, since every time a default configuration is changed, it can potentially break the compilation of valid programs (in this case valid c89 programs which are invalid in c99). Starting with gcc 5.0, the default C standard used by gcc will be gnu11, which is c11 with gnu extensions (see here):
The default mode for C is now -std=gnu11 instead of -std=gnu89.

What's the term *ANSI C* specifies if it used with GNU89, C89, GNU99, C99?

In Xcode IDE, I have an option to set C language dialect one of
ANSI C
GNU89
C89
GNU99
C99
Compiler Default
I understand what they mean except ANSI C. Because As I know, ANSI C is just one of C89 or C99. But there should be a reason about it's on there. What's the term ANSI C specifies in there?
edit Credit goes to #Nicholas Knight for posting a screenshot from XCode's C dialect selection window: http://dl.dropbox.com/u/14571816/xcodelang.png
ANSI C refers, historically, to the ANSI C89 standard (practically the same thing as C90). XCode uses a version of GCC as the compiler back-end for compiling C code, so I think that's where they get these 'options' from, as you can specify the -ansi flag or various std= flags to choose the mode the C compiler backend should operate in for compiling your code.
So if you pass it -ansi, and using the C compiler, it's equivalent to -std=c90, which is also equivalent to -std=c89 or -std=iso9899:1990.
-ansi
In C mode, this is equivalent to -std=c90. In C++ mode, it is equivalent to
-std=c++98.
And if you use the -std flags, you can pass certain values to activate different language features.
-std=
Determine the language standard. This option is currently only supported when compiling C or C++.
These arguments are equivalent:
c90
c89
iso9899:1990
Support all ISO C90 programs (certain GNU extensions that conflict with ISO C90 are disabled). Same as -ansi for C code.
These arguments are equivalent:
iso9899:199409
ISO C90 as modified in amendment 1.
These following arguments are equivalent:
c99
c9x
iso9899:1999
iso9899:199x
ISO C99. Note that this standard is not yet fully supported; see
http://gcc.gnu.org/gcc-4.5/c99status.html for more information. The names c9x
and iso9899:199x are deprecated.
These following arguments are equivalent:
gnu90
gnu89
GNU dialect of ISO C90 (including some C99 features). This is the default for C
code.
These following arguments are equivalent:
gnu99
gnu9x
GNU dialect of ISO C99. When ISO C99 is fully implemented in GCC, this will
become the default. The name gnu9x is deprecated.
Compilers have profiles of the languages they are targeting, like pmg said in his reply ANSI C was one of the earliest profiles, the one that is described in the K&R book.
The question of interest is, why do compilers maintain a list of legacy language profiles ? Because, writing code against the ANSI C profile is quite a strong guarantee that your code will work with virtually any compiler (more importantly compiler version).
When software projects claim ANSI-C compatibility they are telling you that it will compile everywhere give-or-take. Lua's source code is an example of this.
C was "born" in the 70's.
In 1978 Brian Kernighan and Dennis Ritchie published the book. The language as described in the book (the 1st edition) is now called "K&R C".
In 1988 or so, there was a 2nd edition published. This 2nd edition is very, very similar to the ANSI (ISO) Standard, and is the edition that people talk about usually when referring to the book :)
Compiler writers started to make changes to the language and, in order to standardize it, ANSI published a Standard in 1989 (The C89 Standard or ANSI C). This was shortly followed by the ISO standard (C90) which makes hardly any changes to the ANSI.
In 1999, ISO published another C Standard: What we call C99.
So, if I'm right, ANSI C was current only for a few months, but the difference between ANSI C and ISO C90 is minimal. In fact, many compilers today are compilers for ANSI C with extras (rather than for ISO C99 with extras but without a few things)
Assuming you are, in fact, using GCC as the compiler, ANSI and C89 are aliases for the same thing. See:
http://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#C-Dialect-Options
Why Apple made the design decision to present them both, I'm not sure. There is no practical distinction in GCC. Perhaps they're being paranoid in case the meaning of -ansi changes in later versions of GCC (perhaps to C99).

Resources