How to check C source code against the current standard?

How to check C source code against the current standard? - c

I'm continuing to learn C and would like to adhere to whatever is the current standard, but finding a good reference to that seems to be problem.
From what I've found online (mostly through Google and Wikipedia) is that the current standard used today is C99, more formally the ISO/IEC 9899:1999 standard.
When I'm writing C code, I often bring up a browser and do simple web searches for things like finding out the exact return values to the stdio.h function scanf. Mostly I just want to get into a good practice of adhering to the current standard, but even if I search for the specific string "C99 printf" or something like it, there doesn't seem to be one single place to find a definitive spec.
So I have two questions:
1) Is there a central C99 spec that is available online, maintained by the organization responsible for this standard?
[edit]: This first question has already been answered here: Where do I find the current C or C++ standard documents?. Thanks to James McNellis for pointing this out.
2) Is there a program that can parse a C source file to make sure it adheres to the C99 spec? I know there are programs like this to parse XHTML files and it seems like there should be one for C99 as well...
[edit]:
I should also mention that I'm doing C development using gcc, specifically version 3.4.4.
When I go to the main gcc website (http://gcc.gnu.org/) I'm still running into difficulty figuring out which compiler version supports which C specification.

The current standard for Programming
Language C is ISO/IEC 9899:1999,
published 1999-12-01
and
Published ISO and IEC standards can be
purchased from a member body of ISO or
IEC.
From your bestest buds in the standards world, ISO.
It's also worth noting the draft of the NEXT C standard is available in PDF.

Depending on what compiler you use you can ask it to check for you. For example with gcc you would run it like this:
gcc -Wall -Wextra -pedantic -std=c99 -o program program.c
You would replace program.c and program with the name of your program of course.
-Wall and -Wextra provide more warnings and tell you if you have done something funky. -pedantic provides more of that as well.
you can use -ansi if you really want to follow the ansi spec personally I don't use it since I'm lazy but it is more proper.

It is not actually possible to analyse a source file and conclusively determine that it complies with the C99 standard. C is different to XHTML in this regard, because C is a Turing-complete language and XHTML is not.
It is certainly possible to find many instances of non-conformance, but it's impossible to find them all. Consider, for example, a program that generates a printf format string on-the-fly - how can you statically determine that the format string will be conforming? What if it depends on the input given to the program? Consider also a program that right shifts signed integers - if the signed integer in question is never negative, then the program may be conforming, but if not then it is probably relying on implementation-defined results.

The draft for can be obtained here for free. You can also purchase the official one from the ansi store.

Related

When to use -std=c11 while compiling a C source code using ubunto

I am trying to compile a C source code to a machine code using an ubunto terminal
My tutor instruction was to use the following command:
running clang myprogramm.c -std=c11
Why shall I use the keyword -std=c11 and what is the difference to using just
clang myprogramm.c

Using std= options is required by your tutor (I'm divinig her motives, I'm particularly good at this!) because she wants to make sure you stay away from all those nifty Clang features that turn the accepted language from C to A LANGUAGE SUPERFICIALLY LOOKING LIKE C BUT ACTUALLY A DIFFERENT LANGUAGE NOT SUPPORTED BY OTHER C COMPILERS.
That is more than just additional library functions. It include syntax changes that break the grammar of Standard C, as defined by ISO. A grasshopper should not use these while learning. Using -std=c11 makes sure Clang either warns about or even rejects, with an error, such constructs.

When to specify the standard? Whenever you use the compiler. It is never a good idea to let the compiler just use whatever it wants.
If someone tries to use a compiler that is too old, then they will get a warning or error, and they will understand why the compile fails.
If a code contributor (maybe even yourself!) tries to add code using features that are too new, their code will be rejected. That's very important if you intend to keep compatibility with an older standard.
By explicitly stating the standard, using new features or extensions are a choice and don't happen by accident.

What are Vectors and < > in C?

I was looking at the source code for gcc (out of curiosity), and I noticed a data structure that I've never seen in C before.
At line 80 and 129 (and many other places) in the parser, they seem to be using vectors.
80: vec<tree> incomplete_record_decls;
129: ridpointers = ggc_cleared_vec_alloc<tree> ((int) RID_MAX);
I've never encountered this data type in C, nor these: < >. Are they native to C?
Does anyone know what they are and how they are used?

Despite the .c filename, this code is not valid C; it is C++, using that language's template feature. If you inspect the gcc build process, you will find that this file is actually compiled with a C++ compiler.
https://gcc.gnu.org/codingconventions.html
The directories gcc, libcpp and fixincludes may use C++03. They may also use the long long type if the host C++ compiler supports it. These directories should use reasonably portable parts of C++03, so that it is possible to build GCC with C++ compilers other than GCC itself. If testing reveals that reasonably recent versions of non-GCC C++ compilers cannot compile GCC, then GCC code should be adjusted accordingly. (Avoiding unusual language constructs helps immensely.) Furthermore, these directories should also be compatible with C++11.
Keep in mind that although compilers will usually by default infer a source file's language from its filename, this default can always be overridden. It is entirely possible to have C++ code in a .c file, or C code in a .bas file for that matter; you just may have to tell the compiler some other way what language is in use.
I expect that gcc chose this file naming convention because this code was originally written in C and later converted to C++, and they found it too much of a pain to change all the filenames. It would mean a lot of work to update all the makefiles, etc. It may have been less of a pain to just change which compiler was used, and to explain the convention to all the developers. Of course, in general it is better programming practice to name your files in the standard way, but apparently the gcc developers felt it was not the best course of action in this case.

GCC has moved from C to C++ since GCC 4.8
GCC now uses C++ as its implementation language. This means that to build GCC from sources, you will need a C++ compiler that understands C++ 2003. For more details on the rationale and specific changes, please refer to the C++ conversion page.
GCC 4.8 Release Series - Changes, New Features, and Fixes
The work has actually begun long before that, with the creation of gcc-in-cxx branch. The developers first tried to compile the source code with a C++ compiler, so there weren't any name changes. I guess they didn't bother to rename the files later when merging the two branches and officially have only one C++ branch
You can read GCC's move to C++ for more historical information

Does GCC atomic buitlins work with std=C99?

I am using this built-in atomic methods link
It is mentioned that:
The following built-in functions approximately match the requirements
for the C++11 memory model.
However I have tried compiling these methods with std=C99 and std=C89. The program compiles and I get the right results. Is there something I am missing here ?
Does C99 and C89 have a memory model as well ?

It is a compiler extension and therefore it is allowed to provide functionality outside of the what the standard allows but that page does not make it obvious that is the can be used in C.
Fortunately, gcc does have good online documents and if we check out for example the 4.9 series document on C extensions the __atomic Builtins points to the same page.
So that would indicate that it is valid to use in C and it will stick the requirements as laid out in the documentation and so it will work in the C99 as it does in C++. Usually if there is a difference between how a feature/extension is implemented between C and C++ the documents will note this, for example compound literals have significant differences.

Does changing the target C standard achieve anything?

I'm interested in the effects of compiling valid C99 code with a C11 compiler. Is there any practical difference?
As an example, could changing
gcc -c -pedantic -std=c99 source.c
to
gcc -c -pedantic -std=c11 source.c
achieve anything, where source.c is valid C99? Could this introduce regressions, or give optimisations for free?
I'm interested specifically in gcc, although answers addressing other compilers are most welcome.
I used the C11 Wikipedia page as a quick check to see the difference between C99 and C11.
I do notice that gets is removed in C11, so that's one possible regression. The only other one I can see is if the code does something like version detection that's not future-proof, like #if __STDC_VERSION__ == 199901L .

Backwards compatibility is considered very important for the committee that maintains the C standard.
You can expect that a strictly conforming program written according to the preceding standard will function identically when compiled with the settings for the new standard, unless
the program uses a feature that was removed (gets when moving from C99 to C11)
the program uses a feature that was made optional and is not provided by the compiler (e.g. complex types or VLA's)
The latter is unlikely to really hit you, because it is not expected that compilers that had these features will remove them now they are optional. It is more likely that if a compiler in C11 mode doesn't support VLA's, then it also doesn't do so in C99 mode (and will be non-conforming to C99 in that respect).
For programs that aren't strictly conforming (the majority of real programs), you also have to check that the new additions don't interfere with your code. Things to watch out for are:
New standard headers that have the same name as your own headers
Incorrect use of reserved identifiers that have been given a use in the new standard. In particular, identifiers that start with an underscore and a capital letter

Is C open source?

Does C (or any other low-level language, for that matter) even have source, or is the compiler the part that "does all the work", including parsing? If so, couldn't different compilers have different C dialects? Where does the stdlib factor into this? I would really like to know how this works.

The C language is not a piece of software but a defined standard, so one wouldn't say that it's open-source, but rather that it's an open standard.
There are a gazillion different compilers for C however, and many of those are indeed open-source. The most notable example is GCC's C compiler, which is all under the GNU General Public License (GPL), an open-source license.
There are more options. Watcom is open-source, for instance. There is no shortage of open-source C compilers, but without a doubt the most widespread one, at least in the non-Windows world, is GCC.
For Windows, your best bet is probably Watcom or GCC by using Cygwin or MinGW.

C is a standard which specifies how C compilers should generate programs.
C itself doesn't have any source code, just like a musical note doesn't have any plastic.
Some C compilers, such as GCC, are open source.

C is just a language, and a standardised one at that, too. It pretty much is the compiler that "does all the work". Different compilers did have different dialects; before the the C99 ANSI standard, you had things like Borland C and other competing compilers, that implemented the C language in their own fantastic ways.
stdlib is just an agreed-upon collection of standard libraries that are required to be present in any ANSI C implementation.

To add on to the other great answers:
Regarding different dialects -- there are some additional features added to C that are compiler specific. You can provide the command line flag -std=... to gcc to specify the C standard that you want to use, each has slight variations/additions to syntax, the most common is probably c99.
Each compiler tends to implement a few different extras, for example, typeof() is not in the C standard and so compilers do not have to implement this but nevertheless it is useful and most compilers provide it. Here is a list of gcc C extensions
The stdlib is a set of functions specified in the C standard. Much like compilers, stdlib can have different implementations. The GNU implementation is open source, as is gcc, but there are other compilers and could be other implementations of stdlib that are closed source.

The Compiler would determine all the mappings from C to Assembly etc... but as far as someone owning it.....noone really owns C however the ANSI/ISO determines the standards

GCC's C compiler is written in C. So we know there are at least one C compiler written in C.
GNU's stdlib (glibc) is also written in C (stdio.h, stdlib.h). But it also has some parts written in assembly language.

A really good question. There is a way to define a language standard (not the implementation!) in a form of a "source code", in a strict and unambigous language. Unfortunately, all of the old languages, including C, are poorly defined. But it is still possible to translate that definitions into a source code form.
Another approach is to define a language via its operational semantics, often in a form of a simple (and unefficient) reference implementation.

Helgi Hrafn Gunnarsson has written the main answer but I thought it would be worth noting that you can effectively end up with dialects too.
The compilers should do the same thing with regards to whichever standard they support (which these days should be pretty much all the same version) but there are grey areas. The way in which the compilers work for 'undefined' functionality for example. If the C specification says that the behaviour is undefined for a specific case then the compiler can do pretty much what it wants.
There are also examples of functions added to the libraries (and new libraries added) by the compiler makers to support specific platform traits, create a competitive advantage or simply to make life easier. The cynical might suggest that some of these are added to help lock people into a specific compiler too.

I would say that C as a language is not open source.
As pointed out by many, you can download GNU licensed compilers and libraries for free, but if you wanted to write your own C compiler, you would need to follow the ISO C standards, and ISO charge hard cash for the specification of the C language, which at the time of posting this is $178.
So really the answer depends on what elements you are interested in being free and open source.

I'm not sure what your definitions of "open source" are.
For the standardization process, it is possible for anyone to participate, but if you want to be able to vote then you will need to pay to join your national body (for instance, ANSI for the USA, BSI for the UK, AFNOR for France etc.). As a rule most standards body memberships are paid by corporations. That said, the process is fairly open. You can access discussion papers on the standards web site.
The standards themselves are not free either. The ISO pdf store currently sells the C standard for 198 swiss francs. Draft copies of the standard can be found easily for free.
There are plenty of open source implementations of both compilers and libraries.