Related
I have been trying to find a documentation explaining the use of escape sequences but have not been successful.
For instance, I know that I can use
printf("%c[%d;%df",0x1B, y, x);
for placing the cursor to a certain position in the console.
But where I would find an explanation for this and other escape sequences. As said, I've been looking through the internet, there are a lot of articles explaining that you can escape sequences for different things but not found one with a list of available functions.
It would be great if anyonw knew where I can find this. Thanks for all answers!
Update after some answer:
I am aware of the wikipedia page. It e.g. mentions the above possibility but not really explained in the table of CSIs.
What I am looking for is something like
ESC[<l>;<c>f => move cursor to line "l" and column "c"
ESC[<l>;<c>H => move cursor to line "l" and column "c"
and explanation of other ESC...
I am not looking for formatting possibilities of printf (but thanks anyway for all answers)
where I would find an explanation for this and other escape sequences
Wikipedia has a quite extensive list https://en.wikipedia.org/wiki/ANSI_escape_code . The standard is ECMA-48 (and it's horrible to read), but it's old, and I think there are some new escape sequences "in the wild".
but not found one with a list of available functions.
There is no list, or the closest you can get is ECMA-48. Each and every terminal (well, nowadays, terminal emulators) has different support for ANSI escape sequences, and this list is not strict, developers add support for new escape sequences, and terminals sometimes support their own escape sequences. There are endless terminals and emulators and versions of them. The terminfo database was created to deal with compatibility issues between ANSI escape codes between terminals.
The escape sequences are different for each terminal type as a general rule. In the past, each terminal brand used (and published) their own set of escape sequences and they were in generla incompatible.
With time, DEC (Digital Equipment Corporation) imposed their set for several reasons:
Their terminals where the most extended and popular ones (vt100, vt200, vt220, vt420, etc.)
All their models shared the same specification.
PDP-11 and later the VAX mainly where sold with these terminals.
For these reasons, the escape sequences of DEC terminals became an standard and slowly all the software adapted to them.
At the same time, some software tools started to use full screen applications, and addressed the problem of using different terminals. This resulted in the unix environments in a library (curses) that allowed the user to have almost any terminal type with addressable cursor and display features to be possible to use with almost any application. Curses was written to support vi(1) but later, it has been successfuly used in many other programs.
Escape sequences became standarized, and the standard (ANSI X3.64 (ISO 6429)) became a de-facto standard in almost any application that was not designed using the curses library. This standard covers just a subset of the full set of escapes that DEC terminals implement (mainly because the sequences to multiplex several sessions in the same terminal is a patented ---and not published--- set of commands, protected by copyright rules).
ECMA has also standarized escape sequences, as answered in another answer to this question.
But, if you actually want to be completely terminal agnostic, you had better to use some curses-like (e.g. ncurses, which is also opensource) library in order to cope with the large database of terminals that have different and incompatible escape sequences. For example, Hewlett Packard terminals have a completely different language for expressing escape codes, and so, escape sequences for HP terminals are completely different than the ones from DEC.
Look at ANSI wikipedia page for a medium to full list of these escapes, and for other links related to documentation of these escapes.
Something I still don't fully understand. For example, standard C functions such as printf() and scanf() which deal with sending data to the standard output or getting data from the standard input. Will the source code which implements these functions be different depending on if we are using them for Windows or Linux?
I'm guessing the quick answer would be "yes", but do they really have to be different?
I'm probably wrong , but my guess is that the actual function code be the same, but the lower layer functions of the OS that eventually get called by these functions are different. So could any compiler compile these same C functions, but it is what gets linked after (what these functions depend on to work on lower layers) is what gives us the required behavior?
Will the source code which implements these functions be different
depending on if we are using them for Windows or Linux?
Probably. It may even be different on different Linuxes, and for different Windows programs. There are several distinct implementations of the C standard library available for Linux, and maybe even more than one for Windows. Distinct implementations will have different implementation code, otherwise lawyers get involved.
my guess is that the actual function code be the same, but the lower
layer functions of the OS that eventually get called by these
functions are different. So could any compiler compile these same C
functions, but it is what gets linked after (what these functions
depend on to work on lower layers) is what gives us the required
behavior?
It is conceivable that standard library functions would be written in a way that abstracts the environment dependencies to some lower layer, so that the same source for each of those functions themselves can be used in multiple environments, with some kind of environment-specific compatibility layer underneath. Inasmuch as the GNU C library supports a wide variety of environments, it serves as an example of the general principle, though Windows is not among the environments it supports. Even then, however, the environment distinction would be effective even before the link stage. Different environments have a variety of binary formats.
In practice, however, you are very unlikely to see the situation you describe for Windows and Linux.
Yes, they have different implementations.
Moreover you might be using multiple different implementations on the same OS. For example:
MinGW is shipped with its own implementation of standard library which is different from the one used by MSVC.
There are many different implementations of C library even for Linux: glibc, musl, dietlibc and others.
Obviously, this means there is some code duplication in the community, but there are many good reasons for that:
People have different views on how things should be implemented and tested. This alone is enough to "fork" the project.
License: implementations put some restrictions on how they can be used and might require some actions from the end user (GPL requires you to share your code in some cases). Not everyone can follow those requirements.
People have very different needs. Some environments are multithreaded, some are not. printf might need or might not need to use some thread synchronization mechanisms. Some people need locale support, some don't. All this can bloat the code in the end, not everyone is willing to pay for things they do not use. Even strerror is vastly different on different OSes.
Aforementioned synchronization mechanisms are usually OS-specific and work in specific ways. Same can be said about locale handling, signal handling and other things, including the actual data writing and reading.
Some implementations add non-standard extensions that can make your life easier. Not all of those make sense on other OSes. For example glibc adds 'e' mode specifier to open file with O_CLOEXEC flag. This doesn't make sense for Windows.
Many complex things cannot be implemented in pure C and require some compiler-specific extensions. This can tie implementation to a limited number of compilers.
In the end, it is much simpler to have many C libraries, than trying to create a one-size-fits-all implementation.
As you say the higher level parts of the implementation of something like printf, like the code used to format the string using the arguments, can be written in a cross-platform way and be shared between Linux and Windows. I'm not sure if there's a C library that actually does it though.
But to interact with the hardware or use other operating system facilities (such as when printf writes to the console), the libc implementation has to use the OS's interface: the system calls. And these are very different between Windows and Unix-likes, and different even among Unix-likes (POSIX specifies a lot of them but there are OS specific extensions). For example here you can find system call tables for Linux and Windows.
There are two parts to functions like printf(). The first part parses the format string, and assembles an array of characters ready for output. If this part is written in C, there's no reason preventing it being common across all C libraries, and no reason preventing it being different, so long the standard definition of what printf() does is implemented. As it happens, different library developers have read the standard's definition of printf(), and have come up with different ways of parsing and acting on the format string. Most of them have done so correctly.
The second part, the bit that outputs those characters to stdout, is where the differences come in. It depends on using the kernel system call interface; it's the kernel / OS that looks after input/output, and that is done in a specific way. The source code required to get the Linux kernel to output characters is very different to that required to get Windows to output characters.
On Linux, it's usual to use glibc; this does some elaborate things with printf(), buffering the output characters in a pipe until a newline is output, and only then calling the Linux system call for displaying characters on the screen. This means that printf() calls from separate threads are neatly separated, each being on their own line. But the same program source code, compiled against another C library for Linux, won't necessarily do the same thing, resulting in printf() output from different threads being all jumbled up and unreadable.
There's also no reason why the library that contains printf() should be written in C. So long as the same function calling convention as used by the C compiler is honoured, you could write it in assembler (though that'd be slightly mad!). Or Ada (calling convention might be a bit tricky...).
Will the source code which implements these functions be different
Let us try another point-of-view: competition.
No. Competitors in industry are not required by the C spec to share source code to issue a compliant compiler - nor would various standard C library developers always want to.
C does not require "open source".
The compiler (Visual C++) always corrects me when I type flushall();
warning C4996: 'flushall': The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _flushall.
They both work for me, is there any difference?
This warning is nonsensical; flushall is not POSIX, and to my knowledge, never was. The correct ISO C way to flush all open stdio streams is fflush(NULL);.
To shed some more light on the issue, I believe this is a generic warning message Visual C uses for functions which are not part of the C standard, but which their standard library has for some minimal degree of compatibility. Most of these functions are things like open and read, and correspond in some vague way to the POSIX functions by the same name, and can be thought of as the lower-level primitives underlying stdio, etc. Some, like flushall, are not actually from POSIX, but just based on oddball functions copied from various legacy unices.
Since these functions are not actually part of the C (or C++) standards, and the C standard (and possibly also the C++ standard...?) reserves for the application identifiers which are not explicitly defined or reserved in the standard, it's non-conformant to expose in the standard headers names like open or flushall. POSIX gets around this by using feature test macros by which an application can make the namespace guarantees it wants from the implementation explict, but Microsoft instead took an approach of deprecating the POSIX-like names and offering an alternative set of functions, prefixed with underscores, which makes them part of the namespace reserved for the implementation. The warning is recommending that you use these Microsoft-specific names, so that your code would still compile if you turned on the options (probably controlled by macros of compiler switches) for a strict namespace conformance in the headers.
Unfortunately, writing code the Microsoft-recommended way with the underscore-prefixed names renders your code completely non-portable. Since these names are in the namespace reserved for the implementation, other platforms might happen to use the same names for functions which do completely different things, and being as they're reserved, you would not be allowed to replace or #define around them to remap them to other functions.
In short: I would find the warning flag that's generating this warning, and turn it off. Along with the one that spews FUD that such-and-such standard function is "insecure" and you should use the _s-suffixed function instead. But if the only function you need is fflushall(), simply replace it with fflush(NULL); and you're good to go.
This goes back to late 1988, back when Dave Cutler's team started development on Windows NT. The design goal was to create an operating system with three distinct api layers, Winapi, OS/2 and POSIX. POSIX was very new back then, the very first version was released in 1988 as well. It got included because it was listed as a requirement in a USA government standard, FIPS 151-2.
Back then the Unix Wars started to get serious. Pretty vicious with intentional differences in competing implementations. Main combatants were the Open Software Foundation, an industry group that feared Sun's increasing dominance and Unix International, a group around AT&T. That dust didn't settle until 1993 and the formation of the COSE alliance, now Open Group.
This makes the exact origin of flushall() pretty hard to trace, it was very much the wild wild west back then. Microsoft was also in the C compiler business so of course they released the tools to build programs that ran on the POSIX layer, including headers that declared the functions. So surely the time that flushall came about in their toolset. I don't have a copy anymore to check, destroyed by a leaky roof.
The NT Posix and OS/2 api layers never gained much traction, significantly overshadowed by Win32's runaway success. They were entirely dropped in 2001 with the XP release. Which, I think, was also around the time that they started deprecating the original Posix function names. Clearly it didn't make much sense anymore to keep these declarations around when they were no longer supported by the operating system. The Posix names are fairly awkward, short lower-case names in the global namespace cause many accidents.
They were however never completely removed, still documented to this day. Which is why you got the warning. Rather then choosing, do favor standard C library names, fflush().
Does C (or any other low-level language, for that matter) even have source, or is the compiler the part that "does all the work", including parsing? If so, couldn't different compilers have different C dialects? Where does the stdlib factor into this? I would really like to know how this works.
The C language is not a piece of software but a defined standard, so one wouldn't say that it's open-source, but rather that it's an open standard.
There are a gazillion different compilers for C however, and many of those are indeed open-source. The most notable example is GCC's C compiler, which is all under the GNU General Public License (GPL), an open-source license.
There are more options. Watcom is open-source, for instance. There is no shortage of open-source C compilers, but without a doubt the most widespread one, at least in the non-Windows world, is GCC.
For Windows, your best bet is probably Watcom or GCC by using Cygwin or MinGW.
C is a standard which specifies how C compilers should generate programs.
C itself doesn't have any source code, just like a musical note doesn't have any plastic.
Some C compilers, such as GCC, are open source.
C is just a language, and a standardised one at that, too. It pretty much is the compiler that "does all the work". Different compilers did have different dialects; before the the C99 ANSI standard, you had things like Borland C and other competing compilers, that implemented the C language in their own fantastic ways.
stdlib is just an agreed-upon collection of standard libraries that are required to be present in any ANSI C implementation.
To add on to the other great answers:
Regarding different dialects -- there are some additional features added to C that are compiler specific. You can provide the command line flag -std=... to gcc to specify the C standard that you want to use, each has slight variations/additions to syntax, the most common is probably c99.
Each compiler tends to implement a few different extras, for example, typeof() is not in the C standard and so compilers do not have to implement this but nevertheless it is useful and most compilers provide it. Here is a list of gcc C extensions
The stdlib is a set of functions specified in the C standard. Much like compilers, stdlib can have different implementations. The GNU implementation is open source, as is gcc, but there are other compilers and could be other implementations of stdlib that are closed source.
The Compiler would determine all the mappings from C to Assembly etc... but as far as someone owning it.....noone really owns C however the ANSI/ISO determines the standards
GCC's C compiler is written in C. So we know there are at least one C compiler written in C.
GNU's stdlib (glibc) is also written in C (stdio.h, stdlib.h). But it also has some parts written in assembly language.
A really good question. There is a way to define a language standard (not the implementation!) in a form of a "source code", in a strict and unambigous language. Unfortunately, all of the old languages, including C, are poorly defined. But it is still possible to translate that definitions into a source code form.
Another approach is to define a language via its operational semantics, often in a form of a simple (and unefficient) reference implementation.
Helgi Hrafn Gunnarsson has written the main answer but I thought it would be worth noting that you can effectively end up with dialects too.
The compilers should do the same thing with regards to whichever standard they support (which these days should be pretty much all the same version) but there are grey areas. The way in which the compilers work for 'undefined' functionality for example. If the C specification says that the behaviour is undefined for a specific case then the compiler can do pretty much what it wants.
There are also examples of functions added to the libraries (and new libraries added) by the compiler makers to support specific platform traits, create a competitive advantage or simply to make life easier. The cynical might suggest that some of these are added to help lock people into a specific compiler too.
I would say that C as a language is not open source.
As pointed out by many, you can download GNU licensed compilers and libraries for free, but if you wanted to write your own C compiler, you would need to follow the ISO C standards, and ISO charge hard cash for the specification of the C language, which at the time of posting this is $178.
So really the answer depends on what elements you are interested in being free and open source.
I'm not sure what your definitions of "open source" are.
For the standardization process, it is possible for anyone to participate, but if you want to be able to vote then you will need to pay to join your national body (for instance, ANSI for the USA, BSI for the UK, AFNOR for France etc.). As a rule most standards body memberships are paid by corporations. That said, the process is fairly open. You can access discussion papers on the standards web site.
The standards themselves are not free either. The ISO pdf store currently sells the C standard for 198 swiss francs. Draft copies of the standard can be found easily for free.
There are plenty of open source implementations of both compilers and libraries.
The C standard library is notoriously poor when it comes to I/O safety. Many functions have buffer overflows (gets, scanf), or can clobber memory if not given proper arguments (scanf), and so on. Every once in a while, I come across an enterprising hacker who has written his own library that lacks these flaws.
What are the best of these libraries you have seen? Have you used them in production code, and if so, which held up as more than hobby projects?
I use GLib library, it has many good standard and non standard functions.
See https://developer.gnome.org/glib/stable/
and maybe you fall in love... :)
For example:
https://developer.gnome.org/glib/stable/glib-String-Utility-Functions.html#g-strdup-printf
explains that g_strdup_printf is:
Similar to the standard C sprintf() function but safer, since it calculates the maximum space required and allocates memory to hold the result.
This isn't really answering your question about the safest libraries to use, but most functions that are vulnerable to buffer overflows that you mentioned have safer versions which take the buffer length as an argument to prevent the security holes that are opened up when the standard methods are used.
Unless you have relaxed the level of warnings, you will usually get compiler warnings when you use the deprecated methods, suggesting you use the safer methods instead.
I believe the Apache Portable Runtime (apr) library is safer than the standard C library. I use it, well, as part of an apache module, but also for independent processes.
For Windows there is a 'safe' C/C++ library.
You're always at liberty to implement any library you like and to use it - the hard part is making sure it is available on the platforms you need your software to work on. You can also use wrappers around the standard functions where appropriate.
Whether it is really a good idea is somewhat debatable, but there is TR24731 published by the C standard committee - for a safer set of C functions. There's definitely some good stuff in there. See this question: Do you use the TR 24731 Safe Functions in your C code?, which includes links to the technical report.
Maybe the first question to ask is if your really need plain C? (maybe a language like .net or java is an option - then e.g. buffer overflows are not really a problem anymore)
Another option is maybe to write parts of your project in C++ if other higher level languages are not an option. You can then have a C interface which encapsulates the C++ code if you really need C.
Because if you add all the advanced functions the C++ standard library has build in - your C code would only be marginally faster most times (and contain a lot more bugs than an existing and tested framework).