Argument treatment at command line in C

Argument treatment at command line in C - c

The question asked to me is that even if we supplied integer/float arguments at the command prompt they are still treated as strings or not in C language. I am not sure about that can any help just little bit. Is this true or not in C language and why? And what about others like Java or python ?

It is true, independent of the language, that the command line arguments to programs on Unix are strings. If the program chooses to interpret them as numbers, that is fine, but it is because the program (or programmer) chose to do so. Similarly, the runtime support for a language might alter the arguments passed by the o/s into integer or float types, but the o/s passes strings to that runtime (I know of no language that does this, but I don't claim to know all languages).
To see this, note that the ways to execute a program are the exec*() family of functions, and each of those takes a string which is the name of the program to be executed, and an array of strings which are the arguments to be passed to the program. Functions such as system() and popen() are built atop the exec*() family of functions — they also use fork(), of course. Even the posix_spawn() function takes an array of pointers to strings for the arguments of the spawned program.

It's not unlike mailing a letter without an envelope. We all agree to use the common enclosure known as an envelope. Operating systems pass parameters to programs using the common item known as a string of characters. It's beyond the scope of the operating system to understand what the program wants to do with the parameters.
There are some exceptions, one which comes to mind is the passing of parameters to a Linux Kernel Module. These can be passed as items other than strings.

Basically, this is an issue of creating an interface between the operating system and the program. any program. Remember that programs are not always written in C, And you don't even know whether there are things like float or int in the language.
You want to be able to pass several arguments (with natural delimiters), which may easily encode arbitrary information. In C, strings can be of arbitrary length, and the only constraint on them is that a zero byte in them signifies the end of the string. this is a highly flexible and natural way to pass arbitrary information to the program.
So you can never supply an arbitrary integer/float arguments directly to a program; The operating system (Unix / Linux / Windows / etc.) won't let you. You don't have any tool that gives you that interface, in the same way that you can't pass a mouse-click as an argument. All you supply is a sequence of characters.
Since Unix and C were designed together, it is also part of the C programming language, and from there it worked its way to C++, Java, Python and most other modern programming languages, and the same way into Linux, Windows and most other operating systems.

Related

Execute a C program from within another C program as if it was a function call (in Windows)?

Is it possible call a separate C program (.exe file) within a C program, like if it was a function?
I would like to be able to pass arguments of any kind (like any other function) to this separate program, and get the return value (so it can be used in the host program).
I imagine that the arguments can be passed by using int argc, char *argv[], but I don't know if it's possible to pass integers, arrays, pointers to structures and so on.
On the other hand, I've read that the return value from the main function is system specific. Since I'm using Windows, is there any limitations to this return value (type, size, etc.)? Can it be anything that could be used as a return value in any normal function?
Thanks!

What you describe, is the basic premise of the Unix operating system. Unix was designed to allow accomplishing very complex tasks by chaining several commands, piping the (text) output of a command as the input of the next one (this was pretty revolutionary back then).
As klutt already suggested, you can accomplish the same with a Windows executable. To his list, I would add learning how to redirect the input/output of a program to a file handle.
The Windows PowerShell extended this concept to allow passing different data-types other than text, to some special executables known as cmdlets, however, to write your own, you need support from the .Net Framework or the .Net Core infrastructure, so you must do so from a managed language such as C# or C++/CLI.
Keep in mind that spawning a whole process is an extremely expensive operation (compared to simply calling a linked function), so there is some significant overhead you need to be aware of.

Why does system() exist?

Many papers and such mention that calls to 'system()' are unsafe and unportable. I do not dispute their arguments.
I have noticed, though, that many Unix utilities have a C library equivalent. If not, the source is available for a wide variety of these tools.
While many papers and such recommend against goto, there are those who can make an argument for its use, and there are simple reasons why it's in C at all.
So, why do we need system()? How much existing code relies on it that can't easily be changed?

sarcastic answer Because if it didn't exist people would ask why that functionality didn't exist...
better answer
Many of the system functionality is not part of the 'C' standard but are part of say the Linux spec and Windows most likely has some equivalent. So if you're writing an app that will only be used on Linux environments then using these functions is not an issue, and as such is actually useful. If you're writing an application that can run on both Linux and Windows (or others) these calls become problematic because they may not be portable between system. The key (imo) is that you are simply aware of the issues/concerns and program accordingly (e.g. use appropriate #ifdef's to protect the code etc...)

The closest thing to an official "why" answer you're likely to find is the C89 Rationale. 4.10.4.5 The system function reads:
The system function allows a program to suspend its execution temporarily in order to run another program to completion.
Information may be passed to the called program in three ways: through command-line argument strings, through the environment, and (most portably) through data files. Before calling the system function, the calling program should close all such data files.
Information may be returned from the called program in two ways: through the implementation-defined return value (in many implementations, the termination status code which is the argument to the exit function is returned by the implementation to the caller as the value returned by the system function), and (most portably) through data files.
If the environment is interactive, information may also be exchanged with users of interactive devices.
Some implementations offer built-in programs called "commands" (for example, date) which may provide useful information to an application program via the system function. The Standard does not attempt to characterize such commands, and their use is not portable.
On the other hand, the use of the system function is portable, provided the implementation supports the capability. The Standard permits the application to ascertain this by calling the system function with a null pointer argument. Whether more levels of nesting are supported can also be ascertained this way; assuming more than one such level is obviously dangerous.
Aside from that, I would say mainly for historical reasons. In the early days of Unix and C, system was a convenient library function that fulfilled a need that several interactive programs needed: as mentioned above, "suspend[ing] its execution temporarily in order to run another program". It's not well-designed or suitable for any serious tasks (the POSIX requirements for it make it fundamentally non-thread-safe, it doesn't admit asynchronous events to be handled by the calling program while the other program is running, etc.) and its use is error-prone (safe construction of command string is difficult) and non-portable (because the particular form of command strings is implementation-defined, though POSIX defines this for POSIX-conforming implementations).
If C were being designed today, it almost certainly would not include system, and would either leave this type of functionality entirely to the implementation and its library extensions, or would specify something more akin to posix_spawn and related interfaces.

Many interactive applications offer a way for users to execute shell commands. For instance, in vi you can do:
:!ls
and it will execute the ls command. system() is a function they can use to do this, rather than having to write their own fork() and exec() code.
Also, fork() and exec() aren't portable between operating systems; using system() makes code that executes shell commands more portable.

Are functions such as printf() implemented differently for Linux and Windows

Something I still don't fully understand. For example, standard C functions such as printf() and scanf() which deal with sending data to the standard output or getting data from the standard input. Will the source code which implements these functions be different depending on if we are using them for Windows or Linux?
I'm guessing the quick answer would be "yes", but do they really have to be different?
I'm probably wrong , but my guess is that the actual function code be the same, but the lower layer functions of the OS that eventually get called by these functions are different. So could any compiler compile these same C functions, but it is what gets linked after (what these functions depend on to work on lower layers) is what gives us the required behavior?

Will the source code which implements these functions be different
depending on if we are using them for Windows or Linux?
Probably. It may even be different on different Linuxes, and for different Windows programs. There are several distinct implementations of the C standard library available for Linux, and maybe even more than one for Windows. Distinct implementations will have different implementation code, otherwise lawyers get involved.
my guess is that the actual function code be the same, but the lower
layer functions of the OS that eventually get called by these
functions are different. So could any compiler compile these same C
functions, but it is what gets linked after (what these functions
depend on to work on lower layers) is what gives us the required
behavior?
It is conceivable that standard library functions would be written in a way that abstracts the environment dependencies to some lower layer, so that the same source for each of those functions themselves can be used in multiple environments, with some kind of environment-specific compatibility layer underneath. Inasmuch as the GNU C library supports a wide variety of environments, it serves as an example of the general principle, though Windows is not among the environments it supports. Even then, however, the environment distinction would be effective even before the link stage. Different environments have a variety of binary formats.
In practice, however, you are very unlikely to see the situation you describe for Windows and Linux.

Yes, they have different implementations.
Moreover you might be using multiple different implementations on the same OS. For example:
MinGW is shipped with its own implementation of standard library which is different from the one used by MSVC.
There are many different implementations of C library even for Linux: glibc, musl, dietlibc and others.
Obviously, this means there is some code duplication in the community, but there are many good reasons for that:
People have different views on how things should be implemented and tested. This alone is enough to "fork" the project.
License: implementations put some restrictions on how they can be used and might require some actions from the end user (GPL requires you to share your code in some cases). Not everyone can follow those requirements.
People have very different needs. Some environments are multithreaded, some are not. printf might need or might not need to use some thread synchronization mechanisms. Some people need locale support, some don't. All this can bloat the code in the end, not everyone is willing to pay for things they do not use. Even strerror is vastly different on different OSes.
Aforementioned synchronization mechanisms are usually OS-specific and work in specific ways. Same can be said about locale handling, signal handling and other things, including the actual data writing and reading.
Some implementations add non-standard extensions that can make your life easier. Not all of those make sense on other OSes. For example glibc adds 'e' mode specifier to open file with O_CLOEXEC flag. This doesn't make sense for Windows.
Many complex things cannot be implemented in pure C and require some compiler-specific extensions. This can tie implementation to a limited number of compilers.
In the end, it is much simpler to have many C libraries, than trying to create a one-size-fits-all implementation.

As you say the higher level parts of the implementation of something like printf, like the code used to format the string using the arguments, can be written in a cross-platform way and be shared between Linux and Windows. I'm not sure if there's a C library that actually does it though.
But to interact with the hardware or use other operating system facilities (such as when printf writes to the console), the libc implementation has to use the OS's interface: the system calls. And these are very different between Windows and Unix-likes, and different even among Unix-likes (POSIX specifies a lot of them but there are OS specific extensions). For example here you can find system call tables for Linux and Windows.

There are two parts to functions like printf(). The first part parses the format string, and assembles an array of characters ready for output. If this part is written in C, there's no reason preventing it being common across all C libraries, and no reason preventing it being different, so long the standard definition of what printf() does is implemented. As it happens, different library developers have read the standard's definition of printf(), and have come up with different ways of parsing and acting on the format string. Most of them have done so correctly.
The second part, the bit that outputs those characters to stdout, is where the differences come in. It depends on using the kernel system call interface; it's the kernel / OS that looks after input/output, and that is done in a specific way. The source code required to get the Linux kernel to output characters is very different to that required to get Windows to output characters.
On Linux, it's usual to use glibc; this does some elaborate things with printf(), buffering the output characters in a pipe until a newline is output, and only then calling the Linux system call for displaying characters on the screen. This means that printf() calls from separate threads are neatly separated, each being on their own line. But the same program source code, compiled against another C library for Linux, won't necessarily do the same thing, resulting in printf() output from different threads being all jumbled up and unreadable.
There's also no reason why the library that contains printf() should be written in C. So long as the same function calling convention as used by the C compiler is honoured, you could write it in assembler (though that'd be slightly mad!). Or Ada (calling convention might be a bit tricky...).

Will the source code which implements these functions be different
Let us try another point-of-view: competition.
No. Competitors in industry are not required by the C spec to share source code to issue a compliant compiler - nor would various standard C library developers always want to.
C does not require "open source".

definition of function printf in C language

I have read that C language does not include instructions for input and for output and that printf, scanf, getchar, putchar are actually functions.
Which are the primitive C language instructions to obtain the function printf , then?
Thank you.

If you want to use printf, you have to #include <stdio.h>. That file declares the function.
If you where thinking about how printf is implemented: printf might internally call any other functions and probably goes down to putc (also part of the C runtime) to write out the characters one-by-one. Eventually one of the functions needs to really write the character to the console. How this is done depends on the operating system. On Linux for example printf might internally call the Linux write function. On Windows printf might internally call WriteConsole.

The function printf is documented here; in fact, it is not part of the C language itself. The language itself does not provide a means for input and output. The function printf is defined in a library, which can be accessed using the compiler directive #include <stdio.h>.

No programming language provides true "primitives" for I/O. Any I/O "primitives" rely on lower abstraction levels, in this language or another.
I/O, at the lowest level, needs to access hardware. You might be looking at BIOS interrupts, hardware I/O ports, memory-mapped device controlers, or something else entirely, depending on the actual hardware your program is running on.
Because it would be a real burden to cater for all these possibilities in the implementation of the programming language, a hardware abstraction layer is employed. Individual I/O controllers are accessed by hardware drivers, which in turn are controlled by the operating system, which is providing I/O services to the application developer through a defined, abstract API. These may be accessed directly (e.g. by user-space assembly), or wrapped further (e.g. by the implementation of a programming language's interpreter, or standard library functions).
Whether you are looking at "commands" like (bash) echo or (Python) print, or library functions like (Java) System.out.println() or (C) printf() or (C++) std::cout, is just a syntactic detail: Any I/O is going through several layers of abstraction, because it is easier, and because it protects you from all kinds of erroneous or malicious program behaviour.
Those are the "primitives" of the respective language. If you're digging down further, you are leaving the realm of the language, and enter the realm of its implementation.
I once worked on a C library implementation myself. Ownership of the project has passed on since, but basically it worked like this:
printf() was implemented by means of vfprintf() (as was, eventually, every function of the *printf() family).
vfprintf() used a couple of internal helpers to do the fancy formatting, writing into a buffer.
If the buffer needed to be flushed, it was passed to an internal writef() function.
this writef() function needed to be implemented differently for each target system. On POSIX, it would call write(). On Win32, it would call WriteFile(). And so on.

What 'mark' does a language leaves on a library that we need language bindings?

What mark does a language leaves on a compiled library that we need language bindings if we have call its functions from a different language?
object code looks 'language free' to me.
While learning OpenGL in c in Linux environment I have across language bindings.

Binding provides a simple and consistent way for applications to present and interact with data.
Source: The tag under your question

I'm guessing that you're either young or haven't been programming for more than a decade or so.
Object code should look language free, but it ain't due to history. Back in the 1970s and 1980s, on Intel 80x86 and Motorola 680x0 CPUs, function call arguments were passed on the stack. In the 'Pascal' convention, the number of arguments was fixed and the called function code removed them from the stack before returning. In the 'C' convention, the number of arguments was variable (eg printf) so the calling code had to remove them when the function returned. This cost 2 extra bytes per function call, which is nothing today but was significant back then when PCs only came with 128K or so of RAM. So Microsoft chose to use the Pascal calling convention for the Windows API, even though it was written in C. If your object code called a Windows function with the C convention by mistake, kaboom. This is why the header files are still cluttered up with WINAPI and _stdcall and _fastcall and whatnot.
Starting in the 1990s operating system authors realized this was silly and started imposing standard calling conventions on everyone. The C convention could handle both cases, so it got used everywhere. With the moves to MacOS X, 64 bit Windows, and ARM; we are finally getting language free object code.
Now, OpenGL was designed to be used from C and Fortran. (Which was in the 1990s still an important language for scientific calculations and visualization.) Both languages have integers, floating point numbers, and arrays of various sized ints/floats. C has structs but Fortran doesn't, and I suspect this is a major reason why the OpenGL API never uses any structs. There are also differences in the memory layout of 2D or higher dimension arrays between C and Fortran, and again note that the OpenGL API never specifies 2D arrays, only 1D.
A C API works for most languages. This is partly because C is 'portable assembler' that works onto almost any CPU and operating system. It's also because most other programming languages in common use are either supersets of C (C++, Objective-C) or implemented in C themselves (Python, Perl, Ruby) so can be made to call the OpenGL C API reasonably easily.
Java and C# have more problems, because they define their own object code, so to speak, and memory access is more tightly controlled. The C/OpenGL notion of 'here is a pointer to a block of memory, do what you like with it' breaks the security model of the JVM/CLR. So you end up having to use Java NIOByteBuffer things instead of just passing arrays.
A lot of it also comes down to the skill of the language binding designer. For one example, Python-OpenGL by Mike Fletcher is a really good binding. All the functions and constants have exactly the same names, so a lot of code can be just copied from C and pasted into Python. Python doesn't have C style arrays directly, but the language binding will silently translate any Python sequences/tuples you pass as "arrays" into the underlying C format for you. It feels natural for a Python programmer and still exposes the full capabilities of OpenGL.
For a bad example, JOGL is a pain in the arse. There's no automatic conversion from Java arrays to C, so you have to futz around with NIOByteBuffers yourself. It's so annoying that it's actually easier to use glBegin..glEnd blocks. And extra array offset arguments got added to a lot of OpenGL functions, so your code no longer looks the same as C/C++ and you waste a lot of time sticking ,0 on the end of function calls. Some of this is due to the JVM as mentioned before, but a lot of it is just bad design by (I suspect) somebody who never actually wrote much OpenGL themselves.
A long and rambling answer to a vague question.

Well, all you have to do is think about the myriad of calling conventions in C and C++. In order to prevent serious mishaps, the compiler mangles the function names based on calling convention so that you do not accidentally call a stdcall function using fastcall conventions. Each language has its own set of superfluous details like this that a language independent API should never have to burden itself with. Language bindings serve as an adapter/bridge that separates the language-specific stuff from the standardized API, filling in the gaps wherever necessary.
The OpenGL API is generally implemented in a single language (C) and programs written in other languages interface with the system's implementation through language bindings. OpenGL uses null-terminated ASCII strings for GLSL and has numerous functions that use pointers, things that make perfect sense for an API that is designed to be implemented in C. In Java, strings are not null-terminated and they are UTF-16 encoded; you can see why a bridge is needed. The Java GL bindings take care of string conversion and alter glVertexPointer (...)-like functions to fit Java's conditions for "pointing to" contiguous blocks of memory.