C Program: how can result of 1st few lines of code change based on what happens much later in program? - c

I'm programming some C code using gcc -std=C89 switch on a Linux box. This C code communicates with an Oracle database using OCI drivers called by OCILIB libraries. After downloading the necessary data from the database, the C program calls a C function (my_function) that performs a lot of complex math. The program flow looks like:
int main (void) {
OCI_Connection *cn;
OCI_Statement *st;
OCI_Resultset *rs;
...
/* FIRST CALL TO DB */
OCI_Initialize(NULL, NULL, OCI_ENV_DEFAULT);
cn = OCI_ConnectionCreate(...);
st = OCI_StatementCreate(cn);
OCI_Prepare(st, ...);
OCI_Bindxxx(st, ...);
OCI_Execute(st);
printf(...); /* verify data retrieved from database is correct */
/* SECOND CALL TO DB */
OCI_Prepare(st, ...); /* different prepare stmt than above */
OCI_Bindxxx(st, ...);
OCI_Execute(st, ...);
printf(...); /* verify data retrieved from database is correct */
/* THIRD CALL TO DB */
OCI_SetFetchSize(st, 200);
OCI_Prepare(st, ...);
OCI_Bindxxx(st, ...);
OCI_Execute(st);
rs = OCI_GetResultset(st);
...
printf(...); /* verify data retrieved from database is correct */
OCI_Cleanup();
return EXIT_SUCCESS;
my_function(...);
}
If I run the program as shown, the printf statements all display the correct data has been downloaded from the database into the C program. However, my_function has not executed.
If I then move the return EXIT_SUCCESS line of code from before my_function() to AFTER my_function(), re-compile the code and run it, the printf statements show that the data from the 1st call to the database is saved correctly in the C program, but the 2nd call's data is incorrect, and the printf statement from the 3rd call appears not to have done anything.
There are no errors or warnings reported at compile and run time.
I'm not that experienced in C (or OCILIB), but for those who are, is there a logical explanation how the placement of return EXIT_SUCCESS in the code can interact with code located much before it, to cause this?
In my simple mind, I think of the code as executing one line at a time, so if the code works to line 123 (for example), a change to the code at line 456 shouldn't effect the results up to line 123 (e.g. when comparing before-versus-after the change to line 456). Perhaps Am I missing something?

Another possibility is that your code is relying on the value of uninitialized variables, and that by adding the return before calling myfunction() you are changing the way the compiler lays out variables in memory.
For example, an optimizing compiler might notice that the call to myfunction() is unreachable because of the return, thus it might be able to avoid setting aside space for a temporary variable it might otherwise need for the myfunction() call.
Make sure your compiler is set to warn about use of uninitialized variables.

I'm guessing that your printf statements don't end in newlines; in this case, the output isn't flushed until the main ends. This allows for my_function to corrupt stdout in the meantime. Use newlines or fflush and I'll bet this apparently anomalous behavior will cease.

Rewritten answer:
If your code is behaving as differently as you describe, it suggests that the version with the return before your call doesn't include your function in its executable image (the unused code is optimized out), which changes the memory layout. This then might be affecting your code if you have serious memory management issues.
Did you try reprinting the first data after the second lot of database activity, to ensure that you still had the information you thought you had read successfully? Was your printing of the information retrieved thorough and complete?

Related

Do programs look over all the code and then run, or does it run line by line?

To put things in perspective, here's a simple program in C that asks the user to input their name, and then the program says "Hello, [your name]":
void PrintName(string name);
int main(void)
{
printf("Your name: ");
string s = GetString();
PrintName(s);
}
void PrintName(string name)
{
printf("Hello, %s\n,name);
}
In line 7, I've written PrintName(s), however, PrintName is not defined until the end.
My question: If a program runs things line by line, when it first encounters PrintName(s), wouldn't the program not understand how PrintName is defined (because the function definition comes after, not before) and thus not output a name?
Remember that programming languages are there to make our lives easier; writing binary machine code (1's and 0's) is a pain, so languages let us express ourselves more concisely.
Some languages are interpreted (goes through line by line, roughly speaking), some are statically compiled (a compiler goes through all the code and generates an executable when it's done), and some do something else entirely.
C is statically compiled.
void PrintName(string name);
This is a function declaration. This tells the compiler "a function called PrintName that takes a string argument exists".
So when your program is being compiled and the identifier PrintName is hit, the compiler knows about it, can check you're giving it a string (amongst other things) and carries on happily.
The compiler later comes across the definition of PrintName and uses that to generate the executable.
If you declare a function, but do not define it, you'll later get an error along the lines of undefined reference to MyFunction, which is saying "you told me this function definition existed somewhere, but I can't find it".
You are getting confused between compiling and running code.
Your code is compiled first when you build it so it understands what printName is then. Later you run the code that then runs the command called "printName". So it already "knows" what it is from when you compiled it.
the program is not "executed" so to speak in the language you wrote it in. Technically speaking, the only real execution language is binary, the computer chips dont understand code, they only understand on or off.
When you write your code, it is compiled into lower level languages like assembly, and eventually to binary. All the connections you make in your code are created at that time, before execution.
C programs are compiled before you're able to run them. This includes reading all of the sources and building a graph with all functions and entry points. In general, it is possible to call or otherwise use a function defined anywhere, but in C you should declare a forward prototype if your callee function is defined below the place it is used.
So, C programs are not executed line-by-line.
Some scripting languages have a notion of sml line-by-line execution, but mainly it is really statement-by-statement. I.e. an if line cannot be executed at all, because it only controls the program flow. So, interpreter must read the entire body of the statement (see below) before it is able to correctly execute them.
if (condition) {
...
} <- here it starts executing the entire sentence
function foo(bar) {
while (true) {
bar = bar - bar
} <- not here!
} <- last brace closed, statement is complete

Is it possible to print on a file the output of a function?

I'm writing a tool which uses various void functions to analyze some elements that the program receive in input, and print in the stdout (using printf()) the results of the analysis. Obviously that means that i use printf() very much, because there are lots of things that need being printed. I would like that everything is printed on the stdout, is printed also on a log file. But the use of fprintf() for each printf() function makes the program really longer, confused and not well-ordered. Is there a way to save directly the output of a function on a file? So if i have a void function for example analyze() wich contains lots of printf() functions, the output of analyze() will be printed in the stdout (to the shell) and also on a file. I tried to look for it but withouth any results.
Thanks for help guys!
You could write a function that will issue a printf and fprintf and call that function instead.
For instance, here's a skeleton, obviously you'll have to fill in the ...:
void myPrintf(...)
{
printf(...);
fprintf(...);
}
Then in your code, instead of printf, you could call:
myPrintf(...);
You could also do this with a macro.
If you need to turn off logging, you could do so by simply pulling out the fprintf in that function and leave the rest of your code unchanged.
It's generally better to have a specialised logging function, rather than peppering your code with direct printf/fprintf calls. That logging function can write to stdout, a log-file, whatever you want, and can be different in debug/release, etc.

Automatically running code at the start of every C function

I have an almost identical question as How to add code at the entry of every function? but for C:
As I'm maintaining someone else's large undocumented project, I wish to have code similar to
static C0UNT_identifier_not_used_anywhere_else = 0;
printf("%s%s:%d#%d", __func__, strrchr(__FILE__,'/'), __LINE__, ++C0UNT_identifier_not_used_anywhere_else);
to run on entry of every function, so that I
have a log of what calls what, and
can tell, on which nth call to a function it breaks.
The existing code comprises hundreds of source files, so it is unfeasible to put a macro e.g.
#define ENTRY_CODE ...
...
int function() {
ENTRY_CODE
...
}
in every function. I am also not using DevStudio, Visual Studio or other compiler providing __cyg_profile_func_enter or such extensions.
Optionally, I'd like to printf the return value of each function on exit in a similar style. Can I do that too?
Since you have tagged with gcc it has the -finstrument-functions option:
Generate instrumentation calls for entry and exit to functions. ...

Is returning zero from main necessary, and how can the return value from main be useful?

I know it's been the convention in C89 to always return a 0 integer value from main in a C program, like this:
int main() {
/* do something useful here */
return 0;
}
This is to return a "successful" result to the operating system. I still consider myself a novice (or an intermediate programmer at best) in C, but to date I've never fully understood why this is important.
My guess is, this is a useful return result if you're tying the output of this program into the input of another, but I'm not sure. I've never found it useful, or maybe I just don't understand what the intention is.
My questions:
Is returning zero always necessary from a C program?
How is the return value from main() useful?
When writing scripts (like in Bash, or CMD.exe on Windows)
you can chain some commands with the && and || operators.
Canonically, a && b will run b if the result of a is zero, and a || b will run b if a returned nonzero.
This is useful if you wish to conditionally run a command if the previous one succeeded. For example, you would like to delete a file if it contains word foo. Then you will use :
grep foo myfile && rm myfile
grep returns 0 when there was a match, else nonzero.
Returning 0 is a convention. When a program returns 0, it can be assumed that it worked OK, without actually looking at what the program did (ahem :D).
As a widely used convention, that assumption is in a lot of places. As Benoit points out that's the case of the shell (UNIX and Windows) and other parts of the Operating system.
So answering your questions:
For a C program you must return
either EXIT_SUCCESS or EXIT_FAILURE.
But you can return EXIT_FAILURE even
if your program worked OK.
If you
don't return a 0 (EXIT_SUCCESS),
it's quite possible that other
programs will assume your program
failed.
There's a related question with C++ and C great responses
What should main() return in C and C++?
In modern C aka C99 (not sure for C89) all the three terminations of main are equivalent:
just ending main at the last }
return 0
exit(0)
the idea behind all this is (as others mentioned) to give a return status to the invoking program.
The return value is the "result" of the program execution, and 0 is used to indicate a successful termination, while a non-zero return value indicates a failure or unexpected termination.
The return value doesn't really matter to the system when you call you program normally, but it can have two purposes. One of them is debugging. In MSVC, the most commonly used compiler for C++, you see the programs return value after it finishes executing. This can be helpful to see "how and why" your program exited.
Another use is when your application is called from other programs, where the return value may indicate success, or pass on a result.
Hiya, like you said its mainly if the program is used as part of wider program network. Returning zero to the program environment lets the network know everything went fine (like you said) however you can also have it return 1, 2, 3... (depending on what error has occured) to let your network of code know that something has gone wrong. Equivalently, you can have exit(0) or exit(1) at the end of your 'main' program to do exactly the same thing. You may find this useful:
http://en.wikipedia.org/wiki/Exit_status
Hope that helps. Jack
You should use EXIT_SUCCESS when the program finished correctly, and EXIT_FAILURE when it didn't. EXIT_SUCCESS is zero, and zero is portable to any operating system, while EXIT_FAILURE changes from UNIX to Windows, for example. These constants are defined in the stdlib.h header.
#include <stdlib.h>
int main()
{
int toret = EXIT_SUCCESS;
if ( !( /* do something useful here */ ) ) {
toret = EXIT_FAILURE;
}
return toret;
}
The return code of the program was more useful when programs were written for the console. Nowadays, it is quite uncommon, unless you work in a very professional environment (and even this is now changing, with the workflow tools available).
As #Benoit said, the exit code tells the operating system when the operation was successful or not. If the exit code means failure, then you can break the flow of the batch program, since it is not likely to work out.
For example, a compiler can have an exit code of zero if compilation was successful, and any another value if compilation was unsuccessful. In Windows, this can be accessed through the operating system variable "errorlevel":
gcc helloworld.cpp -ohelloworld.exe
goto answer%errorlevel%
:answer0
copy helloworld.exe c:\users\username\Desktop
echo Program installed
goto end
:answer1
echo There were errors. Check your source code.
:end
echo Now exiting...
This windows batch file "installs" helloworld.exe in the Desktop when the compilation was successful. Since you can trigger execution of batch files with double-click, this can make it possible for you to avoid touching the command line for compilation.
Of course, take into account that is better managed by integrated environments (if the exit code did not exist, they wouldn't be able to work correctly). Also note that make is best in this field:
https://en.wikipedia.org/wiki/Make_(software)
Make also needs of exit codes to run correctly.
We can return any integer from main().
Now, by default main() function returns 0 if we do not explicitly mention main() function to return any specific value.
return 0 actually resembles that the program terminated successfully without any errors though its not a rule but this is the way it works.
And if we explicitly return a non-zero number then even though the program runs correctly but internally it actually means that the program terminated with some errors or program terminated with an unexpected result.This can be useful if another program is called based on our current programs return value using '&&' or '||' in our command line.
You can also check the value of return of your previously executed program on linux using this command:
echo $?

Access command line arguments without using char **argv in main

Is there any way to access the command line arguments, without using the argument to main? I need to access it in another function, and I would prefer not passing it in.
I need a solution that only necessarily works on Mac OS and Linux with GCC.
I don't know how to do it on MacOS, but I suspect the trick I will describe here can be ported to MacOS with a bit of cross-reading.
On linux you can use the so called ".init_array" section of the ELF binary, to register a function which gets called during program initilization (before main() is called). This function has the same signature as the normal main() function, execept it returns "void".
Thus, you can use this function to remember or process argc, argv[] and evp[].
Here is some code you can use:
static void my_cool_main(int argc, char* argv[], char* envp[])
{
// your code goes here
}
__attribute__((section(".init_array"))) void (* p_my_cool_main)(int,char*[],char*[]) = &my_cool_main;
PS: This code can also be put in a library, so it should fit your case.
It even works, when your prgram is run with valgrind - valgrind does not fork a new process, and this results in /proc/self/cmdline showing the original valgrind command-line.
PPS: Keep in mind that during this very early program execution many subsystem are not yet fully initialized - I tried libc I/O routines, they seem to work, but don't rely on it - even gloval variables might not yet be constructed, etc...
In Linux, you can open /proc/self/cmdline (assuming that /proc is present) and parse manually (this is only required if you need argc/argv before main() - e.g. in a global constructor - as otherwise it's better to pass them via global vars).
More solutions are available here: http://blog.linuxgamepublishing.com/2009/10/12/argv-and-argc-and-just-how-to-get-them/
Yeah, it's gross and unportable, but if you are solving practical problems you may not care.
You can copy them into global variables if you want.
I do not think you should do it as the C runtime will prepare the arguments and pass it into the main via int argc, char **argv, do not attempt to manipulate the behaviour by hacking it up as it would largely be unportable or possibly undefined behaviour!! Stick to the rules and you will have portability...no other way of doing it other than breaking it...
You can. Most platforms provide global variables __argc and __argv. But again, I support zneak's comment.
P.S. Use boost::program_options to parse them. Please do not do it any other way in C++.
Is there some reason why passing a pointer to space that is already consumed is so bad? You won't be getting any real savings out of eliminating the argument to the function in question and you could set off an interesting display of fireworks. Skirting around main()'s call stack with creative hackery usually ends up in undefined behavior, or reliance on compiler specific behavior. Both are bad for functionality and portability respectively.
Keep in mind the arguments in question are pointers to arguments, they are going to consume space no matter what you do. The convenience of an index of them is as cheap as sizeof(int), I don't see any reason not to use it.
It sounds like you are optimizing rather aggressively and prematurely, or you are stuck with having to add features into code that you really don't want to mess with. In either case, doing things conventionally will save both time and trouble.

Resources