input scanner that takes any type - c

i am trying to write an easier version of scanf. i want to basically be able for a pointer to be assigned whatever was scanned on user input like this:
int *p = (int) w_insc();
so here is my implementation:
void *w_insc()
{
void *temp = 0;
scanf("???", &temp);
return &temp;
}
i am confused as what to assign the format parameter in scanf to. i also think returning an address of a variable that will soon be destroyed is not right so i thought of doing this:
int *p = 0;
p = (int) w_insc((int) p);
can someone help

You are right that returning a pointer to a soon-to-be-destroyed variable is not correct. You can solve this problem by returning a pointer to a memory region allocated with malloc (although the caller must remember to free this memory), or by taking a pointer as the argument to w_insc, and then filling in the pointer with the returned value.
However, there are much broader issues with trying to implement the w_insc function. There is no way for w_insc to know what the caller expects. Just because the caller casts the return value of w_insc to int doesn't allow w_insc to know that it should return an int. The only information that a C function has available to it is its set of parameters, plus any global variables in the program (and global variables are usually the wrong way to solve your problem). Note that a C function has no way of knowing what the caller will do with its return value. As a result, there is no way to write w_insc to take no parameters and return something scanned correctly based on some cast that the caller makes.
You could add a parameter to w_insc, making the declaration into
int w_insc(const char *format, ...)
This allows the caller to pass in a format string and a series of arguments detailing what they expect to get out of standard input. However, with this implementation you have just wrapped scanf with ... another function that looks just like scanf.
My advice: use the standard library functions, since they are standardized, and someone reading your code will know instantly what it means, rather than having to read through the definition of a nearly-trivial wrapper like the w_insc you have described.

Related

How do you use scanf to get an int in C?

I'm trying to learn the benefits and shortcomings of different ways to get input from the console. I'm confused with scanf. Why do I need to use use &favNumber instead of favNumber?
I understand that &favNumber is the address location of favNumber, but why is it done this way?
I feel like there's a type mismatch here where favNumber is an int and I'm telling scanf that it's a pointer to an int. I thought I wrapped my head around pointers but this is confusing me a bit. Any help would be appreciated. Thanks!
#include <stdio.h>
int main()
{
char userPrompt[100] = "What is your favorite number?";
int favNumber;
printf("%s", userPrompt);
scanf("%d", &favNumber);
printf("%d", favNumber);
return 0;
}
When you call a function by value, the function gets a copy of the argument. Any changes to the argument in the function does not affect the value of the original variable.
void foo(int i )
{
i = 20; // The change is local to the function.
}
void bar()
{
int i = 10;
foo(i);
printf("i=%d\n", i); // i is still 10.
}
If you want a function to change the value of a variable, the function must use a pointer type and the calling function must use the address of the variable when calling the function.
void foo(int* i )
{
*i = 20; // The change is visible in the calling function
}
void bar()
{
int i = 10;
foo(&i);
printf("i=%d\n", i); // i is now 20.
}
This is why scanf expects pointers and the calling functions must use the address of variables when calling scanf. scanf must be able to set the values of the variables.
An & sign is used to reference a value by its memory address. So when a reference is passed around the use of that reference modifies the value at the address the reference holds.
scanf is basically just a function, if you are familiar with functions you will see that a parameter passed in to a function by value will be local to the function and any assignment to it will only change its value within the function(which does not answer the need of storing the "scanned" value in the passed variable). In case of scanf it accepts a reference(in other words the location in memory of that value) so it can modify the value at that location and the "scaned" value can be stored in the variable of interest.
So to wrap it up what &favNumber does is passing to scanf the memory address of favNumber variable which in it self is an int, so an int value is then written to that address and can be accessed by favNumber.
"How do you use scanf to get an int in C?"
– you don't. You use a saner approach (e.g. fgets() + strtol()), because scanf() is quirky and clumsy and hard to use correctly.
However, your question is apparently not about this; you are asking why you have to write scanf("%d", &favNumber); when &favNumber is an int * but %d specifies an int.
Well, you seem to be confusing type safety/type mismatches with arbitrary denotation of types expected by library functions.
&favNumber is indeed a pointer to int. However, the %d specifier does NOT mean that "you must pass an int for this argument". %d is a library-defined notation that tells scanf() to scan an integer, and put it into the next argument. In order scanf() to be able to modify your argument, you need to pass a pointer to it, and indeed this function expects that you pass a pointer to it.
I could put it this way: "%d" simply means something different when used with printf() and scanf(): in the former case, it means you pass an int argument, in the latter case, it means you should pass an int *.
Again, that is the case because these format strings have no inherent semantics. It's the formatted input/output functions that interpret them – in this case, they interpret format strings differently for technical necessity reasons.
Alright so what I believe you're confusion is coming to is the fact that '&' denotes an address, not a pointer to an address as '*' denotes but an address itself. You re telling the scan function where it will store the value that is received from the user.
If you were to reference the variable itself ie 'favNumber' how would you know where to store the value that you've placed into stdin? favNumber is just a container, it's nothing special but just a place in memory that is allocated to hold said amount of bytes. I feel as if I understand where your question is coming from, but if you've already encountered pointers, I think you may be confusing the two. A pointer points to an address in memory, the '&' denotes the actual address, and does roughly what a pointer would do, but to a non pointer variable.
If favNumber were a 'int *' type then you would not need the ampersand, as that is already an address, but you would need to dereference that address to be able to tell what is within it. That is roughly what you have within favNumber, a dereferenced address pointer that shows what is stored in the address of favNumber, that is allocated at the beginning of your program being run, in the stack.

Getting return value from a function in C

consider the the two functions :
int add1(int x,int y)
{
return x+y;
}
void add2(int x,int y,int *sum)
{
*sum=x+y;
}
I generally use functions of the form add1 but I found some codes using functions of the form add2.
Even if the size return value is large(like an array or struct) we can just return its ponter
I wonder if there any reason for using the second form?
There's also the reason of returning success state.
There are a lot of functions like:
bool f(int arg1, int arg2, int *ret)
{
}
Where bool (or enum) return the success of the function. Instead of checking if ret is null... (And if you had more than 1 variable).
If you want to return two values from your function, then C is helpless unless you use pointers just like your function add2.
void add2()
{
/* Some Code */
*ptr1=Something;
*ptr2=Something;
}
Form 2 is very common for "multiple returns" in C. A canonical example is returning the address to a buffer and the length of the buffer:
/* Returns a buffer based on param. Returns -1 on failure, or 0 on success.
Buffer is returned in buf and buflen. */
int get_buffer(void *param, char **buf, int *buflen);
Functions of the form 2 are not faster than functions of the form 1 when you're using things as small as int. In fact, in this case, the second one is slower because you have to dereference the passed pointer. It's only useful in this case if your aim was to pass in an array of values)
Always use functions of the form 1 unless you want to pass in a very large piece of data to the function. In that case, the form 2 would be faster.
The reason we use the second form is because for large objects, we want to avoid copying them. Instead of copying them, we could just pass their memory addresses to the function. This is where the pointer comes in. So instead of giving the function all the data, you would just tell it where this data. (I hope this analogy is good enough)
It is largely a matter of preference and local conventions. The second form might be used alongside a bunch of other similar functions where the third parameter in each of them is always passed as a pointer to a return value.
Personally, I like the first form for almost all purposes: it does not require a pointer to be passed, and it allows some type flexibility in handling the return value.
Returning a value by writing to memory passed via a pointer is reasonable, when the returned object is large, or when the return value of the function is used for other purposes (e.g. signaling error conditions). In the code you have shown, neither of these two is the case, so I'd go for the first implementation.
When you return a pointer from a function, you have to make sure that the pointed to memory is valid after the function call. This means, the pointer must point to the heap, making an allocation on the heap necessary. This puts a burdon on the caller; he has to deallocate memory that he did not explicitly allocate.

Why would I pass function parameters by value in C?

I am dusting off my C skills working on some C libraries of mine. After having put together a first working implementation I am now going over the code to make it more efficient. Currently I am on the topic of passing function parameters by reference or value.
My question is, why would I ever pass any function parameter by value in C? The code might look cleaner, but wouldn't it always be less efficient than passing by reference?
Because it's not as important to code for the computer as it is to code for the next human being. If you are passing references around then any reader must assume that any called function could change the value of his parameters and would be obligated to check it or copy the parameter before calling.
Your function signature is a contract and divides your code up so that you don't have to fit the entire code base into your head in order to comprehend what is going on in some area, by passing references you are making the next guy's life worse, your biggest job as a programmer should be making the next guy's life better--because the next guy will probably be you.
In C, all arguments are passed by value. A true pass by reference is when you see the effect of a modification without any explicit indirection at all:
void f(int c, int *p) {
c++; // in C you can't change the original paramenter passed like this
p++; // or this
}
Using values instead of pointers though, is frequently desirable:
int sum(int a, int b) {
return a + b;
}
You would not write this like:
int sum(int *a, int *b) {
return *a + *b;
}
Because it is not safe and it is inefficient. Inefficient because there is an additional indirection. Moreover, in C, a pointer argument suggests the caller that the value will be modified through the pointer (especially true when the pointed type has a size less than or equal to the pointer itself).
Please refer to Passing by reference in C. Pass by reference is a misnomer in C. It refers to passing the address of a variable instead of the variable, but you are passing a pointer to the variable by value.
That said, if you were to pass the variable as a pointer, then yes it would be marginally more efficient, but the main reason is to be able to modify the original variable it points to. If you don't want to be able to do this, it is recommended you take it by value to make your intent clear.
Of course, all this is moot in terms of one of Cs heavier data structures. Arrays are passed by a pointer to their first variable whether you like it or not.
Two reasons:
Often times you will have to dereference the pointer you've passed in many times (think a long for-loop). You don't want to dereference every single time you want to look up the value at that address. Direct access is faster.
Sometimes you want to modify the passed-in value inside you function, but not in the caller. Example:
void foo( int count ){
while (count>0){
printf("%d\n",count);
count--;
}
}
If you wanted to do the above with something passed by reference, you would haev to create yet another variable inside your function to store it first.

C char* pointers pointing to same location where they definitely shouldn't

I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.

Function format in a C program

I'm writing some functions that manipulate strings in C and return extracts from the string.
What are your thoughts on good styles for returning values from the functions.
Referring to Steve McConnell's Code Complete (section 5.8 in 1993 edition) he suggests I use
the following format:
void my_function ( char *p_in_string, char *p_out_string, int *status )
The alternatives I'm considering are:
Return the result of the function (option 2) using:
char* my_function ( char *p_in_string, int *status )
Return the status of the function (option 3) using:
int my_function ( char *p_in_string, char *p_out_string )
In option 2 above I would be returning the address of a local variable from my_function but my calling function would be using the value immediately so I consider this to be OK and assume the memory location has not been reused (correct me on this if I'm wrong).
Is this down to personal style and preference or should I be considering other issues ?
Option 3 is pretty much the unspoken(?) industry standard. If a IO-based C function that returns an integer, returns a non-zero integer value, it almost always means that the IO operation failed. You might want to refer to this Wikibook's section on return values in C/C++.
The reason that people use 0 for success is because there is only one condition of success. Then if it returns non-zero, you look up somehow what the non-zero value means in terms of errors. Perhaps a 1 means it couldn't allocate memory, 2 means the argument was invalid, 3 means there was some kind of IO error, for instance. Technically, typically you wouldn't return 1, but you'd return XXX_ERR_COULD_NOT_MALLOC or something like that.
Also, never return addresses of local variables. Unless you personally malloced it, there are no guarantees about that variable's address after you return from the function. Read the link for more info.
In option 2 above I would be returning
the address of a local variable from
my_function but my calling function
would be using the value immediately
so I consider this to be OK and assume
the memory location has not been
reused (correct me on this if I'm
wrong).
I'm sorry but you're wrong, go with Steve McConnell's method, or the last method (by the way on the first method, "int status" should be "int* status".
You're forgiven for thinking you'd be right, and it could work for the first 99,999 times you run the program, but the 100,000th time is the kicker. In a multi-threaded or even on multi process architecture you can't rely that someone or something hasn't taken that segment of memory and used it before you get to it.
Better to be safe than sorry.
The second option is problematic because you have to get memory for the result string, so you either use a static buffer (which possibly causes several problems) or you allocate memory, which in turn can easily cause memory leaks since the calling function has the responsibility to free it after use, something that is easily forgotten.
There is also option 4,
char* my_function ( char *p_in_string, char* p_out_string )
which simply returns p_out_string for convenience.
a safer way would be:
int my_function(const char* p_in_string, char* p_out_string, unsigned int max_out_length);
the function would return status, so that it's check-able immediately like in
if( my_function(....) )
and the caller would allocate the memory for the output, because
the caller will have to free it and it's best done at the same level
the caller will know how it handles memory allocation in general, not the function
void my_function ( char *p_in_string, char *p_out_string, int *status )
char* my_function ( char *p_in_string, int *status )
int my_function ( char *p_in_string, char *p_out_string )
In all cases, the input string should be const, unless my_function is explicitly being given permission to write - for example - temporary terminating zero's or markers into the input string.
The second form is only valid if my_function calls "malloc" or some variant to allocate the buffer. Its not safe in any c/c++ implementation to return pointers to local / stack scoped variables. Of course, when my_function calls malloc itself, there is a question of how the allocated buffer is free'd.
In some cases, the caller is given the responsibility for releasing the buffer - by calling free(), or, to allow different layers to use different allocators, via a my_free_buffer(void*) that you publish. A further frequent pattern is to return a pointer to a static buffer maintained by my_function - with the proviso that the caller should not expect the buffer to remain valid after the next call to my_function.
In all the cases where a pointer to an output buffer is passed in, it should be paired with the size of the buffer.
The form I most prefer is
int my_function(char const* pInput, char* pOutput,int cchOutput);
This returns 0 on failure, or the number of characters copied into pOutput on success with cchOutput being the size of pOutput to prevent my_function overruning the pOutput buffer. If pOutput is NULL, then it returns the number of characters that pOutput needs to be exactly. Including the space for a null terminator of course.
// This is one easy way to call my_function if you know the output is <1024 characters
char szFixed[1024];
int cch1 = my_function(pInput,szFixed,sizeof(szFixed)/sizeof(char));
// Otherwise you can call it like this in two passes to find out how much to alloc
int cch2 = my_function(pInput,NULL,0);
char* pBuf = malloc(cch2);
my_function(pInput,pBuf,cch2);
2nd Style:
Don't assume that memory will not be used. There can be threads that may eat up that memory and you are left with nothing but never-ending garbage.
I prefer option 3. This is so I can do error checking for the function inline, i.e. in if statements. Also, it gives me the scope to add an additional parameter for string length, should that be needed.
int my_function(char *p_in_string, char **p_out_string, int *p_out_string_len)
Regarding your option 2:
If you return a pointer to a local variable, that has been allocated on the stack, the behavior is undefined.
If you return a pointer some piece of memory you allocated yourself (malloc, calloc, ...), this would be safe (but ugly, as you might forget free()).
I vote for option 3:
It allows you to manage memory outside of my_function(...) and you can also return some status code.
I would say option 3 is the best to avoid memory management issues. You can also do error checking using the status integer.
There's also a point to consider if your function is time critical. On most architecture, it's faster to use the return value, than to use the reference pointer.
I had the case when using the function return value I could avoid memory accesses in an inner loop, but using the parameter pointer, the value was always written out to memory (the compiler doesn't know if the value will be accessed via another pointer somewhere else).
With some compiler you can even apply attributes to the return value, that can't be expressed on pointers.
With a function like strlen, for instance, some compiler know that between to calls of strlen, if the pointer wasn't changed, that the same value will be returned and thus avoid to recall the function.
In Gnu-C you can give the attribute pure or even const to the return value (when appropriate), thing which is impossible with a reference parameter.

Resources