Char* p, and scanf

Char* p, and scanf - c

I have been trying to look for a reason why the following code is failing, and I couldn't find one.
So please, excuse my ignorance and let me know what's happening here.
#include<stdio.h>
int main(void){
char* p="Hi, this is not going to work";
scanf("%s",p);
return 0;
}
As far as I understood, I created a pointer p to a contiguous area in the memory of the size 29 + 1(for the \0).
Why can't I use scanf to change the contents of that?
P.S Please correct me If I said something wrong about char*.

char* p="Hi, this is not going to work";
this does not allocate memory for you to write
this creates a String Literal which results inUndefined Behaviour every time you try to change its contents.
to use p as a buffer for your scanf do something like
char * p = malloc(sizeof(char) * 128); // 128 is an Example
OR
you could as well do:
char p[]="Hi, this is not going to work";
Which I guess is what you really wanted to do.
Keep in mind that this can still end up being UB because scanf() does not check whether the place you are using is indeed valid writable memory.
remember :
char * p is a String Literal and should not be modified
char p[] = "..." allocates enough memory to hold the String inside the "..." and may be changed (its contents I mean).
Edit :
A nice trick to avoid UB is
char * p = malloc(sizeof(char) * 128);
scanf("%126s",s);

p points to a constant literal, which may in fact reside in a read-only memory area (implementation dependent). At any rate, trying to overwrite that is undefined behaviour. I.e. it might result in nothing, or an immediate crash, or a hidden memory corruption which causes mysterious problems much later. Don't ever do that.

It is crashing because memory has not been allocated for p. Allocate memory for p and it should be ok. What you have is a constant memory area pointing to by p. When you attempt to write something in this data segment, the runtime environment will raise a trap which will lead to a crash.
Hope this answers your question

scanf() parses data entered from stdin (normally, the keyboard). I think you want sscanf().
However, the purpose of scanf() is to part a string with predefined escape sequences, which your test string doesn't have. So that makes it a little unclear exactly what you are trying to do.
Note that sscanf() takes an additional argument as the first argument, which specifies the string being parsed.

Related

C null terminator's throught char* correct handelling

This question is aimed at improving my understanding of
what I can and cannot do with pointers when allocating and freeing:
The bellow code is not meant to run, but just set up a situation for the questions bellow.
char *var1 = calloc(8,sizeof(char));
char **var2 = calloc(3,sizeof(char*));
var1 = "01234567";
var1[2] = '\0';
var1[5] = '\0';
//var1 = [0][1][\0][3][4][\0][6][7]
var2[0] = var1[0];
var2[1] = var1[3];
var2[2] = var1[6];
free(var1);
free(var2);
given the following snippet
1: is it ok to write to a location after the \0 if you know the size you allocated.
2: Can I do what I did with var2 , if it points to a block that another pointer is pointing at?
3: are the calls to free ok? or will free die due to the \0 located throughout var1.
I printed out all the variables after free, and only the ones up to the first null got freed (changed to null or other weird and normal looking characters). Is that ok?
4: Any other stuff you wish to point out that is completely wrong and should be avoided.
Thank you very much.

Ok, let's just just recap what you have done here:
char *var1 = calloc(8,sizeof(char));
char **var2 = calloc(3,sizeof(char*));
So var1 is (a pointer to) a block of 8 chars, all set to zero \0.
And var2 is (a pointer to) a block of 3 pointers, all set to NULL.
So now it's the program's memory, it can do whatever it wants with it.
To answer your questions specifically ~
It's quite normal to write characters around inside your char block. It's a common programming pattern to parse string buffers by writing a \0 after a section of text to use everyday C string operations on it, but then point to the next character after the added \0 and continue parsing.
var2 is simply a bunch of char-pointers, it can point to whatever char is necessary, it doesn't necessarily have to be at the beginning of the string.
The calls to free() are somewhat OK (except for the bug - see below). It's normal for the content of free()d blocks to be overwritten when they are returned to the stack, so they often seem to have "rubbish" characters in them if printed out afterwards.
There is some issues with the assignment of var1 ~
var1 = "01234567";
Here you are saying "var1 now points to this constant string". Your compiler may have generated a warning about about this. Firstly the code assigns a const char* to a char* (hard-coded strings are const, but C compilers will only warn about this [EDIT: this is true for C++, not C, see comment from n.m.]). And secondly, the code lost all references to the block of memory that var1 used to point to. You can now never free() this memory - it has leaked. However, at the end of the program, the free() is trying to operate on a pointer-to a block of memory (the "01234567") which was not allocated on the heap. This is BAD. Since you're exiting immediately, there's no ill-effects, but if this was in the middle of execution, the next allocation (or next 1000th!) could crash weirdly. These sorts of problems are hard to debug.
Probably what you should have done here (I'm guessing your intention though) is used a string copy:
strncpy(var1, "01234567", 8);
With that operation instead of the assignment, everything is OK. This is because the digits are stored in the memory allocated on line1.

Question 4 - what's wrong
You 'calloc' some memory and store a pointer to it in var1. Then later you execute var1 = "01234567" which stores a pointer to a literal string in var1, thus losing the calloc'd memory. I imagine you thought you were copying a string. Use strcpy or similar.
Then you write zero values into what var1 points to. Since that's a literal string, it may fail if the literal is in read-only memory. The result is undefined.
free(var1) is not going to go well with a pointer to a literal. Your code may fail or you may get heap corruption.

Pointers don't work this way.
If someone wrote
int a = 6*9;
a = 42;
you would wonder why they ever bothered to initialise a to 6*9 in the first place — and you would be right. There's no reason to. The value returned by * is simply forgotten without being used. It could be never calculated in the first place and no one would know the difference. This is exactly equivalent to
int a = 42;
Now when pointers are involved, there's some kind of evil neural pathway in our brain that tries to tell us that a sequence of statements that is exactly like the one shown above is somehow working differently. Don't trust your brain. It isn't.
char *var1 = calloc(8,sizeof(char));
var1 = "01234567";
You would wonder why they ever bothered to initialise var1 to calloc(8,sizeof(char)); in the first place — and you would be right. There's no reason to. The value returned by calloc is simply forgotten without being used. It could be never calculated in the first place and no one would know the difference. This is exactly equivalent to
char* var1 = "01234567";
... which is a problem, because you cannot modify string literals.
What you probably want is
char *var1 = calloc(8, 1); // note sizeof(char)==1, always
strncpy (var1, "01234567", 8); // note not strcpy — you would need 9 bytes for it
or some variation of that.

var1 = "01234567"; is not correct because you assign a value of pointer to const char to a pointer to mutable char and causes a memory leak because the value of pointer to a calloc allocated buffer of 8 char stored in variable var1 is lost. It seems like you actually intended to initialize allocated array with the value of the string literal instead (though that would require allocation of an array of 9 items). Assignment var1[2] = '\0'; causes undefined behavior because the location var1 points to is not mutable. var2[0] = var1[0]; is wrong as well because you assign a value of char to pointer to char. Finally free(var1); will try to deallocate a pointer to buffer baking string literal, not something you allocated.

Concatenating strings - need clarification

char * a = (char *) malloc(10);
strcpy(a,"string1");
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Here, I allocated only 10B to a, but still after concatenating a and x (combined size is 16B), C prints the answer without any problem.
But if I do this:
char * a = "string1";
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Then I get a segfault. Why is this? Why does the first one work despite lower memory allocation? Does strcat reallocate memory for me? If yes, why does the second one not work? Is it because a & x declared that way are unmodifiable string literals?

In your first example, a is allocated in the heap. So when you're concatenating the other string, something in the heap will be overwritten, but there is no write-protection.
In your second example, a points to a region of the memory that contains constants, and is readonly. Hence the seg fault.

The first one doesn't always work, it already caused an overflow. The second one, a is a pointer to the constant string which is stored in the data section, in a read-only page.

In the 2nd case what you have is a pointer to unmodifiable string literals,
In 1st case, you are printing out a heap memory location and in that case its undefined, you cannot guarantee that it will work every time.
(may be write it in a very large loop, yo may see this undefined behavior)

Your code is writing beyond the buffer that it's permitted, which causes undefined behavior. This can work and it can fail, and worse: it can look like it worked but cause seemingly unrelated failures later. The language allows you to do things like this because you're supposed to know what you're doing, but it's not recommended practice.
In your first case, of having used malloc to allocate buffers, you're actually being helped but not in a manner you should ever rely on. The malloc function allocates at least as much space as you've requested, but in practice it typically rounds up to a multiple of 16... so your malloc(10); probably got a 16 byte buffer. This is implementation specific and it's never a good idea to rely on something like that.
In your second case, it's likely that the memory pointed to by your a (and x) variable(s) is non-writable, which is why you've encountered a segfault.

How strcpy works behind the scenes?

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?

You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.

http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.

In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()

Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}

You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.

The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!

At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).

Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}

Why do I need to allocate memory?

#include<stdio.h>
#include<stdlib.h>
void main()
{
char *arr;
arr=(char *)malloc(sizeof (char)*4);
scanf("%s",arr);
printf("%s",arr);
}
In the above program, do I really need to allocate the arr?
It is giving me the result even without using the malloc.
My second doubt is ' I am expecting an error in 9th line because I think it must be
printf("%s",*arr);
or something.

do I really need to allocate the arr?
Yes, otherwise you're dereferencing an uninitialised pointer (i.e. writing to a random chunk of memory), which is undefined behaviour.

do I really need to allocate the arr?
You need to set arr to point to a block of memory you own, either by calling malloc or by setting it to point to another array. Otherwise it points to a random memory address that may or may not be accessible to you.
In C, casting the result of malloc is discouraged1; it's unnecessary, and in some cases can mask an error if you forget to include stdlib.h or otherwise don't have a prototype for malloc in scope.
I usually recommend malloc calls be written as
T *ptr = malloc(N * sizeof *ptr);
where T is whatever type you're using, and N is the number of elements of that type you want to allocate. sizeof *ptr is equivalent to sizeof (T), so if you ever change T, you won't need to duplicate that change in the malloc call itself. Just one less maintenance headache.
It is giving me the result even without using the malloc
Because you don't explicitly initialize it in the declaration, the initial value of arr is indeterminate2; it contains a random bit string that may or may not correspond to a valid, writable address. The behavior on attempting to read or write through an invalid pointer is undefined, meaning the compiler isn't obligated to warn you that you're doing something dangerous. On of the possible outcomes of undefined behavior is that your code appears to work as intended. In this case, it looks like you're accessing a random segment of memory that just happens to be writable and doesn't contain anything important.
My second doubt is ' I am expecting an error in 9th line because I think it must be printf("%s",*arr); or something.
The %s conversion specifier tells printf that the corresponding argument is of type char *, so printf("%s", arr); is correct. If you had used the %c conversion specifier, then yes, you would need to dereference arr with either the * operator or a subscript, such as printf("%c", *arr); or printf("%c", arr[i]);.
Also, unless your compiler documentation explicitly lists it as a valid signature, you should not define main as void main(); either use int main(void) or int main(int argc, char **argv) instead.
1. The cast is required in C++, since C++ doesn't allow you to assign void * values to other pointer types without an explicit cast
2. This is true for pointers declared at block scope. Pointers declared at file scope (outside of any function) or with the static keyword are implicitly initialized to NULL.

Personally, I think this a very bad example of allocating memory.
A char * will take up, in a modern OS/compiler, at least 4 bytes, and on a 64-bit machine, 8 bytes. So you use four bytes to store the location of the four bytes for your three-character string. Not only that, but malloc will have overheads, that add probably between 16 and 32 bytes to the actual allocated memory. So, we're using something like 20 to 40 bytes to store 4 bytes. That's a 5-10 times more than it actually needs.
The code also casts malloc, which is wrong in C.
And with only four bytes in the buffer, the chances of scanf overflowing is substantial.
Finally, there is no call to free to return the memory to the system.
It would be MUCH better to use:
int len;
char arr[5];
fgets(arr, sizeof(arr), stdin);
len = strlen(arr);
if (arr[len] == '\n') arr[len] = '\0';
This will not overflow the string, and only use 9 bytes of stackspace (not counting any padding...), rather than 4-8 bytes of stackspace and a good deal more on the heap. I added an extra character to the array, so that it allows for the newline. Also added code to remove the newline that fgets adds, as otherwise someone would complain about that, I'm sure.

In the above program, do I really need to allocate the arr?
You bet you do.
It is giving me the result even without using the malloc.
Sure, that's entirely possible... arr is a pointer. It points to a memory location. Before you do anything with it, it's uninitialized... so it's pointing to some random memory location. The key here is wherever it's pointing is a place your program is not guaranteed to own. That means you can just do the scanf() and at that random location that arr is pointing to the value will go, but another program can overwrite that data.
When you say malloc(X) you're telling the computer that you need X bytes of memory for your own usage that no one else can touch. Then when arr captures the data it will be there safely for your usage until you call free() (which you forgot to do in your program BTW)
This is a good example of why you should always initialize your pointers to NULL when you create them... it reminds you that you don't own what they're pointing at and you better point them to something valid before using them.
I am expecting an error in 9th line because I think it must be printf("%s",*arr)
Incorrect. scanf() wants an address, which is what arr is pointing to, that's why you don't need to do: scanf("%s", &arr). And printf's "%s" specificier wants a character array (a pointer to a string of characters) which again is what arr is, so no need to deference.

C char* pointers pointing to same location where they definitely shouldn't

I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.

I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.

What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/

Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.

Try changing retval to a character pointer and allocating your buffer using malloc().

Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.