Is strdup adding a '\0' when duplicating the array of char? [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
i would like to know if strdup adds a '\0' at the end of the new array if the src array does not contain one ?
let's assume we have an array containing "hello" allocated by this kind of malloc
malloc(sizeof(char) * 5);
so 5 bytes for 5 characters.
I guess the src string did not received the sufficient memory for the '\0'.
What is supposed to happen in this case ?

First the standard disclaimer: strdup isn't a standardized function, so exactly what it does could (at least in theory) vary from one compiler to the next. That said, every implementation I've seen of it has worked the same way.
strdup will determine the length of the input string--it'll start from the address in the pointer you pass, and find the first NUL byte after that location. Then it'll allocate a new block of memory and copy the bytes in that range to the newly allocated block.
So one of two things will happen. Either the input will contain a zero byte, and the result will too, or else strdup will read past the end of the input you passed, and you'll get undefined behavior (but chances are pretty good it'll find a zero byte eventually, and copy a bunch of extra garbage to the duplicate string).
One other minor note: if you use strdup, and then try to port you code to a compiler that doesn't define it, you might consider writing your own:
char *strdup(char const *s) {
size_t len = strlen(s) + 1;
char *ret = malloc(len);
if (ret != NULL)
strcpy(ret, s);
return ret;
}
That's obviously a pretty easy thing to do, but it has one other problem: including it in your code produces undefined behavior. You're not allowed to write a function with a name that starts with str. Those are all reserved for the implementation. So even though the function is simple and the behavior of its content is perfectly well defined, the mere existence of the function as a whole still gives undefined behavior.

What strdup() will do in this case is start with the string passed, and go on looking through memory until it either falls off the end of allocated memory (and you get a SIGSEGV or similar) or finds a byte that happens to contain a '\0'.
It will allocate enough memory to include a copy of everything it scanned, including the '\0', and then copy everything.

Related

The use of strcat in C overwrites irrelevant strings

The code is as follows:
char seg1[] = "abcdefgh";
char seg2[] = "ijklmnop";
char seg3[] = "qrstuvwx";
strcat(seg2, seg3);
Then the value stored in seg1 will become:
"rstuvwx\0\0"
I have learned to declare that strings with close positions are also adjacent in the stack area, but I forgot the details.
I guess the memory address of seg1 was overwritten when strcat() was executed, but I'm not sure about the specific process. Can someone tell me the specific process of this event?Thanks
C does not have a string class, it has character arrays which may be used as strings by appending a null terminator. And since there is no string class, all memory management of strings/arrays must be done manually.
char seg1[] = "abcdefgh"; Allocates space for exactly 8 characters and 1 null terminator. There is no room to append anything else at the end. If you try anyway, that's the realm of undefined behavior, where anything can happen. Crashes, overwriting other variables, program ceasing to function as expected and so on.
Solve this by allocating enough space to append something in the end, for example
char seg1[50] = "abcdefgh";. Alternatively allocate a new, third array and copy the strings into that one.

Different input types for fscanf [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
My understanding of fscanf:
grabs a line from a file and based on format, stores it to a string.
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
Some assumptions:
1. fp is a valid FILE pointer.
2. The file has 1 line in it that reads "Something"
A pointer with allocated memory
char* temp = malloc(sizeof(char) * 1); // points to some small part in mem.
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)
An array with predefined length (it's different from the pointer!)
char temp[100]; // this buffer MUST be big enough, or we get segmentation fault
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)
A null pointer
char* temp; // null pointer
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // Crashes, segmentation fault
So a few questions have arisen!
How can a pointer with malloc of 1 contain longer texts?
Since the pointer's content doesn't seem to matter, why does a null pointer crash? I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
Why does the pointer work, but an array (char temp[1];) crashes?
Edit:
I'm well aware that you need to pass a big enough buffer to contain the data from the line, I was wondering why it was still working and not crashing in other situations.
My understanding of fscanf:
grabs a line from a file and based on
format, stores it to a string.
No, that contains some serious and important misconceptions. fscanf() reads from a file as directed by the specified format, so as to assign values to some or all of the objects pointed-to by its third and subsequent arguments. It does not necessarily read a whole line, but on the other hand, it may read more than one.
In your particular usage,
int resp = fscanf(fp,"%s", temp);
, it attempts to skip any leading whitespace, including but not limited to empty and blank lines, then read characters into the pointed-to character array, up to the first whitespace character or the end of the file. Under no circumstance will it consume the line terminator of the line from which it populates the array contents, but it will not even get that far if there is other whitespace on the line following at least one non-whitespace character (though that is not the case in the particular sample input you describe).
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
Strings are not an actual data type in C. Arrays of chars are, but such arrays are not "strings" in the C sense unless they contain at least one null character. Furthermore, in that case, C string functions for the most part operate only on the portions of such arrays up to and including the first null, so it is those portions that are best characterized as "strings".
There is more than one way to obtain storage for character sequences that can be considered strings, but there is only one way to pass them around: by means of a pointer to their first character. Whether you obtain storage by declaring a character array, by a string literal, or by allocating memory for it, the contents are accessed only via pointers. Even when you declare a char array and access elements by applying the index operator, [], to the name of the array variable, you are actually still using a pointer to access the contents.
Why does a pointer with malloc of 1 can contain longer texts?
A pointer does not contain anything but itself. It is the space it points to that contains anything else, such as text. If you allocate only one byte, then the allocated space can contain only one byte. If you overrun that one byte by attempting to write a longer character sequence where the pointer points, then you invoke undefined behavior. In particular, C does not guarantee that an error will be generated, or that the program will fail to behave as you expect, but all manner of havoc can ensue, without limit.
Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as
well, since it points to a small piece of memory.
Attempting to dereference an invalid pointer, including, but not limited to a null pointer, also produces undefined behavior. A crash is well within the realm of possible behaviors. C does not guarantee a crash in that case, but that's reliably provided by some implementations.
Why does the pointer work, but an array(char temp[1];) crashes?
You do not demonstrate your 1-character array alternative, but again, overrunning the bounds of the object -- in this case an array -- produces undefined behavior. It is undefined so it is not justified to suppose that the behavior would be the same as for overrunning the bounds of an allocated object, or even that either one of those behaviors would be consistent.
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
For passing a C-"string" to scanf() & friends there is just one way: Pass it the address of enough valid memory.
If you don't the code would invoke the infamouse Undefined Behaviour, which means anything can happen, from crash to seemingly running fine.
Why does a pointer with malloc of 1 can contain longer texts?
In theory, it can't without causing undefined behavior. In practice, however, when you allocate a single byte, the allocator gives you a small chunk of memory of the smallest size it supports, which is usually sufficient for 8..10 characters without causing a crash. The additional memory serves as a "padding" that prevents a crash (but it is still undefined behavior).
Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
Null pointer, on the other hand, is not sufficient even for an empty string, because you need space for null terminator. Hence, it's a guaranteed UB, which manifests itself as a crash on most platforms.
Why does the pointer work, but an array(char temp[1]) crashes?
Because arrays are allocated without any extra "padding" memory after them. Note that a crash is not guaranteed, because the array may be followed by unused bytes of memory, which your string could corrupt without any consequences.
Because null pointers aren't allocated with memory.
When you request for a small piece of memory, it is allocated from a block of memory called "heap". The heap is always allocated and freed in units of blocks or pages, which will always be a little larger than a few bytes, usually several KBs.
So when you allocate memory with new or by defining an array (small), you get a piece of memory in the heap. The actually available space is larger and can (often) go over the amount you requested, so it's practically safe to write (and read) more than requested. But theoretically, it's an UB and should make the program crash.
When you create a null pointer, it points to 0, an invalid address that can't be read from or written to. So it's guaranteed that the program will crash, often by a segmentation fault.
Small arrays may crash more often than new and malloc because they aren't always allocated from heap, and may come without any extra space after them, so it's more dangerous to write over the limit. However they're often preceding unused (unallocated) memory areas, so sometimes your program may not crash, but gets corrupted data instead.

Overflow not detected when writing nul character in middle of string?

Say I have the code:
char* word = malloc (sizeof(char) * 6);
strcpy(word, "hello\0extra");
puts(word);
free(word);
This compiles just find and Valgrind has no issue, but is there actually a problem? It seems like I am writing into memory that I don't own.
Also, a separate issue, but when I do overfill my buffer with something like
char* word = malloc (sizeof(char) * 6);
strcpy(word, "1234567\0");
puts(word);
free(word);
It prints out 1234567 and Valgrind does catch the problem. What are the consequences of doing something like this? It seems to work every time. Please correct me if this is wrong, but from what I understand, it is possible for another program to take the memory past the 6 and write into it. If that happened, will printing the word just go on forever until it encounter a nul character? That character has just been really confusing for me in learning C strings.
The first strcpy is okay
strcpy(word, "hello\0extra");
You create a char array constant and pass the pointer to strcpy. All characters (including the first \0) is copied, the remainder is ignored.
But wait... You have some extra characters. This makes your const data section a bit larger. Could be a problem in embedded environment where flash space is rare. But there is no run-time problem.
strcpy(word, "hello\0extra");
This is valid because the second paramter should be a well formed string and it is because you have a \0 as your 6th character which forms a string of length 5.
strcpy(word, "1234567\0");
Here you are accessing memory which you don't own/allocated so this is an access violation and might cause crash.(seg fault)
With your first call to strcpy, NUL is inserted into the middle of the string. That means that functions that deal with null-terminated strings will think of your string as stopping with the first NUL, and the rest of your string is ignored. However, free will free all of it and valgrind will not report a problem because malloc will store the length of the buffer in the allocation table and free will use that entry to determine how many bytes to free. In other words, malloc and free are not meant to deal with null-terminated strings, so the NUL in the middle of the string will not affect them. Instead, free determines the length of the string based on how many bytes you allocated in the first place.
With the second example, you overflow the end of the buffer that was allocated by malloc. The results of that are undefined. In theory, that memory that you are writing to could have been allocated by another call to malloc, but in your example, nothing is done with the memory after your buffer, so it is harmless. The string-processing functions think of your string as ending with the first NUL, not with the end of the buffer allocated by malloc, so all of the string is printed out.
Your first question has a couple good answers already. About your second question, on the consequences of writing one byte past the end of your malloced memory:
It's doubtful that mallocing 6 bytes and writing 7 into it will cause a crash. malloc likes to align memory on certain boundaries, so it's not likely to give you six bytes right at the end of a page, such that there would be an access violation at byte 7. But if you malloc 65536 bytes and try to write past the end of that, your program might crash. Writing to invalid memory works a lot of the time, which makes debugging tricky, because you get random crashes only in certain situations.

How does the program identifiy the end of a array pointed by a pointer and created dynamically? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Would you like to help me to understand the mechanism of "pointers" in C:
How does the program identify the end of an array, which was dynamically allocated and pointed at by a pointer (example: montext1)? Where are these arrays stored in RAM (probably not in data, not in stack, perhaps in the heap)?
A pointer is defined by a type and a size: how is this implemented in RAM for a dynamic allocation like in the example below?
#include <stdio.h>
char * gettext()
{
char *text;
printf("Text:");
scanf("%s", &text);
printf("\n");
return text;
}
int main()
{
char *montext1 = gettext();
char *montext2 = gettext();
}
Your program is very wrong, and has undefined behavior. So it's not a very good starting point for discussion.
There is no "dynamic allocation" in your program, only chaotic overwriting of random memory.
It should use heap allocation, i.e.:
char * gettext(void)
{
char *s;
printf("Text:");
fflush(stdout);
if((s = malloc(256)) != NULL)
{
if(fgets(s, 256, stdin) == NULL)
{
free(s);
s = NULL;
}
}
return s;
}
The caller must free() the returned string, and check for NULL before printing it.
The problem here is the CPU does not know the end of the data pointed to by a pointer. For the CPU it's just raw bytes in the memory which can be either application bytes or data entered by the user. However the compiled C code via the C std library know the string (char*) is supposed to end with a zero byte. That's how it knows where the end is.
But, in the gettext method: you need to allocate some memory too, via malloc (calloc), because in it's current stage your application is writing into memory which is not owned by it. And of course, the caller of gettext needs to free the memory.
And finally: a pointer is just an address in the memory, it points to some bytes. It is the role of the application to interpret those bytes in the proper way, such as identify zero terminated strings.
How does the program identify the end of an array, which was dynamically allocated and pointed at by a pointer
This is handled internally by the dynamic memory allocation library routines and it is handled differently on every implementation. The actual code could be in stdlib or it could be in an OS API. So how it is done depends on compiler and OS both.
Where are these arrays stored in RAM
If they were dynamically allocated, they were stored on the heap.
A pointer is defined by a type and a size
No, a pointer is a type, end of story.
how is this implemented in RAM for a dynamic allocation like in the example below
You linked no example containing dynamic allocation. The code you posted is nonsense code. It attempts to copy data into the address where you allocated a pointer. This doesn't make any sense, so the program will crash and burn.
If you rewrite the program so that scanf("%s", &text); is replaced by scanf("%s", text);, then you attempt to copy data into an uninitialized pointer's address, which is random. This is undefined behavior and will also cause your program to crash and burn.

How strcpy works behind the scenes?

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}

Resources