Why am I able to copy more bytes than defined in the char array? [duplicate]

Why am I able to copy more bytes than defined in the char array? [duplicate] - c

This question already has answers here:
Why doesn't my program crash when I write past the end of an array?
(9 answers)
Closed 3 years ago.
I have the following code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char buffer[2];
strcpy(buffer, "12345678910");
printf("%s\n", buffer);
return 0;
}
Since, I have already defined the char array with size 2, I shouldn't be able to put in more than 2 char plus null terminating character. Yet, it is able to take more than that without any buffer overflows or segmentation faults. Even if I copy the string strcpy(buffer, "123456789101298362936129736129369182");, it works fine. The error is generated when I push strcpy(buffer, "1234567891012983629361297361293691823691823869182632918263918");.
More of a theroetical question than a practical, but I hope it helps the new and the experienced programmers alike since it talks about the fundamentals, and helps improving coding ethics. Thanks in advance.

The simple answer is that C does not protect you from yourself. It's YOUR responsibility to check boundaries. The program will happily read and write wherever you instruct it to. However, the operating system may say something if you do this, which is usually a "segmentation fault". A worse scenario is that it may overwrite other variables.
This is a source of many bugs in C, so be careful. Whenever you're writing outside outside a buffer, you're invoking undefined behavior and these can manifest themselves in various ways, including the program working as it should, overwriting variables and segmentation faults.
I shouldn't be able to put in more than 2 char plus null terminating character
This is a common bug. It's NOT "plus null terminating character". It's INCLUDING null terminating character.

Related

Strings and Dynamic allocation in C [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Undefined, unspecified and implementation-defined behavior
This should seg fault. Why doesn't it.
#include <string.h>
#include <stdio.h>
char str1[] = "Sample string. Sample string. Sample string. Sample string. Sample string. ";
char str2[2];
int main ()
{
strcpy (str2,str1);
printf("%s\n", str2);
return 0;
}
I am using gcc version 4.4.3 with the following command:
gcc -std=c99 testString.c -o test
I also tried setting optimisation to o (-O0).

This should seg fault
There's no reason it "should" segfault. The behaviour of the code is undefined. This does not mean it necessarily has to crash.

A segmentation fault only occurs when you perform an access to memory the operating system knows you're not supposed to.
So, what's likely going on, is that the OS allocates memory in pages (which are typically around 4KiB). str2 is probably on the same page as str1, and you're not running off the end of the page, so the OS doesn't notice.
That's the thing about undefined behavior. Anything can happen. Right now, that program actually "works" on your machine. Tomorrow, str2 may be put at the end of a page, and then segfault. Or possibly, you'll overwrite something else in memory, and have completely unpredictable results.
edit: how to cause a segfault:
Two ways. One is still undefined behavior, the other is not.
int main() {
*((volatile char *)0) = 42; /* undefined behavior, but normally segfaults */
}
Or to do it in a defined way:
#include <signal.h>
int main() {
raise(SIGSEGV); /* segfault using defined behavior */
}
edit: third and fourth way to segfault
Here is a variation of the first method using strcpy:
#include <string.h>
const char src[] = "hello, world";
int main() {
strcpy(0, src); /* undefined */
}
And this variant only crashes for me with -O0:
#include <string.h>
const char src[] = "hello, world";
int main() {
char too_short[1];
strcpy(too_short, src); /* smashes stack; undefined */
}

Your program writes beyond the allocated bounds of the array, this results in Undefined Behavior.
The program is ill-formed and It might crash or may not.An explanation may or may not be possible.
It probably doesn't crash because it overwrites some memory beyond the array bounds which is not being used, bt it will once the rightful owner of that memory tries to access it.

A seg-fault is NOT guaranteed behavior.It is one possible (and sometimes likely) outcome of doing something bad.Another possible outcome is that it works by pure luck.A third possible outcome is nasal demons.

if you really want to find out what this might be corrupting i would suggest you see what follows the over-written memory generate a linker map file that should give you a fair idea but then again this all depends on how things are layed out in memory, even can try running this with gdb to reason why it does or does not segfault, that being said, the granularity for built checks in access violations (HW assisted) cannot be finer than a page unless some software magic is thrown in (even with this page granularity access checking it may happen that the immediately next page does really point to something else for the program which you are executing and that it is a Writable page), someone who knows about valgrind can explain how it is able to detect such access violations (also libefence), most likely (i might be very wrong with this explanation, Correct me if i am wrong!) it uses some form of markers or comparisons for checking if out of bounds access has happened.

Confusion in "strcat function in C assumes the destination string is large enough to hold contents of source string and its own."

So I read that strcat function is to be used carefully as the destination string should be large enough to hold contents of its own and source string. And it was true for the following program that I wrote:
#include <stdio.h>
#include <string.h>
int main(){
char *src, *dest;
printf("Enter Source String : ");
fgets(src, 10, stdin);
printf("Enter destination String : ");
fgets(dest, 20, stdin);
strcat(dest, src);
printf("Concatenated string is %s", dest);
return 0;
}
But not true for the one that I wrote here:
#include <stdio.h>
#include <string.h>
int main(){
char src[11] = "Hello ABC";
char dest[15] = "Hello DEFGIJK";
strcat(dest, src);
printf("concatenated string %s", dest);
getchar();
return 0;
}
This program ends up adding both without considering that destination string is not large enough. Why is it so?

The strcat function has no way of knowing exactly how long the destination buffer is, so it assumes that the buffer passed to it is large enough. If it's not, you invoke undefined behavior by writing past the end of the buffer. That's what's happening in the second piece of code.
The first piece of code is also invalid because both src and dest are uninitialized pointers. When you pass them to fgets, it reads whatever garbage value they contain, treats it as a valid address, then tries to write values to that invalid address. This is also undefined behavior.
One of the things that makes C fast is that it doesn't check to make sure you follow the rules. It just tells you the rules and assumes that you follow them, and if you don't bad things may or may not happen. In your particular case it appeared to work but there's no guarantee of that.
For example, when I ran your second piece of code it also appeared to work. But if I changed it to this:
#include <stdio.h>
#include <string.h>
int main(){
char dest[15] = "Hello DEFGIJK";
strcat(dest, "Hello ABC XXXXXXXXXX");
printf("concatenated string %s", dest);
return 0;
}
The program crashes.

I think your confusion is not actually about the definition of strcat. Your real confusion is that you assumed that the C compiler would enforce all the "rules". That assumption is quite false.
Yes, the first argument to strcat must be a pointer to memory sufficient to store the concatenated result. In both of your programs, that requirement is violated. You may be getting the impression, from the lack of error messages in either program, that perhaps the rule isn't what you thought it was, that somehow it's valid to call strcat even when the first argument is not a pointer to enough memory. But no, that's not the case: calling strcat when there's not enough memory is definitely wrong. The fact that there were no error messages, or that one or both programs appeared to "work", proves nothing.
Here's an analogy. (You may even have had this experience when you were a child.) Suppose your mother tells you not to run across the street, because you might get hit by a car. Suppose you run across the street anyway, and do not get hit by a car. Do you conclude that your mother's advice was incorrect? Is this a valid conclusion?
In summary, what you read was correct: strcat must be used carefully. But let's rephrase that: you must be careful when calling strcat. If you're not careful, all sorts of things can go wrong, without any warning. In fact, many style guides recommend not using functions such as strcat at all, because they're so easy to misuse if you're careless. (Functions such as strcat can be used perfectly safely as long as you're careful -- but of course not all programmers are sufficiently careful.)

The strcat() function is indeed to be used carefully because it doesn't protect you from anything. If the source string isn't NULL-terminated, the destination string isn't NULL-terminated, or the destination string doesn't have enough space, strcat will still copy data. Therefore, it is easy to overwrite data you didn't mean to overwrite. It is your responsibility to make sure you have enough space. Using strncat() instead of strcat will also give you some extra safety.
Edit Here's an example:
#include <stdio.h>
#include <string.h>
int main()
{
char s1[16] = {0};
char s2[16] = {0};
strcpy(s2, "0123456789abcdefOOPS WAY TOO LONG");
/* ^^^ purposefully copy too much data into s2 */
printf("-%s-\n",s1);
return 0;
}
I never assigned to s1, so the output should ideally be --. However, because of how the compiler happened to arrange s1 and s2 in memory, the output I actually got was -OOPS WAY TOO LONG-. The strcpy(s2,...) overwrote the contents of s1 as well.
On gcc, -Wall or -Wstringop-overflow will help you detect situations like this one, where the compiler knows the size of the source string. However, in general, the compiler can't know how big your data will be. Therefore, you have to write code that makes sure you don't copy more than you have room for.

Both snippets invoke undefined behavior - the first because src and dest are not initialized to point anywhere meaningful, and the second because you are writing past the end of the array.
C does not enforce any kind of bounds checking on array accesses - you won't get an "Index out of range" exception if you try to write past the end of an array. You may get a runtime error if you try to access past a page boundary or clobber something important like the frame pointer, but otherwise you just risk corrupting data in your program.
Yes, you are responsible for making sure the target buffer is large enough for the final string. Otherwise the results are unpredictable.

I'd like to point out what is actually happening in the 2nd program in order to illustrate the problem.
It allocates 15 bytes at the memory location starting at dest and copies 14 bytes into it (including the null terminator):
char dest[15] = "Hello DEFGIJK";
...and 11 bytes at src with 10 bytes copied into it:
char src[11] = "Hello ABC";
The strcat() call then copies 10 bytes (9 chars plus the null terminator) from src into dest, starting right after the 'K' in dest. The resulting string at dest will be 23 bytes long including the null terminator. The problem is, you allocated only 15 bytes at dest, and the memory adjacent to that memory will be overwritten, i.e. corrupted, leading to program instability, wrong results, data corruption, etc.
Note that the strcat() function knows nothing about the amount of memory you've allocated at dest (or src, for that matter). It is up to you to make sure you've allocated enough memory at dest to prevent memory corruption.
By the way, the first program doesn't allocate memory at dest or src at all, so your calls to fgets() are corrupting memory starting at those locations.

What does it matter what argument I give malloc? [duplicate]

This question already has answers here:
I can use more memory than how much I've allocated with malloc(), why?
(17 answers)
Closed 6 years ago.
C noob here. What does it matter what argument I give malloc when I can pass whatever size string to it later?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *str;
str = malloc(1*sizeof(char));
strcpy(str, "abcd");
printf(str);
printf("\n");
return 0;
}
This works fine. I would have thought I wouldn't be able to store more than 1 char in str from my understanding of what malloc is supposed to be.

malloc can end up actually allocating more than expected to maintain alignment/simplify the allocator.
What you're doing is undefined behavior, and among other things, "undefined" can mean "works, sometimes". Don't do this though, because the other options are not nearly so good. Some of the time, it will crash. Some of the time, it will appear to work, but it turns out you corrupted the heap, and at some later point, using or freeing some completely different allocation, you'll get "inexplicable" data or heap corruption related errors that aren't tied to the overflow in any obvious way.
It's a terrible idea, never rely on having even one byte more than you requested.

How does a program shut down when reading farther than memory allocated to an array?

Good evening everybody, I am learning C++ on Dev C++ 5.9.2, I am really novice at it. I intentionnally make my programs crash to get a better understanding of bugs. I've just learned that we can pass a char string to a function by initializing a pointer with the address of the array and that was the only way to do it. Therefore we should always pass to the function the size of that string to handle it properly. It also means that any procedure can run with a wrong size passed in the argument line hence I supposed we could read farther than the allocated memory assigned to the string.
But how far can we do it? I've tested several integers and apparently it works fine below 300 bytes but it doesn't for above 1000 (the program displays characters but end up to crash). So my questions are :
How far can we read or write on the string out of its memory range?
Is it called an overflow?
How does the program detect that the procedure is doing something unlegit?
Is it, the console or the code behind 'cout', that conditions the shutting down of the program?
What is the condition for the program to stop?
Does the limit depend on the console or the OS?
I hope my questions don't sound too trivial. Thank you for any answer. Good day.
#include <iostream>
using namespace std;
void change(char str[])
{
str[0] = 'C';
}
void display(char str[], int lim)
{
for(int i = 0; i < lim; i++) cout << str[i];
}
int main ()
{
char mystr[] = "Hello.";
change(mystr);
display(mystr, 300);
system("PAUSE");
return 0;
}

The behavior when you read past the end of an array is undefined. The compiler is free to implement the read operation in whatever way works correctly when you don't read beyond the end of the buffer, and then if you do read too far - well, whatever happens is what happens.
So there are no definite answers to most of your questions. The program could crash as soon as you read 1 byte too far, or you could get random data as you read several megabytes and never crash. If you crash, it could be for any number of reasons - though it likely will have something to do with how memory is managed by the OS.
As an aside, the normal way to let a function know where a string ends is to end it with a null character rather than passing a separate length value.

sprintf() prints string to (m)allocated array, although it should be to small

I've been searching the net including stackoverflow for hours and didn't find any answer which suits my problem - maybe because it's not a real problem since the program works...but it shouldn't. Sounds strange? It is - at least to me.
It's part of a task for university. The task is to allocate memory for a char array, then print a string to the array using sprintf() and finally printing the array with printf(). For memory allocation malloc() is to be used (I know there a better ways, but we have to use exactly these functions).
That's what I've got:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
// Declare char array
char *string;
/* Allocate memory for string of certain length
The 1 is only to show what's wrong. I'm aware of the actual size needed */
if( (string = malloc( 1 * sizeof(char) ) ) == NULL )
{
perror("malloc failed to allocate n chars!");
exit(1);
}
/* Print string to previously allocated memory. Now I would expect an error due to too few bytes allocated */
sprintf(string, "Too many characters here...");
// Print string to command line
printf("%s\n", string);
return 0;
}
So far it works: It compile without any notice using gcc -Wall -std=c99 as well on Ubuntu as on Mac OSX.
BUT
The problem is that it shouldn't. As you might have noticed I allocated to few bytes for the string I am writing to the array. Still it works, no matter how long the string is (tried up to 1000 chars) or how many bytes I allocate.
Wouldn't care about it, if the university's automated testing unit wouldn't mark it as wrong. It says the program is not reading from the allocated array. That's why I assume, that sprintf puts the string anywhere but in the allocated array. But I can't explain how this could be possible.
I would be deeply grateful if you guys know what I'm doing wrong.
Thanks in advance!
------ UPDATE ------
As Mike pointed out I'm not using free(string) in this snippet (thanks the hint!). In the actual program I placed free(string) after the printf(). But when I try to print string after that statement again -> it's printed as if nothing happened! How is that possible?

The problem is with the assertion "The problem is that it shouldn't." or "expect an error".
/* Now I would expect an error due to too few bytes allocated */
sprintf(string, "Too many characters here...");
When code does something is should not, like writing beyond allocated memory space, C does not defined what should happen. Therefore it is Undefined Behavior (UB). To expect an error requires a defined behavior on C's part.
UB means anything may happen. The code is not required to check and complain that an attempt to access outside allocated memory occurred.
C provides you with lots of rope fro code to do all sorts of things quickly
- including enough rope for code to hang itself.
Given that sprintf() is prone to writing out of bounds, code could have used snprintf() and checked its results. snprintf() will not over-write the given size of the buffer.
char *string;
size_t size = 1; // or whatever
string = malloc(size);
...
int n = snprintf(string, size, "Too many characters here...");
if (n < 0 || n >= size) return Error_code;
...
printf("%s\n", string);

Overwriting the end of a malloc'd array is likely to mess things up, but exactly what gets messed up is a matter of chance. That it happens not to fail in a simple test is not surprising, especially since your program exited shortly after committing the miseed. The string that's written is, in itself, intact and valid -- it's only other things using that area of memory that may suffer. That doesn't mean it will work in a more complex circumstance.

I just tested the code on the university's server again - the SAME code as before - and now it works. I have absolutely no idea why, certainly there was an error in the testing unit.
So there was no error in my code. But at least testing it with wrong parameters now taught me something important (what you guys pointed out):
Undefined behavior can also be that everything appears to work fine; although it shouldn't.
So from this point of view you were right. This is very similar to the posted topics. It was me approaching the problem with wrong expectations.
Thank you!

The problem is that you're assuming that there should be something wrong and that the compiler should tell you that it's wrong.
The syntax is correct, but the semantics aren't - the compiler can only tell you so much. sprint() will print what you want it to, but what all it writes to in memory varies.
Consider using snprintf()

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight