Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Please explain this output. The problem is same with scanf. How can the array be reinitialized everytime?
Input:
HelloWorld
Tech
Output:
eo
Tech
TW
Code:
#include<stdio.h>
#include<string.h>
int main()
{ char c[1024]; int i,d=1;
gets(c);
printf("%c%c\n",c[d],c[d+5]);
gets(c);
puts(c);
printf("%c%c\n",c[0],c[4+d]);
return 0;
}
In C, strings are stored as null-terminated arrays of characters. This means that there are many ways to represent the same string: everything in the array of characters after the first null byte is ignored, as far as the value of the string is concerned.
When the first gets call reads HelloWorld, it stores the character 'H' in c[0], 'e' in c[1], …, 'd' in c[9] and 0 (that's a null byte, not '0') in c[10]. The contents of c[11] through c[1023] are unchanged.
When the second gets call reads Tech, it stores the character 'T' in c[0], 'e' in c[1], 'c' in c[2], 'h' in c[3] and 0 in c[4]. The contents of c[5] through c[1023] are unchanged. In particular, c[5] still has the value W that was set by the first gets call.
If you're used to high-level languages, you might expect that gets allocates new storage, or that doing what looks like a string access guarantees that you're actually accessing the string, but neither of these is true.
As you might guess from the fact that you pass an array of characters to gets, it merely writes to that array of characters, it doesn't allocate new storage. When gets reads a 4-byte string, it writes that 4-byte string plus the terminating null byte, and it doesn't touch whatever comes after those 5 bytes. In fact, gets doesn't even know the size of the array (which is why gets is practically unusable in practice and has been removed from the current version of the C language: if you input a line that's too long for the array, it will overwrite whatever comes after the array in memory).
c[5] accesses the element of the array c at position 5. This happens to be outside the string, but that's not relevant to how an array access operates. It's an array access, not a string access. C doesn't really have strings natively: it fakes them with arrays of characters. String manipulation functions treat their arguments as strings (i.e. they only look up to the first null byte). But an array access is just an array access, and it's up to the programmer not to access the string, or the array, out of bounds.
Please correct your indications. You may want to use 2, 3, 4 spaces, but 1 just looks wrong. I don't see any use of the string.h library.
The array is not cleared, but the 0..3 characters are replaced with the new ones and character on position 4 is \0. That's why you can read character from previously entered value at position 5 in this array.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have noticed that often when I set a character array the exact size it would need, the data stored in that array gets corrupted. Why does that happen, and how much space more should I allocate an array than the maximum size I would need to store data at?
Why is it recommended to give an array more size than needed?
That is not true for all array types. It is not recommended and also your array´s content should not get touched in any manner if you don´t set more elements than required. However, there is one special exception - storing strings in an array of chars, as explained as following.
I have noticed that often when I set a character array the exact size it would need, the data stored in that array gets corrupted.
When you want to store a string in a char array, you have to consider the terminating null character - \0.
This character is implied for the reason, that a string-operating function can determine the end of a string and also detects whether a char array´s content is part of a string or not.
If you don´t mind to set place for that character and attempt to store a string in that char array, the program will cause a buffer overflow in memory - the \0 will get written beyond the bounds of the array as it only have space for the proper string content, but not the null character.
How much space more should I allocate an array than the maximum size I would need to store data at?
Just 1 char element more is needed to hold the \0.
F.e.:
char a[6] = "hello";
a needs to be consisted of 6, not 5 elements, because it needs to hold the terminating null character.
In comparison,
int b[5] = {1,2,3,4,5};
b doesn´t need to have more elements than explicitly required.
So as conclusion, when you want to store a string in a char array, imply one element more than needed to store the proper characters of the string alone.
This depends on that type of data that you want to store in an array and can have a number of complicated answers.
Generally, I would say for data you can allocate exactly what you need and not a byte more. For strings ensure you allocate a single additional byte for the null terminator.
There are plenty of resources to read more about the operations of arrays:
https://www.cs.swarthmore.edu/~newhall/unixhelp/C_arrays.html
https://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/Arrays.html
https://www.tutorialspoint.com/cprogramming/c_arrays.htm
This question already has answers here:
what should strlen() really return in this code?
(4 answers)
Closed 6 years ago.
The C code:
char c = 'a';
char *p = &c;
printf("%lu\n",strlen(p));
And I get a result 7 and I have no idea how this 7 come out.
The variable p points to a single character, not to a null terminated string. So when you call strlen on it, it attempts to access whatever memory is after c. This invokes undefined behavior.
What's happening in this particular case is that after a in memory there are six non-zero bytes followed by one zero byte, so you get 7. You can't however depend on this behavior. For example, if you add more local variables before and after a, even unused ones, you'll probably get a different result.
Remember that strings in C are really called null-terminated byte strings. All strings are terminated with a single '\0' character, meaning that a single-character string actually is two characters: The single character plus the terminator.
When you have the pointer pointing to c in your code, you don't have two characters, only the single characters contained in c. You don't know if there is a terminator after that character in memory, so when strlen looks for that terminator it will pass the character and go out into memory not belonging to any string to look for it, and you will have undefined behavior.
To try an illustrate what you have, take a look at this "graphical" representation:
+---+ +-----+----------------------
| p | --> | 'a' | indeterminate data...
+---+ +-----+----------------------
That's basically how it looks like in memory. The variable p points to the location where your character is stored, but after that in memory is just indeterminate data. This will be seemingly random, and you can not tell where there will be a byte corresponding to a string terminator character.
There's no way to say why strlen get the value 7 from, except that it finds six non-terminator bytes in the indeterminate data after your character. Next time you run it, or if you run it on a different system, you might get a completely different result.
Because strlen finds first null character starting from passed address. You pass address of only character, so strlen tries to look forward in memory until first null char in memory after variable c.
Anyway, you cannot be sure about result and it depends on compiler and all code you wrote. Moreover, you program can even fail with memory exception.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I tried to populate an array of structs defined as follows:
typedef struct{
char directive[5];
}directive_nfo_t;
By using the following:
directive_nfo_t directive_list[]=
{
{"ALIGN"},{"ASCII"},{"BSS"},{"BYTE"},{"END"},{"EQU"},{"ORG"}
};
To my surprise, the first few elements were corrupted like so:
[0]= ALIGNASCIIBSS
[1]= ASCIIBSS
[2]= BSS
...
Until I made the following change:
typedef struct{
char directive[6]; <-- made char array +1
}directive_nfo_t;
Then the first few arrays were correct like so:
[0]= ALIGN
[1]= ASCII
[2]= BSS
...
My question is what happens in the background to explain this behavior?
Regards.
In C, a string is a sequence of character values followed by a 0-valued terminator; the string "ASCII" is represented by the character sequence 'A', 'S', 'C', 'I', 'I', 0. Thus, you need a six-element array to store the string.
For a string that's N characters long, you need an array of N+1 characters to store it.
When you explicitly initialize a char array as string literal in the way you do:
char some_array[] = {"ALIGN"};
the compiler actually populates the 0th to 4th "position" (total of 5 positions) with the characters inside quotation marks, but also the fifth position with \0 without requiring you do it explicitly (if it has space enough). So the size equals 6. You exceed the boundaries if you don't count the \0 character into the size calculation and restrict the size to 5. Compiler would omit the terminating character.
In your case it looks as if the first element of the next member "overwrote" what should have been the omitted \0 character of the previous, since you haven't reserved a place for it. In fact the "mechanics of populating the array" boils down to the compiler writing as much data as could fit inside the boundaries. The address of the first position of the next member string logically corresponds to your assignment, although the \0 from the previous is missing.
Since your printf() format tag was %s, the function printed the characters until it reached the first \0 character, which is in fact undefined behavior.
That's why
char directive[6];
was correct size assignment in your code.
If the char array is big enough, C compiler automatically places a '\0' after the text.
If it is just large enough for the text, that terminator is omitted, which is what has happened here.
If there isn't even room for the text, the compiler will say something like "too many initialisers" or "array bounds overflow".
The struct array elements are adjacent in memory. The first two items lack a terminator, so the second item printed only stops at the terminator after the third item. The first item, is also printed until it reaches that same terminator. By making the array size 6, the compiler was able to place a terminator after every item.
Unlike in C++, C allows you to (unintentionally) shoot yourself in the feet, by allowing to omit NUL terminating character '\0' in the char array initializer when there is no room for it. Your case can be narrowed down to a simple array definition such as:
char str[5] = "ALFAP";
which is a syntatic shortcut to:
char str[5] = {'A', 'L', 'F', 'A', 'P'};
It may be kind of misleading, because in different context the same "ALFAP" represets string literal, that always has the ending NUL character:
char* str = "ALFAP" // here, "ALFAP" always contains NUL character at the end
My question is what happens in the background to explain this behavior? Regards.
You have an array of struct directive_nfo_t type and each struct directive_nfo_t holds array of five characters (in your first example).
The output that you were getting when you have 5 character array in directive_nfo_t type was basically due to two things-
Array elements are stored in consecutive memory locations.
In C, the abstract idea of a string is implemented with just null terminated array of characters.
When you have declared an array of directive_nfo_t type, each element of directive_nfo_t is stored in consecutive memory location and each element has 5 character array(which are also stored in consecutive locations) in it. And in your Initialization list({"ALIGN"},{"ASCII"},{"BSS"},{"BYTE"},{"END"},{"EQU"},{"ORG"}) for the array, you have used all the 5 characters in storing your data in first two elements of directive_nfo_t ("ALIGN" and "ASCII"). As, in C, functions which operate on character array to implement abstract idea of string, assume that a string will be terminated by using a null character at the end. Therefore, in the first two elements of directive_nfo_t array, the printf will keep on printing characters until it reaches null character(which it will find in element storing character array "BSS"). After printing ALIGN, printf will access the first character of second element of the array of directive_nfo_t (character A of ASCII). It occurred because there was not space for null character in the first element of array of directive_nfo_t type and compiler wouldn't add characters beyond array size as it does array bound check. From the third element of you array, you have enough space for null character and hence, printf works as expected.
You will get UNDEFINED BEHAVIOR if you allocate less memory to store your character array and use those functions which assume null terminated character array. Always set the size of the character array to MAX + 1 when you want to store maximum MAX characters in your array.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Disclaimer: Been doing Java for a while, but new to C.
I have a program that I wrote, and I'm purposely trying to see what happens with different inputs and outputs.
#include <stdio.h>
int main() {
printf("whattup\n");
char str1[1], str2[1];
printf("Enter something: ");
scanf("%s", &str1);
printf("Enter something else: ");
scanf("%s", &str2);
printf("first thing: %s\n", str1);
printf("second thing: %s", str2);
}
This is the program flow:
whattup
Enter something: ahugestatement
Enter something else: smallertext
first thing: mallertext
Things I don't understand:
Why does "first thing" print out the str2?
Why does str2 have it's first letter cut off?
Why does "second thing:" not print out?
I made the char array with a size of 1, shouldn't it only hold 1 letter?
To answer your questions specifically, you'll have to keep in mind that what happens is very much implementation-specific. The specific behavior you're seeing doesn't have to hold true on all C implementations. This is what the C standard calls "undefined behavior". With that in mind:
Why does "first thing" print out the str2?
Why does str2 have it's first letter cut off?
You have allocated storage for two chars on the stack. The compiler allocates them next to each other, with str2 preceding str1 in memory. Therefore, after your first scanf, part of the stack will look like this:
str1 is allocated here
v
? a h u g e s t a t e m e n t \0
^
str2 is allocated here
Then, after the second scanf, the same part of memory will look like this:
str1 is allocated here
v
s m a l l e r t e x t \0 e n t \0
^
str2 is allocated here
In other words, the second input simply overwrites the first, since it goes beyond the bounds of the storage you allocated for it. Then, when you print out str1, it simply prints whatever is at the address of str1, which, as you can see in the figure above, is mallertext.
Why does "second thing:" not print out?
This is because of two effects interacting. For one thing, where you print str2, you do not end the output with a newline. stdout is normally line-buffered, which means that data written to it is not actually written to the underlying terminal until either A) a newline is written, B) you explicitly call fflush(stdout), or C) the program exits.
It would, therefore, print it when the program exited, but your program never exits. Since you overwrite parts of the stack that you don't manage, in this case you overwrite the return address from main, and therefore, when you return from main, your program promptly crashes, and thus never arrives to the point where it would flush stdout.
In the case of your program, the stack-frame layout of main looks like this (assuming AMD64 Linux):
RBP+8: Return address
RPB+0: Previous frame address
RBP-1: str1
RBP-2: str2
Since ahugestatement including its NUL terminator is 15 bytes, the 14 of those bytes that don't fit in str1 overwrite the entire previous frame address and 6 bytes of the return address. Since the new return address is entirely invalid, your program segfaults when the return from main jumps to an address that isn't even mapped in memory.
I made the char array with a size of 1, shouldn't it only hold 1 letter?
Yes, and it does. It's just that you clobber the memory that follows it.
As a general statement, scanf is not really a terribly useful function if you want to do even any most basic form of checking for illegal input. If you're hoping to do interactive input at all, it is almost always better to use something like fgets() instead and then parse the read input. fgets(), unlike scanf, takes an additional input for the size of the receiving buffer, and will then make sure to not write outside it.
You have to do the bounds checking in C, to make sure your buffers don't overflow. So your output is undefined. If you run that code many times, its bound to crash at some point because the overflown buffers will end up overwriting something important.
That's called buffer overflow. You allocated one character to hold your input, but you are writing beyond that (messing up the rest of your program's memory).
Unlike Java, the C compiler and runtime do not enforce array bounds. That is one of the main differences between "(memory-) managed languages" and low-level languages.
Your array only holds one character and the rest is out of bounds.
Access out of the range of an array is undefined and usually disastrous.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I was trying to copy the contents of 1 string to another (a into b) .
I deliberately took the second string(b) to be smaller than the 1st
one(a) .
I copied the contents of the first one into second
. I added WATCH on both of them . In the Debug tab , I found out that while
copying the original string gets destroyed and the new one also
DISPLAYED LARGER than its size.
#include<stdio.h>
int main()
{
char a[10]="What?";
char b[2];
int i;
for(i=0;i<6;i++)
{
b[i]=a[i];
}
printf("This is %s",a);
printf("\n this is b now: ",b);
return 0;
}
I have attached the screenshot for the same. I took a = a string of
size 10 . a="WHat?" then I took a string b[2]
After copying , I printed both a and b . I expected the output to be
, a = "WHat?" , b="WH" But the output is coming something else.(See
the screenshot)
Why did the original string get destroyed ? Has the pointer changed ? But I have made it a constant pointer .It can't be changed.
Here is the Screen shot to the problem I am facing :
https://www.dropbox.com/s/8xwxwb27qis8xww/sjpt.jpg
Please Help Somebody !!
You are copying 6 bytes into an array of two bytes, essentially invoking undefined behavior.
You are passing array b to printf with %s specifier that expects a null-terminated string, while b is most likely not null-terminated at that point, which is another undefined behavior.
Also, a null-terminated string that can fit into 2 bytes array can essentially have only one printable character, so you should not expect b to be "WH". At best, if you fix the copying, it can only be "W" as the second character will be a termination byte (\0). If you want to have two characters, either increase the array size to 3 to allow for null terminator, or simply do not use C strings and print out two bytes using "%c%c" format string.
As pointed out in other answers, you are writing outside the bounds of the array. The original string a changes because it happens to be exactly after b in memory as you can see in the debug window.
Before the loop, memory looks like this:
b a
|00|WHat?00000|
After the loop, memory looks like this:
b a
|WH|at?0?00000|
This explains why
a is changed
the original questionmark in a is still there (you only write 6 characters - two into the location reserved for b, 4 (including null terminator) into the location of a)
Of course this is undefined behavior as already mentioned by Vlad Lazarenko, but it explains the behavior for your compiler/settings/version/etc.
A constant pointer only exists for the compiler. It ensures that you cannot explicitly manipulate its data, but if you have memory leaks, nothing can be guaranteed.
What you're doing currently is very unsafe! It might work on Windows for some godforsaken reason, but don't do this!
The C standard library has special functions for working with strings and memory, strcpy for example is for copying character arrays. I suggest you learn more about how strings work and how you can manipulate them.