Strings behvior on C - c

I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.

Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.

If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.

char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow

Related

If strncat adding NUL may cause the array go out of bound

I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?
This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.
This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.

Lack of understanding of security engineering in simple C code

I am absolutely new to C programming. Currently I am preparing for my new course of studies IT Security. In a slightly older exam I found a task where I have no approach how to solve it. The task is in German. In principle it is about finding critical errors.
It is not written how the passed parameters look like.
1) I have come to the point that you should not use strcpy because it has no bounds checking.
2) Also char[10] should not be used if you want to store 10 characters (\0). It should be char[11].
Is it possible to read Adresses or write sth due the printf(argv[1]) command ?
I would like to mention again that you help me here personally and do not help to collect bonus points in the university.
#include <stdio.h>
int main(int argc, char *argv[])
{
char code[10];
if(argc != 2) return 1;
printf(argv[1]);
strcpy(code, "9999999999");
for(int i = 0; i < 10; ++i){
code[i] -= argv[1][i] % 10;
}
printf(", %s\n", code);
return 0;
}
See
related.
you should not use strcpy() because it has no bounds checking
Nothing in C has bounds checking unless either
the compiler writer put it there, or
you put it there.
Few compiler writers incorporate bounds checking into their products, because it usually causes the resulting code to be bigger and slower. Some tools exist (e.g.
Valgrind,
Electric Fence)
to provide bounds-checking-related debugging assistance, but they are not commonly incorporated into delivered software because of limitations they impose.
You absolutely should use strcpy() if
you know your source is a NUL-terminated array of characters, a.k.a. "a string", and
you know your destination is large enough to hold all of the source array including the terminating NUL
because the compiler writer is permitted to use behind-the-scenes tricks unavailable to compiler users to ensure strcpy() has the best possible performance while still providing the behaviour guaranteed by the standard.
char[10] should not be used if you want to store 10 characters (\0)
Correct.
To store 10 characters and the terminating NUL ('\0'), you must have at least 11 characters of space available.
Is it possible to read Adresses or write sth due the printf(argv[1]) command ?
In principle: maybe.
The first argument to printf() is a format string which is interpreted by printf() to determine what further arguments have been provided. If the format string contains any format specifications (e.g. "%d" or "%n") then printf() will try to retrieve corresponding arguments.
If they were not in fact passed to it, then it invokes Undefined Behaviour which is Bad.
An attacker could run your program giving it a command-line argument containing format specifiers, which would lead to such UB.
The right way to print an arbitrary string like this with printf() is printf("%s", argv[1]);

C character array and its length

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv

Why output length is coming 6?

I have written a simple program to calculate length of string in this way.
I know that there are other ways too. But I just want to know why this program is giving this output.
#include <stdio.h>
int main()
{
char str[1];
printf( "%d", printf("%s", gets(str)));
return 0;
}
OUTPUT :
(null)6
Unless you always pass empty strings from the standard input, you are invoking undefined behavior, so the output could be pretty much anything, and it could crash as well. str cannot be a well-formed C string of more than zero characters.
char str[1] allocates storage room for one single character, but that character needs to be the NUL character to satisfy C string constraints. You need to create a character array large enough to hold the string that you're writing with gets.
"(null)6" as the output could mean that gets returned NULL because it failed for some reason or that the stack was corrupted in such a way that the return value was overwritten with zeroes (per the undefined behavior explanation). 6 following "(null)" is expected, as the return value of printf is the number of characters that were printed, and "(null)" is six characters long.
There's several issues with your program.
First off, you're defining a char buffer way too short, a 1 char buffer for a string can only hold one string, the empty one. This is because you need a null at the end of the string to terminate it.
Next, you're using the gets function which is very unsafe, (as your compiler almost certainly warned you about), as it just blindly takes input and copies it into a buffer. As your buffer is 0+terminator characters long, you're going to be automatically overwriting the end of your string into other areas of memory which could and probably does contain important information, such as your rsp (your return pointer). This is the classic method of smashing the stack.
Third, you're passing the output of a printf function to another printf. printf isn't designed for formating strings and returning strings, there are other functions for that. Generally the one you will want to use is sprintf and pass it in a string.
Please read the documentation on this sort of thing, and if you're unsure about any specific thing read up on it before just trying to program it in. You seem confused on the basic usage of many important C functions.
It invokes undefined behavior. In this case you may get any thing. At least str should be of 2 bytes if you are not passing a empty string.
When you declare a variable some space is reserved to store the value.
The reserved space can be a space that was previously used by some other
code and has values. When the variable goes out of scope or is freed
the value is not erased (or it may be, anything goes.) only the programs access
to that variable is revoked.
When you read from an unitialised location you can get anything.
This is undefined behaviour and you are doing that,
Output on gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 is 0
For above program your input is "(null)", So you are getting "(null)6". Here "6" is the output from printf (number of characters successfully printed).

C String Null Zero?

I have a basic C programming question, here is the situation. If I am creating a character array and if I wanted to treat that array as a string using the %s conversion code do I have to include a null zero. Example:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
The console output for this is:
abcdef
Notice that there is not a null zero as the last element in the array, yet I am still printing this as a string.
I am new to programming...So I am reading a beginners C book, which states that since I am not using a null zero in the last element I cannot treat it as a string.
This is the same output as above, although I include the null zero.
char name[7] = {'a','b','c','d','e','f','\0'};
printf("%s",name);
You're just being lucky; probably after the end of that array, on the stack, there's a zero, so printf stops reading just after the last character. If your program is very short and that zone of stack is still "unexplored" - i.e. the stack hasn't grown yet up to that point - it's very easy that it's zero, since generally modern OSes give initially zeroed pages to the applications.
More formally: by not having explicitly the NUL terminator, you're going in the land of undefined behavior, which means that anything can happen; such anything may also be that your program works fine, but it's just luck - and it's the worst type of bug, since, if it generally works fine, it's difficult to spot.
TL;DR version: don't do that. Stick to what is guaranteed to work and you won't introduce sneaky bugs in your application.
The output of your fist printf is not predictable specifically because you failed to include the terminating zero character. If it appears to work in your experiment, it is only because by a random chance the next byte in memory happened to be zero and worked as a zero terminator. The chances of this happening depend greatly on where you declare your name array (it is not clear form your example). For a static array the chances might be pretty high, while for a local (automatic) array you'll run into various garbage being printed pretty often.
You must include the null character at the end.
It worked without error because of luck, and luck alone. Try this:
char name[6] = {'a','b','c','d','e','f'};
printf("%s",name);
printf("%d",name[6]);
You'll most probably see that you can read that memory, and that there's a zero in it. But it's sheer luck.
What most likely happened is that there happened to be the value of 0 at memory location name + 6. This is not defined behavior though, you could get different output on a different system or with a different compiler.
Yes. You do. There are a few other ways to do it.
This form of initialization, puts the NUL character in for you automatically.
char name[7] = "abcdef";
printf("%s",name);
Note that I added 1 to the array size, to make space for that NUL.
One can also get away with omitting the size, and letting the compiler figure it out.
char name[] = "abcdef";
printf("%s",name);
Another method is to specify it with a pointer to a char.
char *name = "abcdef";
printf("%s",name);

Resources