Checking validity of non null-terminated string

Checking validity of non null-terminated string - c

I am having trouble wrapping my brain around null terminators and non-null terminating arrays.
Let's say I have two declarations:
const char *string = "mike";
and
const char string[4] = {'m', 'i', 'k', 'e'};
I understand the first declaration is because in C, a character array is null terminated because it is a defined to be a contiguous block of characters in memory terminated by a NULL and I can check this with strlen.
The problem that I'm having is understanding declarations like the second, with no null terminator.
How can I check for validity of a string with no null terminator? (as in, what if there are additional values in the array?)

How can I check for validity of a string with no null terminator?
You need to know array bounds in order to see if a null-terminated string is contained within the bounds. Here is how you can do that in your example:
const char string[4] = {'m', 'i', 'k', 'e'};
int good = 0;
for (int i = 0 ; i != sizeof(string) ; i++) {
if (string[i] == '\0') {
good = 1;
break;
}
}
if (good) {
printf("String '%s' is null-terminated.\n", string);
} else {
printf("String is not null-terminated; cannot print.\n");
}
Although C library provides support only for null-terminated strings, you could use character arrays without null termination as long as you have access to their size (i.e. it's an array, not a pointer). For example, you could print your array like this:
printf("'%.*s'\n", sizeof(string), string);

You can't. A string with no null terminator is not a string. It's just an array of characters. A C string must have a null terminator to be considered a string.
You'd have to deal with it like you would with an int[] array or any other type of array: keep track of the size separately, if it's not a known fixed size. Since it's not a string, you couldn't call string functions like strlen.

A string needs to end will a null terminator. If you tried to do printf("%s",string) on the second example or use functions like strcmp, strcpy, or strlen it would not work. It is true the a string is just an array of characters with a null terminator, but the null terminator needs to be there if is to be consider a string. So if you are not sure that you actually have an array of characters that is null terminated, you'll need to check for the null terminator.
This distinction is very important, but the similarities can be used to your advantage especially when you get to the embedded level of programming and are reviving characters across a wire. Let's just say you have a buffer that you are putting revived characters in and you are looking for the string "mike". You most likely will not receive a null terminator across a wire so when you search for the string, you'll either need to compare the characters individually or use strncmp which only compares the number of characters that you tell it to which if you have a hardcoded string you can use strlen to get the size of that string you use for strncmp.

Related

When/Why is '\0' necessary to mark end of an (char) array?

So I just read an example of how to create an array of characters which represent a string.
The null-character \0 is put at the end of the array to mark the end of the array. Is this necessary?
If I created a char array:
char line[100];
and put the word:
"hello\n"
in it, the chars would be placed at the first six indexes line[0] - line[6], so the rest of the array would be filled with null characters anyway?
This books says, that it is a convention that, for example the string constant "hello\n" is put in a character array and terminated with \0.
Maybe I don't understand this topic to its full extent and would be glad for enlightenment.

The \0 character does not mark the "end of the array". The \0 character marks the end of the string stored in a char array, if (and only if) that char array is intended to store a string.
A char array is just a char array. It stores independent integer values (char is just a small integer type). A char array does not have to end in \0. \0 has no special meaning in a char array. It is just a zero value.
But sometimes char arrays are used to store strings. A string is a sequence of characters terminated by \0. So, if you want to use your char array as a string you have to terminate your string with a \0.
So, the answer to the question about \0 being "necessary" depends on what you are storing in your char array. If you are storing a string, then you will have to terminate it with a \0. If you are storing something that is not a string, then \0 has no special meaning at all.

'\0' is not required if you are using it as character array. But if you use character array as string, you need to put '\0'. There is no separate string type in C.
There are multiple ways to declare character array.
Ex:
char str1[] = "my string";
char str2[64] = "my string";
char str3[] = {'m', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g', '\0'};
char str4[64] = {'m', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g' };
All these arrays have the same string "my string". In str1, str2, and str4, the '\0' character is added automatically, but in str3, you need to explicitly add the '\0' character.
(When the size of an array is explicitly declared, and there are fewer items in the initializer list than the size of the array, the rest of the array is initialized with however many zeros it takes to fill it -- see C char array initialization and The N_ELEMENTS macro .).

When/Why is '\0' necessary to mark end of an (char) array?
The terminating zero is necessary if a character array contains a string. This allows to find the point where a string ends.
As for your example that as I think looks the following way
char line[100] = "hello\n";
then for starters the string literal has 7 characters. It is a string and includes the terminating zero. This string literal has type char[7]. You can imagine it like
char no_name[] = { 'h', 'e', 'l', 'l', 'o', '\n', '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So relative to the example the seven characters of the string literal are used to initialize first 7 elements of the array. All other elements of the array that were not initialized by the characters of the string literal will be initialized implicitly by zeroes.
If you want to determine how long is the string stored in a character array you can use the standard C function strlen declared in the header <string.h>. It returns the number of characters in an array before the terminating zero.
Consider the following example
#include <stdio.h>
#include <string.h>
int main(void)
{
char line[100] = "hello\n";
printf( "The size of the array is %zu"
"\nand the length of the stored string \n%s is %zu\n",
sizeof( line ), line, strlen( line ) );
return 0;
}
Its output is
The size of the array is 100
and the length of the stored string
hello
is 6
In C you may use a string literal to initialize a character array excluding the terminating zero of the string literal. For example
char line[6] = "hello\n";
In this case you may not say that the array contains a string because the sequence of symbols stored in the array does not have the terminating zero.

You need the null character to mark the end of the string. C does not store any internal information about the length of the character array or the length of a string, and so the null character/byte \0 marks where it ends.
This is only required for strings, however – you can have any ordinary array of characters that does not represent a string.
For example, try this piece of code:
#include <stdio.h>
int main(void) {
char string[1];
string[0] = 'a';
printf("%s", string);
}
Note that the character array is completely filled with data. Thus, there is no null byte to mark the end. Now, printf will keep printing until it hits a null byte – this will be somewhere past the end of the array, so you will print out a lot of junk in addition to just "a".
Now, try this:
#include <stdio.h>
int main(void) {
char string[2];
string[0] = 'a';
string[1] = '\0';
printf("%s", string);
}
It will only print "a", because the end of the string is explicitly marked.

The length of a C string (an array containing the characters and terminated with a '\0' character) is found by searching for the (first) NUL byte. \0 is zero character. In C it is mostly used to indicate the termination of a character string.
I make an example to you:
let's say you've written a word into a file:
word = malloc(sizeof(cahr) * 6);
word = "Hello";
fwrite(word, sizeof(char), 6, fp);
where in word we allocate space for the 5 character of "Hello" plus one more for its terminating '\0'. The fp is the file.
An now, we write another word after the last one:
word2 = malloc(sizeof(cahr) * 7);
word2 = "world!";
fwrite(word2, sizeof(char), 7, fp);
So now, let's read the two words:
char buff = malloc(sizeof(char)*1000); // See that we can store as much space as we want, it won't change the final result
/* 13 = (5 chacater from 'Hello')+(1 character of the \0)+(6 characters from 'world!')+(1 character from the \0) */
fread(buff, sizeof(char), 13, fp); // We read the words 'Hello\0' and 'world!\0'
printf("the content of buff is: %s", buff); // This would print 'Hello world!'
This last is due to the ending \0 character, so C knows there are two separated strings into buffer. If we had not put that \0 character at the end of both words, and repeat the same example, the output would be "Helloworld!"
This can be used for many string methods and functions!.

Why is the entirety of this first array being added onto the second, on top of the two values (from the first) that I assign it?

I want to assign the first two values from the hash array to the salt array.
char hash[] = {"HAodcdZseTJTc"};
char salt[] = {hash[0], hash[1]};
printf("%s", salt);
However, when I attempt this, the first two values are assigned and then all thirteen values are also assigned to the salt array. So my output here is not:
HA
but instead:
HAHAodcdZseTJTC

salt is not null-terminated. Try:
char salt[] = {hash[0], hash[1], '\0'};

Since you are adding just two characters to the salt array and you are not adding the '\0' terminator.
Passing a non nul terminated array as a parameter to printf() with a "%s" specifier, causes undefined behavior, in your case it prints hash in my case
HA#
was printed.
Strings in c use a special convetion to know where they end, a non printable special character '\0' is appended at the end of a sequence of non-'\0' bytes, and that's how a c string is built.
For example, if you were to compute the length of a string you would do something like
size_t stringlength(const char *string)
{
size_t length;
for (length = 0 ; string[length] != '\0' ; ++length);
return length;
}
there are of course better ways of doing it, but I just want to illustrate what the significance of the terminating '\0' is.
Now that you know this, you should notice that
char string[] = {'A', 'B', 'C'};
is an array of char but it's not a string, for it to be a string, it needs a terminating '\0', so
char string[] = {'A', 'B', 'C', '\0'};
would actually be a string.
Notice that then, when you allocate space to store n characters, you need to allocate n + 1 bytes, to make room for the '\0'.
In the case of printf() it will try to consume all the bytes that the passed pointer points at, until one of them is '\0', there it would stop iterating through the bytes.
That also explains the Undefined Behavior thing, because clearly printf() would be reading out of bounds, and anything could happen, it depends on what is actually there at the memory address that does not belong the the passed data but is off bounds.
There are many functions in the standard library that expect strings, i.e. _sequences of non nul bytes, followed by a nul byte.

Sizeof(char[]) in C

Consider this code:
char name[]="123";
char name1[]="1234";
And this result
The size of name (char[]):4
The size of name1 (char[]):5
Why the size of char[] is always plus one?

Note the difference between sizeof and strlen. The first is an operator that gives the size of the whole data item. The second is a function that returns the length of the string, which will be less than its sizeof (unless you've managed to get string overflow), depending how much of its allocated space is actually used.
In your example
char name[]="123";
sizeof(name) is 4, because of the terminating '\0', and strlen(name) is 3.
But in this example:
char str[20] = "abc";
sizeof(str) is 20, and strlen(str) is 3.

As Michael pointed out in the comments the strings are terminated by a zero. So in memory the first string will look like this
"123\0"
where \0 is a single char and has the ASCII value 0. Then the above string has size 4.
If you had not this terminating character, how would one know, where the string (or char[] for that matter) ends? Well, indeed one other way is to store the length somewhere. Some languages do that. C doesn't.

In C, strings are stored as arrays of chars. With a recognised terminating character ('\0' or just 0) you can pass a pointer to the string, with no need for any further meta-data. When processing a string, you read chars from the memory pointed at by the pointer until you hit the terminating value.
As your array initialisation is using a string literal:
char name[]="123";
is equivalent to:
char name[]={'1','2','3',0};
If you want your array to be of size 3 (without the terminating character as you are not storing a string, you will want to use:
char name[]={'1','2','3'};
or
char name[3]="123";
(thanks alk)
which will do as you were expecting.

Because there is a null character that is attached to the end of string in C.
Like here in your case
name[0] = '1'
name[1] = '2'
name[2] = '3'
name[3] = '\0'
name1[0] = '1'
name1[1] = '2'
name1[2] = '3'
name1[3] = '4'
name1[4] = '\0'

A String in C (and in, probably, every programming language - behind the scenes) is an array of characters which is terminated by \0 with the ASCII value of 0.
When assigning: char arr[] = "1234";, you assign a string literal, which is, by default, null-terminated (\0 is also called null) as you can see here.
To avoid a null (assuming you want just an array of chars and not a string), you can declare it the following way char arr[] = {'1', '2', '3', '4'}; and the program will behave as you wish (sizeof(arr) would be 4).

name = {'1','2','3','\0'};
name1 = {'1','2','3','4','\0'};
So
sizeof(name) = 4;
sizeof(name1) = 5;
sizeof returns the size of the object and in this case the object is an array and it is defined that your array is 4 bytes long in first case and 5 bytes in second case.

In C, string literals have a null terminating character added to them.
Your strings,
char name[]="123";
char name1[]="1234";
look more like:
char name[]="123\0";
char name1[]="1234\0";
Hence, the size is always plus one. Keep in mind when reading strings from files or from whatever source, the variable where you store your string, should always have extra space for the null terminating character.
For example if you are expected to read string, whose maximum size is 100, your buffer variable, should have size of 101.

Every string is terminated with the char nullbyte '\0' which add 1 to your length.

How to add a character to the back of a char array when you obtain it with a gets() function in c?

I have an array of charracters where I put in information using a gets().
char inname[30];
gets(inname);
How can I add another character to this array without knowing the length of the string in c? (the part that are actual letters and not like empty memmory spaces of romething)
note: my buffer is long enough for what I want to ask the user (a filename, Probebly not many people have names longer that 29 characters)

Note that gets is prone to buffer overflow and should be avoided.
Reading a line of input:
char inname[30];
sscanf("%.*s", sizeof(inname), inname);
int len = strlen(inname);
// Remove trailing newline
if (len > 0 && inname[len-1] == '\n') {
len--;
inname[len] = '\0'
}
Appending to the string:
char *string_to_append = ".";
if (len + strlen(string_to_append) + 1) <= sizeof(inname)) {
// There is enough room to append the string
strcat(inname, string_to_append);
}
Optional way to append a single character to the string:
if (len < sizeof(inname) - 2) {
// There is room to add another character
inname[len++] = '.'; // Add a '.' character to the string.
inname[len] = '\0'; // Don't forget to nul-terminate
}

As you have asked in comment, to determine the string length you can directly use
strlen(inname);
OR
you can loop through string in a for loop until \0 is found.
Now after getting the length of prvious string you can append new string as
strcat(&inname[prevLength],"NEW STRING");
EDIT:
To find the Null Char you can write a for loop like this
for(int i =0;inname[i] != 0;i++)
{
//do nothing
}
Now you can use i direcly to copy any character at the end of string like:
inname[i] = Youe Char;
After this increment i and again copy Null char to(0) it.
P.S.
Any String in C end with a Null character termination. ASCII null char '\0' is equivalent to 0 in decimal.

You know that the final character of a C string is '\0', e.g. the array:
char foo[10]={"Hello"};
is equivalent to this array:
['H'] ['e'] ['l'] ['l'] ['0'] ['\0']
Thus you can iterate on the array until you find the '\0' character, and then you can substitute it with the character you want.
Alternatively you can use the function strcat of string.h library

Short answer is you can't.
In c you must know the length of the string to append char's to it, in other languages the same applies but it happens magically, and without a doubt, internally the same must be done.
c strings are defined as sequences of bytes terminated by a special byte, the nul character which has ascii code 0 and is represented by the character '\0' in c.
You must find this value to append characters before it, and then move it after the appended character, to illustrate this suppose you have
char hello[10] = "Hello";
then you want to append a '!' after the 'o' so you can just do this
size_t length;
length = strlen(hello);
/* move the '\0' one position after it's current position */
hello[length + 1] = hello[length];
hello[length] = '!';
now the string is "Hello!".
Of course, you should take car of hello being large enough to hold one extra character, that is also not automatic in c, which is one of the things I love about working with it because it gives you maximum flexibility.
You can of course use some available functions to achieve this without worrying about moving the '\0' for example, with
strcat(hello, "!");
you will achieve the same.
Both strlen() and strcat() are defined in string.h header.

null terminator in the middle of a string

what happens if a program receives as an argv[1] argument a string with a null terminator in the middle? for example:
./program test'\0'example
what is the value of argv[1]? is it test? is it test\0example? I have these lines of code
max = sizeof(filename);
len = strlen(argv[1]);
if (len > max) goto error;
strcpy(filename, argv[1]);
I need to build an exploit for this program and what I wanted to do, is making argv[1] worth test'\0'example so strlen(argv[1])=strlen("test")=4 and strcpy(filename, argv[1])=strcpy(filename, "test") so I can use the rest of the string (the example part) to put my exploit. is it possible? thank you very much?

argv[1] is a pointer object of type char*. Its value is an address, not a string. Specifically, its value is the address of a char object whose value is 't'.
The C standard (in section 7.1.1) has the following definitions:
A string is a contiguous sequence of characters terminated by and including the first null
character.
[...]
A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained characters, in order.
Since argv[1] points to the first of a contiguous sequence of characters, one of which is a null character, it's a pointer to a string. The value of that string is "test" (which includes the terminating '\0'), and the length of the string is 4.
It's common to say, as a kind of verbal shorthand, that the value of argv[1] is "test", but that's imprecise -- especially in a case like this where the distinction between the value of a string and the value of the array containing that string is significant.
argv[1] also points to the first character of an array of characters. The first 5 bytes of that array contain the string "test". The entire array contains the character values:
{ 't', 'e', 's', 't',
'\0',
'e', 'x', 'a', 'm', 'p', 'l', 'e',
'\0' }
If you pass the value of argv[1] to a string function, that function will only see "test", and will not access anything past the terminating '\0'. The rest of the contents of the array are still perfectly valid, and can be accessed using functions (like memcpy) that don't just operate on strings.
Whether it's possible to invoke your main program in such a way that argv[1] will point to the first element of an array with those particular contents is another matter, one that depends on your operating system.

The value of argv[1] will be "test", assuming you actually manage to get a real NULL character on the terminal and not just the literal characters \ and 0.
As RedAlert's comment mentioned, strlen and strcpy both stop on a null character, so getting a null character will not help for most exploits.
You most likely need to find a way to do the exploit without using the character \0.

Your idea works when main() is called from within main()
#include <stdio.h>
int main(int argc, char **argv) {
if (argc == 1) {
char *data[] = {"", "5one\0two", "7three\0four", "6five\0six"};
main(4, data); // call main again, with exploitable data
} else {
if (!argv[0][0]) { // test for empty argv[0]
for (int i = 1; i < 4; i++) {
printf("%s ==> %s\n", argv[i] + 1, argv[i] + argv[i][0] - '0');
}
}
}
return 0;
}
I'm not sure if it will work when main() is called from the C library initialization code ... or even if you can make your shell accept a NUL character as part of an argument.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight