How does the string populate an array - c

I'm creating a simple program to see how a string populates an array.
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
int main(void)
{
char string1[100];
int i;
printf("Enter sentence.\n");
fgets(string1, 100, stdin);
for (i=0; i<=15; i++)
puts(&string1[i]);
return 0;
}
I'm having a bit of a problem understanding how the string is populating an array. My expectation is that the string will be completely stored in string1[0] and any further indexes will come up blank. However, when I throw the loop to see if my assumption is true, it turns out that every index has been filled in by the string. Am I misunderstanding how the string is filling the array?

For the string "Hello!", the memory representation would be something like this
+-------+-------+-------+-------+-------+-------+-------+
| 'H' | 'e' | 'l' | 'l' | 'o' | '!' | '\0' |
+-------+-------+-------+-------+-------+-------+-------+
The first cell, at index 0, contains the first character. And each subsequent character is contained in a cell with an increasing index.
Library functions like puts expect you to pass the address of the first character, and then they read the string up to \0.
So if you pass simply string1 or &string1[0], it will resolve to the address of 'H'.
If you pass &string[1], it will resolve to the address of 'e', and the library function will think that is the first character, because that's the contract C strings are designed with.

Your problem is not string1 layout per se but how puts interprets it. Strings are represented by char arrays in C while their end is marked as null terminator (character with code 0):
S e n t e n c e \0
^ ^
string1 &string1[5]
&string1[5] is a pointer to a one character, but since the following character is not null terminator, following memory is interpreted as a string and nce gets printed.
You'll need to use putc and access individual characters:
putc(string1[i])

string is not stored in string1[0] but string's first character is stored at string1[0] or string starts at (string1+0). Here, &string1[0] or (string1+0) can be seen as a pointer, a pointer to C String string1.
In that sense, every valid index i of string1 will give you a valid pointer (string1 + i) which will point to some part of C String string1.
In the last for loop you are printing the suffixes of string string1 which are pointed by (string1 + 0), (string1 + 1), (string1 + 2)...

Related

How does a null character behave in a char array in C?

I tried to reverse this char array with null characters in the middle and the end, without using string length. (original code)
#include <stdio.h>
#include <stdlib.h>
int main()
{
char string[4] ={'c', '\0', 's', '\0'};
printf("What do we love?\n");
printf("Yes, we love:");
for(int i=3; i>=0; i--){
printf("%d", string[i]);
}
return 0;
}
I expected the output to display nothing. But I got the reverse of the array with whitespaces at the places where I’m guessing are the null characters? (output)
Bcoz I have tried using %d too instead of %c and found that those spaces apparently do have the ascii value of 0. (code with slight change + output + ascii table)
So, does this mean that a loop will not always treat a null character in a char array as an indicator of termination? Does this also mean null characters, which automatically get appended on the empty spaces of a char array actually, get printed as spaces in display, but we just say that it prints nothing in the output after it encounters null character only coz we see 'nothing' on display with most codes?
A null byte is used in a char array to designate the end of a string. Functions that operate on strings such as strcpy, strcmp, and the %s format specifier for printf, will look for a null byte to find the end of a string.
You're not treating string as a string, but as just an array of char. So it doesn't matter whether or not a particular element of the array has the value 0 as you're not treating that value as special in any way. You're just printing the decimal value of each of the elements of the array.

Array showing random characters at the end

I wanted to test things out with arrays on C as I'm just starting to learn the language. Here is my code:
#include <stdio.h>
main(){
int i,t;
char orig[5];
for(i=0;i<=4;i++){
orig[i] = '.';
}
printf("%s\n", orig);
}
Here is my output:
.....�
It is exactly that. What are those mysterious characters? What have i done wrong?
%s with printf() expects a pointer to a string, that is, pointer to the initial element of a null terminated character array. Your array is not null terminated.
Thus, in search of the terminating null character, printf() goes out of bound, and subsequently, invokes undefined behavior.
You have to null-terminate your array, if you want that to be used as a string.
Quote: C11, chapter §7.21.6.1, (emphasis mine)
s
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
Quick solution:
Increase the array size by 1, char orig[6];.
Add a null -terminator in the end. After the loop body, add orig[i] = '\0';
And then, print the result.
char orig[5];//creates an array of 5 char. (with indices ranging from 0 to 4)
|?|?|?|0|0|0|0|0|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
for(i=0;i<=4;i++){ //attempts to populate array with '.'
orig[i] = '.';
|?|?|?|.|.|.|.|.|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
This results in a non null terminated char array, which will invoke undefined behavior if used in a function that expects a C string. C strings must contain enough space to allow for null termination. Change your declaration to the following to accommodate.
char orig[6];
Then add the null termination to the end of your loop:
...
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = 0;
Resulting in:
|?|?|?|.|.|.|.|.|0|?|?|?|
| ^memory you do not own
^start of orig
Note: Because the null termination results in a C string, the function using it knows how to interpret its contents (i.e. no undefined behavior), and your mysterious characters are held at bay.
There is a difference between an array and a character array. You can consider a character array is an special case of array in which each element is of type char in C and the array should be ended (terminated) by a character null (ASCII value 0).
%s format specifier with printf() expects a pointer to a character array which is terminated by a null character. Your array is not null terminated and hence, printf function goes beyond 5 characters assigned by you and prints garbage values present after your 5th character ('.').
To solve your issues, you need to statically allocate the character array of size one more than the characters you want to store. In your case, a character array of size 6 will work.
#include <stdio.h>
int main(){
int i,t;
char orig[6]; // If you want to store 5 characters, allocate an array of size 6 to store null character at last position.
for (i=0; i<=4; i++) {
orig[i] = '.';
}
orig[5] = '\0';
printf("%s\n", orig);
}
There is a reason to waste one extra character space for the null character. The reason being whenever you pass any array to a function, then only pointer to first element is passed to the function (pushed in function's stack). This makes for a function impossible to determine the end of the array (means operators like sizeof won't work inside the function and sizeof will return the size of the pointer in your machine). That is the reason, functions like memcpy, memset takes an additional function arguments which mentions the array sizes (or the length upto which you want to operate).
However, using character array, function can determine the size of the array by looking for a special character (null character).
You need to add a NUL character (\0) at the end of your string.
#include <stdio.h>
main()
{
int i,t;
char orig[6];
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = '\0';
printf("%s\n", orig);
}
If you do not know what \0 is, I strongly recommand you to check the ascii table (https://www.asciitable.com/).
Good luck
prinftf takes starting pointer of any memory location, array in this case and print till it encounter a \0 character. These type of strings are called as null terminated strings.
So please add a \0 at the end and put in characters till (size of array - 2) like this :
main(){
int i,t;
char orig[5];
for(i=0;i<4;i++){ //less then size of array -1
orig[i] = '.';
}
orig[i] = '\0'
printf("%s\n", orig);
}

Recursive addition code in C

The following code works but I don't quite understand how *if (s == 0) works.
It checks if the string is 0?
Also for return(isnumber(s+1)) what is the logic behind that?
I know s is a string but I can just pass s+1 into a function? How does it even know what character I'm looking for?
int isnumber(char *s) {
if (*s == 0) {
return 1; /* Reached end, we've only seen digits so far! */
}
if(!isdigit(*s)) {
printf("The number is invalid\n");
return 0; /* first character is not a digit, so no go */
}
return(isnumber(s+1));
}
int main () {
char inbuf[LENGTH];
int i, j;
printf("Enter a string > ");
fgets(inbuf, LENGTH-1, stdin); // ignore carriage return
inbuf[strlen(inbuf)-1] = 0;
j = isnumber(inbuf);
....
}
This function is a recursive function that checks if a string contains all numbers. To understand how the code works, you must understand how C stores strings. If you have the string "123", C stores this string in memory, like this:
|-----------------------------------|
| 0x8707 | 0x8708 | 0x8709 | 0x870A |
|--------|--------|--------|--------|
| | | | |
| '1' | '2' | '3' | '\0' |
|-----------------------------------|
What C does is it breaks your sting up into characters, stores them in some arbitrary location in memory and adds a null character (\0) (ASCII 0) to the end of the string. This null character is how C knows where the string ends.
Your isnumber() function takes a char *s as a parameter. This is called a pointer. Internally, whats going on is your main() function calls isdigit() and it actually passes in the address of your string, not the string itself. This is important:
j = isnumber(inbuf);
How the compiler interprets this is call isnumber() and pass along the address of inbuf and assign the return value to j.
Now back up at the isnumber() function, its receiving the address of inbuf and assigning it to s. By placing an asterisk (*) in front of s, you are doing something called dereferencing s. Dereferencing means you want the value contained at the address of s. So the line that says if (*s == 0) is basically saying If the value contained at the address of s is equal to 0. Remember earlier I told you in memory, strings always have a terminating null (\0) character? This is how your function knows to end and return.
The next thing to understand is pointer arithmetic. Depending on your system, a char might occupy either 1 byte of memory or 2 bytes. You can find out for sure by printing a sizeof(char). But when you refer to (s+1), that is telling the computer to take the memory address pointed to by s and add to it whatever the size of a char is. So if a char is 1 byte long and s is pointed to 0x8707, then (s+1) will make s equal 0x8708 and *s will point to the '2' in our string (see my memory block diagram above). This is how we iterate through each character in the string.
Hopefully this clears up the confusion!
The statement if (*s == 0) checks to see if the char s points to is zero. In other words, it checks to see if s is a zero-length string and returns 1 if so.
The statement return (isnumber(s+1)) adds 1 to s, causing it to point to the second char in the string, and passes that to isnumber(). isnumber returns true if the string at s[1] is a digit.
In C, strings are terminated with a null character.
(*s == 0) is checking for the null terminator.
This code is a little weirder.
return(isnumber(s+1));
Since the current character is a digit, keep going...call the function again starting at the NEXT character. This is a recursive function call and there is really no need when iteration would be simpler.

null terminator in the middle of a string

what happens if a program receives as an argv[1] argument a string with a null terminator in the middle? for example:
./program test'\0'example
what is the value of argv[1]? is it test? is it test\0example? I have these lines of code
max = sizeof(filename);
len = strlen(argv[1]);
if (len > max) goto error;
strcpy(filename, argv[1]);
I need to build an exploit for this program and what I wanted to do, is making argv[1] worth test'\0'example so strlen(argv[1])=strlen("test")=4 and strcpy(filename, argv[1])=strcpy(filename, "test") so I can use the rest of the string (the example part) to put my exploit. is it possible? thank you very much?
argv[1] is a pointer object of type char*. Its value is an address, not a string. Specifically, its value is the address of a char object whose value is 't'.
The C standard (in section 7.1.1) has the following definitions:
A string is a contiguous sequence of characters terminated by and including the first null
character.
[...]
A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained characters, in order.
Since argv[1] points to the first of a contiguous sequence of characters, one of which is a null character, it's a pointer to a string. The value of that string is "test" (which includes the terminating '\0'), and the length of the string is 4.
It's common to say, as a kind of verbal shorthand, that the value of argv[1] is "test", but that's imprecise -- especially in a case like this where the distinction between the value of a string and the value of the array containing that string is significant.
argv[1] also points to the first character of an array of characters. The first 5 bytes of that array contain the string "test". The entire array contains the character values:
{ 't', 'e', 's', 't',
'\0',
'e', 'x', 'a', 'm', 'p', 'l', 'e',
'\0' }
If you pass the value of argv[1] to a string function, that function will only see "test", and will not access anything past the terminating '\0'. The rest of the contents of the array are still perfectly valid, and can be accessed using functions (like memcpy) that don't just operate on strings.
Whether it's possible to invoke your main program in such a way that argv[1] will point to the first element of an array with those particular contents is another matter, one that depends on your operating system.
The value of argv[1] will be "test", assuming you actually manage to get a real NULL character on the terminal and not just the literal characters \ and 0.
As RedAlert's comment mentioned, strlen and strcpy both stop on a null character, so getting a null character will not help for most exploits.
You most likely need to find a way to do the exploit without using the character \0.
Your idea works when main() is called from within main()
#include <stdio.h>
int main(int argc, char **argv) {
if (argc == 1) {
char *data[] = {"", "5one\0two", "7three\0four", "6five\0six"};
main(4, data); // call main again, with exploitable data
} else {
if (!argv[0][0]) { // test for empty argv[0]
for (int i = 1; i < 4; i++) {
printf("%s ==> %s\n", argv[i] + 1, argv[i] + argv[i][0] - '0');
}
}
}
return 0;
}
I'm not sure if it will work when main() is called from the C library initialization code ... or even if you can make your shell accept a NUL character as part of an argument.

couldn't able to access the first subscript of an array when %s is used but possible with %c

#include <stdio.h>
#include <string.h>
void main()
{
char array[]="hello";
printf("%s",array[0]);
printf("%c",array[0]);
}
couldn't able to access array[0] when %s is used but able to access array[0] when %c is used, help me to find a solution for this.
you should use address while using with %s ==> &array[0]
because %s requires a pointer as argument.
usually we use
printf("%s",character_array_name);
here character_array_name is the address of first element
character_array_name == &character_array_name[0];
and if you want to print only one character , you need to use
printf("%.1s",character_array_name);
Example Code:
#include<stdio.h>
int main()
{
char *str="Hello World";
printf("%s\n",str); //this prints entire string and will not modify the pointer location but prints till the occurence of Null.
printf("%.1s\n",str); //only one character will be printed
printf("%.2s\n",str); //only two characters will be printed
//initially str points to H in the "Hello World" string, now if you increment str location
str++; //try this str=str+3;
printf("%.1s\n",str); //only one character
printf("%.2s\n",str); //only two characters will be printed
str=str+5; //prints from the W
printf("%s\n",str); //this prints upto Hello because scanf treats space or null as end of strings
printf("%.1s\n",str); //only one character
printf("%.2s\n",str); //only two characters will be printed
return 0;
}
Although you have accepted an answer , I thought my answer could help you in a certain way.
String literals are a sequence of characters and you can visualize your array like this :
+---+---+---+---+---+----+
array: | h | e | l | l | o | \0 |
+---+---+---+---+---+---+-
^
|
array[0]
printf is a variadic function and it doesn't know anything about its arguments until you specify them , so when it sees %c format specifier it assumes that the next argument will be a variable storing a character ,in this case its array[0] that is the character h is stored at the index 0 of the array .
Now, when printf sees a %s it assumes that the next argument will be a pointer pointing the string literal ("hello") that you want it to print , in this case array[0] is not a pointer, you should put array instead in the printf , please note that array names are not pointers but array name decays to pointer
Besides , you should use int main(void) in place of void main its standard

Resources