In K&R, we are introduced to char arrays to represent strings.
Arrays are passed by reference. From what I understand, we can point to the first element in an array (pointer?). Using the char array input without really defining its values means it sets garbage data inside the array. (Honestly not really sure what garbage data is, maybe nulls?).
Anyways, initially the empty char array is passed to function getLength, and it sets char array inputs. In my code, I display the len and char array input.
On the next input, I call getLength again, and pass the same char array input. I set the values like before and return the length.
How is the old input erased? Aren't I referencing the exact same array that previously stored the previous input? Below my code, I'll show an example.
#include <stdio.h>
#define MAXLINE 1000 /* For allocating storage size for char array */
int getLength(char s[]); /* set char array and return length */
int main(void) {
int len;
char input[MAXLINE];
while ((len = getLength(input)) > 0) {
printf("len = %d\n", len);
printf("string = %s", input);
}
}
int getLength(char s[]) {
int i, c;
for (i = 0; i < MAXLINE - 1 && (c = getchar()) != EOF && c != '\n'; ++i) {
s[i] = c;
}
if (c == '\n') {
s[i++] = '\n';
}
s[i] = '\0';
return i; /* return length including newline */
}
Example:
Input: "Hello my name is Philip"
Output: "len = 24"
"string = Hello my name is Philip"
Input: "Hi"
Output: "len = 3"
"string = Hi"
When I input "Hi", aren't I using the previous array that has "Hello my name is Philip" stored inside. So won't I expect the array to look like:
['H', 'i', '\n', '\0', 'o', ' ', 'm', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' ', 'P', 'h', 'i', 'l', 'i', 'p', '\n', '\0', etc...]
Edit:
Just to clarify, I understand how printf("%s", input) is printing the correct string. I also understand getLength will return the correct length every time.
I'm just confused about the chars saved in the array input. If we are referencing this same array in memory, how are old chars being handled?
How is the old input erased? Aren't I referencing the exact same array
that previously stored the previous input?
The old input is not erased. In each iteration of the loop, you are just overwriting the same array input and is freshly zero terminated (s[i] = '\0';) by getLength() in each iteration.
Since you are printing the string before the next iteration, it makes it possible to reuse the same array (and overwrite it). So, there really isn't any need to "save" anything.
C arrays are pointers to memory. With the line char input[MAXLINE]; you have allocated a contiguous block of 1000 bytes. Which are not going to be initialized every time unless you do so explicitly. The junk data you refer to is simply the previous iterations of using this block of bytes.
The end of a char array as a string is usually indicated by the '\0' character. Libraries that use string i/o like stdio.h make this assumption and calculate length by traversing the string until the zero character is encountered. One possible danger exists when you write nonzero characters all the way to the end of your array and then use strlen from stdio.h to find the length of your string. The function will go beyond the end of the buffer and crash your program.
Related
After assigning 26th element, when printed, still "Computer" is printed out in spite I assigned a character to 26th index. I expect something like this: "Computer K "
What is the reason?
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
printf("%s\n", m1); /*prints out "Computer"*/
printf("%c", m1[26]); /*prints "K"*/
}
At 8th index of that string the \0 character is found and %s prints only till it finds a \0 (the end of string, marked by \0) - at 26th the character k is there but it will not be printed as \0 is found before that.
char s[100] = "Computer";
is basically the same as
char s[100] = { 'C', 'o', 'm', 'p', 'u','t','e','r', '\0'};
Since printf stops when the string is 0-terminated it won't print character 26
Whenever you partially initialize an array, the remaining elements are filled with zeroes. (This is a rule in the C standard, C17 6.7.9 §19.)
Therefore char m1[40] = "Computer"; ends up in memory like this:
[0] = 'C'
[1] = 'o'
...
[7] = 'r'
[8] = '\0' // the null terminator you automatically get by using the " " syntax
[9] = 0 // everything to zero from here on
...
[39] = 0
Now of course \0 and 0 mean the same thing, the value 0. Either will be interpreted as a null terminator.
If you go ahead and overwrite index 26 and then print the array as a string, it will still only print until it encounters the first null terminator at index 8.
If you do like this however:
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); // prints out "Computer"
m1[8] = 'K';
printf("%s\n", m1); // prints out "ComputerK"
}
You overwrite the null terminator, and the next zero that happened to be in the array is treated as null terminator instead. This code only works because we partially initialized the array, so we know there are more zeroes trailing.
Had you instead written
int main()
{
char m1[40];
strcpy(m1, "Computer");
This is not initialization but run-time assignment. strcpy would only set index 0 to 8 ("Computer" with null term at index 8). Remaining elements would be left uninitialized to garbage values, and writing m1[8] = 'K' would destroy the string, as it would then no longer be reliably null terminated. You would get undefined behavior when trying to print it: something like garbage output or a program crash.
In C strings are 0-terminated.
Your initialization fills all array elements after the 'r' with 0.
If you place a non-0 character in any random field of the array, this does not change anything in the fields before or after that element.
This means your string is still 0-terminated right after the 'r'.
How should any function know that after that string some other string might follow?
That's because after "Computer" there's a null terminator (\0) in your array. If you add a character after this \0, it won't be printed because printf() stops printing when it encounters a null terminator.
Just as an addition to the other users answers - you should try to answer your question by being more proactive in your learning. It is enough to write a simple program to understand what is happening.
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
for(size_t index = 0; index < 40; index++)
{
printf("m1[%zu] = 0x%hhx ('%c')\n", index, (unsigned char)m1[index], (m1[index] >=32) ? m1[index] : ' ');
}
}
Please pardon me if it is a copy question. I will be happy to delete it if pointed out.
The question is that, if I declare a character array in c, say
char character_array[4];
Does that mean I can only store 3 characters and one '/0' is added as the fourth character? But I have tried it and successfully added four characters into the character array. But when I do that where is the '/0' added since I have already used up the four positions?
Well, yes, you can store any four characters. The string-termination character '\0' is a character just like any other.
But you don't have to store strings, char is a small integer so you can do:
char character_array[] = { 1, 2, 3, 4 };
This uses all four elements, but doesn't store printable characters nor any termination; the result is not a C string.
If you want to store a string, you need to accommodate the terminator character of course, since C strings by definition always end with the termination character.
C does not have protection against buffer overflow, if you aim at your foot and pull the trigger it will, in general, happily blow it off for you. Some of us like this. :)
You mix two notions: the notion of arrays and the notion of strings.
In this declaration
char character_array[4];
there is declared an array that can store 4 objects of type char. It is not important what values the objects will have.
On the other hand the array can contain a string: a sequence of characters limited with a terminating zero.
For example you can initialize the array above in C the following way
char character_array[4] = { 'A', 'B', 'C', 'D' };
or
char character_array[4] = "ABCD";
or
char character_array[4] = { '\0' };
or
char character_array[4] = "";
and so on.
In all these cases the array has 4 objects of type char. In the last two cases you may suppose that the array contains strings (empty strings) because the array has an element with zero character ( '\0' ). That is in the last two cases you may apply to the array functions that deal with strings.
Or another example
char character_array[4] = { 'A', 'B', '\0', 'C' };
You can deal with the array as if it had a string "AB" or just four objects.
Consider this demonstrative program
#include <stdio.h>
#include <string.h>
int main( void )
{
char character_array[4] = { 'A', 'B', '\0', 'C' };
char *p = strchr(character_array, 'C');
if (p == NULL)
{
printf("Character '%c' is not found in the array\n", 'C');
}
else
{
printf("Character '%c' is found in the array at position %zu\n",
'C',
(size_t)(p - character_array));
}
p = ( char * )memchr(character_array, 'C', sizeof(character_array));
if (p == NULL)
{
printf("Character '%c' is not found in the array\n", 'C');
}
else
{
printf("Character '%c' is found in the array at position %zu\n",
'C',
(size_t)(p - character_array));
}
}
The program output is
Character 'C' is not found in the array
Character 'C' is found in the array at position 3
In the first part of the program it is assumed that the array contains a string. The standard string function strchr just ignores all elements of the array after encountering the element with the value '\0'.
In the second part of the program it is assumed that the array contains a sequence of objects with the length of 4. The standard function memchr knows nothing about strings.
Conclusion.
This array
char character_array[4];
can contain 4 objects of type character. It is so declared.
The array can contain a string if to interpret its content as a string provided that at least one element of the array is equal to '\0'.
For example if to declare the array like
char character_array[4] = "A";
that is equivalent to
char character_array[4] = { 'A', '\0', '\0', '\0' };
then it may be said that the array contains the string "A" with the length equal to 1. On the other hand the array actually contain 4 object of type char as the second equivalent declaration shows.
You just reserve 4 bytes to fill with. If you write to _array[4] (the fifth character) you have a so called buffer overflow, means you write to non-reserved memory.
If you store a string in 4 characters, you have actually just 3 characters for printable characters (_array[0], ..., _array[2]) and the last one (_array[3]) is just for keeping the string termination '\0'.
For instance, in your case the function strlen() parses until such string termination '\0' and returns length=3.
So I just read an example of how to create an array of characters which represent a string.
The null-character \0 is put at the end of the array to mark the end of the array. Is this necessary?
If I created a char array:
char line[100];
and put the word:
"hello\n"
in it, the chars would be placed at the first six indexes line[0] - line[6], so the rest of the array would be filled with null characters anyway?
This books says, that it is a convention that, for example the string constant "hello\n" is put in a character array and terminated with \0.
Maybe I don't understand this topic to its full extent and would be glad for enlightenment.
The \0 character does not mark the "end of the array". The \0 character marks the end of the string stored in a char array, if (and only if) that char array is intended to store a string.
A char array is just a char array. It stores independent integer values (char is just a small integer type). A char array does not have to end in \0. \0 has no special meaning in a char array. It is just a zero value.
But sometimes char arrays are used to store strings. A string is a sequence of characters terminated by \0. So, if you want to use your char array as a string you have to terminate your string with a \0.
So, the answer to the question about \0 being "necessary" depends on what you are storing in your char array. If you are storing a string, then you will have to terminate it with a \0. If you are storing something that is not a string, then \0 has no special meaning at all.
'\0' is not required if you are using it as character array. But if you use character array as string, you need to put '\0'. There is no separate string type in C.
There are multiple ways to declare character array.
Ex:
char str1[] = "my string";
char str2[64] = "my string";
char str3[] = {'m', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g', '\0'};
char str4[64] = {'m', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g' };
All these arrays have the same string "my string". In str1, str2, and str4, the '\0' character is added automatically, but in str3, you need to explicitly add the '\0' character.
(When the size of an array is explicitly declared, and there are fewer items in the initializer list than the size of the array, the rest of the array is initialized with however many zeros it takes to fill it -- see C char array initialization and The N_ELEMENTS macro .).
When/Why is '\0' necessary to mark end of an (char) array?
The terminating zero is necessary if a character array contains a string. This allows to find the point where a string ends.
As for your example that as I think looks the following way
char line[100] = "hello\n";
then for starters the string literal has 7 characters. It is a string and includes the terminating zero. This string literal has type char[7]. You can imagine it like
char no_name[] = { 'h', 'e', 'l', 'l', 'o', '\n', '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So relative to the example the seven characters of the string literal are used to initialize first 7 elements of the array. All other elements of the array that were not initialized by the characters of the string literal will be initialized implicitly by zeroes.
If you want to determine how long is the string stored in a character array you can use the standard C function strlen declared in the header <string.h>. It returns the number of characters in an array before the terminating zero.
Consider the following example
#include <stdio.h>
#include <string.h>
int main(void)
{
char line[100] = "hello\n";
printf( "The size of the array is %zu"
"\nand the length of the stored string \n%s is %zu\n",
sizeof( line ), line, strlen( line ) );
return 0;
}
Its output is
The size of the array is 100
and the length of the stored string
hello
is 6
In C you may use a string literal to initialize a character array excluding the terminating zero of the string literal. For example
char line[6] = "hello\n";
In this case you may not say that the array contains a string because the sequence of symbols stored in the array does not have the terminating zero.
You need the null character to mark the end of the string. C does not store any internal information about the length of the character array or the length of a string, and so the null character/byte \0 marks where it ends.
This is only required for strings, however – you can have any ordinary array of characters that does not represent a string.
For example, try this piece of code:
#include <stdio.h>
int main(void) {
char string[1];
string[0] = 'a';
printf("%s", string);
}
Note that the character array is completely filled with data. Thus, there is no null byte to mark the end. Now, printf will keep printing until it hits a null byte – this will be somewhere past the end of the array, so you will print out a lot of junk in addition to just "a".
Now, try this:
#include <stdio.h>
int main(void) {
char string[2];
string[0] = 'a';
string[1] = '\0';
printf("%s", string);
}
It will only print "a", because the end of the string is explicitly marked.
The length of a C string (an array containing the characters and terminated with a '\0' character) is found by searching for the (first) NUL byte. \0 is zero character. In C it is mostly used to indicate the termination of a character string.
I make an example to you:
let's say you've written a word into a file:
word = malloc(sizeof(cahr) * 6);
word = "Hello";
fwrite(word, sizeof(char), 6, fp);
where in word we allocate space for the 5 character of "Hello" plus one more for its terminating '\0'. The fp is the file.
An now, we write another word after the last one:
word2 = malloc(sizeof(cahr) * 7);
word2 = "world!";
fwrite(word2, sizeof(char), 7, fp);
So now, let's read the two words:
char buff = malloc(sizeof(char)*1000); // See that we can store as much space as we want, it won't change the final result
/* 13 = (5 chacater from 'Hello')+(1 character of the \0)+(6 characters from 'world!')+(1 character from the \0) */
fread(buff, sizeof(char), 13, fp); // We read the words 'Hello\0' and 'world!\0'
printf("the content of buff is: %s", buff); // This would print 'Hello world!'
This last is due to the ending \0 character, so C knows there are two separated strings into buffer. If we had not put that \0 character at the end of both words, and repeat the same example, the output would be "Helloworld!"
This can be used for many string methods and functions!.
I am a very new programmer and I have been having a hard time figuring this out. I have looked online but I can't seem to find a clear answer. How do I go about replacing a desired substring by another in C?
I basically would like the user to be able to edit their string if they wanted to.
I know how to get the following:
The index from where to start making the edit
The length of the new substring to be added
I was wondering how do I then go about inserting the new substring and deleting the old one?
Any Help would be so appreciated
This is my code:
int length;
char firstString[100];
char editIndex[100];
char newString[100];
Char editedString[100];
printf("Enter your string: \n");
fgets(firstString, 100, stdin);
printf("Enter the word from which you would like to start editing: \n");
fgets(editIndex, 100, stdin);
printf("What substring would you like there instead?\n");
fgets(newString, 100, stdin);
length = strlen (newString);
This is where I get confused, I have the index from where to start the index (the first character of the word, and I know how long the edit is. I just don't know how to delete the original character(s) that were there and replace it by the new ones. I thought of using a for loop but I am not sure it would work.
This is what I would like to have as an end result:
printf("%s", editedString);
and then this printing the edit the user has made.
You will need a loop that iterates from the index of where to start making the edit and ends when the index is the length of the substring from the starting point setting the characters all the way. It needs to be careful not to overrun past the length of the original string.
Example
char theString[10] = {'H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd'};
char theSubString[5] = {'E', 'a', 'r', 't', 'h'};
for (int i = 0; i < 5, i++)
theString[5 + i] = theSubString[i];
This replaces the desired substring "World" with the string "Earth"
"HelloWorld"
becomes
"HelloEarth"
The problem you're facing here is how strings are stored in C. Strings in C are simply stored as an array of chars, with a terminating NULL character (\0). That information leads to the answer to your question, as well as a problem with what you're trying to do.
The problem is that because strings are saved as arrays, the string that you are trying to insert must be the same length as the string you are removing. Otherwise, if the new string is shorter than the old, you'll end up not fully removing the old string, and if the new string is longer than the old, you'll be overwriting other characters not included in the old string. To implement this so that the strings can be of different lengths, your code is going to have to account for the change in number of letters and shift other letters around in the word accordingly. Don't forget that if you shift letters further down the string, you also have to move the NULL character at the end and that you're not going beyond the end of the array.
The solution, though, is that because it's only an array of characters, individual characters can be accessed (and changed) simply by using array subscripts. If you can find the index of your substring, you can simply copy in your new string letter by letter.
int delete_substring (char *str, char *substr) {
char *loc = strstr (str, substr); // find position of substring
if (!loc) return -1; // not found
int sublen = strlen (substr);
for (int i = loc - str;; i++) {
str [i] = str [i + sublen]; // move characters left overwriting substr positions
if (str [i] == '\0') break; // moved the end of string char, we are done
}
return loc - str; // use pointer difference to return the integer where substr found
}
I'm working to try and understand some string functions so I can more effectively use them in later coding projects, so I set up the simple program below:
#include <stdio.h>
#include <string.h>
int main (void)
{
// Declare variables:
char test_string[5];
char test_string2[] = { 'G', 'O', '_', 'T', 'E', 'S', 'T'};
int init;
int length = 0;
int match;
// Initialize array:
for (init = 0; init < strlen(test_string); init++)
{ test_string[init] = '\0';
}
// Fill array:
test_string[0] = 'T';
test_string[1] = 'E';
test_string[2] = 'S';
test_string[3] = 'T';
// Get Length:
length = strlen(test_string);
// Get number of characters from string 1 in string 2:
match = strspn(test_string, test_string2);
printf("\nstrlen return = %d", length);
printf("\nstrspn return = %d\n\n", match);
return 0;
}
I expect to see a return of:
strlen return = 4
strspn return = 4
However, I see strlen return = 6 and strspn return = 4. From what I understand, char test_string[5] should allocate 5 bytes of memory and place hex 00 into the fifth byte. The for loop (which should not even be nessecary) should then set all the bytes of memory for test_string to hex 00. Then, the immediately proceeding lines should fill test_string bytes 1 through 4 (or test_string[0] through test_string[3]) with what I have specified. Calling strlen at this point should return a 4, because it should start at the address of string 0 and count an increment until it hits the first null character, which is at string[4]. Yet strlen returns 6. Can anyone explain this? Thanks!
char test_string[5];
test_string is an array of 5 uninitialized char objects.
for (init = 0; init < strlen(test_string); init++)
Kaboom. strlen scans for the first '\0' null character. Since the contents of test_string are garbage, the behavior is undefined. It might return a small value if there happens to be a null character, or a large value or program crash if there don't happen to be any zero bytes in test_string.
Even if that weren't the case, evaluating strlen() in the header of a for loop is inefficient. Each strlen() call has to re-scan the entire string (assuming you've given it a valid string), so if your loop worked it would be O(N2).
If you want test_string to contain just zero bytes, you can initialize it that way:
char test_string[5] = "";
or, since you initialize the first 4 bytes later:
char test_string[5] = "TEST";
or just:
char test_string[] = "TEST";
(The latter lets the compiler figure out that it needs 5 bytes.)
Going back to your declarations:
char test_string2[] = { 'G', 'O', '_', 'T', 'E', 'S', 'T'};
This causes test_string2 to be 7 bytes long, without a trailing '\0' character. That means that passing test_string2 to any function that expects a pointer to a string will cause undefined behavior. You probably want something like:
char test_string2[] = "GO_TEST";
strlen searches for '\0' character to count them, in your test_string, there is none so it continues until it finds one which happens to be 6 bytes away from the start of your array since it is uninitialized.
The compiler does not generate code to initialize the array so you don't have to pay to run that code if you fill it later.
To initialize it to 0 and skip the loop, you can use
char test_string[5] = {0};
This way, all character will be initialized to 0 and your strlen will work after you filled the array with "TEST".
There are a few problems here. First of all, char test_string[5]; simply sets aside 5 bytes for that string, but does not set the bytes to anything. In particular, when you say "char test_string[5] should allocate 5 bytes of memory and place hex 00 into the fifth byte", the second part is wrong.
Secondly, your array initialization loop uses strlen(test_string) but since the bytes of test_string are uninitialized, there's no way to know what's there so strlen(test_string) returns some undefined result. A better way to clear the array would be memset( test_string, 0, sizeof(test_string) );.
You fill the array with "TEST" but don't set the NULL byte at the end, so the last byte is still uninitialized. If you do the memset above this will be fixed, or you can manually do test_string[4] = '\0'.