I'm having an issue with a program in C and I think that a for loop is the culprit, but I'm not certain. The function is meant to take a char[] that has already been passed through a reverse function, and write it into a different char[] with all trailing white space characters removed. That is to say, any ' ' or '\t' characters that lie between a '\n' and any other character shouldn't be part of the output.
It works perfectly if there are no trailing white space characters, as in re-writing an exact duplicate of the input char[]. However, if there are any, there is no output at all.
The program is as follows:
#include<stdio.h>
#define MAXLINE 1000
void trim(char output[], char input[], int len);
void reverse(char output[], char input[], int len);
main()
{
int i, c;
int len;
char block[MAXLINE];
char blockrev[MAXLINE];
char blockout[MAXLINE];
char blockprint[MAXLINE];
i = 0;
while ((c = getchar()) != EOF)
{
block[i] = c;
++i;
}
printf("%s", block); // for debugging purposes
reverse(blockrev, block, i); // reverses block for trim function
trim(blockout, blockrev, i);
reverse(blockprint, blockout, i); // reverses it back to normal
// i also have a sneaking suspicion that i don't need this many arrays?
printf("%s", blockprint);
}
void trim(char output[], char input[], int len)
{
int i, j;
i = 0;
j = 0;
while (i <= len)
{
if (input[i] == ' ' || input[i] == '\t')
{
if (i > 0 && input[i-1] == '\n')
for (; input[i] == ' ' || input[i] == '\t'; ++i)
{
}
else
{
output[j] = input[i];
++i;
++j;
}
}
else
{
output[j] = input[i];
++i;
++j;
}
}
}
void reverse(char output[], char input[], int len)
{
int i;
for (i = 0; len - i >= 0; ++i)
{
output[i] = input[len - i];
}
}
I should note that this is a class assignment that doesn't allow the use of string functions, hence why it's so roundabout.
Change
for (i; input[i] == ' ' || input[i] == '\t'; ++i);
to
for (; i <= len && (input[i] == ' ' || input[i] == '\t'); ++i);
With the first method, if the whitespace is at the end, the loop will iterate indefinitely. Not sure how you didn't get an out of bounds access exception, but that's C/C++ for you.
Edit As Arkku brought up in the comments, make sure your character array is still NUL-terminated (the \0 character), and you can check on that case instead. Make sure you're not trimming the NUL character from the end either.
Declaring your main() function simply as main() is an obsolete style that should not be used. The function must be declared either as int main(void) or as int main(int argc, char *argv[]).
Your input process does not null-terminate your input. This means that what you're working with is not a "string", because a C string, by definition, is an array of char that the last element is a null character ('\0'). Instead, what you've got are simple arrays of char. This wouldn't be a problem as long as you're expecting that, and indeed your code is passing array lengths about, but you're also trying to print it with printf(), which requires C strings, not simple arrays of char.
Your reverse() function has an off-by-one error, because you aren't accounting for the fact that C arrays are zero-indexed, so what you're reversing is always one byte longer than your actual input.
What this means is that if you call reverse(output, input, 10), your code will start by assigning the value at input[10] to output[0], but input[10] is one past the end of your data, and since you didn't initialize your arrays before starting to fill them, that's an indeterminate value. In my testing, that indeterminate value happens, coincidentally, to have zero values much of the time, which means that output[0] gets filled with a null ('\0').
You need to be subtracting one more from the index into the input than you actually are. The loop-termination condition in the reverse() function is also wrong, in compensation, that condition should be len - i > 0, not len - i >= 0.
Your trim() function is unnecessarily complex. Additionally, it too has an incorrect loop condition to compensate for the off-by-one error in reverse(). The loop should be while ( i < len ), not while ( i <= len ).
Additionally, the trim() function has the ability to reduce the size of your data, but you don't provide a way to retain that information. (I see in the comments of Arkku's answer that you've corrected for this already. Good.)
Once you've fixed the issue with not keeping track of your data's size changes, and the off-by-one error which is copying indeterminate data (which happens, coincidentally, to be a null) from the end of the blockout array to the beginning of the blockprint array when you do the second reverse(), and you fix the incorrect <= condition in trim() and the incorrect >= condition in reverse(), and you null-terminate your byte array before passing it to printf(), your program will work.
(Moved from comments to an answer)
My guess is that the problem is outside this function, and is caused by the fact that in the described problem cases the output is shorter than the input. Since you are passing the length of the string as an argument, you need to calculate the length of the string after trim, because it may have changed...
For instance, passing an incorrect length to reverse can cause the terminating NUL character (and possibly some leftover whitespace) to end up at the beginning of the string, thus making the output appear empty.
edit: After seeing the edited question with the code of reverse included, in addition to the above problem, your reverse puts the terminating NUL as the first character of the reversed string, which causes it to be the empty string (in some cases your second reverse puts it back, so you don't see it without printing the output of the first reverse). Note that input[len] contains the '\0', not the last character of the string itself.
edit 2: Furthermore, you are not actually terminating the string in block before using it. It may be the case that the uninitialised array often happens to contain zeroes that serve to terminate the string, but for the program to be correct you absolutely need to terminate it with block[i] = '\0'; immediately after the input loop. Similarly you need ensure NUL-termination of the outputs of reverse and trim (in case of trim it seems to me that this already happens as a side-effect of having the loop condition i <= len instead of i < len, but it's not a sign of good code that it's hard to tell).
Related
I started learning C and I had this exercise from the book "Prentice Hall - The C Programming Language".
Chapter 5 Exercise 3:
Write a pointer version of the fuction strcat that we showed in Chapter 2. strcat(s, t) copies the string t to the end of s.
I did the exercise but the first method that came up to my mind was:
void stringcat(char *s, char *t){
int i,j;
i = j = 0;
while(*(s+i) != '\0'){
printf("%d", i);
i++;
}
while ( (*(t+j)) != '\0'){
*(s+i) = *(t+j);
i++;
j++;
}
}
In main I had:
int main(){
char s[] = "Hola";
char t[] = "lala";
stringcat(s,t);
printf("%s\n", s);
}
At first sight I thought it was right but the actual output was Holalalaa.
Of course it was not the output that I expected, but then I coded this:
void stringcat(char *s, char *t){
int i,j;
i = j = 0;
while(*(s+i) != '\0'){
printf("%d", i);
i++;
}
while((*(s+i) = *(t+j)) != '\0'){
i++;
j++;
}
}
And the output was right.
But then I was thinking a lot about the first code because it's very similar to the second one but why the first output was wrong?. Is it something related with the while statement? or something with pointers?. I found it really hard to understand because you can't see what's happening in the array.
Thanks a lot.
Your code has more than the one problem that you found, but let's start with it.
Actually you are asking why
/* ... */
while ((*(t+j)) != '\0') {
*(s+i) = *(t+j);
/* ... */
works differently than
/* ... */
while ((*(s+i) = *(t+j)) != '\0') {
/* ... */
I hope you see it already, now that both cases stand side by side, actually vertically ;-). In the first case the value of t[j] is compared before it is copied to s[i]. In the second case the comparison is done after the copy. That's why the second case copies the terminating '\0' to the target string, and the first case does not.
The output you get works accidentally, it is Undefined Behavior, since you are writing beyond the border of the target array. Fortunately for you, both strings are laying in sequence in the memory, and you are overwriting the source string with its own characters.
Because your first case does not copy the '\0', the final printf() outputs more characters until a '\0' is encountered. By chance this is the last 'a'.
As others commented, the target string has not enough space for the concatenated string. Provide some more space like this:
char s[10] = "Hola"; /* 10 is enough for both strings and the terminating '\0'. */
However, if you had done this already, the error would have not been revealed, because the last 6 characters of s are initialized with '\0'. Not copying the terminating '\0' makes no difference. You can see this if you use
char s[10] = "Hola\0xxxx";
I don't think that your solution is the expected one. Instead of s[i] you are using *(s + i), which is essentially the same, accessing an array. Consider changing s (and in the course, t) in the function and use just *s.
Side note: The printf() in the function is most probably a leftover from debugging. But I'm sure you know.
for (unsigned int i = 0; i < strlen(s); i++) {
if (s[i] != ' ')
strcat(p, s[i]);
I want to add the current character of the s string at the end of the p string provided it is not a space. How do I do that using strcat? The code above gives the following error "invalid conversion from 'char' to 'const char*'".
I want to use strcat because this way I don't have to store an index for p string in order to know where to place the current character. I hope this makes sense.
Also, I need to do this using array of chars, not c-strings or whatever those are called.
A more sensible algorithm would avoid using strcat() or strncat() altogether:
int j = strlen(p);
for (int i = 0; s[i] != '\0'; i++)
{
if (s[i] != ' ')
p[j++] = s[i];
}
p[j] = '\0';
This avoids quadratic behaviour which using strlen() and strcat() (or strncat()) necessarily involves. It does mean you need to keep a track of where to place characters in p, but the work involved in doing that is trivial. Generally speaking, the quadratic behaviour won't be a problem on strings of 10 characters or so, but if the strings reach 1000 bytes or more, then quadratic behaviour becomes a problem (it takes 1,000,000 operations instead of 1,000 operations — that can become noticeable).
First, you need addresses (the array itself is an address to the first element) to pass to the strcat, not a char. That is why you need to use the & operator before s[i].
You have a working example here
#include <stdio.h>
#include <string.h>
int main()
{
char s[50]= "Hello World";
char p[50]= "Hello World";
for(unsigned int i=0;i<strlen(s);i++){
if(s[i]!=' ')
strncat(p,&s[i],1);
}
puts(p);
return 0;
}
It is a simple program to check whether string is palindrome or not. I made following code in the program.
When i compile it, there is no error but when i try to run the .exe file, i always get following message.
The fundamental problem here is that the loop exit condition can never be met, and this leads to an infinite loop:
(i != j || i != j - 1)
This condition is logically equivalent to:
!(i == j && i == j - 1)
which is obviously always true, so the loop continues indefinitely. The loop only needs to continue so long as j > i.
There is another problem here; namely that the dangerous function gets() should never be used. This function was deprecated in C99 and completely removed from the language in C11. One alternative is to use fgets() instead. Note that this function keeps the newline (if there is room in the buffer), so you will need to remove this after getting the input. Also, characters may be left in the input stream if the buffer is too small. For this reason, it would be better to declare a generously sized input buffer to reduce the risk of problems here. There is no reason not to use an input buffer holding 1000 characters, and I usually just use 4096 for something like this. Memory is cheap.
Further, there is a risk of undefined behavior in the posted code, since the input string may be empty. In this case, strlen(str) would be 0, so in the first execution of the loop body j would be decremented to -1. But the array access str[-1] is out of bounds, and leads to undefined behavior.
This problem can be fixed by checking that j is positive before the first decrement. Note that size_t is the correct type for array indices, as it is an unsigned integer type that is guaranteed to be able to hold any array index. Also note that the strlen() function returns a value of type size_t, not int.
Here is a modified version of the posted code. The size_t type is used for array indices. The length of the input string is stored in j, which is then decremented only if it is a positive value; this sets j to the index of the character preceding the null terminator so long as the input string is not an empty string. The loop continues while j is greater than i and the characters indexed by these values match. After the loop terminates, str[i] and str[j] should agree; if they do not, then the input was not a palindrome.
#include <stdio.h>
#include <string.h>
#define BUF_SZ 4096
int main(void)
{
char str[BUF_SZ];
printf("Enter string:\n");
fgets(str, sizeof str, stdin); // Never use gets()
str[strcspn(str, "\r\n")] = '\0'; // remove '\n'
size_t i = 0;
size_t j = strlen(str);
if (j > 0) {
--j;
}
while (i < j && str[i] == str[j]) {
++i;
--j;
}
if (str[i] == str[j]) {
puts("string is palindrome!!");
} else {
puts("string is not palindrome!!");
}
return 0;
}
The below code reads a line and return the line length. lim is the length of the array s[].
When the input line length is lim, then s[lim] = '\0'. But the array s[] is only lim-length long, from s[0] to s[lim-1]. Will it cause an buffer overflow? I tested it many times, but the code seemed to work just fine.
int getline(char s[], int lim)
{
int c, i;
for(i = 0; i < lim-1 && ( c = getchar())!= EOF && c!= '\n'; i++)
s[i] = c;
if( c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The '\0' is just another character. It is stored right after the last character of the string.
Often, you can "get away" with writing off the end of a buffer with no obvious harm, but don't do it. It's a bug.
I once had to debug a program that contained an error like this. The program was writing a single byte past the end of one buffer. In the debug build, there was enough extra stuff on the stack that the single byte extra caused no harm; the crash only occurred in the release build, but the debugger didn't really work since it was the non-debug build. This is an example of why it is good to test your code both in a "debug" build and in a release build (compiled the way you would give it to your users).
This is a good example as to how to clearly define an interface - its input and returned value;
" int getline(char s[], int lim) "
One possible definition of "lim" is, maximum number of characters to be copied to s[], excluding the terminating null-character i.e. '\0'
Example:
char arr[] = "hello";
getline(arr, strlen(arr));
The other definition of "lim" is, Maximum number of characters to be copied into s[] (including the terminating null-character)
Example:
char arr[] = "hello";
getline(arr, sizeof(arr));
You seem to be supposing the 2nd definition of "lim".
This is a function straight out of "The C Programming Language" by K&R. It's from chapter one. It works because it is correct.
Consider "cat". This is a four character array {'c','a','t','\0'}. The length of the string is 3.
If s[]="cat" then s[0]='c', s[3]='\0'. Eh?
The string length returned by srtlen or what have you is the number of characters minus one. The array is allocated to hold all the 4 characters. That's where the '\0' is, at the end of the array.
No, it won't cause buffer overflow. In fact, a '\0' indicates a NULL position, which is considered as the end of an array. When you go from the beginning to the end of an array, the last position containing the '\0' character will never be considered as a position containing valid data.
You could go over all the array by using while(index < size) as a condition, or by using while(array[position] != NULL)
Why do we need to add a '\0' (null) at the end of a character array in C?
I've read it in K&R 2 (1.9 Character Array). The code in the book to find the longest string is as follows :
#include <stdio.h>
#define MAXLINE 1000
int readline(char line[], int maxline);
void copy(char to[], char from[]);
main() {
int len;
int max;
char line[MAXLINE];
char longest[MAXLINE];
max = 0;
while ((len = readline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0)
printf("%s", longest);
return 0;
}
int readline(char s[],int lim) {
int c, i;
for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0'; //WHY DO WE DO THIS???
return i;
}
void copy(char to[], char from[]) {
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
My Question is why do we set the last element of the character array as '\0'?
The program works fine without it...
Please help me...
You need to end C strings with '\0' since this is how the library knows where the string ends (and, in your case, this is what the copy() function expects).
The program works fine without it...
Without it, your program has undefined behaviour. If the program happens to do what you expect it to do, you are just lucky (or, rather, unlucky since in the real world the undefined behaviour will choose to manifest itself in the most inconvenient circumstances).
In c "string" means a null terminated array of characters. Compare this with a pascal string which means at most 255 charactes preceeded by a byte indicating the length of the string (but requiring no termination).
Each appraoch has it's pros and cons.
Especially string pointers pointed to array of characters without length known is the only way NULL terminator will determine the length of the string.
Awesome discussion about NULL termination at link
Because C defines a string as contiguous sequence of characters terminated by and including the first null character.
Basically the authors of C had the choice to define a string as a sequence of characters + the length of string or to use a magic marker to delimit the end of the string.
For more information on the subject I suggest to read this article:
"The Most Expensive One-byte Mistake" by Poul-Henning Kamp
http://queue.acm.org/detail.cfm?id=2010365
You have actually written the answer yourself right here:
void copy(char to[], char from[]) {
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
The loop in this function will continue until it encounters a '\0' in the array from. Without a terminating zero the loop will continure an unknown number of steps, until it encounters a zero or an invalid memory region.
Really, you do not need to end a character array by \0. It is the char*, or the C representation of the string that needs to be ended by it.
As for array, you have to add a \0 after its end if you want to transfer it to the string (representer by char*).
On the other hand, you need to have \0 at the end of the array, if you want to address it as char* and plan to use char* functions on it.
'\0' in the array indicates the end of string, which means any character after this character is not considered part of the string.It doesn’t mean they are not part of the character array. i.e., we can still access these characters by indexing but they are just not part when we invoke string related things to this character array.
For a string to be in proper format and be able to work properly with the string functions, it must be a null-terminated character array. Without NULL, the programs show undefined behavior when we invoke string functions on the character array. Even though we might get lucky with the results most of the times, it still is an undefined behavior.
I've just looked it up
If your array is considered as string
Which means like this char array[MAX]="string";
Or like this scanf("%s",array);
Or char* table;
Then the NULL character '\0' will append automatically as the end of the characters on that table
But if you initialized it like this char array[MAX]={'n','o','t','s','t,'r'};
Or you fill it using character by character with %c format
for(int i=0;i<MAX;i++)
scanf("%c",&array[i]);
Or getchar() instead of scanf("%c",...)
Then you have to add '\0' by yourself
Because now it considered as any other array's type (int,float...) So the cases that we consider as empty are actually filled by random numbers or characters depends on the type
Meanwhile in the case of a string type the next character after the last considered character is by default '\0'
for more explanation the length of this char array[]="12345" is 6
The array[5]=='\0' will return 1
by other words you can't define a string array like this char array[3]="123" because we left no room for the '\0' that has to append automatically
last example char array[7]={'t','e','s','t','\0'};
Here array[4] is the NULL character
array[5] and array[6] are random values
But if it was string then "test" array[4] and 5 and 6 are all filled by the NULL character (NULL character can refers to any white_space as I think so tab '\t' and enter '\n' are also NULL characters just like '\0' which may refer to spacebar)
nota ben: we can't assign array[7] or more as we all know but if you try to output it, it'll show a random value as any empty case
It is string terminating symbol,When this is encountered ,compiler comes to know that your string is ended.