C line printed from file undefined behavior - c

I have made a c program that parses through the source code file of a language called rapid to extract certain data that I will need to document at work. the data extracted is saved to a csv file that is then formatted into an excel worksheet.
Everything is working except for this function that I have put below. In certain scenarios I was wanting to remove all of the spaces and tabs from a line read from a file so that I can store the statement as a string, in a struct attribute.
The program isn't crashing, but when I printf() the new line with the whitespace removed, some other characters get printed out to.
Example "cmd.exe" , "PowerShell\v1.0\Modules", "igh\AppData\LocaloYSφo¡"
If I do Printf("%s\n", currentLine); It prints fine
When I use printf("%s\n", removeWhiteSpace(currentLine)); I get the undefined behavior.
Here is the function
/******************************************************************
* Takes a string as input, returns it without tabs or spaces
* Used to put whole line into the additional commands
* Attribute
******************************************************************/
static char* removeWhiteSpace(char* string)
{
int i;
int j;
int len = strlen(string);
char ch;
char* result = malloc(sizeof(char)*len+1);
memset(result, 0, sizeof(*result));
j=0;
for (i=0; i<len; i++)
{
ch = string[i];
if ((ch != ' ') && (ch != '\t'))
{
result[j] = ch;
j++;
}
}
result[strlen(result)] = '\0';
return result;
}
Also, I am using fgets() to get the line from the file, and the size for the buffer is at 1000.
The unwanted characters don't exist in the text file, at least not visible anyways.
Thank you for your time, and if you need the text file or the rest of the program I can provide it, but it is lengthy.
Also, I'm using codeblocks IDE using the GCC compiler, I have no errors or warnings when I compile.

memset(result, 0, sizeof(*result));
That is wrong. *result is the thing result points to. result is char *, so it points to a char, and the size of a char is 1. So that statement sets one char to zero. It does not set the entire block of allocated memory to zero.
As we will see, it is unneeded, so just delete that statement.
result[strlen(result)] = '\0';
This statement is useless. strlen works by finding the first null (zero) character in an array. So strlen(result) would report where the first null character is. Then result[strlen(result)] = '\0'; would set that character to zero. But it is already zero. So this statement can never accomplish anything. More than that, though, it does not work because the memset above failed to set the memory to zero, so there may be no null character inside the allocated memory to find. In that case, the behavior is not defined by the C standard.
However, there is no need to use strlen to find the end of the string. We know where the end of the string should be. The object j has been counting the characters written to result. So just delete this line too and use:
result[j] = '\0';
When I use printf("%s\n", removeWhiteSpace(currentLine)); I get the undefined behavior.
That does not make any sense. “Undefined behavior” is not a thing. It is a lack of a thing. Saying something has “undefined behavior” means the C standard does not define what the behavior is. A program that has undefined behavior may print nothing, it may print a desired result, it may print an undesired result, it may print garbage characters, it may crash, and it may hang.
Saying a program produced undefined behavior does not tell anybody what happened. Instead, you should have written a specific description of the behavior of the program, such as “The program printed the expected text followed by unexpected characters.” A copy-and-paste of the exact input and the exact output would be good.

Related

Removing spaces in a string

I was trying to remove spaces of a string after scanning it. The program has compiled perfectly but it is not showing any output and the output screen just keeps getting shut down after scanning the string. Is there a logical error or is there some problem with my compiler(I am using a devc++ compiler btw).
Any kind of help would be appreciated
int main()
{
char str1[100];
scanf("%s",&str1);
int len = strlen(str1);
int m;
for (m=0;m<=len;){
if (&str1[m]==" "){
m++;
}
else {
printf("%c",&str1[m]);
}
m++;
}
return 0;
}
Edit : sorry for the error of m=1, I was just checking in my compiler whether that works or not and just happened to paste that code
Your code contains a lot of issues, and the behaviour you describe is very likely not because of a bug in the compiler :-)
Some of the issues:
Use scanf("%s",str1) instead of scanf("%s",&str1). Since str1 is defined as a character array, it automatically decays to a pointer to char as required.
Note that scanf("%s",str1) will never read in any white space because "%s" is defined as skipping leading white spaces and stops reading in when detecting the first white space.
In for (m=1;m<=len;) together with str1[m], note that an array in C uses zero-based indizes, i.e. str1[0]..str1[len-1], such that m <= len exceeds array bounds. Use for (m=0;m<len;) instead.
Expression &str1[m]==" " is correct from a type perspective, but semantically a nonsense. You are comparing the memory address of the mth character with the memory address of a string literal " ". Use str1[m]==' ' instead and note the single quotes denoting a character value rather than double quotes denoting a string literal.
Statement printf("%c",&str1[m]) passes the memory address of a character to printf rather than the expected character value itself. Use printf("%c",str1[m]) instead.
Hope I found everything. Correct these things, turn on compiler warnings, and try to get ahead. In case you face further troubles, don't hesitate to ask again.
Hope it helps a bit and good luck in experiencing C language :-)
There are many issues:
You cannot read a string with spaces using scanf("%s") , use fgets instead (see comments).
scanf("%s", &str1) is wrong anyway, it should be scanf("%s", str1);, str1 being already the address of of the string
for (m = 0; m <= len;) is wrong, it should be for (m = 0; m < len;), because otherwise the last character you will check is the NUL string terminator.
if (&str1[m]==" ") is wrong, you should write if (str1[m]==' '), " " does not denote a space character but a string literal, you need ' ' instead..
printf("%c", &str1[m]); is wrong, you want to print a char so you need str1[m] (without the &).
You should remove both m++ and put that into the for statement: for (m = 1; m < len; m++), that makes the code clearer.
And possibly a few more problems.
And BTW your attempt doesn't remove the spaces from the string, it merely displays the string skipping spaces.
There are a number of smaller errors here that are adding up.
First, check the bounds on your for loop. You're iterating from index 1 to index strlen(str1), inclusive. That's a reasonable thing to try, but remember that in C, string indices start from 0 and go up to strlen(str1), inclusive. How might you adjust the loop to handle this?
Second, take a look at this line:
if (&str1[m] == " ") {
Here, you're attempting to check whether the given character is a space character. However, this doesn't do what you think it does. The left-hand side of this expression, &str1[m], is the memory address of the mth character of the string str1. That should make you pause for a second, since if you want to compare the contents of memory at a given location, you shouldn't be looking at the address of that memory location. In fact, the true meaning of this line of code is "if the address of character m in the array is equal to the address of a string literal containing the empty string, then ...," which isn't what you want.
I suspect you may have started off by writing out this line first:
if (str1[m] == " ") {
This is closer to what you want, but still not quite right. Here, the left-hand side of the expression correctly says "the mth character of str1," and its type is char. The right-hand side, however, is a string literal. In C, there's a difference between a character (a single glyph) and a string (a sequence of characters), so this comparison isn't allowed. To fix this, change the line to read
if (str1[m] == ' ') {
Here, by using single quotes rather than double quotes, the right-hand side is treated as "the space character" rather than "a string containing a space." The types now match.
There are some other details about this code that need some cleanup. For instance, look at how you're printing out each character. Is that the right way to use printf with a character? Think about the if statement we discussed above and see if you can tinker with your code. Similarly, look at how you're reading a string. And there may be even more little issues here and there, but most of them are variations on these existing themes.
But I hope this helps get you in the right direction!
For loop should start from 0 and less than length (not less or equal)
String compare is wrong. Should be char compare to ' ' , no &
Finding apace should not do anything, non space outputs. You ++m twice.
& on %c output is address not value
From memory, scanf stops on whitespace anyway so needs fgets
int main()
{
char str1[100];
scanf("%s",str1);
int len = strlen(str1);
int m;
for (m=0;m<len;++m){
if (str1[m]!=' '){
printf("%c",str1[m]);
}
}
return 0;
}
There are few mistakes in your logic.
scanf terminates a string when it encounter any space or new line.
Read more about it here. So use fgets as said by others.
& in C represents address. Since array is implemented as pointers in C, its not advised to use & while getting string from stdin. But scanf can be used like scanf("%s",str1) or scanf("%s",&str[1])
While incrementing your index put m++ inside else condition.
Array indexing in C starts from 0 not 1.
So after these changes code will becames something like
int main()
{
char str1[100];
fgets(str1, sizeof str1 , stdin);
int len = strlen(str1);
int m=0;
while(m < len){
if (str1[m] == ' '){
m++;
}
else {
printf("%c",str1[m]);
m++;
}
}
return 0;
}

What does this line of code do? [Newbie]

int main(void)
{
string n = GetString();
if(n!=NULL){
for(int i=0, j=strlen(n); i<j; i++){
if(!isalpha(n[i-1]) && isalpha(n[i])){
printf("%c", toupper(n[i]));
}
}
}
}
if(!isalpha(name[i-1]) && isalpha(name[i]))
how can this line be explained to a new starter?(by the way the code works properly on harvard's cs50 ide)
The code is attempting to find every occurrence of a non-alphabetic character in the array n followed by an alphabetic character and, in each one, print that alphabetic character in upper case.
The problem is, since i starts with the value of 0, the code has undefined behaviour in the first iteration since it accesses a character before the start of the array.
The code might seem to work properly under cs50, but that is just happenstance. One feature of undefined behaviour is that it is not required to produce any error, or any unexpected results. But that doesn't make it right. It simply means that it did not produce an observable symptom in some set of circumstances.
Note: for sake of discussion, I am assuming string is a pointer to char, and that GetString() returns the address of the first character in an array of char.

Strncpy (String C programming) doesn't perform well the second time

Why does the second strncpy give incorrect weird symbols when printing?
Do I need something like fflush(stdin) ?
Note that I used scanf("%s",aString); to read an entire string, the input that is entered starts first off with a space so that it works correctly.
void stringMagic(char str[], int index1, int index2)
{
char part1[40],part2[40];
strncpy(part1,&str[0],index1);
part1[sizeof(part1)-1] = '\0';//strncpy doesn't always put '\0' where it should
printf("\n%s\n",part1);
strncpy(part2,&str[index1+1],(index2-index1));
part2[sizeof(part2)-1] = '\0';
printf("\n%s\n",part2);
}
EDIT
The problem seems to lie in
scanf("%s",aString);
because when I use printf("\n%s",aString); and I have entered something like "Enough with the hello world" I only get as output "Enough" because of the '\0'.
How can I correctly input the entire sentence with whitespace stored? Reading characters?
Now I use: fgets (aString, 100, stdin);
(Reading string from input with space character?)
In order to print a char sequence correctly using %s it needs to be null-terminated. In addition the terminating symbol should be immediately after the last symbol to be printed. However this section in your code: part2[sizeof(part2)-1] = '\0'; always sets the 39th cell of part2 to be the 0 character. Note that sizeof(part2) will always return the memory size allocated for part2 which in this case is 40. The value of this function does not depend on the contents of part2.
What you should do instead is to set the (index2-index1) character in part2 to 0. You know you've copied that many characters to part2, so you know what is the expected length of the resulting string.

In K&R 1.9 longest line example, what is getchar() doing?

I seem to understand the program now, except the getline function is not very intuitive as it seems to copy everything getchar() returns to a character array s[] which is never really used for anything important.
int getline(char s[], int lim)
{
int c, i;
for(i=0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if(c == '\n')
{
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The function could just as easily ignore the line s[i] = c; because all the function is really doing is counting the number of characters until it reaches EOF or '\n' returns from getchar()
What I really do not understand is why the program progressed forward as the main loop is as follows:
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max)
{
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
The only explanation would be that the getchar() function does its magic after the user has entered in a full line of text, and hits the enter key. So it would appear to work during run-time is my guess.
Is this how the program progresses? Does the program first enter the while loop, and then wait for a user to enter a line of text, and once the user hits enter, the getline function's for-loop is iterated? I feel like this would be the case, since the user can enter backspace during input.
My question is, how exactly does the program move forward at all? Is it all because of the getchar() function?
When I hit ctrl-D in the terminal, some other confusing stuff happens. If I hit ctrl-D at the start of a newline, the program will terminate. If I hit ctrl-D at the end of a line filled with some text, it does not terminate and it does not act the same way as hitting enter. If I hit ctrl-D a few times in a line with text, the program will finally end.
Is this just the way my terminal is treating the session, or is this all stuff I should not be worrying about if I just want to learn C?
The reason why I ask is that I like to trace the program to get a good understanding of it, but the getchar() function makes that tricky.
In a parameter declaration (and only in that context), char s[] really means char *s. The way the C standard describes this is that:
A declaration of a parameter as "array of type" shall be adjusted to
"qualified pointer to
type".
So s really is a pointer, of type char*, and when the function modifies s[i] it's modifying the ith element of line.
On the call:
getline(line, MAXLINE)
line is an array, but in most contexts an array expression is implicitly converted to a pointer to the array's first element.
These two rules almost seem to be part of a conspiracy to make it look like arrays and pointers are really the same thing in C. They most definitely are not. A pointer object contains the address of some object (or a null pointer that doesn't point to any object); an array object contains an ordered sequence of elements. But most manipulation of arrays in C is done via pointers to the array's elements, with pointer arithmetic used to advance from one element to the next.
Suggested reading (I say this a lot): section 6 of the comp.lang.c FAQ.
getchar reads a character from standard input. So if that's you sitting at the terminal, it blocks the program until it receives a character you've typed, then it's done. But standard input is line buffered when its interactive, so what you type isn't processed by the program until you press enter. That means that getchar will be able to keep reading all the characters you typed, as they're read from the buffer.
You're mistaken about the function. The array is passed to the function*, and it stores each character read by getchar (except for EOF or newline) in successive elements. That's the point of it - not to count the characters, but to store them in the array.
(*a pointer is actually passed, but the function here can still treat it like an array.)
The array is used for something important: it is provided by the caller and returned modified with the new content. From a reasonable point of view, filling in the array is the purpose of calling the function.
That array (an array reference) is actually a pointer, char s[] is the same as char *s. so it's building its result in that array, which is why it's copied later in main. there is rarely any "magic" in K&R.

File Reading in C, Random Error

Thank you everybody so far for your input and advice!
Additionally:
After testing and toying further, it seems individual calls to FileReader succeed. But calling FileReader multiple times (these might be separate versions of FileReader) causes the issue to occur.
End Add
Hello,
I have a very unusual problem [please read this fully: it's important] (Code::Blocks compiler, Windows Vista Home) [no replicable code] with the C File Reading functions (fread, fgetc). Now, normally, the File Reading functions load up the data correctly to a self-allocating and self-deallocating string (and it's not the string's issue), but this is where it gets bizarre (and where Quantum Physics fits in):
An error catching statement reports that EOF occurred too early (IE inside the comments section at the start of the text file it's loading). Printing out the string [after it's loaded] reports that indeed, it's too short (24 chars) (but it has enough space to fit it [~400] and no allocation issues). The fgetc loop iterator reports it's terminating at just 24 (the file is roughly 300 chars long) with an EOF: This is where it goes whacky:
Temporarily checking Read->_base reports the entire (~300) chars are loaded - no EOF at 24. Perplexed, [given it's an fgetc loop] I added a printf to display each char [as a %d so I could spot the -1 EOF] at every step so I could see what it was doing, and modified it so it was a single char. It loops fine, reaching the ~300 mark instead of 24 - but freezes up randomly moments later. BUT, when I removed printf, it terminated at 24 again and got caught by the error-catching statement.
Summary:
So, basically: I have a bug that is affected by the 'Observer Effect' out of quantum physics: When I try to observe the chars I get from fgetc via printf, the problem (early EOF termination at 24) disappears, but when I stop viewing it, the error-catch statement reports early termination.
The more bizarre thing is, this isn't the first time it's occurred. Fread had a similar problem, and I was unable to figure out why, and replaced it with the fgetc loop.
[Code can't really be supplied as the code base is 5 headers in size].
Snippet:
int X = 0;
int C = 0;
int I = 0;
while(Copy.Array[X] != EOF)
{
//Copy.Array[X] = fgetc(Read);
C = fgetc(Read);
Copy.Array[X] = C;
printf("%d %c\n",C,C); //Remove/add this as necessary
if(C == EOF){break;}
X++;
}
Side-Note: Breaking it down into the simplest format does not reproduce the error.
This is the oldest error in the book, kind of.
You can't use a variable of type char to read characters (!), since the EOF constant doesn't fit.
You need:
int C;
Also, the while condition looks scary, you are incrementing X in the loop, then checking the (new) position, is that properly initialized? You don't show how Copy.Array is set up before starting the loop.
I would suggest removing that altogether, it's very strange code.
In fact, I don't understand why you loop reading single characters at all, why not just use fread() to read as much as you need?
Firstly, unwind's answer is a valid point although I'm not sure whether it explains the issues you are seeing.
Secondly,
printf("%d %c\n",C,C); //Remove/add this as necessary
might be a problem. The %d and %c format specifiers expect an int to be the parameter, you are only passing a char. Depending on your compiler, this might mean that they are too small.
This is what I think the problem is:
How are you allocating Copy.Array? Are you making sure all its elements are zeroed before you start? If you malloc it (malloc just leaves whatever garbage was in the memory it returns) and an element just happens to contain 0xFF, your loop will exit prematurely because your while condition tests Copy.Array[X] before you have placed a character in that location.
This is one of the few cases where I allow myself to put an assignment in a condition because the pattern
int c;
while ((c = fgetc(fileStream)) != EOF)
{
doSomethingWithC(c);
}
is really common
Edit
Just read your "Additionally" comment. I think it is highly likely you are overrunning your output buffer. I think you should change your code to something like:
int X = 0; int C = 0; int I = 0;
while(X < arraySize && (C = fgetc(Read)) != EOF)
{
Copy.Array[X] = C;
printf("%d %c\n", (int)C, (int)C);
X++;
}
printf("\n");
Note that I am assuming that you have a variable called arraySize that is set to the number of characters you can write to the array without overrunning it. Note also, I am not writing the EOF to your array.
You probably have some heap corruption going on. Without seeing code it's impossible to say.
Not sure if this is your error but this code:
C = fgetc(Read);
Copy.Array[X] = C;
if(C == EOF){break;}
Means you are adding the EOF value into your array - I'm pretty sure you don't want to do that, especially as your array is presumably char and EOF is int, so you'll actually end up with some other value in there (which could mess up later loops etc).
Instead I suggest you change the order so C is only put in the array once you know it is not EOF:
C = fgetc(Read);
if(C == EOF){break;}
Copy.Array[X] = C;
Whilst this isn't what I'd call a 'complete' answer (as the bug remains), this does solve the 'observer effect' element: I found, for some reason, printf was somehow 'fixing' the code, and using std::cout seemed to (well, I can't say 'fix' the problem) prevent the observer effect happening. That is to say, use std::cout instead of printf (as printf is the origin of the observer effect).
It seems to me that printf does something in memory on a lower level that seems to partially correct what does indeed seem to be a memory allocation error.

Resources