A peculiar bug while reading and printing strings of *char in C - c

I've just encountered something really odd. My string of char (let's call it word) turns out to have additional letters when I print it. The contatenated letter varies depending on:
the length of the proper prefix word.
the number of spaces after the word.
I'm parsing the word from a line which is just a one line form the standard input. I'm using a function readWord to get the word out of the line:
void readWord(char **linePointer, char **wordPointer){
char *line = *linePointer;
char *word = *wordPointer;
while (!isEndOfLine(line) && isLowerCaseLetter(*line)){
*word = *line;
word++;
line++;
}
word++;
*word = '\0';
printf("The retrieved word is: %s.\n", *wordPointer)
*linePointer = line;
}
My inputs/outputs look like this (please note that I call the readWord function AFTER taking care of insert and the whitespace between):
// INPUT 1 :
insert foo
insert ba // several spaces after 'ba'
// OUTPUT 2:
The retrieved word is foo.
The retrieved word is bas.
// INPUT 1 :
insert foo
insert ba // several spaces after 'bar'
// OUTPUT 2:
The retrieved word is foo.
The retrieved word is bare.
I was thinking whether I allocate the *word properly and I guess I do:
root.word = (char *)malloc(sizeof(char *)); //root is my structure
Moreover, it is unlikely connected to some errors of reassigning the word string because it is completely clear at the beginning of the readWord() function.
Thank you for any help. It is indeed a challenging bug for me and I don't know what else I can do.
UPDATE
It turns out that I actually have some problems with allocating/reassigning, since:
//INPUT
insert foo//no spaces
insert bar //spaces here
//OUTPUT
word variable before calling readWord function: ' '.
The retrieved word is foo.
word variable before calling readWord function: 'insert foo
'.
The retrieved word is bare.

Never trust your input, so check for spaces to go to the beginning of the word.
You increment word one too many, as #rpattiso notes.
I have doubts about your memory allocation (you don't show us all your code):
root.word = (char *)malloc(sizeof(char *)); allocates room for a pointer to a char, but does not allocate the room for the characters themselves. readWord can do that.
The following adapted version should work (updated):
void readWord(char **linePointer, char **wordPointer){
char *line = *linePointer;
int i;
while (!isEndOfLine(line) && !isLowerCaseLetter(*line)) line++; // go to begin of word
*linePointer= line;
while (!isEndOfLine(line) && isLowerCaseLetter(*line)) line++; // go to end of word
i= line - *linePointer; // allocate room for word and copy it
*wordPointer= malloc((i+1) * sizeof(char));
strncpy(*wordPointer, *linePointer, i);
(*wordPointer)[i]= '\0;
printf("The retrieved word is: %s.\n", *wordPointer);
*linePointer = line;
}

Related

Problem reading two strings with getchar() and then printing those strings in C

This is my code for two functions in C:
// Begin
void readTrain(Train_t *train){
printf("Name des Zugs:");
char name[STR];
getlinee(name, STR);
strcpy(train->name, name);
printf("Name des Drivers:");
char namedriver[STR];
getlinee(namedriver, STR);
strcpy(train->driver, namedriver);
}
void getlinee(char *str, long num){
char c;
int i = 0;
while(((c=getchar())!='\n') && (i<num)){
*str = c;
str++;
i++;
}
printf("i is %d\n", i);
*str = '\0';
fflush(stdin);
}
// End
So, with void getlinee(char *str, long num) function I want to get user input to first string char name[STR] and to second char namedriver[STR]. Maximal string size is STR (30 charachters) and if I have at the input more than 30 characters for first string ("Name des Zuges"), which will be stored in name[STR], after that I input second string, which will be stored in namedriver, and then printing FIRST string, I do not get the string from the user input (first 30 characters from input), but also the second string "attached" to this, I simply do not know why...otherwise it works good, if the limit of 30 characters is respected for the first string.
Here my output, when the input is larger than 30 characters for first string, problem is in the row 5 "Zugname", why I also have second string when I m printing just first one...:
Name des Zugs:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
i is 30
Name des Drivers:xxxxxxxx
i is 8
Zugname: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxxxxxxxx
Drivername: xxxxxxxx
I think your issue is that your train->name is not properly terminated with '\0', as a consequence when you call printf("%s", train->name) the function keeps reading memory until it finds '\0'. In your case I guess your structure looks like:
struct Train_t {
//...
char name[STR];
char driver[STR];
//...
};
In getlinee() function, you write '\0' after the last character. In particular, if the input is more than 30 characters long, you copy the first 30 characters, then add '\0' at the 31-th character (name[30]). This is a first buffer overflow.
So where is this '\0' actually written? well, at name[30], even though your not supposed to write there. Then, if you have the structure above when you do strcpy(train->name, name); you will actually copy a 31-bytes long string: 30 chars into train->name, and the '\0' will overflow into train->driver[0]. This is the second buffer overflow.
After this, you override the train->driver buffer so the '\0' disappears and your data in memory basically looks like:
train->name = "aaa...aaa" // no '\0' at the end so printf won't stop reading here
train->driver = "xxx\0" // but there
You have an off-by-one error on your array sizes -- you have arrays of STR chars, and you read up to STR characters into them, but then you store a NUL terminator, requiring (up to) STR + 1 bytes total. So whenever you have a max size input, you run off the end of your array(s) and get undefined behavior.
Pass STR - 1 as the second argument to getlinee for the easiest fix.
Key issues
Size test in wrong order and off-by-one. ((c=getchar())!='\n') && (i<num) --> (i+1<num) && ((c=getchar())!='\n'). Else no room for the null character. Bad form to consume an excess character here.
getlinee() should be declared before first use. Tip: Enable all compiler warnings to save time.
Other
Use int c; not char c; to well distinguish the typical 257 different possible results from getchar().
fflush(stdin); is undefined behavior. Better code would consume excess characters in a line with other code.
void getlinee(char *str, long num) better with size_t num. size_t is the right size type for array sizing and indexing.
int i should be the same type as num.
Better code would also test for EOF.
while((i<num) && ((c=getchar())!='\n') && (c != EOF)){
A better design would return something from getlinee() to indicate success and identify troubles like end-of-file with nothing read, input error, too long a line and parameter trouble like str == NULL, num <= 0.
I believe you have a struct similar to this:
typedef struct train_s
{
//...
char name[STR];
char driver[STR];
//...
} Train_t;
When you attempt to write a '\0' to a string that is longer than STR (30 in this case), you actually write a '\0' to name[STR], which you don't have, since the last element of name with length STR has an index of STR-1 (29 in this case), so you are trying to write a '\0' outside your array.
And, since two strings in this struct are stored one after another, you are writing a '\0' to driver[0], which you immediately overwrite, hence when printing out name, printf doesn't find a '\0' until it reaches the end of driver, so it prints both.
Fixing this should be easy.
Just change:
while(((c=getchar())!='\n') && (i<num))
to:
while(((c=getchar())!='\n') && (i<num - 1))
Or, as I would do it, add 1 to array size:
char name[STR + 1];
char driver[STR + 1];

Scanning data from text file, that doesn't have spacing between each item of data

I have encountered a problem with my homework. I need to scan some data from a text file, to a struct.
The text file looks like this.
012345678;danny;cohen;22;M;danny1993;123;1,2,4,8;Nice person
223325222;or;dan;25;M;ordan10;1234;3,5,6,7;Singer and dancer
203484758;shani;israel;25;F;shaninush;12345;4,5,6,7;Happy and cool girl
349950234;nadav;cohen;50;M;nd50;nadav;3,6,7,8;Engineer very smart
345656974;oshrit;hasson;30;F;osh321;111;3,4,5,7;Layer and a painter
Each item of data to its matching variable.
id = 012345678
first_name = danny
etc...
Now I can't use fscanf because there is no spacing, and the fgets scanning all the line.
I found some solution with %[^;]s, but then I will need to write one block of code and, copy and past it 9 times for each item of data.
Is there any other option without changing the text file, that similar to the code I would write with fscanf, if there was spacing between each item of data?
************* UPDATE **************
Hey, First of all, thanks everyone for the help really appreciating.
I didn't understand all your answers, but here something I did use.
Here's my code :
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct
{
char *idP, *firstNameP, *lastNameP;
int age;
char gender, *userNameP, *passwordP, hobbies, *descriptionP;
}user;
void main() {
FILE *fileP;
user temp;
char test[99];
temp.idP = (char *)malloc(99);
temp.firstNameP = (char *)malloc(99);
temp.lastNameP = (char *)malloc(99);
temp.age = (int )malloc(4);
temp.gender = (char )malloc(sizeof(char));
temp.userNameP = (char *)malloc(99);
fileP = fopen("input.txt", "r");
fscanf(fileP, "%9[^;];%99[^;];%99[^;];%d;%c", temp.idP,temp.firstNameP,temp.lastNameP,&temp.age, temp.gender);
printf("%s\n%s\n%s\n%d\n%c", temp.idP, temp.firstNameP, temp.lastNameP, temp.age, temp.gender);
fgets(test, 60, fileP); // Just testing where it stop scanning
printf("\n\n%s", test);
fclose(fileP);
getchar();
}
It all works well until I scan the int variable, right after that it doesn't scan anything, and I get an error.
Thanks a lot.
As discussed in the comments, fscanf is probably the shortest option (although fgets followed by strtok, and manual parsing are viable options).
You need to use the %[^;] specifier for the string fields (meaning: a string of characters other than ;), with the fields separated by ; to consume the actual semicolons (which we specifically requested not to be consumed as part of the string field). The last field should be %[^\n] to consume up to the newline, since the input doesn't have a terminating semicolon.
You should also (always) limit the length of each string field read with a scanf family function to one less than the available space (the terminating NUL byte is the +1). So, for example, if the first field is at most 9 characters long, you would need char field1[10] and the format would be %9[^;].
It is usually a good idea to put a single space in the beginning of the format string to consume any whitespace (such as the previous newline).
And, of course you should check the return value of fscanf, e.g., if you have 9 fields as per the example, it should return 9.
So, the end result would be something like:
if (fscanf(file, " %9[^;];%99[^;];%99[^;];%d;%c;%99[^;];%d;%99[^;];%99[^\n]",
s.field1, s.field2, s.field3, &s.field4, …, s.field9) != 9) {
// error
break;
}
(Alternatively, the field with numbers separated by commas could be read as four separate fields as %d,%d,%d,%d, in which case the count would go up to 12.)
Here you have simple tokenizer. As I see you have more than one delimiter here (; & ,)
str - string to be tokenized
del - string containing delimiters (in your case ";," or ";" only)
allowempty - if true allows empty tokens if there are two or more consecutive delimiters
return value is a NULL terminated table of pointers to the tokens.
char **mystrtok(const char *str, const char *del, int allowempty)
{
char **result = NULL;
const char *end = str;
size_t size = 0;
int extrachar;
while(*end)
{
if((extrachar = !!strchr(del, *end)) || !*(end + 1))
{
/* add temp variable and malloc / realloc checks */
/* free allocated memory on error */
if(!(!allowempty && !(end - str)))
{
extrachar = !extrachar * !*(end + 1);
result = realloc(result, (++size + 1) * sizeof(*result));
result[size] = NULL;
result[size -1] = malloc(end - str + 1 + extrachar);
strncpy(result[size -1], str, end - str + extrachar);
result[size -1][end - str + extrachar] = 0;
}
str = end + 1;
}
end++;
}
return result;
}
To free the the memory allocated by the tokenizer:
void myfree(char **ptr)
{
char **savedptr = ptr;
while(*ptr)
{
free(*ptr++);
}
free(savedptr);
}
Function is simple but your can use any separators and any number of separators.

How to copy text untill newline character?

I have a char array list that contains text from a text file, for example:
this is the first line
this is the second line
I want to have the first line copied to another char array without \n (and/or \r).
I do not know the size of the first line exactly but I do know it is less than 100 bytes.
Snappet of my code:
unsigned char *line;
line = (u_char *)calloc(100, sizeof(char));
//read txt file to list
while(list[0] != '\n'){
line[0] = list[0];
list++;
line++;
}
Unfortunaly line is empty. Note that I know for sure list isn't empty, and contains the text as showed above.
Any suggestions on this code, or another solution? The file is opened using open() and not fopen() so I've to loop through my list array.
You can do it like this:
for ( int i = 0; list[i] && list[i] != '\n'; ++i ) {
line[i] = list[i];
}
You also could use strcspn() from the standard library string.h:
Declaration:
size_t strcspn(const char *str1, const char *str2);
Finds the first sequence of characters in the string str1 that does
not contain any character specified in str2.
Returns the length of this first sequence of characters found that do
not match with str2.
Source
Your program would then become
unsigned char *line;
int firstlineLength;
//read txt file to list
/*count the characters up to first linebreak */
firstlineLength = strspn(list, "\n");
/* allocate just the memory you need +1 one for the terminating zero*/
line = (u_char *)calloc(firstlineLength+1, sizeof(char));
strncpy(line, list, firstlineLength);

Reading a line of chars one word at a time

I'm new to C and I see plenty of example of reading a file one word at a time but I'm trying to make a function that is given a line of text(actually a list of filenames) and it needs to read a word(filename) at a time.
Eg. I call the function, words("file1.c file2.c file3.txt");
And the function needs to read each word(filename) and put it through another function.
So far I've got:
void words(char* line) {
char buf[100];
while (!feof(line)) {
fscanf(line,"%s",buf);
printf("current word %s \n", buf);
}
}
But this won't compile. I get "passing argument 1 of ‘feof’ from incompatible pointer type"
edit So this is the code I've come up with. It seems to work fine if I called it with words("test1 test2 test3 test4 "); but if the last character is not a space then it has an error in the out put. eg ("test1 test2 test3 test4");
char buf[100];
int word_length = 0;
int n;
while((sscanf(line + word_length,"%s",buf, &n)) == 1) {
printf("current word %s \n", buf);
word_length = word_length + strlen(buf) + 1;
}
What I am doing wrong?
The fscanf and feof functions work on files.
The corresponding function for strings is sscanf.
The return value from sscanf can be used to check whether you managed to scan anything from the string and how far into the string you should look for the next word.
Edit:
Good effort. There are two problems left. First, if there are multiple spaces between words your code will fail. Also, the + 1 will move you past the null terminator if there is no space after the last word.
The second problem can be solved by not adding a +1. That means that the next item will be scanned right after the previous one ends. This is not a problem because scanf will skip initial whitespace.
The problem with multiple spaces can be solved by finding how far into the string the next token starts using strstr.
Because strstr returns a pointer I switched to using a pointer instead of an index to keep track of progress through the string.
char *ptr = line;
while((sscanf(ptr,"%s",buf)) == 1) {
printf("current word %s \n", buf);
ptr = strstr(ptr, buf); // Find where the current word starts.
ptr += strlen(buf); // Skip past the current word.
}

Tokenize Strings using Pointers in ANSI C

This is in Ansi C. I am given a string. I am supposed to create a method that returns an array of character pointers that point to the beginning of each word of said string. I am not allowed to use Malloc, but instead told that the maximum length of input will be 80.
Also, before anyone flames me for not searching the forum, I can't use strtok :(
char input[80] = "hello world, please tokenize this string"
and the output of the method should have 6 elements;
output[0] points to the "h",
output[1] points to the "w",
and so on.
How should I write the method?
Also, I need a similar method to handle input from a file with maximum of 110 lines.
Pseudocode:
boolean isInWord = false
while (*ptr != NUL character) {
if (!isInWord and isWordCharacter(*ptr)) {
isInWord = true
save ptr
} else if (isInWord and !isWordCharacter(*ptr)) {
isInWord = false
}
increment ptr
}
isWordCharacter checks whether the character is part of the word or not. Depending on your definition, it can be only alphabet character (recognize part-time as 2 words), or it may include - (recognize part-time as one word).
Because it's homework here's a part of what you might need:
char* readPtr = input;
char* wordPtr = input;
int wordCount = 0;
while (*readPtr++ != ' ');
/* Here we have a word from wordPtr to readPtr-1 */
output[wordCount++] = /* something... :) */
You'll need that in a loop, and must consider how to move onto the next word, and check for end of input.

Resources