Using fgets() in file parsing - c

I have a file which contains several lines.
I am tokenizing the file, and if the token contains contains .word, I would like to store the rest of the line in c-string.
So if:
array: .word 0:10
I would like to store 0:10 in a c-string.
I am doing the following:
if (strstr(token, ".word")) {
char data_line[MAX_LINE_LENGTH + 1];
int word_ret = fgets(data_line, MAX_LINE_LENGTH, fptr);
printf(".word is %s\n", data_line);
}
The problem with this is that fgets() grabs the next line. How would I grab the remainder of the current line? Is that possible?
Thank you,

strstr() returns a pointer to where the first character of ":word" is found.
This means that if you add the length of ":word" (5 characters) to that, you will get a pointer to the characters after ":word", which is the string you want.
char *x = strstr(token, ".word");
char *string_wanted = x + 5;

First of all it is obvious that you need to use fgets only once for every line you parse and then work with a buffer where the line is stored.
Next having a whole line you have several choices: if the string format is fixed (something like " .word") then you may use the result of "strstr" function to locate the start of ".word", advance 6 characters (including space) from it and print the required word from the found position.
Another option is more complex but in fact is a liitle bit better. It is using "strtok" function.

You need to have already read the input into a buffer, which I'm assuming is token, and from there you just copy from the return value of strstr + the length of ".word" to the end of the buffer. This is what I'd do:
char *location = strstr(token, ".word");
if (location != NULL) {
char data_line[MAX_LINE_LENGTH];
strncpy(data_line, location + 5, MAX_LINE_LENGTH);
printf(".word is %s\n", data_line);
}
You could add 5 or 6 to the pointer location (depending on whether or not there's going to be a space after ".word") to get the rest of the line.
Also note that the size parameter in strncpy and fgets includes space for the terminating NUL character.

Related

Tokenizing a phone number in C

I'm trying to tokenize a phone number and split it into two arrays. It starts out in a string in the form of "(515) 555-5555". I'm looking to tokenize the area code, the first 3 digits, and the last 4 digits. The area code I would store in one array, and the other 7 digits in another one. Both arrays are to hold just the numbers themselves.
My code seems to work... sort of. The issue is when I print the two storage arrays, I find some quirks;
My array aCode; it stores the first 3 digits as I ask it to, but then it also prints some garbage values notched at the end. I walked through it in the debugger, and the array only stores what I'm asking it to store- the 515. So how come it's printing those garbage values? What gives?
My array aNum; I can append the tokens I need to the end of it, the only problem is I end up with an extra space at the front (which makes sense; I'm adding on to an empty array, ie adding on to empty space). I modify the code to only hold 7 variables just to mess around, I step into the debugger, and it tells me that the array holds and empty space and 6 of the digits I need- there's no room for the last one. Yet when I print it, the space AND all 7 digits are printed. How does that happen?
And how could I set up my strtok function so that it first copies the 3 digits before the "-", then appends to that the last 4 I need? All examples of tokenization I've seen utilize a while loop, which would mean I'd have to choose either strcat or strcpy to complete my task. I can set up an "if" statement to check for the size of the current token each time, but that seems too crude to me and I feel like there's a simpler method to this. Thanks all!
int main() {
char phoneNum[]= "(515) 555-5555";
char aCode[3];
char aNum[7];
char *numPtr;
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
printf("%s\n", aCode);
numPtr = strtok(&phoneNum[6], "-");
while (numPtr != NULL) {
strcat(aNum, numPtr);
numPtr = strtok(NULL, "-");
}
printf("%s", aNum);
}
I can primarily see two errors,
Being an array of 3 chars, aCode is not null-terminated here. Using it as an argument to %s format specifier in printf() invokes undefined behaviour. Same thing in a differrent way for aNum, too.
strcat() expects a null-terminated array for both the arguments. aNum is not null-terminated, when used for the first time, will result in UB, too. Always initialize your local variables.
Also, see other answers for a complete bug-free code.
The biggest problem in your code is undefined behavior: since you are reading a three-character constant into a three-character array, you have left no space for null terminator.
Since you are tokenizing a value in a very specific format of fixed length, you could get away with a very concise implementation that employs sscanf:
char *phoneNum = "(515) 555-5555";
char aCode[3+1];
char aNum[7+1];
sscanf(phoneNum, "(%3[0-9]) %3[0-9]-%4[0-9]", aCode, aNum, &aNum[3]);
printf("%s %s", aCode, aNum);
This solution passes the format (###) ###-#### directly to sscanf, and tells the function where each value needs to be placed. The only "trick" used above is passing &aNum[3] for the last argument, instructing sscanf to place data for the third segment into the same storage as the second segment, but starting at position 3.
Demo.
Your code has multiple issues
You allocate the wrong size for aCode, you should add 1 for the nul terminator byte and initialize the whole array to '\0' to ensure end of lines.
char aCode[4] = {'\0'};
You don't check if strtok() returns NULL.
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
Point 1, applies to aNum in strcat(aNum, numPtr) which will also fail because aNum is not yet initialized at the first call.
Subsequent calls to strtok() must have NULL as the first parameter, hence
numPtr = strtok(&phoneNum[6], "-");
is wrong, it should be
numPtr = strtok(NULL, "-");
Other answers have already mentioned the major issue, which is insufficient space in aCode and aNum for the terminating NUL character. The sscanf answer is also the cleanest for solving the problem, but given the restriction of using strtok, here's one possible solution to consider:
char phone_number[]= "(515) 555-1234";
char area[3+1] = "";
char digits[7+1] = "";
const char *separators = " (-)";
char *p = strtok(phone_number, separators);
if (p) {
int len = 0;
(void) snprintf(area, sizeof(area), "%s", p);
while (len < sizeof(digits) && (p = strtok(NULL, separators))) {
len += snprintf(digits + len, sizeof(digits) - len, "%s", p);
}
}
(void) printf("(%s) %s\n", area, digits);

in C I want to read in line by line from a file a certain way with the end length of the file changing

Ok I need to read information in from a file. I have to take certain parts of the line apart and do different things with each part. I know the maximum and minimum length of the file but I am doing something wrong when I read in the file and then split it up as I am getting really funny values and stuff when I try to compare methods. The maximum length of any line is 80 character.
The format for each line will be as follows: (I will write them in column form as they would appear in a character array)
0-7 _ 9 10-16 _ 18 19-28_ _31-79
spots 0-7 will contain a string(any being under 8 will have blank spaces)
spots 8,17,29,30 are all blank spaces (Marked by underscores)
spots 10-16 will contain a string (again any being under the max length will have blank spaces at the end)
spot 18 will contain a blank space or a character
spot 19-28 will contain another string (Same as other cases)
spot 31-79 can be filled with a string or may not exist at all depends on the users input.
Right now I am using a buffer of size 82 and then doing strncpy to take certain parts from the buffer to break it up. It appears to be working fine but when I do strcmp I am getting funky answers and the strlen is not giving the char arrays I declared the right length.
(I have declared them as having a max length of 8,9,etc. but strlen has been returning weird numbers like 67)
So if I could just read it in broken up it should completely resolve the issue.
I was hoping there would be a way to do this but am currently unsure.
Any help would be greatly appreciated. I have attached the part of the code where I think the error is.
(I know it isn't good to have the size hardcoded in there but I want to get it working first and then I'll get rid of the magic numbers)
while (fgets(buffer, sizeof buffer, fp) != NULL) /* read a line from a file */
{
if (buffer[0] == '.') //If it is a comment line just echo it do not increase counter
{
printf("%s", buffer);
}
else if (buffer[0] == ' ' && buffer[10] == ' ') // If it is a blank line print blank line do not increase counter
{
printf("\n");
}
else //it is an actual instruction perform the real operations
{
//copy label down
strncpy(label, &buffer[0], 8);
//copy Pnemonic into command string
strncpy(command, &buffer[9], 8);
//copy symbol down
symbol = buffer[syLoc];
//copy operand down
strncpy(operand, &buffer[19], 9);
Funky characters and overlong string lengths are a sign that the strings aren't null-terminated, as C (or at least most of C's library functions) expects them.
strncpy will yield null-terminated strings only if the buffer is greater than the length of the source string. In your case, you want to copy substrings out of the middle of a string, so your strings won't have the null terminator.
You could add the null-terminator by hand:
char label[9];
strncpy(label, &buffer[0], 8);
label[8] = '\0';
But given that you have spaces after the substrings you want anyway, you could also use strtok's approach to make your substrings pointers into the line you have read and overwrite the spaces with the null character:
char *label;
char *command;
label = &buffer[0];
buffer[8] = '\0';
command = &buffer[9];
buffer[9 + 8] = '\0';
This approach has the advantage that you don't need extra memory for the substrings. It has the drawback that your substrings will become invalid when you read the next line. If your substrings don't "live" long enough, that approach might be good for you.
Warning: strncpy function do not add any null termination(\0) at the end of the copied chars.
To protect the target char array you have to manually add a \0after each strncpycall like this:
//copy label down
strncpy(label, &buffer[0], 8);
label[8]='\0';
//copy Pnemonic into command string
strncpy(command, &buffer[9], 8);
command[8]='\0';
//copy symbol down
symbol = buffer[syLoc]; //Ok just a single char
//copy operand down
strncpy(operand, &buffer[19], 9);
operand[9]='\0';
If no '\0' is added, chars will be read until a '\0' is encountered in the address after the readed char array in the memory (buffer overflow).

Variable reset after scanf

I wrote the below function :
typedef enum {GREEN,BLACK, WHITE} color;
void StartGame(Piece board[8][8])
{
color currentPlayer=WHITE;
char location[2];
int gameover=1;
while(gameover)
{
printf("%d\n",currentPlayer);
if(currentPlayer==WHITE)
printf(BOLDWHITE"White: Please select a piece:\n");
else
printf(BOLDBLACK"Black: Please select a piece:\n");
printf("%d\n",currentPlayer);
scanf("%s",location);
printf("%d\n",currentPlayer);
if(currentPlayer==WHITE)
currentPlayer=BLACK;
else
currentPlayer=WHITE;
}
}
I print the currentPlayer on any level to see what's going on -> here what I get:
2
White: Please select a piece:
2
a1
0
2
White: Please select a piece:
2
Why the current player is 0 after scanf? I didn't touch it.
The buffer location has only room for 2 characters and scanf puts an extra NUL character at end. Therefore you have a stack corruption issue. Just give more room to location, for example:
char location[8];
EDIT
Since you just want to read a string, I recommend you using fgets, which allows you to limit the number of read characters from the input string. Thus, my code would look like this:
char location[8];
...
fgets(location, sizeof(location), stdin); //instead of scanf, fgets reads at most one less than buffer's size characters.
You only have to worry about the fact that fgets puts a final end line character (\n) at the end, but this should not be a deal if you just process the 2 first characters of the string.
It seems you overwrite the memory occupied by currentPlayer when you enter a string in character array location. As it seen from the console output you enetered string a1. That to store it in array location it shall be defined at leat as
char location[3];
because scanf appends entered strings with the terminating zero.
It would be better if you would use function fgets instead.
You should use something like this:
sprintf(format, "%%%dX", sizeof(buffer));
fscanf(file, format, &buffer);

fscanf overwriting next bytes in memory (C)

The basic gist is, I'm reading words from a text file, storing them as a string, running a function, and then looping over this multiple times, rewriting that string with every new line read. After this loop is done, I need to deal with a different string. The problem is, the second string's bytes, even though I've memset them to 0 at declaration, are getting overwritten by the extra letters in words longer than the space I've allocated to the first string:
char* currDictWord = malloc(9*(sizeof(char));
char* currBrutWord = malloc(9*(sizeof(char));
memset(currBrutWord, 0, 9);
memset(currDictWord, 0, 9);
...
while (stuff) {
fscanf(dictionary, "%s", currDictWord);
}
...
printf("word: %s\n", currBrutWord);
currBrutWord will not be empty anymore. The two ways I've dealt with this are by either making sure currDictWord is longer than the longest word in the dictionary file (kind of a ghetto solution), and doing a new memset on currBrutWord after the loop. Is there no way to tell C to stop writing stuff into memory I've specifically allocated for a different variable?
Yes: stop using fscanf (and preferably the whole scanf-family), and use fgets instead, it lets you pass the maximum number of bytes to read into the variable.
EDIT: (in response to the comment)
fgets will stop reading until count bytes have been read or a newline has been found, which will be in the string. So after fgetsing the string check if there is a newline at the end of it (and remove if necessary). If there is no newline in the string fgetc from the file until you've found one, like this:
fgets(currDictWord, 9, dictionary);
if(currDictWord[strlen(currDictWord) - 1] != '\n'){
while(fgetc(dictionary) != '\n'); /* no body necssary */
/* the stream-pointer is now a the beginning of the next line */
}
Improper string assignment and that not validating data read from a file.
currBrutWord is overrun because too many chars were written into currBrutWord. The same would have happened had you done:
strcpy(currBrutWord, "123456789"); // Bad as this copy 9+1 char into currBrutWord
When using fscanf(), one could limit the data read via:
fscanf(dictionary, "%8s", currDictWord);
This prevents fscanf() from putting too much data into currDictWord. That part is good, but you still have unexpected data coming from the file. You need to challenge any data from the outside world.
if (NULL == fgets(bigbuf, sizeof bigbuf, dictionary)) {
; handle EOF or I/O error
}
// now parse and validate bigbuf using various tools: strtok(), sscanf(), etc.
int n;
if ((sscanf(bigbuf, "%8s%n", currDictWord, &n) < 1) || (bigbif[n] != '\n')) {
; handle error
}

Strchr and strncpy Misuse

Hi im trying to find the - char and then place the leftmost characters into a string. Here i would like FUPOPER to be stored in program_id_DB, however when i run this code my output results to:
Character '-' found at position 8.
The prgmid contains FUPOPERL <-where is it getting this l?!?!
char data_DB[]="FUPOPER-$DSMSCM.OPER*.FUP";
char program_id_DB[10];
char program_name_DB_c[ZSYS_VAL_LEN_FILENAME];
char *pos = strchr(data_DB, '-');
if (pos)
strncpy(program_id_DB,data_DB, pos-data_DB);
printf("Character '-' found at position %d.\n", pos-data_DB+1);
printf("The prgmid contains %s\n",program_id_DB);
You didn't initialize program_id_DB, so it's free to contain anything it wants. Set it to zero before you start:
memset(program_id_DB, 0, 10);
(You need to #include <string.h> for memset.)
In fact, what you're doing is terribly dangerous because there's no guarantee that the string you pass to printf is null-terminated! Always zero the array before use and copy at most 9 non-null characters into it.
You need to put a \0 to mark the string's end.
A way to do it is: memset(program_id_DB, 0, sizeof(program_id_DB)); before you strncpy to it.
You have to append a null-terminating character at the end of the program_id_DB string as strncpy does not do this automatically for you if you've already copied N characters (i.e., in your case you're copying a total of eight characters, so there will not be a null-terminating character copied into the buffer if you copy more than seven characters). Either that, or zero-initialize your program-id_DB string using memset before using it with strncpy.
strncpy is a bitch!
It doesn't terminate the string. You need to terminate the string yourself.
if (pos) {
strncpy(program_id_DB,data_DB, pos-data_DB);
program_id_DB[pos - data_DB] = 0;
}
And if the string is too small, strncpy will set the remainder with zeros.
strncpy(dst, src, 1000); /* always writes 1000 bytes, whether it needs to */

Resources