strtok() appends some character to my string - c

I'm using strtok() to parse a string I get from fgets() that is separated by the ~ character
e.g. data_1~data_2
Here's a sample of my code:
fgets(buff, LINELEN, stdin);
pch = strtok(buff, " ~\n");
//do stuff
pch = strtok(NULL, " ~\n");
//do stuff
The first instance of strtok breaks it apart fine, I get data_1 as is, and strlen(data_1) provides the correct length of it. However, the second instance of strtok returns the string, with something appended to it.
With an input of andrewjohn ~ jamessmith, I printed out each character and the index, and I get this output:
a0
n1
d2
r3
e4
w5
j6
o7
h8
n9
j0
a1
m2
e3
s4
s5
m6
i7
t8
h9
10
What is that "11th" value corresponding to?
EDIT:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char buff[100];
char * pch;
fgets(buff, 100, stdin);
pch = strtok(buff, " ~\n");
printf("FIRST NAME\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
printf("SECOND NAME\n");
pch = strtok(NULL, " ~\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
}
I ran it by:
cat sample.in | ./myfile
Where sample.in had
andrewjohn ~ johnsmith
Output was:
FIRST NAME
a 97 0
n 110 1
d 100 2
r 114 3
e 101 4
w 119 5
j 106 6
o 111 7
h 104 8
n 110 9
SECOND NAME
j 106 0
o 111 1
h 104 2
n 110 3
s 115 4
m 109 5
i 105 6
t 116 7
h 104 8
13 9
So the last character is ASCII value 13, which says it's a carriage return ('\r'). Why is this coming up?

Based on your edit, the input line ends in \r\n. As a workaround you could just add \r to your list of tokens in strtok.
However, this should be investigated further. \r\n is the line ending in a Windows file, but stdin is a text stream, so \r\n in a file would be converted to just \n in the fgets result.
Are you perhaps piping in a file that contains something weird like \r\r\n ? Try hex-dumping the file you're piping in to check this.
Another possible explanation might be that your Cygwin (or whatever) environment has somehow been configured not to translate line endings in a file piped in.
edit: Joachim's suggestion is much more likely - using a \r\n file on a non-Windows system. If this is the case , you can fix it by running dos2unix on the file. But in accordance with the principle "accept everything, generate correctly" it would be useful for your program to handle this file.

Related

How can fscanf(), in C, be used to read a .gro file?

I am trying to read the following gro file via a C code.
FJP in Pol Water in water t= 0.00000 step= 0
16
1FJP P 1 5.346 7.418 0.319
2FJP P 2 5.151 7.405 0.499
3FJP P 3 5.260 7.178 0.428
4FJP P 4 5.159 6.961 0.342
5FJP P 5 5.355 6.909 0.220
6FJP P 6 5.169 6.824 0.043
7FJP P 7 5.068 6.669 11.454
8FJP P 8 4.919 6.861 11.482
9FJP P 9 4.835 7.075 11.364
10FJP P 10 4.738 6.987 11.197
11FJP P 11 4.847 7.115 10.993
12FJP P 12 4.642 7.126 10.870
13FJP P 13 4.680 6.940 10.674
14FJP P 14 4.521 7.052 10.545
15FJP P 15 4.321 6.973 10.513
16FJP P 16 4.315 6.728 10.516
11.56681 11.56681 11.56681
My code is:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, const char * argv[])
{
char input_file[]="file.gro";
FILE *input;
char *myfile=malloc(sizeof(char)*80);
sprintf(myfile,"%s",input_file); //the .gro file being read in
input=fopen(myfile,"r");
double dummy1,dummy6,dummy7,dummy8,dummy9,dummy10,dummy11;
int dummy2,dummy3,dummy4,dummy5;
int lines=0;
while (fscanf(input,"FJP in Pol Water in water t= %lf step= %d",&dummy1,&dummy2)==2
||fscanf(input," %d\n",&dummy3)==1
||fscanf(input," %dFJP P %d %lf %lf %lf\n",
&dummy4,&dummy5,&dummy6,&dummy7,&dummy8)==5
||fscanf(input," %lf %lf %lf\n",&dummy9,&dummy10,&dummy11)==3)
{
printf("%lf %d\n",dummy1,dummy2);
printf("%d\n",dummy3);
printf("%d %d\n",dummy4,dummy5);
printf("%lf %lf %lf\n",dummy6,dummy7,dummy8);
printf("%lf %lf %lf\n",dummy9,dummy10,dummy11);
lines=lines+1;
}
printf("lines=%d\n",lines);
fclose(input);
}
The problem is the values printed by the various dummy variables do not match what is in the file. Also, the number of lines being read is 3 as opposed to 19, which matches the file. I am not certain what is incorrect about my fscanf() statements to read this file. Any help for this problem would be much appreciated.
Your main problem is that you are assuming scanf is better than it is.
Scanf will read and parse as many arguments as it can, and then give up. It does not rewind to the start of the scanf. Also it treats spaces and newlines (and tabs) as simply "skip all whitespace"
So the line printf("%d\n",dummy3) will try to parse the main lines, eg 1FJP
It will read the digit 1 OK into dummy3, but then get stuck because P != a whitespace.
All the other rules will then get stuck, because none of them expect a P or any string first.
If you want to do it this way, you will just have to apply the scanf statements more intelligently as and when they are expected.
The problem is that you try to read and match the header repeatedly, before each line read (in the while loop.) you should read the head once, then read the lines. You also only need to skip any given piece of whitespace once. So you end up with code like:
if (fscanf(input,"FJP in Pol Water in water t=%lf step=%d%d", &dummy1, &dummy2, &dummy3) != 3) {
fprintf(stderr, "Invalid header\n");
exit(1); }
while (fscanf(input,"%dFJP P%d%lf%lf%lf", &dummy4, &dummy5, &dummy6, &dummy7, &dummy8) == 5) {
... read a line of the table

Reading data from a text file line by line into arrays using strtok in C

Currently trying to read data from a text file line by line using strtok and a space as a delimiter and save the info into different arrays. Im using the FatFs library to read the file from an sd card. Atm im only trying to read the first 2 elements from the line.
My text file looks like this:
223 895 200 200 87 700 700 700
222 895 200 200 87 700 700 700
221 895 200 200 87 700 700 700
222 895 200 200 87 700 700 700
My current code is something like this:
void sd_card_read()
{
char buffer[30];
char buffer2[10];
char buffer3[10];
int i=0;
int k=0;
int l=0;
int16 temp_array[500];
int16 hum_array[500];
char *p;
FIL fileO;
uint8 resultF;
resultF = f_open(&fileO, "dados.txt", FA_READ);
if(resultF == FR_OK)
{
UART_UartPutString("Reading...");
UART_UartPutString("\n\r");
while(f_gets(buffer, sizeof(buffer), &fileO))
{
p = strtok(buffer, " ");
temp_array[i] = atoi(p);
UART_UartPutString(p);
UART_UartPutString("\r\n");
p = strtok(NULL, " ");
hum_array[i] = atoi(p);
UART_UartPutString(p);
UART_UartPutString("\r\n");
i++;
}
UART_UartPutString("Done reading");
resultF = f_close(&fileO);
}
UART_UartPutString("Printing");
UART_UartPutString("\r\n");
for (k = 0; k < 10; k++)
{
itoa(temp_array[k], buffer2, 10);
UART_UartPutString(buffer2);
UART_UartPutString("\r\n");
}
for (l = 0; l < 10; l++)
{
itoa(hum_array[l], buffer3, 10);
UART_UartPutString(buffer3);
UART_UartPutString("\r\n");
}
}
The output atm is this:
223
0
222
0
etc..
895
0
895
0
etc..
After reading one time it puts the next position the value of 0 in both arrays, which is not what is wanted. Its probably something basic but cant see what is wrong.
Any help is valuable!
If we take the first line of the file
223 895 200 200 87 700 700 700
That lines is, including space and newline (assuming single '\n') 31 characters long. And since strings in C needs to be terminated by '\0' the line requires at least 32 characters (if f_gets works similar to the standard fgets function, and adds the newline).
Your buffer you read into only fits 30 characters, which means only 29 characters of your line would be read and then the terminator added. So that means you only read
223 895 200 200 87 700 700 70
The next time you call f_gets the function will read the remaining
0
You need to increase the size of the buffer to be able to fit all of the line. With the current data it needs to be at least 32 characters. But be careful since an extra character in one of the lines will give you the same problem again.

Having issues iterating through machine code

I'm attempting to recreate the wc command in c and having issues getting the proper number of words in any file containing machine code (core files or compiled c). The number of logged words always comes up around 90% short of the amount returned by wc.
For reference here is the project info
Compile statement
gcc -ggdb wordCount.c -o wordCount -std=c99
wordCount.c
/*
* Author(s) - Colin McGrath
* Description - Lab 3 - WC LINUX
* Date - January 28, 2015
*/
#include<stdio.h>
#include<string.h>
#include<dirent.h>
#include<sys/stat.h>
#include<ctype.h>
struct counterStruct {
int newlines;
int words;
int bt;
};
typedef struct counterStruct ct;
ct totals = {0};
struct stat st;
void wc(ct counter, char *arg)
{
printf("%6lu %6lu %6lu %s\n", counter.newlines, counter.words, counter.bt, arg);
}
void process(char *arg)
{
lstat(arg, &st);
if (S_ISDIR(st.st_mode))
{
char message[4056] = "wc: ";
strcat(message, arg);
strcat(message, ": Is a directory\n");
printf(message);
ct counter = {0};
wc(counter, arg);
}
else if (S_ISREG(st.st_mode))
{
FILE *file;
file = fopen(arg, "r");
ct currentCount = {0};
if (file != NULL)
{
char holder[65536];
while (fgets(holder, 65536, file) != NULL)
{
totals.newlines++;
currentCount.newlines++;
int c = 0;
for (int i=0; i<strlen(holder); i++)
{
if (isspace(holder[i]))
{
if (c != 0)
{
totals.words++;
currentCount.words++;
c = 0;
}
}
else
c = 1;
}
}
}
currentCount.bt = st.st_size;
totals.bt = totals.bt + st.st_size;
wc(currentCount, arg);
}
}
int main(int argc, char *argv[])
{
if (argc > 1)
{
for (int i=1; i<argc; i++)
{
//printf("%s\n", argv[i]);
process(argv[i]);
}
}
wc(totals, "total");
return 0;
}
Sample wc output:
135 742 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
231 1189 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
24 224 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
39 247 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
41 230 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11508 85256 1324239 total
Sample rebuild output (./wordCount):
139 76 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
233 493 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
25 3 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
40 6 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
42 3 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11517 83205 1324239 total
Notice the difference in the word count (second int) from the first two files (core files) as well as the roman file and parseURL files (machine code, no extension).
C strings do not store their length. They are terminated by a single NUL (0) byte.
Consequently, strlen needs to scan the entire string, character by character, until it reaches the NUL. That makes this:
for (int i=0; i<strlen(holder); i++)
desperately inefficient: for every character in holder, it needs to count all the characters in holder in order to test whether i is still in range. That transforms a simple linear Θ(N) algorithm into an Θ(N2) cycle-burner.
But in this case, it also produces the wrong result, since binary files typically include lots of NUL characters. Since strlen will actually tell you where the first NUL is, rather than how long the "line" is, you'll end up skipping a lot of bytes in the file. (On the bright side, that makes the scan quadratically faster, but computing the wrong result more rapidly is not really a win.)
You cannot use fgets to read binary files because the fgets interface doesn't tell you how much it read. You can use the Posix 2008 getline interface instead, or you can do binary input with fread, which is more efficient but will force you to count newlines yourself. (Not the worst thing in the world; you seem to be getting that count wrong, too.)
Or, of course, you could read the file one character at a time with fgetc. For a school exercise, that's not a bad solution; the resulting code is easy to write and understand, and typical implementations of fgetc are more efficient than the FUD would indicate.

Using strcat to append spaces. Compiles but overwrites string

The language I am working in is C.
I am trying to use a mix of built in c string functions in order to take a list of tokens (space separated) and "convert" it into a list of tokens that is split by quotations.
A string like
echo "Hello 1 2 3 4" test test2
gets converted to
[echo] ["Hello] [1] [2] [3] [4"] [test] [test2]
I then use my code (at bottom) to attempt to convert it into something like
[echo] [Hello 1 2 3 4] [test] [test2]
For some reason the second 'token' in the quoted statement gets overridden.
Here's a snippet of the code that runs over the token list and converts it to the new one.
88 for (int i = 0; i < counter; i++) {
89 if ( (strstr(tokenized[i],"\"") != NULL) && (inQuotes == 0)) {
90 inQuotes = 1;
91 tokenizedQuoted[quoteCounter] = tokenized[i];
92 strcat(tokenizedQuoted[quoteCounter]," ");
93 } else if ( (strstr(tokenized[i],"\"") != NULL) && (inQuotes == 1)) {
94 inQuotes = 0;
95 strcat(tokenizedQuoted[quoteCounter],tokenized[i]);
96 quoteCounter++;
97 } else {
98 if (inQuotes == 0) {
99 tokenizedQuoted[quoteCounter] = tokenized[i];
100 quoteCounter++;
101 } else if (inQuotes == 1) {
102 strcat(tokenizedQuoted[quoteCounter], tokenized[i]);
103 strcat(tokenizedQuoted[quoteCounter], " ");
104 }
105 }
106
107 }
In short, adding an space to a char * means that the memory pointed by it needs more bytes. Since you do not provide it, you are overwritting the first byte of the following "word" with \0, so the char * to it is interpreted as the empty string. Note that writting to a location that has not been reserved is an undefined behavior, so really ANYTHING could happen (from segmentation fault to "correct" results with no errors).
Use malloc to create a new buffer for the expanded result with enough bytes for it (do not forget to free the old buffers if they were malloc'd).

Overwriting lines in file in C

I'm doing a project on filesystems on a university operating systems course, my C program should simulate a simple filesystem in a human-readable file, so the file should be based on lines, a line will be a "sector". I've learned, that lines must be of the same length to be overwritten, so I'll pad them with ascii zeroes till the end of the line and leave a certain amount of lines of ascii zeroes that can be filled later.
Now I'm making a test program to see if it works like I want it to, but it doesnt. The critical part of my code:
file = fopen("irasproba_tesztfajl.txt", "r+"); //it is previously loaded with 10 copies of the line I'll print later in reverse order
/* this finds the 3rd line */
int count = 0; //how much have we gone yet?
char c;
while(count != 2) {
if((c = fgetc(file)) == '\n') count++;
}
fflush(file);
fprintf(file, "- . , M N B V C X Y Í Ű Á É L K J H G F D S A Ú Ő P O I U Z T R E W Q Ó Ü Ö 9 8 7 6 5 4 3 2 1 0\n");
fflush(file);
fclose(file);
Now it does nothing, the file stays the same. What could be the problem?
Thank you.
From here,
When a file is opened with a "+"
option, you may both read and write on
it. However, you may not perform an
output operation immediately after an
input operation; you must perform an
intervening "rewind" or "fseek".
Similarly, you may not perform an
input operation immediately after an
output operation; you must perform an
intervening "rewind" or "fseek".
So you've achieved that with fflush, but in order to write to the desired location you need to fseek back. This is how I implemented it - could be better I guess:
/* this finds the 3rd line */
int count = 0; //how much have we gone yet?
char c;
int position_in_file;
while(count != 2) {
if((c = fgetc(file)) == '\n') count++;
}
// Store the position
position_in_file = ftell(file);
// Reposition it
fseek(file,position_in_file,SEEK_SET); // Or fseek(file,ftell(file),SEEK_SET);
fprintf(file, "- . , M N B V C X Y Í Ű Á É L K J H G F D S A Ú Ő P O I U Z T R E W Q Ó Ü Ö 9 8 7 6 5 4 3 2 1 0\n");
fclose(file);
Also, as has been commented, you should check if your file has been opened successfully, i.e. before reading/writing to file, check:
file = fopen("irasproba_tesztfajl.txt", "r+");
if(file == NULL)
{
printf("Unable to open file!");
exit(1);
}

Resources