C program doesn't recognize '\n' in file exported from MATLAB - c

I have a matrix G in MATLAB that I have printed into a text file using:
file = fopen('G.dat','w');
fprintf(file, [repmat('%f\t', 1, size(G, 2)) '\n'], G');
fclose(file);
The dimension of this matrix is 100 x 500. If I count rows and columns using awk, for instance, using
cat G.dat | awk '{print NF}END{print NR}'
I see that the dimensions correspond to the original one.
Now, I want to read this file, G.dat, from a C program that counts the columns of the first row just to understand the columns' dimension as in:
while (!feof(file) && (fscanf(file, "%lf%c", &k, &c) == 2) ) {
Ng++;
if (c == '\n')
break;
}
Unfortunately it gives me Ng = 50000 and it doesn't recognize any of the '\n'.
Instead, if I create the text file just by copying and pasting the data, it works. Can you explain me why? Thanks!

Are you working in Windows? Try opening your output file in text mode:
file = fopen('G.dat','wt');
This will automatically insert a carriage return before each newline when writing to the file.

Code's approach is too fragile to "counts the columns of the first row just to understand the columns' dimension". fscanf(file, "%lf%c"... is too susceptible to variant white-space delimiters and EOL to detect the '\n'.
Recommend explicitly examine white-space to determine width:
// return 0 on success, 1 on error
int GetWidth(FILE *file, size_t *width) {
*width = 0;
for (;;) {
int ch;
while (isspace(ch = fgetc(file))) {
if (ch == '\n') return 0;
}
if (ch == EOF) return 0;
ungetc(ch, file);
double d;
if (fscanf(file, "%lf", &d) != 1)) {
return 1; // unexpected non convertible text
}
(*width)++;
}
}
//Sample, usage
size_t width;
if (GetWidth(file, &width)) return 1;
// read entire file
rewind(file);
for (size_t line = 0; foo(); line++)
for (size_t column = 0; column<width; column++) {
double d;
if (fscanf(file, "%lf", &d) != 1)) {
break; // EOF, unexpected non convertible text or input error
}
}
...
}

Matlab writes rows as
%f\t%\f .. %f\t\n
which is a problem. I have used
dlmwrite('G.dat', G, '\t');
and it is fine!

if I understand the matlab syntax correctly, this expands to a format string like %f\t%f\t%f\t\n\%f\t%f\t%f\t\n for a 3x2 matrix. Note the extra \t at the end of each line. If this assumption is correct, the last fscanf() call in a line will assign the last \t to &c. The next fscanf() call just skips the \n because it doesn't match your format.
I'd propose you use fgets() instead for reading each line and then loop over the fields using strtok(), reading the values with atof() e.g.
char buf[8192];
if (fgets(buf, 8192, file))
{
if (strtok(buf, '\t'))
{
++Ng;
while (strtok(0, '\t')) ++Ng;
}
}
else
{
/* error reading ... */
}

Related

C: Time-efficient way to get value of second column in text file

I am parsing a simple text file with two columns in C.
The two columns are separated by a tab. While I need the whole line in a later stage I also have to extract the value in the second column.
My implementation of this part is so far (reading a gzipped file):
while (! gzeof(fp)) {
// here I keep the whole line since I need it later (can I do this also faster?)
strcpy(line_save, line);
// get the value in the second column (first removing the newline char.):
line[strcspn(line, "\n")] = 0;
linkage = strtok(line,"\t");
linkage = strtok(NULL,"\t"); // here I have the value in the second col. as the result
// do stuff
gzgets(fp, line, LL);
}
What is a more time-efficient way to do this?
I am reading a gzipped file. gzeof() checks if EOF is reached and gzgets() reads one line.
I am not looking for an overly advanced solution here, but I am interested mainly in the "low-hanging fruits". However, if you can present more advances solutions I do not mind.
Try the following code. BTW, probably you do not need to create a copy of line in line_save as this code does not destruct original line. If this is the case you can break the inner loop after having set t2.
while (! gzeof(fp)) {
int i, t1, t2;
t1 = t2 = -1;
for(i=0; line[i]!=0; i++) {
line_save[i] = line[i];
if (line[i] = '\t') {
if (t1 < 0) t1 = i;
else if (t2 < 0) t2 = i;
}
}
line_save[i] = 0;
if (t2 >= 0) {
line[t2] = 0;
linkage = &line[t1+1];
// do what you need with 'linkage'
// reconstruct the original line
line[t2] = '\t';
}
// do other stuf with 'line'
gzgets(fp, line, LL);
}
I'm assuming that gzgets() behaves in a similar way to fgets():
ZEXTERN char * ZEXPORT gzgets OF((gzFile file, char *buf, int len));
Reads bytes from the compressed file until len-1 characters are read, or a newline character is read and transferred to buf, or an end-of-file condition is encountered. If any characters are read or if len == 1, the string is terminated with a null character. If no characters are read due to an end-of-file or len < 1, then the buffer is left untouched.
gzgets returns buf which is a null-terminated string, or it returns NULL for end-of-file or in case of error. If there was an error, the contents at buf are indeterminate.
char line[128]; // Extend as you see fit
while (gzgets(gzfile, line, sizeof(line))) {
line[strcspn(line, "\n")] = '\0';
char col1[64], col2[64];
if (sscanf(line, " %63s\t%63[^\n]", col1, col2) != 2) {
// Error while parsing the line
puts("Error");
}
// Testing
printf("col1: '%s'\ncol2: '%s'\n", col1, col2);
// And line is untouched.
}
Edit: The below version should run slightly faster than the one above:
Removed the call for strcspn()
The for-loop stops when a \t is met, so this avoids scanning the entire string.
char line[128]; // Extend as you see fit
while (gzgets(gzfile, line, sizeof(line))) {
char col1[64], col2[64];
for (char *p = line; *p != '\0' && *p != '\n'; ++p) {
if (*p == '\t') {
strncpy(col1, line, p - line);
strcpy(col2, p+1);
break;
}
}
// Testing
printf("col1: '%s'\ncol2: '%s'\n", col1, col2);
// And line is untouched.
}

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

Using fgetc to pass only part of a text file to a buffer

I have the following text file:
13.69 (s, 1H), 11.09 (s, 1H).
So far I can quite happily use either fgets or fgetc to pass all text to a buffer as follows:
char* data;
data = malloc(sizeof(char) * 100);
int c;
int n = 0;
FILE* inptr = NULL;
inptr = fopen("NMR", "r");
if(NULL == fopen("NMR", "r"))
{
printf("Error: could not open file\n");
return 1;
}
for (c = fgetc(inptr); c != EOF && c != '\n'; c = fgetc(inptr))
{
data[n++] = c;
}
for (int i = 0, n = 100; i < n; i++)
{
printf ("%c", data[i]);
}
printf("\n");
and then print the buffer to the screen afterwards. However, I am only looking to pass part of the textfile to the buffer, namely:
13.69 (s, 1H),
So this means I want fgetc to stop after ','. However, this means the that the text will stop at 13.69 (s, and not 13.69 (s, 1H),
Is there a way around this? I have also experimented with fgets and then using strstr as follows:
char needle[4] = ")";
char* ret;
ret = strstr(data, needle);
printf("The substring is: %s\n", ret);
However, the output from this is:
), 11.09 (s, 1H)
thus giving me the rest of the string which I do not want. It's an interesting one and if anyone has any tips it would be much appreciated!
If you know that the closing parenthesis is the last character you want, you can use that as your stopping point in the fgetc() loop:
char data[100]; //No need to dynamically allocate if we know the size at compile time
int c;
int n = 0;
FILE* inptr = NULL;
inptr = fopen("NMR", "r");
if(inptr == NULL) //We want to check the value of the file we just opened
{ //and plan to use
printf("Error: could not open file\n");
return 1;
}
//We'll keep the original value guards (EOF and '\n') below and add two more
//to make sure we break from the loop
//We use n<98 below to make sure we can always create a null-terminated string,
//If we used 99, the 100th character might be a ')', then we have no room for a
//terminating null-char
for (c = fgetc(inptr); c != ')' && n < 98 && c != EOF && c != '\n'; c = fgetc(inptr))
{
data[n++] = c;
}
if(c != ')') //We hit EOF, \n, or ran out of space in data[]
{
printf("Error: no matching sequence found\n");
return 2;
}
data[n]=')'; //Could also write data[n]=c here, since we know it's a ')'
data[n+1]='\0'; //Add the terminating null character
printf("%s\n",data); //Since it's a properly formatted string, we can use %s
(Note that this example will handle null input characters differently from yours. If you expect null characters to be in the input stream (NMR file) then change the printf("%s",...) line back to the for loop you originally had.
Well with only one example of the format you are trying to parse it's not totally possible to give an answer, however if your input is always like this I would simply have a counter and break after the second comma.
int comma = 0;
for (c = fgetc(inptr); c != EOF && c != '\n' && c != ',' && comma < 1; c = fgetc(inptr))
{
if (data[n] = ',')
comma++;
data[n++] = c;
}
In case the characters inside the parenthesis can be more complex I would simply maintain a boolean state to know if I am actually inside or outside a parenthesis and break when I read a comma outside of it.
Simply read using fgets and store desired string in char * using sscanf-
char *new_data;
new_data=malloc(100); // allocate memory
...
fgets(data,100,inptr); // read from file but check its return
sscanf(data,"%[^)]",new_data); // store string untill ')' in new_data from data
strcat(new_data,")"); // concatenating new_data and ")"
printf("%s",new_data); // print new_data
...
free(new_data); // remember to free memory
Also you should check return of malloc though not done in my example and also close the file opened .

Reading different types in C from File

I'm trying to read in a file that has different types to read in (integers, chars). This is relatively simple, yet I'm confused on which method to use to read in these different values.
I'm using fgets to make sure the file isn't at the end, i.e;
char line[MAX_CHARS];
char ch;
FILE *infile = fopen("file.txt", "r");
const int MAX_CHARS = 50;
while(fgets(line, MAX_CHARS, infile) != NULL)
Given the input;
-Flight one
83, 34
X XX X
X X
-Flight two
....
I want to print the line that starts with a dash, send the integers to a method, and then print the X's and spaces. So, my code for this would be:
if(line[0] == '-')
{
printf("%s\n", line);
}
else if(2==sscanf(line, "%d%d", &long, &lat))
{
coordinates(long, lat);
}
I used scanf to try to read the X's and spaces, but it doesn't work at all. getchar() doesn't seem to work either, so should I start over and instead read each char individually instead of a char array?
EDIT: So I did as someone suggested, this is my updated code to read in spaces and X's, but it is clearly not reading right, as it's not going to the next line of X's.
else
{
while(line[++index] != '\0')
{
if(line[index] == ' ')
{
printf("%c", '.');
}
else if(line[index] == 'X')
{
printf("%c", '*');
}
}
}
For output;
*Flight one
.......*Flight two
scanf(line, " %c", &ch)
/* ^ */
The space before percent will cause scanf to ignore all whitespace characters (space, tab, newline) if present.
i.e. condition if(ch == ' ') will be always false.
In your question, this is not something that you like, so remove this whitespace.
EDIT
Also as suggested by Barak Manos
I think that for the second line you need to scan "%d, %d" instead of "%d%d".

C skipping line starts with '#'

im trying read a file and using fscanf to get some values down and store it in an array, however in the file, there will be some line starts with '#" (e.g. #this is just a command), and i want to skip them how should i do that? those lines that contains # will appear at random lines. got some of my code here:
//do line counts of how many lines contain parameters
while(!EOF) {
fgets(lines, 90, hi->agentFile);
count++;
if (lines[0] == '#') {
count--;
}
}
//mallocing an array of struct.
agentInfo* array = malloc(count*sizeof(agentInfo));
for (i = 0; i < count; i++) {
fscanf(hi->agentFile,"%d %d %c %s %c",&array[i].r,&array[i].c,
&array[i].agent_name,&array[i].function[80],
&array[i].func_par);
so i need to add something so i can skip lines start with '#', how?
Your EOF test is wrong. You also need to rewind the file between the fgets() loop and the fscanf() loop. And you need to replace the fscanf() loop with a second fgets() loop using sscanf() to read the data. Or you need to allocate the memory as you go while reading the file once. Let's leave that for later, though:
while(fgets(lines, sizeof(lines), hi->agentFile) != EOF)
{
if (lines[0] != '#')
count++;
}
agentInfo *array = malloc(count*sizeof(agentInfo));
if (array != 0)
{
int i;
rewind(hi->agentFile);
for (i = 0; fgets(lines, sizeof(lines), hi->agentFile) != EOF && i < count; i++)
{
if (lines[0] != '#')
{
if (sscanf(lines, "%d %d %c %s %c",&array[i].r,&array[i].c,
&array[i].agent_name,&array[i].function[80],
&array[i].func_par) != 5)
...format error in non-comment line...
}
}
assert(i == count); // else someone changed the file, or ...
}
Note that this checks for a memory allocation error and for format errors for non-comment lines.

Resources