Parsing a tab-delimited text file - c

I'm trying to write a code that will parse a tab-delimited text file by assigning each string between tabs to a given element of a sample struct that I've defined. In the input file, the first row will have all the class identifiers (c_name), the second row will have all the sample identifiers (s_name), and the rest of the rows will contain data.
I know it's going to be a bit more complicated because the first column will actually just contain labels, but I figured I'd start with trying to figure out the general parsing scheme.
I can gather that, for the class identifiers for example, I should probably be using fscanf in a for loop add each identifier to the class field of a given sample, but I'm getting lost in the actual implementation. Based on one post I saw, I thought I could do something along the lines of using %[^\t]\t in fscanf to read into an array everything that's not a tab up to a tab, but I don't think I have this quite right.
Any suggestions would be greatly appreciated.
#define LENGTH 30
#define MAX_OBS 80000
typedef struct
{
char c_name[LENGTH];
char s_name[LENGTH];
double value[MAX_OBS];
}
sample;
// I've already calculated the number of columns in the file
sample sample[total_columns];
for (int i = 0; i < total_columns; i++)
{
fscanf(input, "%[^\t]\t", sample[i].s_name);
}
Edit: I've tried several different variations of the code below ("%[^\t\n\r]\t\n\r", or "%[^\t\n\r]%*1[\t\n\r]", or " %[^\t\n\r]") and they all seem to be generally working except that, depending on the size I'm allocating to data and how long I'm iterating, it gives a segmentation fault at some point. The code below gives a segmentation fault immediately, but if I arbitrarily change total_columns in both places to 3, it will print Class Case Case. This seems to work up until 14, at which point the whole program segmentation faults. I'm fairly confused about the issue here. I've also tried mallocing memory to the sample data array to see if it was an issue of stack vs heap, but that doesn't seem to be helping either. Thanks so much for your help!
sample data[total_columns];
fseek(input, 0, SEEK_SET);
for (int i = 0; i < total_columns; i++)
{
fscanf(input, "%[^\t\n\r]\t\n\r", data[i].s_name);
printf("%s\n", data[i].s_name);
}
An example input file would look like:
Class Case Case Case Case Case Case Case Case Case Case Case Case Case Case Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control
Subject G038 G144 G135 G161 G116 G165 G133 G069 G002 G059 G039 G026 G125 G149 G108 G121 G060 G140 G127 G113 G023 G147 G011 G019 G148 G132 G010 G142 G020 G021
Data1 0.000741628 0.00308607 0.000267431 0.001418697 0.001237904 0.000761145 0.0008281 0.002426075 0.000236698 0.004924871 0.000722752 0.003758006 0.000104813 0.000986619 0.000121803 0.000666854 0 0.000171394 0.000877993 0.002717391 0.001336501 0.000812089 0.001448743 5.28E-05 0.001944298 0.000292529 0.000469631 0.001674047 0.000651526 0.000336615
Data2 0.102002396 0.108035127 0.015052531 0.079923731 0.020643362 0.086480609 0.017907667 0.016279315 0.076263965 0.034876124 0.187481931 0.090615572 0.037460171 0.143326961 0.029628502 0.049487575 0.020175439 0.122975405 0.019754837 0.006702899 0.014033264 0.040024363 0.076610375 0.069287599 0.098896479 0.011813681 0.293331246 0.037558052 0.303052867 0.137591517
Data2 0.218495065 0.242891829 0.23747851 0.101306336 0.309040188 0.237477347 0.293837554 0.34351816 0.217572429 0.168651691 0.179387106 0.166516699 0.099970652 0.181003474 0.076126675 0.10244981 0.449561404 0.139257863 0.127579104 0.355797101 0.354544105 0.262855651 0.10167146 0.186068602 0.316763006 0.187466247 0.05701315 0.123825467 0.064780343 0.069847682
Data4 0.141137543 0.090948286 0.102502388 0.013063365 0.162060849 0.166292135 0.070215996 0.063535037 0.333743609 0.131011609 0.140936687 0.150108506 0.07812762 0.230704405 0.069792935 0.120770743 0.164473684 0.448110378 0.42599534 0.074094203 0.096525097 0.157661185 0.036737518 0.213931398 0.091119285 0.438073807 0.224921728 0.187034237 0.06611442 0.086005218
Data5 0.003594044 0.003948354 0.008137536 0.001327901 0.002161974 0.003552012 0.002760334 0.001898667 0.001420186 0.003165988 0.001011853 0.001217382 0.000314439 0.004254794 0.000213155 0.003650147 0 0.002742309 0.002633978 0 0.002524503 0.002146234 0.001751465 0.006543536 0.003941146 0.00049505 0.00435191 0.001944054 0.001303053 0.004207692
Data6 0.000285242 2.27E-05 0 1.13E-05 0.0002964 3.62E-05 0.000138017 0.000210963 0.000662753 0 0 0 0 4.11E-05 0 0 0 0 0.000101307 0 0 0 0 5.28E-05 0.00152391 0 0 0 0 0
Data7 0.002624223 0.001134584 0.00095511 0.000419934 0.000401011 0.001739761 0.00272583 0.002566717 0.000520735 0.002311674 0.006287944 0 6.29E-05 0.000143882 3.05E-05 0.000491366 0 0 3.38E-05 0 0.001782002 0.000957104 0.002594763 0.000527704 0.000105097 0.001192619 3.13E-05 0 0.000744602 0.000252461
Data8 0.392777683 0.383875286 0.451499522 0.684663315 0.387394299 0.357992026 0.488406597 0.423473155 0.27267563 0.47454646 0.331020526 0.484041709 0.735955056 0.338841956 0.781699147 0.625403622 0.313596491 0.270545891 0.379259109 0.498913043 0.372438372 0.446271644 0.606698813 0.305593668 0.360535996 0.29889739 0.328710081 0.521222594 0.419924299 0.584111756
Edit: I seem to have fixed it by changing the MAX_OBS definition - pretty sure I have a fundamental misunderstanding of what that actually means. I'll have to look into that. Thanks again for the help!

try this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define LENGTH 30
#define MAX_OBS 80000
typedef struct{
char c_name[LENGTH];
char s_name[LENGTH];
double value[MAX_OBS];
} Sample;//Duplication of type and variable names should be avoided. pointed out by Jonathan Leffler.
int main(void){
char line[1024];
FILE *input = fopen("data.txt", "r");
fgets(line, sizeof(line), input);
int total_columns = 0;
char *p = strtok(line, "\t\n");
while(p){
++total_columns;
p = strtok(NULL, "\t\n");
}
--total_columns;//first column is field name
rewind(input);
//*******************************************************************************
Sample *sample = malloc(total_columns * sizeof(*sample));//To allocate in the stack is large. So allocate by malloc.
fscanf(input, "%*s\t");//skip first column
for (int i = 0; i < total_columns; i++){
fscanf(input, "%[^\t\n]\t", sample[i].c_name);//\n for last column
}
fscanf(input, "%*s\t");//skip first column
for (int i = 0; i < total_columns; i++){
fscanf(input, "%[^\t\n]\t", sample[i].s_name);
}
int r;
for(r = 0; r < MAX_OBS; ++r){
if(EOF==fscanf(input, "%*s")) break;
for (int i = 0; i < total_columns; i++){
fscanf(input, "%lf", &sample[i].value[r]);
}
}
fclose(input);
//test print
printf("%s\n", sample[0].c_name);
printf("%s\n", sample[0].s_name);
for(int i = 0; i < r; ++i)
printf("%f\n", sample[0].value[i]);
printf("\n%s\n", sample[total_columns-1].c_name);
printf("%s\n", sample[total_columns-1].s_name);
for(int i = 0; i < r; ++i)
printf("%f\n", sample[total_columns-1].value[i]);
free(sample);
}

Related

How to read imaginary data from text file, with C

I am unable to read imaginary data from text file.
Here is my .txt file
abc.txt
0.2e-3+0.3*I 0.1+0.1*I
0.3+0.1*I 0.1+0.4*I
I want to read this data into a matrix and print it.
I found the solutions using C++ here and here. I don't know how to do the same in C.
I am able to read decimal and integer data in .txt and print them.
I am also able to print imaginary data initialized at the declaration, using complex.h header. This is the program I have writtern
#include<stdio.h>
#include<stdlib.h>
#include<complex.h>
#include<math.h>
int M,N,i,j,k,l,p,q;
int b[2];
int main(void)
{
FILE* ptr = fopen("abc.txt", "r");
if (ptr == NULL) {
printf("no such file.");
return 0;
}
long double d=0.2e-3+0.3*I;
long double c=0.0000000600415046630252;
double matrixA[2][2];
for(i=0;i<2; i++)
for(j=0;j<2; j++)
fscanf(ptr,"%lf+i%lf\n", creal(&matrixA[i][j]), cimag(&matrixA[i][j]));
//fscanf(ptr, "%lf", &matrixA[i][j]) for reading non-imainary data, It worked.
for(i=0;i<2; i++)
for(j=0;j<2; j++)
printf("%f+i%f\n", creal(matrixA[i][j]), cimag(matrixA[i][j]));
//printf("%lf\n", matrixA[i][j]); for printing non-imainary data, It worked.
printf("%f+i%f\n", creal(d), cimag(d));
printf("%Lg\n",c);
fclose(ptr);
return 0;
}
But I want to read it from the text, because I have an array of larger size, which I can't initialize at declaration, because of it's size.
There are two main issues with your code:
You need to add complex to the variables that hold complex values.
scanf() needs pointers to objects to store scanned values in them. But creal() returns a value, copied from its argument's contents. It is neither a pointer, nor could you get the address of the corresponding part of the complex argument.
Therefore, you need to provide temporary objects to scanf() which receive the scanned values. After successfully scanning, these values are combined to a complex value and assigned to the indexed matrix cell.
Minor issues not contributing to the core problem are:
The given source is "augmented" with unneeded #includes, unused variables, global variables, and experiments with constants. I removed them all to see the real thing.
The specifier "%f" (as many others) lets scanf() skip whitespace like blanks, tabs, newlines, and so on. Providing a "\n" mostly does more harm than one would expect.
I kept the "*I" to check the correct format. However, an error will only be found on the next call of scanf(), when it cannot scan the next number.
You need to check the return value of scanf(), always! It returns the number of conversions that were successful.
It is a common and good habit to let the compiler calculate the number of elements in an array. Divide the total size by an element's size.
Oh, and sizeof is an operator, not a function.
It is also best to return symbolic values to the caller, instead of magic numbers. Fortunately, the standard library defines these EXIT_... macros.
The signs are correctly handled by scanf() already. There is no need to tell it more. But for a nice output with printf(), you use the "+" as a flag to always output a sign.
Since the sign is now placed directly before the number, I moved the multiplication by I (you can change it to lower case, if you want) to the back of the imaginary part. This also matches the input format.
Error output is done via stderr instead of stdout. For example, this enables you to redirect the standard output to a pipe or file, without missing potential errors. You can also redirect errors somewhere else. And it is a well-known and appreciated standard.
This is a possible solution:
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>
int main(void)
{
FILE* ptr = fopen("abc.txt", "r");
if (ptr == NULL) {
perror("\"abc.txt\"");
return EXIT_FAILURE;
}
double complex matrixA[2][2];
for (size_t i = 0; i < sizeof matrixA / sizeof matrixA[0]; i++)
for (size_t j = 0; j < sizeof matrixA[0] / sizeof matrixA[0][0]; j++) {
double real;
double imag;
if (fscanf(ptr, "%lf%lf*I", &real, &imag) != 2) {
fclose(ptr);
fprintf(stderr, "Wrong input format\n");
return EXIT_FAILURE;
}
matrixA[i][j] = real + imag * I;
}
fclose(ptr);
for (size_t i = 0; i < sizeof matrixA / sizeof matrixA[0]; i++)
for (size_t j = 0; j < sizeof matrixA[0] / sizeof matrixA[0][0]; j++)
printf("%+f%+f*I\n", creal(matrixA[i][j]), cimag(matrixA[i][j]));
return EXIT_SUCCESS;
}
Here's a simple solution using scanf() and the format shown in the examples.
It writes the values in the same format that it reads them — the output can be scanned by the program as input.
/* SO 7438-4793 */
#include <stdio.h>
static int read_complex(FILE *fp, double *r, double *i)
{
int offset = 0;
char sign[2];
if (fscanf(fp, "%lg%[-+]%lg*%*[iI]%n", r, sign, i, &offset) != 3 || offset == 0)
return EOF;
if (sign[0] == '-')
*i = -*i;
return 0;
}
int main(void)
{
double r;
double i;
while (read_complex(stdin, &r, &i) == 0)
printf("%g%+g*I\n", r, i);
return 0;
}
Sample input:
0.2e-3+0.3*I 0.1+0.1*I
0.3+0.1*I 0.1+0.4*I
-1.2-3.6*I -6.02214076e23-6.62607015E-34*I
Output from sample input:
0.0002+0.3*I
0.1+0.1*I
0.3+0.1*I
0.1+0.4*I
-1.2-3.6*I
-6.02214e+23-6.62607e-34*I
The numbers at the end with large exponents are Avogadro's Number and the Planck Constant.
The format is about as stringent are you can make it with scanf(), but, although it requires a sign (+ or -) between the real and imaginary parts and requires the * and I to be immediately after the imaginary part (and the conversion will fail if the *I is missing), and accepts either i or I to indicate the imaginary value:
It doesn't stop the imaginary number having a second sign (so it will read a value such as "-6+-4*I").
It doesn't stop there being white space after the mandatory sign (so it will read a value such as "-6+ 24*I".
It doesn't stop the real part being on one line and the imaginary part on the next line.
It won't handle either a pure-real number or a pure-imaginary number properly.
The scanf() functions are very flexible about white space, and it is very hard to prevent them from accepting white space. It would require a custom parser to prevent unwanted spaces. You could do that by reading the numbers and the markers separately, as strings, and then verifying that there's no space and so on. That might be the best way to handle it. You'd use sscanf() to convert the string read after ensuring there's no embedded white space yet the format is correct.
I do not know which IDE you are using for C, so I do not understand this ./testprog <test.data.
I have yet to find an IDE that does not drive me bonkers. I use a Unix shell running in a terminal window. Assuming that your program name is testprog and the data file is test.data, typing ./testprog < test.data runs the program and feeds the contents of test.data as its standard input. On Windows, this would be a command window (and I think PowerShell would work much the same way).
I used fgets to read each line of the text file. Though I know the functionality of sscanf, I do not know how to parse an entire line, which has about 23 elements per line. If the number of elements in a line are few, I know how to parse it. Could you help me about it?
As I noted in a comment, the SO Q&A How to use sscanf() in loops? explains how to use sscanf() to read multiple entries from a line. In this case, you will need to read multiple complex numbers from a single line. Here is some code that shows it at work. It uses the POSIX getline() function to read arbitrarily long lines. If it isn't available to you, you can use fgets() instead, but you'll need to preallocate a big enough line buffer.
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>
#ifndef CMPLX
#define CMPLX(r, i) ((double complex)((double)(r) + I * (double)(i)))
#endif
static size_t scan_multi_complex(const char *string, size_t nvalues,
complex double *v, const char **eoc)
{
size_t nread = 0;
const char *buffer = string;
while (nread < nvalues)
{
int offset = 0;
char sign[2];
double r, i;
if (sscanf(buffer, "%lg%[-+]%lg*%*[iI]%n", &r, sign, &i, &offset) != 3 || offset == 0)
break;
if (sign[0] == '-')
i = -i;
v[nread++] = CMPLX(r, i);
buffer += offset;
}
*eoc = buffer;
return nread;
}
static void dump_complex(size_t nvalues, complex double values[nvalues])
{
for (size_t i = 0; i < nvalues; i++)
printf("%g%+g*I\n", creal(values[i]), cimag(values[i]));
}
enum { NUM_VALUES = 128 };
int main(void)
{
double complex values[NUM_VALUES];
size_t nvalues = 0;
char *buffer = 0;
size_t buflen = 0;
int length;
size_t lineno = 0;
while ((length = getline(&buffer, &buflen, stdin)) > 0 && nvalues < NUM_VALUES)
{
const char *eoc;
printf("Line: %zu [[%.*s]]\n", ++lineno, length - 1, buffer);
size_t nread = scan_multi_complex(buffer, NUM_VALUES - nvalues, &values[nvalues], &eoc);
if (*eoc != '\0' && *eoc != '\n')
printf("EOC: [[%s]]\n", eoc);
if (nread == 0)
break;
dump_complex(nread, &values[nvalues]);
nvalues += nread;
}
free(buffer);
printf("All done:\n");
dump_complex(nvalues, values);
return 0;
}
Here is a data file with 8 lines with 10 complex numbers per line):
-1.95+11.00*I +21.72+64.12*I -95.16-1.81*I +64.23+64.55*I +28.42-29.29*I -49.25+7.87*I +44.98+79.62*I +69.80-1.24*I +61.99+37.01*I +72.43+56.88*I
-9.15+31.41*I +63.84-15.82*I -0.77-76.80*I -85.59+74.86*I +93.00-35.10*I -93.82+52.80*I +85.45+82.42*I +0.67-55.77*I -58.32+72.63*I -27.66-81.15*I
+87.97+9.03*I +7.05-74.91*I +27.60+65.89*I +49.81+25.08*I +44.33+77.00*I +93.27-7.74*I +61.62-5.01*I +99.33-82.80*I +8.83+62.96*I +7.45+73.70*I
+40.99-12.44*I +53.34+21.74*I +75.77-62.56*I +54.16-26.97*I -37.02-31.93*I +78.20-20.91*I +79.64+74.71*I +67.95-40.73*I +58.19+61.25*I +62.29-22.43*I
+47.36-16.19*I +68.48-15.00*I +6.85+61.50*I -6.62+55.18*I +34.95-69.81*I -88.62-81.15*I +75.92-74.65*I +85.17-3.84*I -37.20-96.98*I +74.97+78.88*I
+56.80+63.63*I +92.83-16.18*I -11.47+8.81*I +90.74+42.86*I +19.11-56.70*I -77.93-70.47*I +6.73+86.12*I +2.70-57.93*I +57.87+29.44*I +6.65-63.09*I
-35.35-70.67*I +8.08-21.82*I +86.72-93.82*I -28.96-24.69*I +68.73-15.36*I +52.85+94.65*I +85.07-84.04*I +9.98+29.56*I -78.01-81.23*I -10.67+13.68*I
+83.10-33.86*I +56.87+30.23*I -78.56+3.73*I +31.41+10.30*I +91.98+29.04*I -9.20+24.59*I +70.82-19.41*I +29.21+84.74*I +56.62+92.29*I +70.66-48.35*I
The output of the program is:
Line: 1 [[-1.95+11.00*I +21.72+64.12*I -95.16-1.81*I +64.23+64.55*I +28.42-29.29*I -49.25+7.87*I +44.98+79.62*I +69.80-1.24*I +61.99+37.01*I +72.43+56.88*I]]
-1.95+11*I
21.72+64.12*I
-95.16-1.81*I
64.23+64.55*I
28.42-29.29*I
-49.25+7.87*I
44.98+79.62*I
69.8-1.24*I
61.99+37.01*I
72.43+56.88*I
Line: 2 [[-9.15+31.41*I +63.84-15.82*I -0.77-76.80*I -85.59+74.86*I +93.00-35.10*I -93.82+52.80*I +85.45+82.42*I +0.67-55.77*I -58.32+72.63*I -27.66-81.15*I]]
-9.15+31.41*I
63.84-15.82*I
-0.77-76.8*I
-85.59+74.86*I
93-35.1*I
-93.82+52.8*I
85.45+82.42*I
0.67-55.77*I
-58.32+72.63*I
-27.66-81.15*I
Line: 3 [[+87.97+9.03*I +7.05-74.91*I +27.60+65.89*I +49.81+25.08*I +44.33+77.00*I +93.27-7.74*I +61.62-5.01*I +99.33-82.80*I +8.83+62.96*I +7.45+73.70*I]]
87.97+9.03*I
7.05-74.91*I
27.6+65.89*I
49.81+25.08*I
44.33+77*I
93.27-7.74*I
61.62-5.01*I
99.33-82.8*I
8.83+62.96*I
7.45+73.7*I
Line: 4 [[+40.99-12.44*I +53.34+21.74*I +75.77-62.56*I +54.16-26.97*I -37.02-31.93*I +78.20-20.91*I +79.64+74.71*I +67.95-40.73*I +58.19+61.25*I +62.29-22.43*I]]
40.99-12.44*I
53.34+21.74*I
75.77-62.56*I
54.16-26.97*I
-37.02-31.93*I
78.2-20.91*I
79.64+74.71*I
67.95-40.73*I
58.19+61.25*I
62.29-22.43*I
Line: 5 [[+47.36-16.19*I +68.48-15.00*I +6.85+61.50*I -6.62+55.18*I +34.95-69.81*I -88.62-81.15*I +75.92-74.65*I +85.17-3.84*I -37.20-96.98*I +74.97+78.88*I]]
47.36-16.19*I
68.48-15*I
6.85+61.5*I
-6.62+55.18*I
34.95-69.81*I
-88.62-81.15*I
75.92-74.65*I
85.17-3.84*I
-37.2-96.98*I
74.97+78.88*I
Line: 6 [[+56.80+63.63*I +92.83-16.18*I -11.47+8.81*I +90.74+42.86*I +19.11-56.70*I -77.93-70.47*I +6.73+86.12*I +2.70-57.93*I +57.87+29.44*I +6.65-63.09*I]]
56.8+63.63*I
92.83-16.18*I
-11.47+8.81*I
90.74+42.86*I
19.11-56.7*I
-77.93-70.47*I
6.73+86.12*I
2.7-57.93*I
57.87+29.44*I
6.65-63.09*I
Line: 7 [[-35.35-70.67*I +8.08-21.82*I +86.72-93.82*I -28.96-24.69*I +68.73-15.36*I +52.85+94.65*I +85.07-84.04*I +9.98+29.56*I -78.01-81.23*I -10.67+13.68*I]]
-35.35-70.67*I
8.08-21.82*I
86.72-93.82*I
-28.96-24.69*I
68.73-15.36*I
52.85+94.65*I
85.07-84.04*I
9.98+29.56*I
-78.01-81.23*I
-10.67+13.68*I
Line: 8 [[+83.10-33.86*I +56.87+30.23*I -78.56+3.73*I +31.41+10.30*I +91.98+29.04*I -9.20+24.59*I +70.82-19.41*I +29.21+84.74*I +56.62+92.29*I +70.66-48.35*I]]
83.1-33.86*I
56.87+30.23*I
-78.56+3.73*I
31.41+10.3*I
91.98+29.04*I
-9.2+24.59*I
70.82-19.41*I
29.21+84.74*I
56.62+92.29*I
70.66-48.35*I
All done:
-1.95+11*I
21.72+64.12*I
-95.16-1.81*I
64.23+64.55*I
28.42-29.29*I
-49.25+7.87*I
44.98+79.62*I
69.8-1.24*I
61.99+37.01*I
72.43+56.88*I
-9.15+31.41*I
63.84-15.82*I
-0.77-76.8*I
-85.59+74.86*I
93-35.1*I
-93.82+52.8*I
85.45+82.42*I
0.67-55.77*I
-58.32+72.63*I
-27.66-81.15*I
87.97+9.03*I
7.05-74.91*I
27.6+65.89*I
49.81+25.08*I
44.33+77*I
93.27-7.74*I
61.62-5.01*I
99.33-82.8*I
8.83+62.96*I
7.45+73.7*I
40.99-12.44*I
53.34+21.74*I
75.77-62.56*I
54.16-26.97*I
-37.02-31.93*I
78.2-20.91*I
79.64+74.71*I
67.95-40.73*I
58.19+61.25*I
62.29-22.43*I
47.36-16.19*I
68.48-15*I
6.85+61.5*I
-6.62+55.18*I
34.95-69.81*I
-88.62-81.15*I
75.92-74.65*I
85.17-3.84*I
-37.2-96.98*I
74.97+78.88*I
56.8+63.63*I
92.83-16.18*I
-11.47+8.81*I
90.74+42.86*I
19.11-56.7*I
-77.93-70.47*I
6.73+86.12*I
2.7-57.93*I
57.87+29.44*I
6.65-63.09*I
-35.35-70.67*I
8.08-21.82*I
86.72-93.82*I
-28.96-24.69*I
68.73-15.36*I
52.85+94.65*I
85.07-84.04*I
9.98+29.56*I
-78.01-81.23*I
-10.67+13.68*I
83.1-33.86*I
56.87+30.23*I
-78.56+3.73*I
31.41+10.3*I
91.98+29.04*I
-9.2+24.59*I
70.82-19.41*I
29.21+84.74*I
56.62+92.29*I
70.66-48.35*I
The code would handle lines with any number of entries on a line (up to 128 in total because of the limit on the size of the array of complex numbers — but that can be fixed too.

Reading file containing multiple data formats

I'm supposed to read a file in C with a structure that looks like this
A:
1
2
3
4
B:
1 1
2 2
3 3
4 4
C:
1 1 1
2 2 2
3 3 3
4 4 4
The file is always separated into three parts and each part starts with an identifier (A:, B:,..).
Identifier is followed by unspecified number of rows containing data. But in each part the format of the data is different. Also it's not just integers but that's not important in this question.
I don't have a problem reading the file. My question is what would be an optimal way to read such a file? It can contain thousands of rows or even more parts than just three. The result should be for example string arrays each containing rows from a different part of the file.
I didn't post any code because I don't need/want you to post any code either. Idea is good enough for me.
You could read the file line by line and check every time if a new section starts. If this is the case, you allocate new memory for a new section and read all the following lines to the data structure for that new section.
For dynamic memory allocation, you will need some counters so you know how many lines per section and how many sections in total you have read.
To illustrate the idea (no complete code):
typedef struct {
int count;
char **lines;
} tSection;
int section_counter = 0;
tSection *sections = NULL;
tSection *current_section = NULL;
char line[MAXLINE];
while (fgets(line, MAXLINE, file)) {
if (isalpha(line[0])) { // if line is identifier, start new section
sections = realloc(sections, sizeof(tSection)*(section_counter+1));
current_section = &sections[section_counter];
current_section->lines = NULL;
current_section->count = 0;
section_counter++;
}
else { // if line contains data, add new line to structure of current section
current_section->lines = realloc(current_section->lines, sizeof(char*)*(current_section->count+1));
current_section->lines[current_section->count] = malloc(sizeof(char)*MAXLINE);
strcpy(current_section->lines[current_section->count], line);
current_section->count++;
}
}
If each section in the file has a fixed format and the section header has a fixed format, you can use fscanf and a state machine based approach. For example in the code below, the function readsec reads a section based on the parameters passed to it. The arguments to readsec depends on which state it is in.
void readsec(FILE* f, const char* fmt, int c, char sec) {
printf("\nReading section %c\n",sec);
int data[3];
int ret=0;
while ((ret=fscanf(f, fmt, &data[0], &data[1], &data[2]))!=EOF){
if (ret!=c) {
return;
}
// processData(sec, c, &data); <-- process read data based on section
}
}
int main() {
FILE * f = fopen("file","r");
int ret = 0;
char sect = 0;
while ((ret=fscanf(f, "%c%*c\n", &sect))!=EOF){
switch (sect) {
case 'A':
readsec(f, "%d", 1, 'A');break;
case 'B':
readsec(f, "%d %d", 2, 'B');break;
case 'C':
readsec(f, "%d %d %d", 3, 'C');break;
default:break;
}
}
return 0;
}

Segmentation error, what is missing in my code?

I'm trying to get my code to convert a text file with 3 columns, xcoor, ycoor, and a symbol with 2 characters into a 30x30 map that prints the 2nd character of the symbol with the rest of the spaces being filled with a '.' However, my code doesn't seem to run, and I get a segmentation error when I try inputting the text file, what am I doing wrong? Thanks in advance
int main(void)
{
char grid[30][30];
for(int i=0;i<30;i++){
for(int j=0;j<30;j++){
grid[i][j]='.';
}
}
int xcoor,ycoor;
char symbol[2];
while((xcoor!=0)||(scanf("%d",&xcoor)))
{
while(xcoor==0){
scanf("%d",&xcoor);
}
scanf("%d %c%c",&ycoor,&symbol[0],&symbol[1]);
grid[xcoor-1][ycoor-1]=symbol[1];
}
for(int i=0;i<30;i++){
for(int j=0;j<30;j++){
printf("%c ",grid[i][j]);
}
printf("\n");
}
return 0;
}
This may not cover ALL of your errors, but immediately I see this:
int xcoor,ycoor;
char symbol[2];
while((xcoor!=0)
Do you think xcoor has a valid value right now? Should it? Because it doesn't. You've created a variable, then before actually setting it to anything, you are checking its value.
It's more than likely your scanf call that's giving you trouble. Regardless, try actually setting these variables. It will most likely fix your issues.
See here for more info: Is reading from unallocated memory safe?
You are using an uninitialized variable xcoor in the conditional of the while statement.
You can fix that by initializing xcoor.
More importantly, you can simplify the code for reading user data and the related error checks. Here's what I suggest:
while ( scanf("%d%d %c%c", &xcoor, &ycoor, &symbol[0], &symbol[1]) == 4 )
{
if ( xcoor < 0 || xcoor >= 30 )
{
// Deal with problem.
fprintf(stderr, "Out or range value of xcoor: %d\n", xcoor);
exit(1);
}
if ( ycoor < 0 || ycoor >= 30 )
{
// Deal with problem.
fprintf(stderr, "Out or range value of ycoor: %d\n", ycoor);
exit(1);
}
grid[xcoor-1][ycoor-1] = symbol[1];
}

wcstok() not working correctly?

I'm trying to get each line in a loop for a wchar_t string using wcstok(), that string is supposed to contain at least two lines, the latest 'wcstok(0, L"\n")' is getting always null result and I'm getting the value of i using printf as 1 only instead of 2 or higher, but the problem got solved when doing #if 0 instead of #if 1.
this is the code below:
wchar_t* w;
wchar_t* line;
int j;
wchar_t**** lines;
int** linescount;
......
int i=0;
#if 1 //problem get solved when changing to #if 0
line = wcstok(w, L"\n");
do{
((*linescount)[j])++;
}while(line=wcstok(0, L"\n"));
(*lines)[annex] = calloc(sizeof(wchar_t**), (*linescount)[j]);
#endif
line = wcstok(w, L"\n");
do{
#if 1 //problem get solved when changing to #if 0
(*lines)[j][i] = calloc(sizeof(wchar_t*), wcslen(line)+1);
wcscpy((*lines)[j][i], line);
#endif
i++;
}while(line=wcstok(0, L"\n"));
printf("i = %d\n", i); /*prints the i value to check if the latest line=wcstok(0, L"\n") worked correctly or not*/
so what's supposed the cause of this problem? and how can I solve it? please help.
The wcstok modifies the string passed in as argument so once you have run your loop to count lines the buffer is basically kaputt.
It seems like overkill to use wcstok to count lines when you easily could just loop through the buffer counting number of \n.

fgetc not starting at beginning of file - c [duplicate]

This question already has an answer here:
fgetc not starting at beginning of large txt file
(1 answer)
Closed 9 years ago.
Problem solved here:
fgetc not starting at beginning of large txt file
I am working in c and fgetc isn't getting chars from the beginning of the file. It seems to be starting somewhere randomly within the file after a \n. The goal of this function is to modify the array productsPrinted. If "More Data Needed" or "Hidden non listed" is encountered, the position in the array, productsPrinted[newLineCount], will be changed to 0. Any help is appreciated.
Update: It works on smaller files, but doesn't start at the beginning of the larger,617kb, file.
function calls up to category:
findNoPics(image, productsPrinted);
findVisible(visible, productsPrinted);
removeCategories(category, productsPrinted);
example input from fgetc():
Category\n
Diagnostic & Testing /Scan Tools\n
Diagnostic & Testing /Scan Tools\n
Hidden non listed\n
Diagnostic & Testing /Scan Tools\n
Diagnostic & Testing /Scan Tools\n
Hand Tools/Open Stock\n
Hand Tools/Sockets and Drive Sets\n
More Data Needed\n
Hand Tools/Open Stock\n
Hand Tools/Open Stock\n
Hand Tools/Open Stock\n
Shop Supplies & Equip/Tool Storage\n
Hidden non listed\n
Shop Supplies & Equip/Heaters\n
Code:
void removeCategories(FILE *category, int *prodPrinted){
char more[17] = { '\0' }, hidden[18] = { '\0' };
int newLineCount = 0, i, ch = 'a', fix = 0;
while ((ch = fgetc(category)) != EOF){ //if fgetc is outside while, it works//
more[15] = hidden[16] = ch;
printf("%c", ch);
/*shift char in each list <- one*/
for (i = 0; i < 17; i++){
if (i < 17){
hidden[i] = hidden[i + 1];
}
if (i < 16){
more[i] = more[i + 1];
}
}
if (strcmp(more, "More Data Needed") == 0 || strcmp(hidden, "Hidden non listed") == 0){
prodPrinted[newLineCount] = 0;
/*printf("%c", more[0]);*/
}
if (ch == '\n'){
newLineCount++;
}
}
}
Let computers do the counting. You have not null terminated your strings properly. The fixed strings (mdn and hdl are initialized but do not have null terminators, so string comparisons using them are undefined.
Given this sample data:
Example 1
More Data Needed
Hidden non listed
Example 2
Keeping lines short.
But as they get longer, the overwrite is worse...or is it?
Hidden More Data Needed in a longer line.
Lines containing "Hidden non listed" are zapped.
Example 3
This version of the program:
#include <stdio.h>
#include <string.h>
static
void removeCategories(FILE *category, int *prodPrinted)
{
char more[17] = { '0' };
char hidden[18] = { '0' };
char mdn[17] = { "More Data Needed" };
char hnl[18] = { "Hidden non listed" };
int newLineCount = 0, i, ch = '\0';
do
{
/*shift char in each list <- one*/
for (i = 0; i < 18; i++)
{
if (i < 17)
hidden[i] = hidden[i + 1];
if (i < 16)
more[i] = more[i + 1];
}
more[15] = hidden[16] = ch = fgetc(category);
if (ch == EOF)
break;
printf("%c", ch); /*testing here, starts rndmly in file*/
//printf("<<%c>> ", ch); /*testing here, starts rndmly in file*/
//printf("more <<%s>> hidden <<%s>>\n", more, hidden);
if (strcmp(more, mdn) == 0 || strcmp(hidden, hnl) == 0)
{
prodPrinted[newLineCount] = 0;
}
if (ch == '\n')
{
newLineCount++;
}
} while (ch != EOF);
}
int main(void)
{
int prod[10];
for (int i = 0; i < 10; i++)
prod[i] = 37;
removeCategories(stdin, prod);
for (int i = 0; i < 10; i++)
printf("%d: %d\n", i, prod[i]);
return 0;
}
produces this output:
Example 1
More Data Needed
Hidden non listed
Example 2
Keeping lines short.
But as they get longer, the overwrite is worse...or is it?
Hidden More Data Needed in a longer line.
Lines containing "Hidden non listed" are zapped.
Example 3
0: 37
1: 0
2: 0
3: 37
4: 37
5: 37
6: 0
7: 0
8: 37
9: 37
You may check which mode you opened the file, and you may have some error-check to make sure you have got the right return value.
Here you can refer to man fopen to get which mode to cause the stream position.
The fopen() function opens the file whose name is the string pointed to
by path and associates a stream with it.
The argument mode points to a string beginning with one of the follow‐
ing sequences (Additional characters may follow these sequences.):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is positioned
at the beginning of the file.
a Open for appending (writing at end of file). The file is cre‐
ated if it does not exist. The stream is positioned at the end
of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file position
for reading is at the beginning of the file, but output is
always appended to the end of the file.
And there is another notice, that the file you operated should not more than 2G, or there maybe problem.
And you can use fseek to set the file position indicator.
And you can use debugger to watch these variables to see why there are random value. I think debug is efficient than trace output.
Maybe you can try rewinding the file pointer at the beginning of your function.
rewind(category);
Most likely another function is reading from the same file. If this solves your problem, it would be better to find which other function (or previous call to this function) is reading from the same file and make sure rewinding the pointer won't break something else.
EDIT:
And just to be sure, maybe you could change the double assignment to two different statements. Based on this post, your problem might as well be caused by a compiler optimization of that line. I haven't checked with the standard, but according to best answer the behavior in c and c++ might be undefined, therefore your strange results. Good luck

Resources