Reading a file containing multiple lines of strings, integers and floats - c

I'm trying to read a file containing a line containing a string, integer and float. The data is separated by commas and I've seen a tonne of information about how best to approach this. I've simplified by problem by first trying to read in one line, and then work on implementing multiple lines.
I've managed to read the first two pieces of data. It's the float that's giving me issues.
Here is an example of what it is I'm trying to read:
CHANNEL_1, 0, 0.453
char CHANNEL_NAME_[BUF_SIZE];
uint32_t val1_;
double val2_;
FILE *fp;
int c;
fp = fopen("E:\\read_from_file\\config.cfg", "r");
if (fp < 1)
{
printf("Failed to open file = %d\n", fp);
fclose(fp);
}
c = fscanf(fp, "%[^,], %u[^,], %lf", CHANNEL_NAME_, &val1_, &val2_);
printf("[%s] [%u] [%lf]\n", CHANNEL_NAME_, val1_, val2_);
printf("C = %d\n", c);
I'm able to print the string and integer correctly, however, it's the float that's giving me issues. It comes out as a random float, something like 34534524524523452345.0000000. I expect to see the float as per above, 0.453.
When I print C, which is the result of the fscanf, I get 2 which is incorrect as I'd expect to read 3, due to 3 data types being read in.
What am I doing wrong?

There's no such specifier as "%u[^,]". That format is treated as a separate "%u" followed by a separate "[^,]".
The %[ format only reads strings, it doesn't have any type-prefix (and it's not needed as scanf will stop reading integers at the first non-digit character in the input). Which means you can use only "%u" for the middle specifier:
c = fscanf(fp, " %[^,], %u, %lf", CHANNEL_NAME_, &val1_, &val2_);
Note that I added a leading space in the format string. That's because the "%[" format does not skip leading spaces (like any possible newlines after the previous line).
With the format specifier "%u[^,]", the function actually tries to match the exact sequence "[^,]" in the input, which it won't find, leading to the last value not being read. Hence the value of c being 2 (as fscanf only matched two values, the initial string and the first unsigned integer).

Related

How to read and print hexadecimal numbers from a file in C

I'm trying to read 14 digit long hexadecimal numbers from a file and then print them. My idea is to use a long long int and read the lines from the files with fscanf as if they were strings and then turn the string into a hex number using atoll. The problem is I am getting a seg value on my fscanf line according to valgrind and I have absolutely no idea why. Here is the code:
#include<stdio.h>
int main(int argc, char **argv){
if(argc != 2){
printf("error argc!= 2\n");
return 0;
}
char *fileName = argv[1];
FILE *fp = fopen( fileName, "r");
if(fp == NULL){
return 0;
}
long long int num;
char *line;
while( fscanf(fp, "%s", line) == 1 ){
num = atoll(line);
printf("%x\n", num);
}
return 0;
}
Are you sure you want to read your numbers as character strings? Why not allow the scanf do the work for you?
long long int num;
while( fscanf(fp, "%llx", &num) == 1 ){ // read a long long int in hex
printf("%llx\n", num); // print a long long int in hex
}
BTW, note the ll size specifier to %x conversion in printf - it defines the integer value will be of long long type.
Edit
Here is a simple example of two loops reading a 3-line input (with two, no and three numbers in consecutive lines) with a 'hex int' format and with a 'string' format:
http://ideone.com/ntzKEi
A call to rewind allows the second loop read the same input data.
That line variable is not initialized, so when fscanf() dereferences it you get undefined behavior.
You should use:
char line[1024];
while(fgets(line, sizeof line, fp) != NULL)
To do the loading.
If you're on C99, you might want to use uint64_t to hold the number, since that makes it clear that 14-digit hexadecimal numbers (4 * 14 = 56) will fit.
The other answers are good, but I want to clarify the actual reason for the crash you are seeing. The problem is that:
fscanf(fp, "%s", line)
... essentially means "read a string from a file, and store it in the buffer pointed at by line". In this case, your line variable hasn't been initialised, so it doesn't point anywhere. Technically, this is undefined behavior; in practice, the result will often be that you write over some arbitrary location in your process's address space; furthermore, since it will often point at an illegal address, the operating system can detect and report it as a segment violation or similar, as you are indeed seeing.
Note that fscanf with a %s conversion will not necessarily read a whole line - it reads a string delimited by whitespace. It might skip lines if they are empty and it might read multiple strings from a single line. This might not matter if you know the precise format of the input file (and it always has one value per line, for instance).
Although it appears in that case that you can probably just use an appropriate modifier to read a hexadecimal number (fscanf(fp, "%llx", &num)), rather than read a string and try to do a conversion, there are various situations where you do need to read strings and especially whole lines. There are various solutions to that problem, depending on what platform you are on. If it's a GNU system (generally including Linux) and you don't care about portability, you could use the m modifier, and change line to &line:
fscanf(fp, "%ms", &line);
This passes a pointer to line to fscanf, rather than its value (which is uninitialised), and the m causes fscanf to allocate a buffer and store its address in line. You then should free the buffer when you are done with it. Check the Glibc manual for details. The nice thing about this approach is that you do not need to know the line length beforehand.
If you are not using a GNU system or you do care about portability, use fgets instead of fscanf - this is more direct and allows you to limit the length of the line read, meaning that you won't overflow a fixed buffer - just be aware that it will read a whole line at a time, unlike fscanf, as discussed above. You should declare line as a char-array rather than a char * and choose a suitable size for it. (Note that you can also specify a "maximum field width" for fscanf, eg fscanf(fp, "%1000s", line), but you really might as well use fgets).

Why in C, when reading files, do we have to use a character array?

I've literally just started programming in C. Coming from a little understanding of Python.
Just had a lecture on C, the lecture was about this:
#include <stdio.h>
int main() {
FILE *file;
char name[10], degree[5];
int mark;
file = fopen("file.txt", "r");
while (fscan(file("%s %s %d", name, degree, &mark) != EOF);
printf("%s %s %d", name, degree, mark);
fclose(file);
}
I'm specifically asking why we the lecturer would have used an array rather than just declaring two string variables. Searching for a deeper answer than just, "that's just C for you".
There are multiple typos on this line:
while (fscan(file("%s %s %d", name, degree, &mark) != EOF);
printf("%s %s %d", name, degree, mark);
It should read:
while (fscanf(file, "%s %s %d", name, degree, &mark) == 3)
printf("%s %s %d", name, degree, mark);
Can you spot the 4 mistakes?
The function is called fscanf
file is an argument followed by ,, not a function name followed by (
you should keep looping for as long as fscanf converts 3 values. If conversion fails, for example because the third field is not a number, it will return a short count, not necessarily EOF.
you typed an extra ; after the condition. This is parsed as an empty statement: the loop keeps reading until end of file, doing nothing, and finally executes printf just once, with potentially invalid arguments.
The programmer uses char arrays and passes their address to fscanf. If he had used pointers (char *), he would have needed to allocate memory to make them point to something and pass their values to fscanf, a different approach that is not needed for fixed array sizes.
Note that the code should prevent potential buffer overflows by specifying the maximum number of characters to store into the arrays:
while (fscanf(file, "%9s %4s %d", name, degree, &mark) == 3) {
printf("%s %s %d", name, degree, mark);
}
Note also that these hard-coded numbers must match the array sizes minus 1, an there is no direct way to pass the array sizes to fscanf(), a common source of bugs when the code is modified. This function has many quirks and shortcomings, use with extreme care.
Take name[] for example. It's an array of chars, a collection of chars if you like.
There is no string type in c, so we use an array an array of chars when we want to use a string.
The code is written as such, so that we can read the actual string in a line of the file, in our array.
As a side note, this program will produce syntax errors.
What you have in every non-empty file is a series of bytes. Therefor, what your C program has to do is read bytes. Since the variable type char is used to represent a byte, and since you want to read multiple bytes at once for efficiency purposes, you read an array of chars. That's for the general understanding of what reading from a file means.
Going back to your example, there is no string type in C. A string is an array of bytes (char[]) ended by a nul character.
What the lecturer is doing is:
define an array of chars (which will contain a string) => char name[10] is an array that may contain a string of at most 9 characters (the last byte would be used for '\0').
ask fscanf to read a string, which means it will read multiple bytes (multiple chars) until it finds a nul character ('\0') and put all of that in the given array.
To understand what's happening, forget about string as an opaque data type (which might be true in other programming languages) and see them as what they really are: arrays of chars.

splitting string in c

I have a file where each line looks like this:
cc ssssssss,n
where the two first 'c's are individual characters, possibly spaces, then a space after that, then the 's's are a string that is 8 or 9 characters long, then there's a comma and then an integer.
I'm really new to c and I'm trying to figure out how to put this into 4 seperate variables per line (each of the first two characters, the string, and the number)
Any suggestions? I've looked at fscanf and strtok but i'm not sure how to make them work for this.
Thank you.
I'm assuming this is a C question, as the question suggests, not C++ as the tags perhaps suggest.
Read the whole line in.
Use strchr to find the comma.
Do whatever you want with the first two characters.
Switch the comma for a zero, marking the end of a string.
Call strcpy from the fourth character on to extract the sssssss part.
Call atoi on one character past where the comma was to extract the integer.
A string is a sequence of characters that ends at the first '\0'. Keep this in mind. What you have in the file you described isn't a string.
I presume n is an integer that could span multiple decimal places and could be negative. If that's the case, I believe the format string you require is "%2[^ ] %9[^,\n],%d". You'll want to pass fscanf the following expressions:
Your FILE *,
The format string,
An array of 3 chars silently converted to a pointer,
An array of 9 chars silently converted to a pointer,
... and a pointer to int.
Store the return value of fscanf into an int. If fscanf returns negative, you have a problem such as EOF or some other read error. Otherwise, fscanf tells you how many objects it assigned values into. The "success" value you're looking for in this case is 3. Anything else means incorrectly formed input.
I suggest reading the fscanf manual for more information, and/or for clarification.
fscanf function is very powerful and can be used to solve your task:
We need to read two chars - the format is "%c%c".
Then skip a space (just add it to the format string) - "%c%c ".
Then read a string until we hit a comma. Don't forget to specify max string size. So, the format is "%c%c %10[^,]". 10 - max chars to read. [^,] - list of allowed chars. ^, - means all except a comma.
Then skip a comma - "%c%c %10[^,],".
And finally read an integer - "%c%c %10[^,],%d".
The last step is to be sure that all 4 tokens are read - check fscanf return value.
Here is the complete solution:
FILE *f = fopen("input_file", "r");
do
{
char c1 = 0;
char c2 = 0;
char str[11] = {};
int d = 0;
if (4 == fscanf(f, "%c%c %10[^,],%d", &c1, &c2, str, &d))
{
// successfully got 4 values from the file
}
}
while(!feof(f));
fclose(f);

Get number of characters read by sscanf?

I'm parsing a string (a char*) and I'm using sscanf to parse numbers from the string into doubles, like so:
// char* expression;
double value = 0;
sscanf(expression, "%lf", &value);
This works great, but I would then like to continue parsing the string through conventional means. I need to know how many characters have been parsed by sscanf so that I may resume my manual parsing from the new offset.
Obviously, the easiest way would be to somehow calculate the number of characters that sscanf parses, but if there's no simple way to do that, I am open to alternative double parsing options. However, I'm currently using sscanf because it's fast, simple, and readable. Either way, I just need a way to evaluate the double and continue parsing after it.
You can use the format specifier %n and provide an additional int * argument to sscanf():
int pos;
sscanf(expression, "%lf%n", &value, &pos);
Description for format specifier n from the C99 standard:
No input is consumed. The corresponding argument shall be a pointer to
signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function. No argument is converted, but one is consumed. If the conversion specification includes an assignment suppressing character or a field width, the behavior is undefined.
Always check the return value of sscanf() to ensure that assignments were made, and subsequent code does not mistakenly process variables whose values were unchanged:
/* Number of assignments made is returned,
which in this case must be 1. */
if (1 == sscanf(expression, "%lf%n", &value, &pos))
{
/* Use 'value' and 'pos'. */
}
int i, j, k;
char s[20];
if (sscanf(somevar, "%d %19s %d%n", &i, s, &j, &k) != 3)
...something went wrong...
The variable k contains the character count up to the point where the end of the integer stored in j was scanned.
Note that the %n is not counted in the successful conversions. You can use %n several times in the format string if you need to.

How to read a line and split into string and float values in a C program?

I have a file that contains 80 characters per line. I want to go to a particular line that starts with "ATOM".
I tried with fscanf(f1," %s%*[^\n]", rec) and compare rec with strcmp(rec,"ATOM"), but it reads the next line from the match.
I also tried with fscanf("line_format", variables), but this reads somewhere else from the file.
The line is
ATOM 1 N MET A 1 36.643 -24.862 8.890 1.00 24.11 N
From this I want to read character by character and assign it to variables. I have a problem with the float values and spaces. If I find a space in a place of particular variable how do I read it? How do I read the float values if there is no space between them?
You can read each line from the input file using fgets(), tokenise it using strtok() or strtok_r(), compare the first token to "ATOM", and then parse the rest of the tokens using atof() or atoi() to convert them to floating point or integer numbers if necessary.
Although this is a bit of an overkill since the ATOM record in the PDB file has a well defined structure with fixed sized fields and any conformant pdb file would be much easier to parse. You just pick the relevant substrings and pass them to atof() or atoi().
I believe you had an error in your (not shown) line_format. You really should be able to just do:
if( fscanf(f1, "ATOM %d %s %s %s %d %f %f %f %f %f %s", /* ... */) == 11 )
{
/* store/analyze/print the parsed values */
}
Note of course that this runs the risk of overwriting the string arguments. You could use a more specific format string to limit the lengths.

Resources