I have a scientific application for which I want to input initial values at run time. I have an option to get them from the command line, or to get them from an input file. Either of these options are input to a generic parser that uses strtod to return a linked list of initial values for each simulation run. I either use the command-line argument or getline() to read the values.
The question is, should I be rolling my own parser, or should I be using a parser-generator or some library? What is the standard method? This is the only data I will read at run time, and everything else is set at compile time (except for output files and a few other totally simple things).
Thanks,
Joel
Also check out strtof() for floats, strtod() for doubles.
sscanf
is probably the standard way to parse them.
However, there are some problems with sscanf, especially if you are parsing user input.
And, of course,
atof
In general, I prefer to have data inputs come from a file (e.g. the initial conditions for the run, the total number of timesteps, etc), and flag inputs come from the command line (e.g. the input file name, the output file name, etc). This allows the files to be archived and used again, and allows comments to be embedded in the file to help explain the inputs.
If the input file has a regular format:
For parsing, read in a full line from the file, and use sscanf to "parse" the line into variables.
If the input file has an irregular format:
Fix the file format so that it is regular (if that is an option).
If not, then strtof and strtod are the best options.
Related
How can I modify my lex or yacc files to output the same input in a file? I read the statements from a file, I want to add some invariant for special statements and add it to input file and then continue statements. For example I read this file:
char mem(d);
int fun(a,b);
char a ;
The output should be like:
char mem(d);
int fun(a,b);
invariant(a>b) ;
char a;
I can't do this. I can only write the new statements to output file.
It's useful to understand why this is a non-trivial question.
The goal is to
Copy the entire input to the output; and
Insert some extra information produced while parsing.
The problem is that the first of those needs to be done by the scanner (lexer), because the scanner doesn't usually pass every character through to the parser. It usually drops whitespace, comments, at least. And it may do other things, like convert numbers to their binary representation, losing the original textual representation.
But the second one obviously needs to be done by the parser, obviously. And here is the problem: the parser is (almost) always one token behind the scanner, because it needs the lookahead token to decide whether or not to reduce. Consequently, by the time a reduction action gets executed, the scanner will already have processed all the input data up to the end of the next token. If the scanner is echoing input to output, the place where the parser wants to insert data has already been output.
Two approaches suggest themselves.
First, the scanner could pass all of the input to the parser, by attaching extra data to every token. (For example, it could attach all whitespace and comments to the following token.) That's often used for syntax coloring and reformatting applications, but it can be awkward to get the tokens output in the right order, since reduction actions are effectively executed in a post-order walk.
Second, the scanner could just remember where every token is in the input file, and the parser could attach notes (such as additional output) to token locations. Then the input file could be read again and merged with the notes. Unfortunately, that requires that the input be rewindable, which would preclude parsing from a pipe, for example; a more general solution would be to copy the input into a temporary file, or even just keep it in memory if you don't expect it to be too huge.
Since you can already output your own statements, your problem is how to write out the input as it is being read in. In lex, the value of each token being read is available in the variable yytext, so just write it out for every token you read. Depending on how your lexer is written, this could be used to echo whitespace as well.
This might sound rather awkward, but I want to ask if there is a commonly practiced way of storing tabular data in a text file to be read and written in C.
Like in python you can load a full text file nto an array by f.readlines then go through all the lines and split each line by a specific character or sequence of characters (delimiter).
How do you approach this problem in C?
Pretty much the same way you would in any other language. Pick a field separator (I.E., tab character), open the text file for reading and parse each line.
Of course, in C it will never be as easy as it is in Python, but approaches are similar.
Whoa. I am a bit baffled by the other answers which make me feel like I'm on Mainframes.stackexchange.com instead of stackoverflow.com
Why don't you pick a modern data format like JSON or XML and follow best practices for the data format of your choice?
If you want a good JSON reader/writer for C, I've used Jansson, and it's very easy and fast.
If you want a good XML reader/writer for C, I've used miniXML and it's also easy and fast. Also has SAX *and * DOM support depending on how you want to read in the XML.
Obviously there are a wealth of other libraries available as well.
Please don't give the next guy to come along and support your program some wacky custom file format to deal with.
I find getline() and strtok() to be quite convenient (getline was a gnu extension, standardized in POSIX.1-2008).
There's a handful of mechanisms, but there's a reason why scripting languages have become so popular over the least twenty years -- some of the tasks that seem simple in scripting languages are ponderous in C.
You could use flex and bison to write a parser for your tables. This really only works if the format is very well defined and "static". They're amazing tools that can do more than you might suspect, but it is very heavy machinery for what could be done simply with a split() in a scripting language.
You could read individual fields using getdelim(3). However, this was only standardized with POSIX.1-2008, so this is far from ubiquitous. (Every Linux machine with glibc should have them.)
You could read lines with fgets(3) and discover the split locations using strchr(3).
You could read lines with fgets(3) and use strtok(3) to tokenize strings.
You can use scanf(3) to perform input and scanning in one go; it seems from the questions here that scanf(3) is difficult to use correctly.
You could use character-at-a-time parsing approaches: read characters using getc(3), inspect it, do something with it, iterate until no more characters.
I have taken up a project and I would like some help. Basically it is a program to check whether some pins are connected or not on a board.
(Well, that's the simplified version. The whole thing is a circuit with a microcontroller.)
The problem is that, when a pin is connected I get a numeric value, and when it's not connected, I get no value, as in it's a blank in my table.
How can I accept these values?
I need to accept even the blank, to know that its not connected,
plus the table contains some other non-numeric values as well.
I tried reading the file using the fscanf() function but it didn't quite work. I'm aware of only fscanf(), fread(), fgets() and fgetc() functions to read from different kinds of files.
Also, is it possible to read data from an Excel file using C?
An example of the table is:
FROM TO
1 39
2
Over here, the numbers 1 and 2 are under the column FROM and it tells which pin the first end of the connector is connected to. The numbers under TO tell us which pin the other end of the connector is connected to, and when the column is blank, it's not connected at one end.
Now what I'm trying to do is create a program to create an assembly language program for the micro controller, so I need to be able to read whether the connector is connected, and if it is then to which pin? And accordingly, I need to perform some operations. (Which I can manage by myself).
The difficulty I'm facing is reading from a specific line and reading the blank.
Read the lines using fgets() or a relative. Then use sscanf() on the line, checking to see whether there were one or two successful conversions (the return value). If there's one conversion, the second value was empty or missing; if two, then you have both numbers safely.
Note that fscanf() and relatives will read past newlines unless you're careful, so they do not provide the information you need.
so your file is more like this
Col1 col2 \n
r1val1 r1val2\n
.
.
and so on,if this is the case then use fscanf() to read the string (until \n)from the file.Then use strtok() function to break the string into tokens ,here is the tutorial of the same
http://www.gnu.org/s/hello/manual/libc/Finding-Tokens-in-a-String.html
hope this helps...
one more humble suggestion..just work on c programming first if you are a newbie,don't directly go for microcontrollers,as there are lots of things that you might understand in a wrong way if you dont know some of the basic concepts...
This is a common problem in C. When line boundaries carry meaning in the grammar, it's difficult to directly read the file using only the scanf()-family functions.
Just read each line with fgets(3) and then run sscanf() on one line at a time. By doing this you won't incorrectly jump ahead to read the next line's first column.
Since there are two values on a line you can parse the first, find the next whitespace, then parse the next looking for it's absence as well. I say parse rather than scanf() as when I really want control, or have a huge volume of numbers to scan, I use calls in the strtol() family.
hey guys!
is there any way of directly accessing a cell in a .csv file format using C?
e.g. i want to sum up a column using C, how do i do it?
It's probably easiest to use the scanf-family for this, but it depends a little on how your data is organized. Let's say you have three columns of numeric data, and you want to sum up the third column, you could loop over a statement like this: (file is a FILE*, and is opened using fopen, and you loop until end of file is reached)
int n; fscanf(file, "%*d,%*d,%d", &n);
and sum up the ns. If you have other kinds of data in your file, you need to specify your format string accordingly. If different lines have different kinds of data, you'll probably need to search the string for separators instead and pick the third interval.
That said, it's probably easier not to use C at all, e.g. perl or awk will probably do a better job, :) but I suppose that's not an option.
If you have to use C: read the entire line to memory, go couting "," until you reach your desired column, read the value and sum it, go to next line.
When you reach your value, you can use sscanf to read it.
You might want to start by looking at RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files, and then looking for implementations of the same. (Be aware though, that the notion of comma separated values predates the RFC and there are many implementations that do not comply with that document.)
I find:
ccsv
And not many others in plain c. There are quite a few c++ implementations, and most of the are probably readily adapted to c.
I am working on a small text replacement application that basically lets the user select a file and replace text in it without ever having to open the file itself. However, I want to make sure that the function only runs for files that are text-based. I thought I could accomplish this by checking the encoding of the file, but I've found that Notepad .txt files use Unicode UTF-8 encoding, and so do MS Paint .bmp files. Is there an easy way to check this without placing restrictions on the file extensions themselves?
Unless you get a huge hint from somewhere, you're stuck. Purely by examining the bytes there's a non-zero probability you'll guess wrong given the plethora of encodings ("ASCII", Unicode, UTF-8, DBCS, MBCS, etc). Oh, and what if the first page happens to look like ASCII but the next page is a btree node that points to the first page...
Hints can be:
extension (not likely that foo.exe is editable)
something in the stream itself (like BOM [byte-order-marker])
user direction (just edit the file, goshdarnit)
Windows used to provide an API IsTextUnicode that would do a probabilistic examination, but there were well-known false-positives.
My take is that trying to be smarter than the user has some issues...
Honestly, given the Windows environment that you're working with, I'd consider a whitelist of known text formats. Windows users are typically trained to stick with extensions. However, I would personally relax the requirement that it not function on non-text files, instead checking with the user for goahead if the file does not match the internal whitelist. The risk of changing a binary file would be mitigated if your search string is long - that is assuming you're not performing Y2K conversion (a la sed 's/y/k/g').
It's pretty costly to determine if a file is text-based or not (i.e. a binary file). You would have to examine each byte in the file to determine if it is a valid character, irrespective of the file encoding.
Others have said to look at all the bytes in the file and see if they're alphanumeric. Some UNIX/Linux utils do this, but just check the first 1K or 2K of the file as an "optimistic optimization".
well a text file contains text, right ? so a really easy way to check a file if it does contain only text is to read it and check if it does contains alphanumeric characters.
So basically the first thing you have to do is to check the file encoding if its pure ASCII you have an easy task just read the whole file in to a char array (I'm assuming you are doing it in C/C++ or similar) and check every char in that array with functions isalpha and isdigit ...of course you have to take care about special exceptions like tabulators '\t' space ' ' or the newline ('\n' in linux , '\r'\'n' in windows)
In case of a different encoding the process is the same except the fact that you have to use different functions for checking if the current character is an alphanumeric character... also note that in case of UTF-16 or greater a simple char array is simply to small...but if you are doing it for example in C# you dont have to worry about the size :)
You can write a function that will try to determine if a file is text based. While this will not be 100% accurate, it may be just enough for you. Such a function does not need to go through the whole file, about a kilobyte should be enough (or even less). One thing to do is to count how many whitespaces and newlines are there. Another thing would be to consider individual bytes and check if they are alphanumeric or not. With some experiments you should be able to come up with a decent function. Note that this is just a basic approach and text encodings might complicate things.