How to save a 2-Dimensional array to a file in C? - c

I am an beginner C programmer and I am currently working on a project to implement viola jones object detection algorithm using C. I would like to know how I would be able to store data in a 2-Dimensional array to a file that can be easily ported and accessed by different program files(e.g. main.c, header_file.h etc.)
Thank you in advance.

There's not quite enough detail to be sure what you're looking for, but the basic structure of what you want to do is going to look something like this:
open file.csv for writing
for(iterate through one dimension of the array using i)
{
for(iterate through the other dimension of the array using j)
{
fprintf(yourfilehandle,"%d,",yourvalue[i][j]);
}
fprintf(yourfilehandle,"\n");
}
close your file
As has been suggested by others, this will leave you with a .CSV file, which is a pretty good choice, as it's easy to read in and parse, and you can open your file in Notepad or Excel and view it no problems.
This is assuming you really meant to do this with C file I/O, which is a perfectly valid way of doing things, some just feel it's a bit dated.
Note this leaves an extraneous comma at the end of the line. If that bugs you it's easy enough to do the pre and post conditions to only get commas where you want. Hint: it involves printing the comma before the entry inside the second for loop, reducing the number of entries you iterate over for the interior for loop, and printing out the first and last case of each row special, immediately before and after the inner for loop, respectively. Harder to explain that to do, probably.
Here is a reference for C-style file I/O, and here is a tutorial.

Without knowing anything about what type of data you're storing, I would say to store this as a matrix. You'll need to choose a delimiter to separate your elements (tab or space are common choices, aka 'tsv' and 'csv', respectively) and then something to mark the end of a row (new line is a good choice here).
So your saved file might look something like:
10 162 1 5
7 1 4 12
9 2 2 0
You can also define your format as having some metadata in the first line -- the number of rows and columns may be useful if you want to pre-allocate memory, along with other information like character encoding. Start simple and add as necessary!

Related

Standard format for writing Compressed Row Storage into text files?

I have a large sparse matrix stored in Compressed Row Storage (CRS) format. This is basically three arrays: an array containing the Values, an array for Column Index, and a final array containing the Row Pointers. E.g. http://web.eecs.utk.edu/~dongarra/etemplates/node373.html
I want to write this information into a text (.txt) file, which is intended to be read and put into three arrays using C. I currently plan to do this by writing all the entries in the Value array in one long line separated by commas. E.g. 5.6,10,456,78.2,... etc. Then do the same for the other two arrays.
My C code will end read the first line, put all the values into an array labeled "Value". And so on.
Question
Is this "correct"? Or is there a standard way of putting CRS data into text files?
No standard format that I'm aware of. You decide on a format that makes your life easy.
First, consider that if you want to look at one of these text files, you'll be instantly put off by the long lines. Some text editors might simply hate you. There's nothing wrong with splitting lines up.
Second, consider writing out the number of elements in each array (well, I suppose there's only two different array lengths for the three arrays) at the beginning of the file. This will let you preallocate your arrays. If you have all array lengths at hand, you have the option of doing a single memory allocation.
Finally, consider writing out some sensible tag names. Some kind of header that can identify your file is the correct format, then something to denote the start of each array. It's kind of a sanity thing for your code to detect problems with the file. It might just be one character, but it's something.
Now... call me a grungy old programmer, but I'd probably just write whole lot in binary. Especially if it's floating point data, I wouldn't want to deal with the loss of precision you get when you write out numbers as text (or the space they can consume when you write them with full precision). Binary files are easy to write and quick to run. You just have to be careful if you're going to be using them across platforms with different endian order.
That's my 2 cents worth.. Hope it's useful to you.
If you want to stick to some widely-used standards, have a look at the Matrix Market. This is a repository with many matrices arising in a variety of engineering and science problems. You can find software libraries to save and read the matrices as well.

How to take numbers and words out of a txt file and assign them to int and char*

I am making a text based game and want the user to be able to save. When they save all the variables will be saved to a text file.
I can't figure out how to take them out of the file and assigning them to specific variables and pointers.
The file will look somewhat like this:
jesse
hello
yes
rifle
0
1
3
20
Is there anyway I can specify what line I want to take out with fscanf? Or do I have to take a different approach?
There is no way to specify what line to read from because the concept of a file stream in C does not explicitly distinguish new lines. They are simply treated as a character. To read from a specific line, you would have to loop forward with fseek and fgetc until you find '\n' at which point you can update some variable that holds the current line number the stream points to.
One way around this would be to have information at a fixed offset. For example, say you are storing player information then if you make player information a fixed size X and have the constituent data at fixed indexes into each structure, you can just fseek to the right location straight away.
However, if you have structured data, it may be more suitable to use a format which is able to represent these structures inherently such as XML or JSON.
Altough I can't exactly tell what you want, I'd do a few suggestions:
Use a SQLite file instead of a text file. This way you can use SQL to get exactly what you want. Shortcut for you: http://www.sqlite.org/
If you still want to use a text file, use it comma-separated instead of spaces-separated. It's more common of a practice.
I have created a simple settings reader for my C program, maybe it might be useful to you to know how to parse test files
https://codereview.stackexchange.com/questions/8620/coding-style-in-c

How to represent a random-access text file in memory (C)

I'm working on a project in which I need to read text (source) file in memory and be able to perform random access into (say for instance, retrieve the address corresponding to line 3, column 15).
I would like to know if there is an established way to do this, or data structures that are particularly good for the job. I need to be able to perform a (probably amortized) constant time access. I'm working in C, but am willing to implement higher level data structures if it is worth it.
My first idea was to go with a linked list of large buffer that will hold the character data of the file. I would also make an array, whose index are line numbers and content are addresses corresponding to the begin of the line. This array would be reallocated on need.
Subsidiary question: does anyone have an idea the average size of a source file ? I was surprised not to find this on google.
To clarify:
The file I'm concerned about are source files, so their size should be manageable, they should not be modified and the lines have variables length (tough hopefully capped at some maximum).
The problem I'm working on needs mostly a read-only file representation, but I'm very interested in digging around the problem.
Conlusion:
There is a very interesting discussion of the data structures used to maintain a file (with read/insert/delete support) in the paper Data Structures for Text Sequences.
If you just need read-only, just get the file size, read it in memory with fread(), then you have to maintain a dynamic array which maps the line numbers (index) to pointer to the first character in the line. Someone below suggested to build this array lazily, which seems a good idea in many cases.
I'm not quite sure what the question is here, but there seems to be a bit of both "how do I keep the file in memory" and "how do I index it". Since you need random access to the file's contents, you're probably well advised to memory-map the file, unless you're tight on address space.
I don't think you'll be able to avoid a linear pass through the file once to find the line endings. As you said, you can create an index of the pointers to the beginning of each line. If you're not sure how much of the index you'll need, create it lazily (on demand). You can also store this index to disk (as offsets, not pointers) if you will need it on subsequent runs. You can estimate the size of the index based on the file size and the expected line length.
1) Read (or mmap) the entire file into one chunk of memory.
2) In a second pass create an array of pointers or offsets pointing to the beginnings of the lines (hint: one after the '\n' ) into that memory.
Now you can index the array to access a specific line.
It's impossible to make insertion, deletion, and reading at a particular line/column/character address all simultaneously O(1). The best you can get is simultaneous O(log n) for all of these operations, and it can be achieved using various sorts of balanced binary trees for storing the file in memory.
Of course, unless your files will be larger than 100 kB or so, you're probably best off not bothering with anything fancy and just using a flat linear buffer...
solution: If lines are about same size, make all lines equally long by appending needed number of metacharacters to each line. Then you can simply calculate the fseek() position from line number, making your search O(1).
If lines are sorted, then you can perform binary search, making your search O(log(nõLines)).
If neither, you can store the indexes of line begginings. But then, you have a problem if you modify file a lot, because if you insert let's say X characters somewhere, you have to calculate which line it is, and then add this X to the all next lines. Similar with with deletion. Yu essentially get O(nõLines). And code gets ugly.
If you want to store whole file in memory, just create aray of lines *char[]. You then get line by first dereference and character by second dereference.
As an alternate suggestion (although I do not fully understand the question), you might want to consider a struct based, dynamically linked list of dynamic strings. If you want to be astutely clever, you could build a dynamically linked list of chars which you then export as strings.
You'd have to use OO type design for this to be manageable.
So structs you'd likely want to build are:
DynamicArray;
DynamicListOfArrays;
CharList;
So it goes:
CharList(Gets Chars/Size) -> (SetSize)DynamicArray -> (AddArray)DynamicListOfArrays
If you build suitable helper functions for malloc and delete, and make it so the structs can either delete themselves automatically or manually. Using the above combinations won't get you O(1) read in (which isn't possible without the files have a static format), but it will get you good time.
If you know the file static length (at least individual line wise), IE no bigger than 256 chars per line, then all you need is the DynamicListOfArries - write directly to the array (preset to 256), create a new one, repeat. Downside is it wastes memory.
Note: You'd have to convert the DynamicListOfArrays into a 'static' ArrayOfArrays before you could get direct point-to-point access.
If you need source code to give you an idea (although mine is built towards C++ it wouldn't take long to rewrite), leave a comment about it. As with any other code I offer on stackoverflow, it can be used for any purpose, even commercially.
Average size of a source file? Does such a thing exist? A source file could go from 0 bytes to thousands of bytes, like any text file, it depends on the number of caracters it contains

Efficient random access within a file? [C]

I have a text file I use to hold an index of files and words (with their frequencies) that appear in them. I need to read the file into memory and store the words so they can be searched. The file is formatted as follows:
<files> 169
0:file0.txt
1:file1.txt
2:file2.txt
3:file3.txt
... etc ...
</files>
<list> word 2
9: 10
1: 2
</list>
<list> word2 4
3: 19
5: 12
0: 2
8: 2
</list>
... etc ...
The problem is that this index file can become extremely large and won't all fit into memory at once. My solution is to only store a handful of them in a HashTable at once and then when I need to get the data for another word, I would kick an old word out and then parse the data for the new word from a file.
How can I efficiently accomplish this in C? I was thinking I would have to do something with fseek and rewinding once I got to certain points.
Thanks,
Mike
Although C has poor string support - from what I can tell looking at the sample, it has a distinct pattern, re-parsing this from disk would be practical.
I would however consider converting the file into a database and work from there. Unless there is reason not to, pull in a third party database engine.
If you decide to go for re parsing the text file, It does not look too difficult. First pass store the start locations of each list, as a pair. Then all you do is seek to the index to read the data for a particular word.
If your efficiency concern is how long it will take the computer to do the parsing, forget it, work out what is easiest for you. Don't optimize till you know you need to. Computers are fast and cheap, programmers aren't.
Like mattnz pointed out, this is best achieved using separate database layer. You can try SQlite. There is almost zero setup and is very stable. Otherwise, if you want to do this in C, you can have a header in beginning of file with links/indexes to each section of the file. Section being <files>..</files>, <list>..</list>. This is just on top of my head. If you read any book on implementing databases, you can find many more techniques.
It ended up that the best way to do this (for my needs) was to keep a pointer to current location in the file and the use rewind( FILE *f ); when I reached the end.

Access a cell directly in .csv file using C programming

hey guys!
is there any way of directly accessing a cell in a .csv file format using C?
e.g. i want to sum up a column using C, how do i do it?
It's probably easiest to use the scanf-family for this, but it depends a little on how your data is organized. Let's say you have three columns of numeric data, and you want to sum up the third column, you could loop over a statement like this: (file is a FILE*, and is opened using fopen, and you loop until end of file is reached)
int n; fscanf(file, "%*d,%*d,%d", &n);
and sum up the ns. If you have other kinds of data in your file, you need to specify your format string accordingly. If different lines have different kinds of data, you'll probably need to search the string for separators instead and pick the third interval.
That said, it's probably easier not to use C at all, e.g. perl or awk will probably do a better job, :) but I suppose that's not an option.
If you have to use C: read the entire line to memory, go couting "," until you reach your desired column, read the value and sum it, go to next line.
When you reach your value, you can use sscanf to read it.
You might want to start by looking at RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files, and then looking for implementations of the same. (Be aware though, that the notion of comma separated values predates the RFC and there are many implementations that do not comply with that document.)
I find:
ccsv
And not many others in plain c. There are quite a few c++ implementations, and most of the are probably readily adapted to c.

Resources