Reading a list of numbers separated by comma into an array in C - c

I have a file of the following form
-1,1.2
0.3,1.5
Basically a list of vectors, where the dimension of the vectors is known but the number of vectors isn't. I need to read each vector into an array. In other words I need to turn
-1,1.2
into an array of doubles so that vector[0] == -1 , vector[1] == 1.2
I'm really not sure how to start.

There's three parts to the problem:
Getting access to the data in the file, i.e. opening it
Reading the data in the file
Tidying up, i.e. closing the file
The first and last part is covered in this tutorial as well as a couple of other things.
The middle bit can be done using formatted input, here's a example. As long as the input is well formatted, i.e. it is in the format you expect, then this will work OK. If the file has formatting errors in then this becomes trickier and you need to parse the file for formatting errors before converting the data.

Related

writing float as bytearray produces newline in python

Short version first, long version will follow :
Short :
I have a 2D matrix of float32. I want to write it to a .txt file as bytearray. I also want to keep the structure, which means adding a newline character at the end of a row. Some numbers like 683.61, when converted to bytearray include \n which produces an undesired newline character and messes up the reading ot the file as lines. How can I do this?
Long :
I am writing a program to work with huge arrays of datas (2D matrix). For that purpose, I need the array stored on disk rather then on my ram as the datas might be too big for the ram of the computer. I created my own type of file which is going to be read by the program. It has a header with important parameter as bytes followed by the matrix as bytearrays.
As I write the datas to the file one float32 at a time, I add a newline (\n) character at the end of one row of the matrix, so I keep the structure.
Writing goes well, but reading causes issues as some numbers, once converted to byte array, include \n.
As an example :
struct.pack('f',683.61)
will yield
b'\n\xe7*D'
This cuts my matrix rows as well as sometimes cut in the middle of a bytearray making the bytearray size wrong.
From this question :
Python handling newline and tab characters when writing to file
I found out that a str can be encoded with 'unicode_escape' to double the backslash and avoid confusion when reading.
Some_string.encode('unicode_escape')
However, this method only works on strings, not bytes or bytearrays. (I tryed it) This means I can't use it when I directly convert a float32 to a bytearray and write it to a file.
I have also tryed to convert the float to bytearray, decode the bytearray as a str and reencode it like so :
struct.pack('f',683.61).decode('utf-8').encode('unicode_escape')
but decode can't decode bytearrays.
I have also tryed converting the bytearray to string directly then encoding like so :
str(struct.pack('f',683.61)).encode('unicode_escape')
This yields a mess from which it is possible to get the right bytes with this :
bytes("b'\\n\\xe7*D'"[2:-1],'utf-8')
And finally, when I actually read the byte array, I obtain two different results wheter the unicode_escape has been used of not :
numpy.frombuffer(b'\n\xe7*D', dtype=float32)
yields : array([683.61], dtype=float32)
numpy.frombuffer(b'\\n\\xe7*D', dtype=float32)
yields : array([1.7883495e+34, 6.8086554e+02], dtype=float32)
I am expecting the top restults, not the bottom one. So I am back to square one.
--> How can I encode my matrix of floats as a bytearray, on multiple lines, without being affected by newline character in the bytearrays?
F.Y.I. I decode the bytearray with numpy as this is the working method I found, but it might not be the best way. Just starting to play around with bytes.
Thank you for you help. If there is any issue with my question, please inform me, I will gladly rewrite it properly if it was wrong.
You either write your data as binary data, or you use newlines to keep it readable - it does not even make sense otherwise.
When you are trying to record "bytes" to a file, and have float32 values raw as a 4 byte sequence, each of those bytes can, of course, have any value from 0-255 - and some of these will be control characters.
The alternatives are to serialize to a format that will encode your byte values to characters in the printable ASCII range, like base64, Json, or even pickle, using protocol 0.
Perhaps what will be most confortable for you is just to write your raw bytes in a binary byte, and change the programs you are using to interact with it - using and hexeditor like "hexedit" or Midnight Commander. Both will allow you to browse your bytes by their hexadecimal representation in a confortable way, and will display eventual ASCII-text sequences inside the files.
For anyone having the same questionning as I did, trying to keep the readline function working with byte, the previous answer from #jsbueno got me thinking of alternate ways to proceed rather than modify the bytes.
Here is an alternative if like me you are making your own file with data as bytes. write your own readline() function based on the classic read() function, but with a customized "newline character". Here is what I worked out :
def readline(file, newline=b'Some_byte',size=None):
buffer = bytearray()
if size is None :
while 1 :
buffer += file.read(1)
if buffer.endswith(newline):
break
else :
while len(buffer) < size :
buffer += file.read(1)
if buffer.endswith(newline):
break
return buffer

Writing matrix on binary file C

I am working in C with some binary files using the famous commands fwrite/fread.
I have to write pairs of numbers, one pair per line, like this:
double values[2];
for (int i=0 ; i<numPairs ; i++){
values[0]=rand();
values[1]=rand();
fwrite(&values, sizeof(double), 2, myFile);
}
where myFile is (as its name suggests) a file I've opened using fopen().
Although I’ve got a couple of questions:
in a binary file it is possible to write 2 numbers on the same line?
if so, will this command do the trick? I've been scavenging around for answers but I wasn't able to find something that confirms this point. It's ok with arrays and such, but for matrices...?
A binary file does not have a concept of "lines" - it's entirely up to your program.
Currently you write numPairs*2 doubles to the file, two at a time. You could equally well each double individually, or store them all in an array and wrote them all with one call to fwrite.
Likewise, the reading program is free to read them individually, or two at a time, or all at once.
I think you mean text file, since binary files don't have columns or rows just 1's and 0's which is only readable for computers

How to save a 2-Dimensional array to a file in C?

I am an beginner C programmer and I am currently working on a project to implement viola jones object detection algorithm using C. I would like to know how I would be able to store data in a 2-Dimensional array to a file that can be easily ported and accessed by different program files(e.g. main.c, header_file.h etc.)
Thank you in advance.
There's not quite enough detail to be sure what you're looking for, but the basic structure of what you want to do is going to look something like this:
open file.csv for writing
for(iterate through one dimension of the array using i)
{
for(iterate through the other dimension of the array using j)
{
fprintf(yourfilehandle,"%d,",yourvalue[i][j]);
}
fprintf(yourfilehandle,"\n");
}
close your file
As has been suggested by others, this will leave you with a .CSV file, which is a pretty good choice, as it's easy to read in and parse, and you can open your file in Notepad or Excel and view it no problems.
This is assuming you really meant to do this with C file I/O, which is a perfectly valid way of doing things, some just feel it's a bit dated.
Note this leaves an extraneous comma at the end of the line. If that bugs you it's easy enough to do the pre and post conditions to only get commas where you want. Hint: it involves printing the comma before the entry inside the second for loop, reducing the number of entries you iterate over for the interior for loop, and printing out the first and last case of each row special, immediately before and after the inner for loop, respectively. Harder to explain that to do, probably.
Here is a reference for C-style file I/O, and here is a tutorial.
Without knowing anything about what type of data you're storing, I would say to store this as a matrix. You'll need to choose a delimiter to separate your elements (tab or space are common choices, aka 'tsv' and 'csv', respectively) and then something to mark the end of a row (new line is a good choice here).
So your saved file might look something like:
10 162 1 5
7 1 4 12
9 2 2 0
You can also define your format as having some metadata in the first line -- the number of rows and columns may be useful if you want to pre-allocate memory, along with other information like character encoding. Start simple and add as necessary!

Standard format for writing Compressed Row Storage into text files?

I have a large sparse matrix stored in Compressed Row Storage (CRS) format. This is basically three arrays: an array containing the Values, an array for Column Index, and a final array containing the Row Pointers. E.g. http://web.eecs.utk.edu/~dongarra/etemplates/node373.html
I want to write this information into a text (.txt) file, which is intended to be read and put into three arrays using C. I currently plan to do this by writing all the entries in the Value array in one long line separated by commas. E.g. 5.6,10,456,78.2,... etc. Then do the same for the other two arrays.
My C code will end read the first line, put all the values into an array labeled "Value". And so on.
Question
Is this "correct"? Or is there a standard way of putting CRS data into text files?
No standard format that I'm aware of. You decide on a format that makes your life easy.
First, consider that if you want to look at one of these text files, you'll be instantly put off by the long lines. Some text editors might simply hate you. There's nothing wrong with splitting lines up.
Second, consider writing out the number of elements in each array (well, I suppose there's only two different array lengths for the three arrays) at the beginning of the file. This will let you preallocate your arrays. If you have all array lengths at hand, you have the option of doing a single memory allocation.
Finally, consider writing out some sensible tag names. Some kind of header that can identify your file is the correct format, then something to denote the start of each array. It's kind of a sanity thing for your code to detect problems with the file. It might just be one character, but it's something.
Now... call me a grungy old programmer, but I'd probably just write whole lot in binary. Especially if it's floating point data, I wouldn't want to deal with the loss of precision you get when you write out numbers as text (or the space they can consume when you write them with full precision). Binary files are easy to write and quick to run. You just have to be careful if you're going to be using them across platforms with different endian order.
That's my 2 cents worth.. Hope it's useful to you.
If you want to stick to some widely-used standards, have a look at the Matrix Market. This is a repository with many matrices arising in a variety of engineering and science problems. You can find software libraries to save and read the matrices as well.

Access a cell directly in .csv file using C programming

hey guys!
is there any way of directly accessing a cell in a .csv file format using C?
e.g. i want to sum up a column using C, how do i do it?
It's probably easiest to use the scanf-family for this, but it depends a little on how your data is organized. Let's say you have three columns of numeric data, and you want to sum up the third column, you could loop over a statement like this: (file is a FILE*, and is opened using fopen, and you loop until end of file is reached)
int n; fscanf(file, "%*d,%*d,%d", &n);
and sum up the ns. If you have other kinds of data in your file, you need to specify your format string accordingly. If different lines have different kinds of data, you'll probably need to search the string for separators instead and pick the third interval.
That said, it's probably easier not to use C at all, e.g. perl or awk will probably do a better job, :) but I suppose that's not an option.
If you have to use C: read the entire line to memory, go couting "," until you reach your desired column, read the value and sum it, go to next line.
When you reach your value, you can use sscanf to read it.
You might want to start by looking at RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files, and then looking for implementations of the same. (Be aware though, that the notion of comma separated values predates the RFC and there are many implementations that do not comply with that document.)
I find:
ccsv
And not many others in plain c. There are quite a few c++ implementations, and most of the are probably readily adapted to c.

Resources