Create an array of values from different text files in C

Create an array of values from different text files in C - c

I'm working in C on 64-bit Ubuntu 14.04.
I have a number of .txt files, each containing lines of floating point values (1 value per line). The lines represent parts of a complex sample, and they're stored as real(a1) \n imag(a1) \n real(a2) \n imag(a2), if that makes sense.
In a specific scenario there are 4 text files each containing 32768 samples (thus 65536 values), but I need to make the final version dynamic to accommodate up to 32 files (the maximum samples per file would not exceed 32768 though). I'll only be reading the first 19800 samples (depending on other things) though, since the entire signal is contained in those 39600 points (19800 samples).
A common abstraction is to represent the files / samples as a matrix, where columns represent return signals and rows represent the value of each signal at a sampling instant, up until the maximum duration.
What I'm trying to do is take the first sample from each return signal and move it into an array of double-precision floating point values to do some work on, move on to the second sample for each signal (which will overwrite the previous array) and do some work on them, and so forth, until the last row of samples have been processed.
Is there a way in which I can dynamically open files for each signal (depending on the number of pulses I'm using in that particular instance), read the first sample from each file into a buffer and ship that off to be processed. On the next iteration, the file pointers will all be aligned to the second sample, it would then move those into an array and ship it off again, until the desired amount of samples (19800 in our hypothetical case) has been reached.
I can read samples just fine from the files using fscanf:
rx_length = 19800;
int x;
float buf;
double *range_samples = calloc(num_pulses, 2 * sizeof(range_samples));
for (i=0; i < 2 * rx_length; i++){
x = fscanf(pulse_file, "%f", &buf);
*(range_samples) = buf;
}
All that needs to happen (in my mind) is that I need to cycle both sample# and pulse# (in that order), so when finished with one pulse it would move on to the next set of samples for the next pulse, and so forth. What I don't know how to do is to somehow declare file pointers for all return signal files, when the number of them can vary inbetween calls (e.g. do the whole thing for 4 pulses, and on the next call it can be 16 or 64).
If there are any ideas / comments / suggestions I would love to hear them.
Thanks.

I would make the code you posted a function that takes an array of file names as an argument:
void doPulse( const char **file_names, const int size )
{
FILE *file = 0;
// declare your other variables
for ( int i = 0; i < size; ++i )
{
file = fopen( file_names[i] );
// make sure file is open
// do the work on that file
fclose( file );
file = 0;
}
}

What you need is a generator. It would be reasonably easy in C++, but as you tagged C, I can imagine a function, taking a custom struct (the state of the object) as parameter. It could be something like (pseudo code) :
struct GtorState {
char *files[];
int filesIndex;
FILE *currentFile;
};
void gtorInit(GtorState *state, char **files) {
// loads the array of file into state, set index to 0, and open first file
}
int nextValue(GtorState *state, double *real, double *imag) {
// read 2 values from currentFile and affect them to real and imag
// if eof, close currentFile and open files[++currentIndex]
// if real and imag were found returns 0, else 1 if eof on last file, 2 if error
}
Then you main program could contain :
GtorState state;
// initialize the list of files to process
gtorInit(&state, files);
double real, imag);
int cr;
while (0 == (cr = nextValue(&state, &real, &imag)) {
// process (real, imag)
}
if (cr == 2) {
// process (at least display) error
}
Alternatively, your main program could iterate the values of the different files and call a function with state analog of the above generator that processes the values, and at the end uses the state of the processing function to get the results.

Tried a slightly different approach and it's working really well.
In stead of reading from the different files each time I want to do something, I read the entire contents of each file into a 2D array range_phase_data[sample_number][pulse_number], and then access different parts of the array depending on which range bin I'm currently working on.
Here's an excerpt:
#define REAL(z,i) ((z)[2*(i)])
#define IMAG(z,i) ((z)[2*(i)+1])
for (i=0; i<rx_length; i++){
printf("\t[%s] Range bin %i. Samples %i to %i.\n", __FUNCTION__, i, 2*i, 2*i+1);
for (j=0; j<num_pulses; j++){
REAL(fft_buf, j) = range_phase_data[2*i][j];
IMAG(fft_buf, j) = range_phase_data[2*i+1][j];
}
printf("\t[%s] Range bin %i done, ready to FFT.\n", __FUNCTION__, i);
// do stuff with the data
}
This alleviates the need to dynamically allocate file pointers and in stead just opens the files one at a time and writes the data to the corresponding column in the matrix.
Cheers.

Related

MSP430 SD card application running on top of FATFS appears too restrictive . Is my understanding correct?

I am working my way through an SD card application code example provided by TI for their MSP530 LaunchPad microcontroller development kit. It appears that the example restricts the number of directories and number of files to 10 each (a total of 100 files) which seems overly restrictive for a 32GB SD card. The current code compiles to use less than half of the program space and less than half of available RAM. I am wondering if I misunderstand the code, or if the code is limited by some other reason, such as the available stack size in memory. Below is the code and my comments.
There are several layers: SDCardLogMode, sdcard (SDCardLib), and ff (HAL layer). I've reduced the code below to illustrate the constructions but not runnable - I am more interested if I understand it correctly and if my solution to increase the number of allowed files and directories is flawed.
SDCardLogMode.c there are two places of interest here. The first is the declaration of char dirs[10][MAX_DIR_LEN] and files[10][MAX_FILE_LEN]. the MAX LENs are 8 and 12 respectively and are the maximum allowed length of a name.
/*******************************************************************************
*
* SDCardLogMode.c
* ******************************************************************************/
#include "stdlib.h"
#include "string.h"
#include "SDCardLogMode.h"
#include "driverlib.h"
#include "sdcard.h"
#include "HAL_SDCard.h"
#pragma PERSISTENT(numLogFiles)
uint8_t numLogFiles = 0;
SDCardLib sdCardLib;
char dirs[10][MAX_DIR_LEN];
char files[10][MAX_FILE_LEN]; //10 file names. MAX_FILE_LEN =10
uint8_t dirNum = 0;
uint8_t fileNum = 0;
#define MAX_BUF_SIZE 32
char buffer[MAX_BUF_SIZE];
// FatFs Static Variables
static FIL fil; /* File object */
static char filename[31];
static FRESULT rc;
//....
Later in the same SDCardLogMode.c file is the following function (also reduced for readability). Here the interesting thing is that the code calls SDCardLib_getDirectory(&sdCardLib, "data_log", dirs, &dirNum, files, &fileNum) which consume the "data_log" path and produces dir, and updates &dirNum, files, and &fileNum. I do not believe &sdCardLib (which holds a handle to the FATFS and an interface pointer) is used in this function. At least not that I can tell.
What is puzzling is what's the point of calling SDCardLib_getDirectory() and then not using anything it produces? I did not find any downstream use of the dirs and files char arrays. Nor did I find any use of dirNum and fileNum either.
In the code snippets I show the code for SDCardLib_getDirectory(). I could not find where SDCardLib parameter is used. And as mentioned earlier, I found no use of files and dirs arrays. I can see where the file and directory count could be used to generate new names, but there are already static variables to hold the file count. Can anyone see a reason why the SDCard_getDirectory() was called?
/*
* Store TimeStamp from PC when logging starts to SDCard
*/
void storeTimeStampSDCard()
{
int i = 0;
uint16_t bw = 0;
unsigned long long epoch;
// FRESULT rc;
// Increment log file number
numLogFiles++;
,
//Detect SD card
SDCardLib_Status st = SDCardLib_detectCard(&sdCardLib);
if (st == SDCARDLIB_STATUS_NOT_PRESENT) {
SDCardLib_unInit(&sdCardLib);
mode = '0';
noSDCard = 1; //jn added
return;
}
// Read directory and file
rc = SDCardLib_getDirectory(&sdCardLib, "data_log", dirs, &dirNum, files, &fileNum);
//Create the directory under the root directory
rc = SDCardLib_createDirectory(&sdCardLib, "data_log");
if (rc != FR_OK && rc != FR_EXIST) {
SDCardLib_unInit(&sdCardLib);
mode = '0';
return;
}
//........
}
Now jumping to sdcard.c (SDCardLib layer) to look at SDCardLib_getDirectory() is interesting. It takes the array pointer assigns it to a one dimensional array (e.g. char (*fileList)[MAX_FILE_LEN] and indexes it each time it writes a filename). This code seem fragile since the SDCardLib_createDirectory() simply returns f_mkdir(directoryName), it does not check how many files already exist. Perhaps TI assumes this checking should be done at the application layer above SDCardLogMode....
void SDCardLib_unInit(SDCardLib * lib)
{
/* Unregister work area prior to discard it */
f_mount(0, NULL);
}
FRESULT SDCardLib_getDirectory(SDCardLib * lib,
char * directoryName,
char (*dirList)[MAX_DIR_LEN], uint8_t *dirNum,
char (*fileList)[MAX_FILE_LEN], uint8_t *fileNum)
{
FRESULT rc; /* Result code */
DIRS dir; /* Directory object */
FILINFO fno; /* File information object */
uint8_t dirCnt = 0; /* track current directory count */
uint8_t fileCnt = 0; /* track current directory count */
rc = f_opendir(&dir, directoryName);
for (;;)
{
rc = f_readdir(&dir, &fno); // Read a directory item
if (rc || !fno.fname[0]) break; // Error or end of dir
if (fno.fattrib & AM_DIR) //this is a directory
{
strcat(*dirList, fno.fname); //add this to our list of names
dirCnt++;
dirList++;
}
else //this is a file
{
strcat(*fileList, fno.fname); //add this to our list of names
fileCnt++;
fileList++;
}
}
*dirNum = dirCnt;
*fileNum = fileCnt;
return rc;
}
Below is SDCardLib_createDirectory(SDCardLib *lib, char *directoryName). It just creates a directory, it does not check on the existing number of files.
FRESULT SDCardLib_createDirectory(SDCardLib * lib, char * directoryName)
{
return f_mkdir(directoryName);
}
So coming back to my questions:
Did I understand this code correctly, does it really does limit the number of directories and files to 10 each?
If so, why would the number of files and directories be so limited? The particular MSP430 that this example code came with has 256KB of program space and 8KB of RAM. The compiled code consumes less than half of the available resources (68KB of program space and about 2.5KB of RAM). Is it because any larger would overflow the stack segment?
I want to increase the number of files that can be stored. If I look at the underlying FATFS code, it does not to impose a limit on the number of files or directories (at least not until the sd card is full). If I never intend to display or search the contents of a directory on the MSP430 my thought is to remove SDCard_getDirectory() and the two char arrays (files and dirs). Would there a reason why this would be a bad idea?

There are other microcontrollers with less memory.
The SDCardLib_getDirectory() function treats its dirList and fileList parameters as simple strings, i.e., it calls strcat() on the same pointers. This means that it can read as many names as fit into 10*8 or 10*12 bytes.
And calling strcat() without adding a delimiter means that it is impossible to get the individual names out of the string.
This codes demonstrates that it is possible to use the FatFs library, and in which order its functions need to be called, but it is not necessarily a good example of how to do that. I recommend that you write your own code.

gnu FORTRAN unformatted file record markers stored as 64-bit width?

I have a legacy code and some unformatted data files that it reads, and it worked with gnu-4.1.2. I don't have access to the method that originally generated these data files. When I compile this code with a newer gnu compiler (gnu-4.7.2) and attempt to load the old data files on a different computer, it is having difficulty reading them. I start by opening the file and reading in the first record which consists of three 32-bit integers:
open(unit, file='data.bin', form='unformatted', status='old')
read(unit) x,y,z
I am expecting these three integers here to describe x,y,z spans so that next it can load a 3D matrix of float values with those same dimensions. However, instead it's loading a 0 for the first value, then the next two are offset.
Expecting:
x=26, y=127, z=97 (1A, 7F, 61 in hex)
Loaded:
x=0, y=26, z=127 (0, 1A, 7F in hex)
When I checked the data file in a hex editor, I think I figured out what was happening.
The first record marker in this case has a value of 12 (0C in hex) since it's reading three integers at 4 bytes each. This marker is stored both before and after the record. However, I notice that the 32bits immediately after each record marker is 00000000. So either the record markers are treated as 64bit integers (little-Endian) or there is a 32-bit zero padding after each record marker. Either way, the code generated with the new compiler is reading the record markers as 32-bit integers and not expecting any padding. This effectively intrudes/corrupts the data being read in.
Is there an easy way to fix this non-portable issue? The old and new hardware are 64 bit architecture and so is the executable I compiled. If I try to use the older compiler version again will it solve the problem, or is it hardware dependent? I'd prefer to use the newer compilers because they are more efficient, and I really don't want to edit the source code to open all the files as access='stream' and manually read in a trailing 0 integer after each record marker, both before and after each record.
P.S. I could probably write a C++ code to alter the data files and remove these zero paddings if there is no easier alternative.

See the -frecord-marker= option in the gfortran manual. With -frecord-marker=8 you can read the old style unformatted sequential files produced by older versions of gfortran.

Seeing as how Fortran doesn't have a standardization on this, I opted to convert the data files to a new format that uses 32-bit wide record lengths instead of 64-bit wide. In case anyone needs to do this in the future I've included some Visual C++ code here that worked for me and should be easily modifiable to C or another language. I have also uploaded a Windows executable (fortrec.zip) here.
CFile OldFortFile, OutFile;
const int BUFLEN = 1024*20;
char pbuf[BUFLEN];
int i, iIn, iRecLen, iRecLen2, iLen, iRead, iError = 0;
CString strInDir = "C:\folder\";
CString strIn = "file.dat";
CString strOutDir = "C:\folder\fortnew\"
system("mkdir \"" + strOutDir + "\""); //create a subdir to hold the output files
strIn = strInDir + strIn;
strOut = strOutDir + strIn;
if(OldFortFile.Open(strIn,CFile::modeRead|CFile::typeBinary)) {
if(OutFile.Open(strOut,CFile::modeCreate|CFile::modeWrite|CFile::typeBinary)) {
while(true) {
iRead = OldFortFile.Read(&iRecLen, sizeof(iRecLen)); //Read the record's raw data
if (iRead < sizeof(iRecLen)) //end of file reached
break;
OutFile.Write(&iRecLen, sizeof(iRecLen));//Write the record's raw data
OldFortFile.Read(&iIn, sizeof(iIn));
if (iIn != 0) {//this is the padding we need to ignore, ensure it's always zero
//Padding not found
iError++;
break;
}
i = iRecLen;
while (i > 0) {
iLen = (i > BUFLEN) ? BUFLEN : i;
OldFortFile.Read(&pbuf[0], iLen);
OutFile.Write(&pbuf[0], iLen);
i -= iLen;
}
if (i != 0) { //Buffer length mismatch
iError++;
break;
}
OldFortFile.Read(&iRecLen2, sizeof(iRecLen2));
if (iRecLen != iRecLen2) {//ensure we have reached the end of the record proeprly
//Record length mismatch
iError++;
break;
}
OutFile.Write(&iRecLen2, sizeof(iRecLen));
OldFortFile.Read(&iIn, sizeof(iIn));
if (iIn != 0) {//this is the padding we need to ignore, ensure it's always zero
//Padding not found
break;
}
}
OutFile.Close();
OldFortFile.Close();
}
else { //Could not create the ouput file.
OldFortFile.Close();
return;
}
}
else { //Could not open the input file
}
if (iError == 0)
//File successfully converted
else
//Encountered error

Adding characters to the middle of a file without overwriting the existing characters in C

I am quite rusty with C and system calls and pointers in general, so this is a good refresher exercise to get back on track. All I need to do is, given a file such as this:
YYY.txt: "somerandomcharacters"
Change it to be like this:
YYY.txt: "somerandomabcdefghijklmnopqrstuvwxyzcharacters"
So all that is done is some characters added to the middle of the file. Obviously, this is quite simple, but in C you must keep track and manage the size of the file in advance before adding the additional characters.
Here is my naive try:
//(Assume a file called YYY.txt exists and an int YYY is the file descriptor.)
char ToBeInserted[26] = "abcdefghijklmnopqrstuvwxyz";
//Determine the current length of YYY
int LengthOfYYY = lseek(YYY, 0, 2);
if(LengthOfYYY < 0)
printf("Error upon using lseek to get length of YYY");
//Assume we want to insert at position 900 in YYY.txt, and length of YYY is over 1000.
//1.] Keep track of all characters past position 900 in YYY and store in a char array.
lseek(YYY, 900, 0); //Seeks to position 900 in YYY, so reading begins there.
char NextChar;
char EverythingPast900[LengthOfYYY-900];
int i = 0;
while(i < (LengthOfYYY - 900)) {
int NextRead = read(YYY, NextChar, 1); //Puts next character from YYY in NextChar
EverythingPast900[i] = NextChar;
i++;
}
//2.] Overwrite what used to be at position 900 in YYY:
lseek(YYY, 900, 0); //Moves to position 900.
int WriteToYYY = write(YYY, ToBeInserted, sizeof(ToBeInserted));
if(WriteToYYY < 0)
printf("Error upon writing to YYY");
//3.] Move to position 900 + length of ToBeInserted, and write the characters that were saved.
lseek(YYY, 926, 0);
int WriteMoreToYYY = write(YYY, EverythingPast900, sizeof(EverythingPast900));
if (WriteMoreToYYY < 0) {
printf("Error writing the saved characters back into YYY.");
}
I think the logic is sound, mostly, although there are much better ways to do it in C. I need help on my C pointers, basically, as well as the UNIX system calls. Does anyone mind walking me through how to properly implement this in C?

That's the basic idea. If you had to really conserve RAM and the file was a lot bigger you'd want to copy block by block in reverse order. But the simpler way is to read the entire thing into memory and rewrite the entire file.
also, I prefer the stream functions: fopen, fseek, fread. But the file descriptor method works.

Writing a VTK ASCII Legacy File to draw contours in VisIt

I'm trying to write a legacy .vtk file to be read into VisIt using C. Unfortunately my installed VisIt program refuses to render the VTK file that I have writing, reading: 'local host failed'
Below is the code used to read data from one file and convert it to a legacy VTK file. I use the macros XPIX, YPIX, and ZPIX to describe the dimensions of a pixel grid. Each pixel contains a scalar density value. I have listed the pixels in a 'grid-file' using row-major ordering: i.e.
int list_index(x,y,z) = YPIX * ZPIX * x + ZPIX * y + z;
Every entry in this pixel list is read into an array called grid[] of type double, and written to outfile beneath the legacy VTK header data:
/*Write vtk header */
fprintf(outfile,"# vtk DataFile Version 3.0\n");
fprintf(outfile,"Galaxy density grid\nASCII\nDATASET STRUCTURED_POINTS\n");
fprintf(outfile,"DIMENSIONS %d %d %d \n", (XPIX+1), (YPIX+1), (ZPIX+1));
fprintf(outfile,"ORIGIN 0 0 0\n");
fprintf(outfile,"SPACING 1 1 1\n");//or ASPECT_RATIO
fprintf(outfile,"CELL_DATA %d\n", totalpix);
fprintf(outfile,"SCALARS cell_density float 1\n");
fprintf(outfile, "LOOKUP_TABLE default\n");
/*Create Memory Space to store Pixel Grid*/
double *grid;
grid = malloc(XPIX * YPIX * ZPIX * sizeof(double));
if (grid == NULL ){
fprintf(stderr, "Pixel grid of type double failed to initialize\n");
exit(EXIT_FAILURE);
}
fprintf(stderr,"Pixel grid has been initialized.\n Now reading infile\n");
/*Read infile contents into double grid[], using Row-Major Indexing*/
double rho;
char newline;
int i, j, k;
for(i = 0; i < XPIX; i++){
for(j = 0; j < YPIX; j++){
for(k = 0; k < ZPIX; k++){
fscanf(infile, "%lf", &rho);
grid[getindex(i,j,k)] = rho;
}
}
fprintf(stderr,"%d\n", i);
}
fprintf(stderr,"Finished reading\n");
#if !DEBUG
/*Write out grid contents in Row major order*/
fprintf(stderr,"Now writing vtk file");
for(i = 0; i < XPIX; i++){
for(j = 0; j < YPIX; j++){
for(k = 0; k < ZPIX; k++){
fprintf(outfile, "%lf ", grid[getindex(i,j,k)]);
}
fprintf(outfile,"\n");
}
}
fprintf(stderr,"Finished Writing to outfile\n");
#endif
After running the grid data list through this routine, I have XPIX*YPIX lines in the lookup_table, each with ZPIX entries. Is this an incorrect format? VisIt continues to fail reading the input file. I'm aware of the fact that structured_points may use column major indexing, but my first goal of course is to get some sort of result from VisIt. I would like to draw a contour using the scalar cell_density eventually. Is my data set simply too large?

Have you seen the accepted answer to the question vtk data format error? The question is debugging a VTK writer in C++ but it is very similar to your code (and of course should yield the same results).
The key point from the accepted answer is that data is written in column major order, not row major order (you seem to hint at this in your question: "I'm aware of the fact that structured_points may use column major indexing").
Also, it is always helpful (if you can) to compare your code with something which you know works. For example, VisIt provides a small C library for writing legacy VTK file formats called VisItWriterLib. Compare the output from your code and the VisItWriterLib to see where your data files differ. I would recommend using VisItWriterLib for your VTK IO rather than writing your own routines - no need to reinvent the wheel.
Edit: To answer a couple of your other questions:
After running the grid data list through this routine, I have
XPIX*YPIX lines in the lookup_table, each with ZPIX entries. Is this
an incorrect format?
This is not the correct format. LOOKUP_TABLE should be a list of XPIX*YPIX*ZPIX lines, with one element per line (or alternatively, VisIt will accept one line with XPIX*YPIX*ZPIX elements). See the section Dataset Attribute Format in the VTK File Formats document (www.vtk.org/VTK/img/file-formats.pdf).
Is my data set simply too large?
I doubt it. VisIt is designed to handle huge datasets and, AFAIK, can render petabyte data sets. I would be very surprised if your data is that large.
However, if you are concerned about having large files, you can split your data into multiple files and tell VisIt to read these files is parallel. To do this, write a bit a your data into separate files, e.g. domain1.vtk, domain2.vtk, ... domainN.vtk etc. Then write a .visit master file, which has the structure
!NBLOCKS N
domain1.vtk
domain2.vtk
...
domainN.vtk
Save this as, for example, mydata.visit and then open this .visit file, rather than the .vtk files, in VisIt.

What could cause a Labwindows/CVI C program to hate the number 2573?

Using Windows
So I'm reading from a binary file a list of unsigned int data values. The file contains a number of datasets listed sequentially. Here's the function to read a single dataset from a char* pointing to the start of it:
function read_dataset(char* stream, t_dataset *dataset){
//...some init, including setting dataset->size;
for(i=0;i<dataset->size;i++){
dataset->samples[i] = *((unsigned int *) stream);
stream += sizeof(unsigned int);
}
//...
}
Where read_dataset in such a context as this:
//...
char buff[10000];
t_dataset* dataset = malloc( sizeof( *dataset) );
unsigned long offset = 0;
for(i=0;i<number_of_datasets; i++){
fseek(fd_in, offset, SEEK_SET);
if( (n = fread(buff, sizeof(char), sizeof(*dataset), fd_in)) != sizeof(*dataset) ){
break;
}
read_dataset(buff, *dataset);
// Do something with dataset here. It's screwed up before this, I checked.
offset += profileSize;
}
//...
Everything goes swimmingly until my loop reads the number 2573. All of a sudden it starts spitting out random and huge numbers.
For example, what should be
...
1831
2229
2406
2637
2609
2573
2523
2247
...
becomes
...
1831
2229
2406
2637
2609
0xDB00000A
0xC7000009
0xB2000008
...
If you think those hex numbers look suspicious, you're right. Turns out the hex values for the values that were changed are really familiar:
2573 -> 0xA0D
2523 -> 0x9DB
2247 -> 0x8C7
So apparently this number 2573 causes my stream pointer to gain a byte. This remains until the next dataset is loaded and parsed, and god forbid it contain a number 2573. I have checked a number of spots where this happens, and each one I've checked began on 2573.
I admit I'm not so talented in the world of C. What could cause this is completely and entirely opaque to me.

You don't specify how you obtained the bytes in memory (pointed to by stream), nor what platform you're running on, but I wouldn't be surprised to find your on Windows, and you used the C stdio library call fopen(filename "r"); Try using fopen(filename, "rb");. On Windows (and MS-DOS), fopen() translates MS-DOS line endings "\r\n" (hex 0x0D 0x0A) in the file to Unix style "\n", unless you append "b" to the file mode to indicate binary.

A couple of irrelevant points.
sizeof(*dataset) doesn't do what you think it does.
There is no need to use seek on every read
I don't understand how you are calling a function that only takes one parameter but you are giving it two (or at least I don't understand why your compiler doesn't object)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight