Pack file names and unpack from archive? - c

I was asked to create an archive file which would be used to pack in multiple files into and unpack them as well. I noticed that for file names that that went over the allotted buffer size of 20 would give me problems. While I did find a working solution? I feel like there were better ways to go about it.
This is the method that I am using to store the file name in the archive and pad to the right with nulls up to 20 bytes
/*
copies the original name using memcpy to preserve it for deletion purposes
calculates the length of string in an if-else statement for padding purposes
truncates the file name in argv[i] if file name is too long ( > 20 bytes )
if the file is not long enough ( != 20 bytes) pads with '\0'
*/
memcpy(original_name,argv[i],strlen(argv[i]));
original_name[strlen(argv[i])] = '\0';
if(strlen(argv[i]) > 20){
argv[i][20] = '\0';
fputs(argv[i], archive);
} else {
fputs(argv[i],archive);
for ( j = 0; j < 20-strlen(argv[i]); j++)
{
fputc('\0', archive);
}
}
and this is the method I am using to extract the file size that is followed by the file name
/*
copies the first four bytes from the buffer into an int which will be used as the size of the file
opens a new file with the file name taken from the last 20 bytes of the buffer
checks if the file was opened properly
*/
memcpy(&size_of_file,&buff[0],sizeof(size_of_file));
memcpy(file_name,&buff[4],20*sizeof(char));
file_name[20] = '\0';
new_file = fopen(file_name,"wb+");
if_opened(new_file);

Related

Can't read real numbers from Yale Bright Star Catalog

I'm currently trying to read some star data from the BSC. I've managed to read in the header and that shows up more or less correct, but I'm having trouble reading in the star data itself. The specification states that values are stored as 4/8-byte "Real" numbers, which I assumed meant floats/doubles, but the Ascension and Declination I get are all wrong, a good bit above the trillions for one and zero for the other. The magnitude is also wrong, despite it just being an integer, which I could read fine in the header. Here's and image of the output thus far. Any know what I'm doing wrong?
Alright, after some more testing, I managed to solve my problem. The crucial step was to abandon the binary file altogether and use the ASCII file instead. I had some problems reading from it before due to how it was formatted, but I came up with a method that worked:
/* Struct to store all the attributes I'm interested in */
struct StarData_t{
char Name[11];
char SpType[21];
float GLON, GLAT, Vmag;
};
int main()
{
/* Allocate a list of the structs
(the BSC has 9110 entries) */
struct StarData_t stars[9110];
/* Open the catalog */
FILE *fptr = fopen("catalog", "r");
if(fptr != NULL){
/* Create a buffer for storing the star entries.
The ASCII file has one entry per line.
Each line has a max length of 197,
which becomes 199 with the newline and null terminator,
so I round up to 200. */
size_t star_size = 200;
char *star_buffer;
star_buffer = (char *)malloc(star_size * sizeof(char));
/* Create a buffer for reading in the numbers.
The catalog has no numbers longer than 6 characters,
So I allocate 7 to account for the newline. */
char data_buffer[7];
/* For each entry in the BSC... */
for(int i = 0; i < 9110; i++){
/* Read the line to the buffer */
getline(&star_buffer, &star_size, fptr);
/* And put the data in the matching index,
Using the data buffer to create the floats */
// GLON
strncpy(data_buffer, &(star_buffer[90]), 6);
data_buffer[6] = '\0';
stars[i].GLON = fmod(atof(data_buffer)+180, 360)-180;
// GLAT
strncpy(data_buffer, &(star_buffer[96]), 6);
data_buffer[6] = '\0';
stars[i].GLAT = atof(data_buffer);
// Vmag
strncpy(data_buffer, &(star_buffer[102]), 5);
data_buffer[5] = '\0';
stars[i].Vmag = atof(data_buffer);
// Name
strncpy(stars[i].Name, &(star_buffer[4]), 10);
stars[i].Name[10] = '\0';
// Spectral Type
strncpy(stars[i].SpType, &(star_buffer[127]), 20);
stars[i].SpType[20] = '\0';
printf("Name: %s, Long: %7.2f, Lat: %6.2f, Vmag: %4.2f, SpType: %s\n", stars[i].Name, stars[i].GLON, stars[i].GLAT, stars[i].Vmag, stars[i].SpType);
}
free(star_buffer);
}
}
Hope this is useful!

How does fread() in C work inside a for loop?

I am new to C programming, but I need it to read some binary file which I describe below.
The India Meteorological Department (IMD) has provided historical weather data in .GRD files in their website. They have also provided sample C code to read those files. From their sample C code, I have written the following code that extracts the daily minimum temperatures on 15 April 1980 recorded on a 31x31 grid over India.
/* This program reads binary data for 365/366 days and writes in ascii file. */
#include <stdio.h>
int main() {
float t[31][31];
int i,j ,k;
FILE *fin,*fout;
fin = fopen("C:\\New folder\\Mintemp_MinT_1980.GRD","rb"); // Input file
fout = fopen("C:\\New folder\\MINT15APR1980.TXT","w"); // Output file
fprintf(fout,"Daily Minimum Tempereture for 15 April 1980\n");
if(fin == NULL) {
printf("Can't open file");
return 0;
}
if(fout == NULL) {
printf("Can't open file");
return 0;
}
for(k=0 ; k<366 ; k++) {
fread(&t,sizeof(t),1,fin);
if(k == 105) {
for(i=0 ; i < 31 ; i++) {
fprintf(fout,"\n") ;
for(j=0 ; j < 31 ; j++)
fprintf(fout,"%6.2f",t[i][j]);
}
}
}
fclose(fin);
fclose(fout);
return 0;
}
/* end of main */
The file Mintemp_MinT_1980.GRD can be downloaded from the IMD website by selecting the year as 1980 against Minimum Temperature.
What I don't understand is that how the fread() function actually works in the line fread(&t,sizeof(t),1,fin) within the loop for(k=0 ; k<366 ; k++). In plain sight, the arguments of fread() here do not depend on the looping variable k, and so it should read the same data to the matrix t[31][31] for every k. However, I have checked that, surprisingly, the data extracted by this program are different for different values of k in the line if(k == 105), i.e., the data extracted for k == 105 and k == 32 are different, for example.
I would very much appreciate if one can please explain the above.
Files contain sequential data. All the file operators are based on the premise that whatever you do to a file, you'll generally be doing it in a sequential way.
So when you read data, and then read more data, you will be getting sequential chunks of the file. The both the FILE datatype and the operating system itself do a number of things for you, including keeping track of your current position in the file and doing block buffering in memory to improve performance.
If you wanted to reread the same data over, or skip around in the file, you would need to use fseek() to change positions in the file before doing your next read.

gnu FORTRAN unformatted file record markers stored as 64-bit width?

I have a legacy code and some unformatted data files that it reads, and it worked with gnu-4.1.2. I don't have access to the method that originally generated these data files. When I compile this code with a newer gnu compiler (gnu-4.7.2) and attempt to load the old data files on a different computer, it is having difficulty reading them. I start by opening the file and reading in the first record which consists of three 32-bit integers:
open(unit, file='data.bin', form='unformatted', status='old')
read(unit) x,y,z
I am expecting these three integers here to describe x,y,z spans so that next it can load a 3D matrix of float values with those same dimensions. However, instead it's loading a 0 for the first value, then the next two are offset.
Expecting:
x=26, y=127, z=97 (1A, 7F, 61 in hex)
Loaded:
x=0, y=26, z=127 (0, 1A, 7F in hex)
When I checked the data file in a hex editor, I think I figured out what was happening.
The first record marker in this case has a value of 12 (0C in hex) since it's reading three integers at 4 bytes each. This marker is stored both before and after the record. However, I notice that the 32bits immediately after each record marker is 00000000. So either the record markers are treated as 64bit integers (little-Endian) or there is a 32-bit zero padding after each record marker. Either way, the code generated with the new compiler is reading the record markers as 32-bit integers and not expecting any padding. This effectively intrudes/corrupts the data being read in.
Is there an easy way to fix this non-portable issue? The old and new hardware are 64 bit architecture and so is the executable I compiled. If I try to use the older compiler version again will it solve the problem, or is it hardware dependent? I'd prefer to use the newer compilers because they are more efficient, and I really don't want to edit the source code to open all the files as access='stream' and manually read in a trailing 0 integer after each record marker, both before and after each record.
P.S. I could probably write a C++ code to alter the data files and remove these zero paddings if there is no easier alternative.
See the -frecord-marker= option in the gfortran manual. With -frecord-marker=8 you can read the old style unformatted sequential files produced by older versions of gfortran.
Seeing as how Fortran doesn't have a standardization on this, I opted to convert the data files to a new format that uses 32-bit wide record lengths instead of 64-bit wide. In case anyone needs to do this in the future I've included some Visual C++ code here that worked for me and should be easily modifiable to C or another language. I have also uploaded a Windows executable (fortrec.zip) here.
CFile OldFortFile, OutFile;
const int BUFLEN = 1024*20;
char pbuf[BUFLEN];
int i, iIn, iRecLen, iRecLen2, iLen, iRead, iError = 0;
CString strInDir = "C:\folder\";
CString strIn = "file.dat";
CString strOutDir = "C:\folder\fortnew\"
system("mkdir \"" + strOutDir + "\""); //create a subdir to hold the output files
strIn = strInDir + strIn;
strOut = strOutDir + strIn;
if(OldFortFile.Open(strIn,CFile::modeRead|CFile::typeBinary)) {
if(OutFile.Open(strOut,CFile::modeCreate|CFile::modeWrite|CFile::typeBinary)) {
while(true) {
iRead = OldFortFile.Read(&iRecLen, sizeof(iRecLen)); //Read the record's raw data
if (iRead < sizeof(iRecLen)) //end of file reached
break;
OutFile.Write(&iRecLen, sizeof(iRecLen));//Write the record's raw data
OldFortFile.Read(&iIn, sizeof(iIn));
if (iIn != 0) {//this is the padding we need to ignore, ensure it's always zero
//Padding not found
iError++;
break;
}
i = iRecLen;
while (i > 0) {
iLen = (i > BUFLEN) ? BUFLEN : i;
OldFortFile.Read(&pbuf[0], iLen);
OutFile.Write(&pbuf[0], iLen);
i -= iLen;
}
if (i != 0) { //Buffer length mismatch
iError++;
break;
}
OldFortFile.Read(&iRecLen2, sizeof(iRecLen2));
if (iRecLen != iRecLen2) {//ensure we have reached the end of the record proeprly
//Record length mismatch
iError++;
break;
}
OutFile.Write(&iRecLen2, sizeof(iRecLen));
OldFortFile.Read(&iIn, sizeof(iIn));
if (iIn != 0) {//this is the padding we need to ignore, ensure it's always zero
//Padding not found
break;
}
}
OutFile.Close();
OldFortFile.Close();
}
else { //Could not create the ouput file.
OldFortFile.Close();
return;
}
}
else { //Could not open the input file
}
if (iError == 0)
//File successfully converted
else
//Encountered error

Create an array of values from different text files in C

I'm working in C on 64-bit Ubuntu 14.04.
I have a number of .txt files, each containing lines of floating point values (1 value per line). The lines represent parts of a complex sample, and they're stored as real(a1) \n imag(a1) \n real(a2) \n imag(a2), if that makes sense.
In a specific scenario there are 4 text files each containing 32768 samples (thus 65536 values), but I need to make the final version dynamic to accommodate up to 32 files (the maximum samples per file would not exceed 32768 though). I'll only be reading the first 19800 samples (depending on other things) though, since the entire signal is contained in those 39600 points (19800 samples).
A common abstraction is to represent the files / samples as a matrix, where columns represent return signals and rows represent the value of each signal at a sampling instant, up until the maximum duration.
What I'm trying to do is take the first sample from each return signal and move it into an array of double-precision floating point values to do some work on, move on to the second sample for each signal (which will overwrite the previous array) and do some work on them, and so forth, until the last row of samples have been processed.
Is there a way in which I can dynamically open files for each signal (depending on the number of pulses I'm using in that particular instance), read the first sample from each file into a buffer and ship that off to be processed. On the next iteration, the file pointers will all be aligned to the second sample, it would then move those into an array and ship it off again, until the desired amount of samples (19800 in our hypothetical case) has been reached.
I can read samples just fine from the files using fscanf:
rx_length = 19800;
int x;
float buf;
double *range_samples = calloc(num_pulses, 2 * sizeof(range_samples));
for (i=0; i < 2 * rx_length; i++){
x = fscanf(pulse_file, "%f", &buf);
*(range_samples) = buf;
}
All that needs to happen (in my mind) is that I need to cycle both sample# and pulse# (in that order), so when finished with one pulse it would move on to the next set of samples for the next pulse, and so forth. What I don't know how to do is to somehow declare file pointers for all return signal files, when the number of them can vary inbetween calls (e.g. do the whole thing for 4 pulses, and on the next call it can be 16 or 64).
If there are any ideas / comments / suggestions I would love to hear them.
Thanks.
I would make the code you posted a function that takes an array of file names as an argument:
void doPulse( const char **file_names, const int size )
{
FILE *file = 0;
// declare your other variables
for ( int i = 0; i < size; ++i )
{
file = fopen( file_names[i] );
// make sure file is open
// do the work on that file
fclose( file );
file = 0;
}
}
What you need is a generator. It would be reasonably easy in C++, but as you tagged C, I can imagine a function, taking a custom struct (the state of the object) as parameter. It could be something like (pseudo code) :
struct GtorState {
char *files[];
int filesIndex;
FILE *currentFile;
};
void gtorInit(GtorState *state, char **files) {
// loads the array of file into state, set index to 0, and open first file
}
int nextValue(GtorState *state, double *real, double *imag) {
// read 2 values from currentFile and affect them to real and imag
// if eof, close currentFile and open files[++currentIndex]
// if real and imag were found returns 0, else 1 if eof on last file, 2 if error
}
Then you main program could contain :
GtorState state;
// initialize the list of files to process
gtorInit(&state, files);
double real, imag);
int cr;
while (0 == (cr = nextValue(&state, &real, &imag)) {
// process (real, imag)
}
if (cr == 2) {
// process (at least display) error
}
Alternatively, your main program could iterate the values of the different files and call a function with state analog of the above generator that processes the values, and at the end uses the state of the processing function to get the results.
Tried a slightly different approach and it's working really well.
In stead of reading from the different files each time I want to do something, I read the entire contents of each file into a 2D array range_phase_data[sample_number][pulse_number], and then access different parts of the array depending on which range bin I'm currently working on.
Here's an excerpt:
#define REAL(z,i) ((z)[2*(i)])
#define IMAG(z,i) ((z)[2*(i)+1])
for (i=0; i<rx_length; i++){
printf("\t[%s] Range bin %i. Samples %i to %i.\n", __FUNCTION__, i, 2*i, 2*i+1);
for (j=0; j<num_pulses; j++){
REAL(fft_buf, j) = range_phase_data[2*i][j];
IMAG(fft_buf, j) = range_phase_data[2*i+1][j];
}
printf("\t[%s] Range bin %i done, ready to FFT.\n", __FUNCTION__, i);
// do stuff with the data
}
This alleviates the need to dynamically allocate file pointers and in stead just opens the files one at a time and writes the data to the corresponding column in the matrix.
Cheers.

Adding characters to the middle of a file without overwriting the existing characters in C

I am quite rusty with C and system calls and pointers in general, so this is a good refresher exercise to get back on track. All I need to do is, given a file such as this:
YYY.txt: "somerandomcharacters"
Change it to be like this:
YYY.txt: "somerandomabcdefghijklmnopqrstuvwxyzcharacters"
So all that is done is some characters added to the middle of the file. Obviously, this is quite simple, but in C you must keep track and manage the size of the file in advance before adding the additional characters.
Here is my naive try:
//(Assume a file called YYY.txt exists and an int YYY is the file descriptor.)
char ToBeInserted[26] = "abcdefghijklmnopqrstuvwxyz";
//Determine the current length of YYY
int LengthOfYYY = lseek(YYY, 0, 2);
if(LengthOfYYY < 0)
printf("Error upon using lseek to get length of YYY");
//Assume we want to insert at position 900 in YYY.txt, and length of YYY is over 1000.
//1.] Keep track of all characters past position 900 in YYY and store in a char array.
lseek(YYY, 900, 0); //Seeks to position 900 in YYY, so reading begins there.
char NextChar;
char EverythingPast900[LengthOfYYY-900];
int i = 0;
while(i < (LengthOfYYY - 900)) {
int NextRead = read(YYY, NextChar, 1); //Puts next character from YYY in NextChar
EverythingPast900[i] = NextChar;
i++;
}
//2.] Overwrite what used to be at position 900 in YYY:
lseek(YYY, 900, 0); //Moves to position 900.
int WriteToYYY = write(YYY, ToBeInserted, sizeof(ToBeInserted));
if(WriteToYYY < 0)
printf("Error upon writing to YYY");
//3.] Move to position 900 + length of ToBeInserted, and write the characters that were saved.
lseek(YYY, 926, 0);
int WriteMoreToYYY = write(YYY, EverythingPast900, sizeof(EverythingPast900));
if (WriteMoreToYYY < 0) {
printf("Error writing the saved characters back into YYY.");
}
I think the logic is sound, mostly, although there are much better ways to do it in C. I need help on my C pointers, basically, as well as the UNIX system calls. Does anyone mind walking me through how to properly implement this in C?
That's the basic idea. If you had to really conserve RAM and the file was a lot bigger you'd want to copy block by block in reverse order. But the simpler way is to read the entire thing into memory and rewrite the entire file.
also, I prefer the stream functions: fopen, fseek, fread. But the file descriptor method works.

Resources