Fragmenting and Un-fragmenting files in C - c

I wanted to take a file (text or binary) and fragment it into small pieces of a certain size (about 250-500kB), randomize the order of the fragments, and put it into another temporary fragmented file.
The un-fragmenting would then take the fragmented file, extract the pieces, put them in order and allow the original file to be intact.
This would be very easy for simple text-based ASCII files as you could use the C library functions (like sscanf) for formating/parsing the information. The one file could have a format then like
(#### <fragment #> <fragment> ...)
However, I am not sure how one would do something like that with binary files.
I know one easy solution is to use separate files for the fragments like <.part1, .part2> files but this would be a bit ugly and wouldn't scale well to much larger files. It would be a lot better to just store it in one file.
Thanks a lot.

Doing this with binary files is easiest of all, and also the fastest and most reliable. Your fragment files need a simple segment record that gives you the offset in the original file and length of the segment. The record might look like this:
typedef struct _Fragment
{
unsigned long offset;
unsigned long length;
} Fragment;
Writing your file would go like this:
Fragment fragment;
FILE *outFile;
unsigned long segmentOffset, segmentLength;
char segmentData[MAXSEGMENTLENGTH];
outFile = fopen(fileName, "wb");
while (ReadNextSegment(segmentData, &segmentOffset, &segmentLength))
{
fragment.offset = segmentOffset;
fragment.length = segmentLength;
fwrite(header, sizeof(fragment), 1, outFile);
fwrite(segmentData, 1, segmentLength, outFile);
}
fclose(outFile);
Reassembling the file is accomplished by reversing the process. Read each Fragment record, then read the following data with fread using fragment.length, then position to the correct offset in the target file using the fseek function and fragment.offset, and then write it using fwrite.

Try to use binary data only. In you fragmented file, follow the structure:
OFFSET SIZE DESCRIPTION
0 4 BLOCK NUMBER
4 4 BLOCK SIZE IN BYTES
8 ? BLOCK DATA
Define a header structure:
typedef struct hdr
{
uint32_t number;
uint32_t size;
} hdr_t;
Code to work with it can look like:
void file_append(FILE *f, size_t block, size_t size, const void *data)
{
hdr_t hdr;
hdr.number = block;
hdr.size = size;
fwrite(&hdr, sizeof(hdr), 1, f);
fwrite(data, size, 1, f);
}
And reading the data:
void file_read_chunk(FILE *f, size_t *block, size_t *size, void **data)
{
hdr_t hdr;
fread(&hdr, sizeof(hdr), f);
*block = hdr.number;
*size = hdr.size;
*data = malloc(hdr.size);
fread(*data, hdr.size, 1, f);
}

Related

How can I read a large set of data from a file into either a pointer to a structure or array of a structure in C

I have a data file with a known key, that is, it has many entries (devices) with the same properties and I have this structure in code to capture it.
struct deviceData{
int id;
char serial[10];
float temperature;
float speed;
long timestamp;
}
struct deviceData fileItems;
It's 4 bytes for the ID, 10 bytes for the serial code, 4 bytes for both the temperature and speed and 8 bytes for the timestamp. 30 bytes in total.
What I would like to achieve is to be able to read all those entries and run a calculation in the quickest way I can.
What I initially thought of doing was to simply create a giant array to capture all the entries but that causes errors.
Secondly I thought of allocating space from a pointer to that structure and reading the whole file to that. That worked in execution but I had trouble processing the data. Possibly a gap in fundamentals on my part.
The way I'm currently looking at is to loop through readings where I capture a single entry using fread(), process that and then move the file to put the next entry into the buffer.
Something like this:
fread(&fileItems, 30, 1, filename)
What happens though is that when I view what actually gets read I see that the ID and the serial code were read correctly but the following data points are garbage. Reading a little bit about it I came across something about padding which I don't fully understand but the fix seems to be to make my char array 100 which seems to work for the first entry but I suspect it's causing problems with subsequent readings because it's throwing my calculations off.
I'm kind of at a wall here because every strategy I try seems to have something that works strangely. If I could at least be pointed in the right direction I'll at least know I'm putting effort in the right thing.
If you are just wanting to process a group of data record by record, you probably can utilize the methodology of defining a structure, then reading the data from a file into the structure, and then processing the data. To make things consistent, it would make sense to store the data as a structure in a binary file. Following is a code snippet creating some sample data and then processing it following the spirit of your project.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct deviceData{
int id;
char serial[10];
float temperature;
float speed;
long timestamp;
};
struct deviceData fileItems;
void create_data(char * file_name)
{
FILE* fp = fopen(file_name, "wb");
if(!fp)
{
printf("\n\tFile open error\n");
return;
}
for (int i = 0; i < 10; i++)
{
fileItems.id = i + 20000 + i * 3;
sprintf(fileItems.serial, "SN%d", fileItems.id);
fileItems.temperature = 166.0 + i;
fileItems.speed = 2400.0;
fileItems.timestamp = 20220830;
fwrite(&fileItems, sizeof(struct deviceData), 1, fp);
}
fclose(fp);
}
void read_data(char *file_name)
{
FILE* fp = fopen(file_name, "rb");
if(!fp)
{
printf("\n\tFile open error\n");
return;
}
while(fread(&fileItems, sizeof(struct deviceData), 1, fp))
{
printf("ID. . . . . . .: %d\n", fileItems.id);
printf("Serial number. .: %s\n", fileItems.serial);
printf("Temparature. . .: %f\n", fileItems.temperature);
printf("Speed. . . . . .: %f\n", fileItems.speed);
printf("Timestamp. . . .: %ld\n", fileItems.timestamp);
}
}
int main()
{
create_data("device.dat"); /* Create some sample data to be read */
read_data("device.dat"); /* Read and print the data to the terminal */
return 0;
}
The storage of device data probably would occur in some other monitor program, but for this snippet a one-off function was included to produce some data to process.
Analyze that code and see if it meets the spirit of your project.

Quantization noise during writing wav file using C

I'm just trying to write a library for read and write Wav files (just need it for audio processing), just as test, I read samples from a Wave File convert them to double (just standardize them to -1 ~ 1), and do nothing but transform them back to integer, according to the bit per sample (assume the Wav file have N bits per sample, I divided them through 2^(N-1)-1 and multiply with the same factor after to restore it)
But the problem is, I get a wav file with background noise (id say it seems like quantisization noise) and I don't know why, can you help me find it out?
the library is here: https://pastebin.com/mz5TWMPN
the header file is: https://pastebin.com/Lr2tbmnv
and a demo main function is like:
#include <stdio.h>
#include <math.h>
#include "wavreader.h"
#define FRAMESIZE 512
int main()
{
FILE *fh;
FILE *fhWrite;
struct WavHeader * header;
struct WavHeader * newHeader;
double frame[FRAMESIZE];
int iBytesWritten;
int i;
char test;
fh = fopen("D:/ArbeitsOrdner/advanced_pacev/AudioSample/spfg.wav", "rb+");
if (fh == NULL)
{
printf("Failed to open organ.wav\n");
return 1;
}
fhWrite = fopen("D:/ArbeitsOrdner/MyC/test_organ.wav", "wb+");
if (fhWrite == NULL)
{
printf("Failed to create test_organ.wav\n");
return 1;
}
header = readWaveHeader(fh);
printWaveHeader(header);
newHeader = createWaveHeader(header->iChannels, header->iSampleRate, header->iBitsPerSample);
WaveWriteHeader(fhWrite, newHeader);
while (WaveReadFrame(fh, header, FRAMESIZE, frame) != -1)
{
iBytesWritten = WaveWriteFrame(fhWrite, newHeader, FRAMESIZE, frame);
if (iBytesWritten < 0)
{
printf("Error occured while writing to new file\n");
return 1;
}
}
WaveWriteHeader(fhWrite, newHeader);
fclose(fhWrite);
fclose(fh);
return 0;
}
thx for viewing this post. I have found the problem myself, it is that, i used char instead of unsigned char for raw data (raw bytes). By converting them to int16 or int32, i haven't considered the sign bit. that means they are not exact the same value during convertion as it except to be.
the solution for this is:
either stay with signed char and use:
buffer[i] & 0xff
to get the correct raw data for convertion, or change the char types into unsigned char:
unsgiend char * buffer;

Encoding a wav file using G711 encoding

I am trying to encode a PCM uncompressed Wav file using A law encoding.
I have written a function which takes in the 16 bit PCM data and returns 8 bit encoded data..After encoding, my file does not play properly..I feel that there is something I am not doing correctly to handle the files.I have separated the header information of the file and written the same header to output file.
// Code for compressing data is below
short inbuff;
unsigned char outbuff;
while (!feof(inp))
{
fread(inbuff, 2 , BUFSIZE, inp);
for (i=0; i < BUFSIZE; ++i)
{
temp_16 = inbuff[i];
temp_8 = Lin2Alaw(temp_16);
outbuff[i] = temp_8;
}
fwrite(outbuff, 1 , (BUFSIZE), out);
}
You are writing the data with the same header, which means that any audio program will think the data inside the WAV file is still PCM. Check the file format for WAV and change it accordingly.
Mainly you need to change audio format at 0x0014-0x0015 to a-law and other values also to mark the proper bytes per second, block size etc.
Easiest way to make sure they're correct might be to convert the file with an audio editor and then checking for the differences in the values.
How did your code even compile when you are not using arrays? Even so, your use of feof isn't good, please see Why is “while ( !feof (file) )” always wrong?
#include <stdio.h>
#define BUFSIZE 512
int main(void) {
short inbuff[BUFSIZE]; // must be an array
unsigned char outbuff[BUFSIZE]; // must be an array
size_t bread, i;
unsigned char temp_8;
short temp_16;
FILE *inp, *out;
// ... open the file
// ... transcribe the header
// rewrite the data
while ((bread = fread(inbuff, 2 , BUFSIZE, inp)) > 0)
{
for (i=0; i < bread; ++i) // only the data actually read
{
temp_16 = inbuff[i];
temp_8 = Lin2Alaw(temp_16);
outbuff[i] = temp_8;
}
fwrite(outbuff, 1 , bread, out); // only the data actually read
}
// ... finish off and close the file
return 0;
}
I notice too you are using signed short for the 16-bit data - should that be unsigned short?
See the format of wave file is at
http://www.topherlee.com/software/pcm-tut-wavformat.html
Now check all bytes of header and make sure all information about bit rate,sample rate etc are correct.
If your code for compressing is correct then issue should be with header file only.

on linux , use the compress() and uncompress() functions of ZLIB,it sometimes return Z_BUFFER_ERROR

I want to test the compression and decompression functions: compress () uncompresss ()provides by the ZLIB library ; wrote the following code to open a file that already exists, read in a while () loop insidetake the contents of the file already exists, the compression portion write to a single file, the uncompress part written to another file, the code shown below, the size of the file that already exists (originalFile) about 78K , the first time to enter while() loop compression with decompression of the return value is 0, so that the first entry is successful, but the second and the next a few times to enter, return values ​​are -5 (according to official documents, buffered output size is not large to contain the content), why ? Where was wrong? pre-thank you very much!
enter code here
#include <string>
#include <time.h>
#include <stdio.h>
#include <iostream>
#include <string.h>
#include "zlib.h"
int main()
{
unsigned long int fileLength;
unsigned long int readLength;
unsigned long int compressBufLength;
unsigned long int uncompressLength;
unsigned long int offset;
unsigned char *readBuf = new unsigned char[512];//the readbuf of the exist file content
unsigned char *compressBuf = new unsigned char[512];//the compress buffer
unsigned char *uncompressBuf = new unsigned char[512];//the uncompress content buffer
FILE *originalFile = fopen("/lgw150/temp/src/lg4/original.lg4","a+");//the exist file
FILE *compressedFile = fopen("/lgw150/temp/src/lg4/compressed.lg4","a+");//compressfile
FILE *uncompressFile = fopen("/lgw150/temp/src/lg4/uncompressed.lg4","a+");//
fseek(originalFile,0,2);
fileLength = ftell(originalFile);
offset = 0;//
while(offset <fileLength)//
{
printf("offset=%lu;fileLength=%lu\n",offset,fileLength);
memset(readBuf,0,512);
memset(compressBuf,0,512);
memset(uncompressBuf,0,512);
fseek(originalFile,offset,0);//
readLength = fread(readBuf,sizeof(char),512,originalFile);
offset += readLength;//
int compressValue = compress(compressBuf,&compressBufLength,readBuf,readLength);
int fwriteValue = fwrite(compressBuf,sizeof(char),compressBufLength,compressedFile);//
printf("compressValue = %d;fwriteLength = %d;compressBufLength=%lu;readLength = %lu\n",compressValue,fwriteValue,compressBufLength,readLength);
int uncompressValue = uncompress(uncompressBuf,&uncompressLength,compressBuf,compressBufLength);//
int fwriteValue2= fwrite(uncompressBuf,sizeof(char),uncompressLength,uncompressFile);//
}
fseek(originalFile,0,0);
fseek(compressedFile,0,0);
fseek(uncompressFile,0,0);
if(originalFile != NULL)
{
fclose(originalFile);
originalFile = NULL;
}
if(compressedFile != NULL)
{
fclose(compressedFile);
compressedFile = NULL;
}
if(uncompressFile != NULL)
{
fclose(uncompressFile);
uncompressFile = NULL;
}
delete[] readBuf;
delete[] compressBuf;
delete[] uncompressBuf;
return 0;
}
enter code here
First off, the reason you're getting "buffered output size is not large enough to contain the content" is because the buffered output size is not large enough to contain the content. If you give incompressible data to compress it will expand the data. So 512 bytes is not large enough if the input is 512 bytes. Use the compressBound() function for the maximum expansion for sizing the compression output buffer.
Second, compressing 512 bytes at a time is silly. You're not giving the compression algorithm enough data to work with in order to get the mileage you should be getting from the compression. Your application of reading 512 byte chunks at a time should not be using compress() and uncompress(). You should be using deflate() and inflate(), which were written for this purpose -- to feed chunks of data through the compression and decompression engines.
You need to read zlib.h. All of it. You can also look at the example (after reading zlib.h).

How to get the uncompressed size of an LZMA2 file (.xz / liblzma)

I'm looking for a way to get the uncompressed stream size of an LZMA2 / .xz file compressed with the xz utility.
I'm using liblzma from Windows/Linux for this task, so I guess I'm looking for some C/C++ API in liblzma that will do the trick.
I think I've found a solution.
This is a very crude code sample, but seems to work fine.
I'm assuming I have a do_mmap() function that maps the entire file as read-only into memory, and returns the total size mapped.
This can naturally be adapted to use read/fread/ReadFile or any other File API.
extern size_t get_uncompressed_size(const char *filename)
{
lzma_stream_flags stream_flags;
int file_size;
const uint8_t *data = (uint8_t *) do_mmap(filename, &file_size);
// 12 is the size of the footer per the file-spec...
const uint8_t *footer_ptr = data + file_size - 12;
// Something is terribly wrong
if (footer_ptr < data) {
do_unmap((void *)data, file_size);
return -1;
}
// Decode the footer, so we have the backward_size pointing to the index
lzma_stream_footer_decode(&stream_flags, (const uint8_t *)footer_ptr);
// This is the index pointer, where the size is ultimately stored...
const uint8_t *index_ptr = footer_ptr - stream_flags.backward_size;
// Allocate an index
lzma_index *index = lzma_index_init(NULL);
uint64_t memlimit;
size_t in_pos = 0;
// decode the index we calculated
lzma_index_buffer_decode(&index, &memlimit, NULL, index_ptr, &in_pos, footer_ptr - index_ptr);
// Just make sure the whole index was decoded, otherwise, we might be
// dealing with something utterly corrupt
if (in_pos != stream_flags.backward_size) {
do_unmap((void *)data, file_size);
lzma_index_end(index, NULL);
return -1;
}
// Finally get the size
lzma_vli uSize = lzma_index_uncompressed_size(index);
lzma_index_end(index, NULL);
return (size_t) uSize;
}
Having downloaded the source from sourceforge and had a look here, I quoted this from the main header file LzmaLib.h
/*
LzmaUncompress
--------------
In:
dest - output data
destLen - output data size
src - input data
srcLen - input data size
Out:
destLen - processed output size
srcLen - processed input size
Returns:
SZ_OK - OK
SZ_ERROR_DATA - Data error
SZ_ERROR_MEM - Memory allocation arror
SZ_ERROR_UNSUPPORTED - Unsupported properties
SZ_ERROR_INPUT_EOF - it needs more bytes in input buffer (src)
*/
MY_STDAPI LzmaUncompress(unsigned char *dest, size_t *destLen, const unsigned char *src, SizeT *srcLen,
const unsigned char *props, size_t propsSize);
It looks that destLen is the size of the data that is uncompressed.

Resources