Filling a 2GiB file with 0s in C - c

I am about to do some data processing in C, and the processing part is working logically, but I am having a strange file problem. I conveniently have 32-bits of numbers to consider, so I need a file of 32-bits of 0s, and then I will change the 0 to 1 if something exists in a finite field.
My question is: What is the best way to make a file with all "0s" in C?
What I am currently doing, seems to make sense but is not working. I currently am doing the following, and it doesn't stop at the 2.4GiB mark. I have no idea what's wrong or if there's a better way.
#include <stdlib.h>
#include <stdio.h>
typedef uint8_t u8;
typedef uint32_t u32;
int main (int argc, char **argv) {
u32 l_counter32 = 0;
u8 l_ubyte = 0;
FILE *f_data;
f_data = fopen("file.data", "wb+");
if (f_data == NULL) {
printf("file error\n");
return(0);
}
for (l_counter32 = 0; l_counter32 <= 0xfffffffe; l_counter32++) {
fwrite(&l_ubyte, sizeof(l_ubyte), 1, f_data);
}
fwrite(&l_ubyte, sizeof(l_ubyte), 1, f_data); //final byte at 0xffffffff
fclose(f_data);
}
I increment my counter in the loop to be 0xFFFFFFFe, so that it doesn't wrap around and run forever.. I haven't waited for it to stop actually, I just keep checking on the disk via ls -alF and when it's larger than 2.4GiB, I stop it. I checked sizeof(l_ubyte), and it is indeed 8-bits.
I feel that I must be missing some mundane detail.

You are counting up to 0xffffffff, which is equal to 4,294,967,295. You want to count up to 0x80000000 for exactly 2 GB of data.

The faster way to create initalize a file with zeroes (alias \0 null bytes) is using truncate()/ftruncate(). See man page here

Related

How can I read a large set of data from a file into either a pointer to a structure or array of a structure in C

I have a data file with a known key, that is, it has many entries (devices) with the same properties and I have this structure in code to capture it.
struct deviceData{
int id;
char serial[10];
float temperature;
float speed;
long timestamp;
}
struct deviceData fileItems;
It's 4 bytes for the ID, 10 bytes for the serial code, 4 bytes for both the temperature and speed and 8 bytes for the timestamp. 30 bytes in total.
What I would like to achieve is to be able to read all those entries and run a calculation in the quickest way I can.
What I initially thought of doing was to simply create a giant array to capture all the entries but that causes errors.
Secondly I thought of allocating space from a pointer to that structure and reading the whole file to that. That worked in execution but I had trouble processing the data. Possibly a gap in fundamentals on my part.
The way I'm currently looking at is to loop through readings where I capture a single entry using fread(), process that and then move the file to put the next entry into the buffer.
Something like this:
fread(&fileItems, 30, 1, filename)
What happens though is that when I view what actually gets read I see that the ID and the serial code were read correctly but the following data points are garbage. Reading a little bit about it I came across something about padding which I don't fully understand but the fix seems to be to make my char array 100 which seems to work for the first entry but I suspect it's causing problems with subsequent readings because it's throwing my calculations off.
I'm kind of at a wall here because every strategy I try seems to have something that works strangely. If I could at least be pointed in the right direction I'll at least know I'm putting effort in the right thing.
If you are just wanting to process a group of data record by record, you probably can utilize the methodology of defining a structure, then reading the data from a file into the structure, and then processing the data. To make things consistent, it would make sense to store the data as a structure in a binary file. Following is a code snippet creating some sample data and then processing it following the spirit of your project.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct deviceData{
int id;
char serial[10];
float temperature;
float speed;
long timestamp;
};
struct deviceData fileItems;
void create_data(char * file_name)
{
FILE* fp = fopen(file_name, "wb");
if(!fp)
{
printf("\n\tFile open error\n");
return;
}
for (int i = 0; i < 10; i++)
{
fileItems.id = i + 20000 + i * 3;
sprintf(fileItems.serial, "SN%d", fileItems.id);
fileItems.temperature = 166.0 + i;
fileItems.speed = 2400.0;
fileItems.timestamp = 20220830;
fwrite(&fileItems, sizeof(struct deviceData), 1, fp);
}
fclose(fp);
}
void read_data(char *file_name)
{
FILE* fp = fopen(file_name, "rb");
if(!fp)
{
printf("\n\tFile open error\n");
return;
}
while(fread(&fileItems, sizeof(struct deviceData), 1, fp))
{
printf("ID. . . . . . .: %d\n", fileItems.id);
printf("Serial number. .: %s\n", fileItems.serial);
printf("Temparature. . .: %f\n", fileItems.temperature);
printf("Speed. . . . . .: %f\n", fileItems.speed);
printf("Timestamp. . . .: %ld\n", fileItems.timestamp);
}
}
int main()
{
create_data("device.dat"); /* Create some sample data to be read */
read_data("device.dat"); /* Read and print the data to the terminal */
return 0;
}
The storage of device data probably would occur in some other monitor program, but for this snippet a one-off function was included to produce some data to process.
Analyze that code and see if it meets the spirit of your project.

Dumping a INT32 array into a .bin file

I have the array defined as below
INT32 LUT_OffsetValues[6][12] = {
0,180,360,540,720,900,1080,1260,1440,1620,1800,1980,
2160,2340,2520,2700,2880,3060,3240,3420,3600,3780,3960,4140,
4320,4500,4680,4860,5040,5220,5400,5580,5760,5940,6120,6300,
6480,6660,6840,7020,7200,7380,7560,7740,7920,8100,8280,8460,
8640,8820,9000,9180,9360,9540,9720,9900,10080,10260,10440,10620,
10800,10980,11160,11340,11520,11700,11880,12060,12240,12420,12600,12780
};
int main(int argc,char *argv[])
{
int var_row_index = 4 ;
int var_column_index = 5 ;
int computed_val = 0 ;
FILE *fp = NULL ;
fp = fopen("./LUT_Offset.bin","wb");
if(NULL != fp)
{
fwrite(LUT_OffsetValues,sizeof(INT32),72,fp);
fclose(fp);
}
printf("Size of Array:%d\n",sizeof(LUT_OffsetValues));
//computed_val = LUT_OffsetValues[var_row_index][var_column_index];
return 0;
}
Above is the code snippet with which I have generated the .bin file. Is that the right way of doing it?
No, it is not the right way if you plan to transfer the file to a different machine and read it as you haven't considered the Endianness. Let's say the file is:
Written in little endian machine but read in big endian machine
Written in big endian machine but read in little endian machine
It won't work for none of the cases above.
Out of the order of the bytes signaled by askinoor, that way is not generic because the reader have to now it is an INT32[6][12] when it read it
Why the useless variables var_row_index etc in your program ?
As already mentioned, when serializing data in and out of the CPU it is preferable to force network byte order. This can be done easily using functions like htonl(), which should be available on most platforms (and compile down to nothing on big endian machines).
Here's the doc from Linux:
https://linux.die.net/man/3/htonl
Also, it's not good practice to explicitly code sizes and types into your program.
Use sizeof(array[0][0]) to get the size of the element type of array, then iterate over it and use htonl() to write each element to the file.

Reading serial port faster

I have a computer software that sends RGB color codes to Arduino using USB. It works fine when they are sent slowly but when tens of them are sent every second it freaks out. What I think happens is that the Arduino serial buffer fills out so quickly that the processor can't handle it the way I'm reading it.
#define INPUT_SIZE 11
void loop() {
if(Serial.available()) {
char input[INPUT_SIZE + 1];
byte size = Serial.readBytes(input, INPUT_SIZE);
input[size] = 0;
int channelNumber = 0;
char* channel = strtok(input, " ");
while(channel != 0) {
color[channelNumber] = atoi(channel);
channel = strtok(0, " ");
channelNumber++;
}
setColor(color);
}
}
For example the computer might send 255 0 123 where the numbers are separated by space. This works fine when the sending interval is slow enough or the buffer is always filled with only one color code, for example 255 255 255 which is 11 bytes (INPUT_SIZE). However if a color code is not 11 bytes long and a second code is sent immediately, the code still reads 11 bytes from the serial buffer and starts combining the colors and messes them up. How do I avoid this but keep it as efficient as possible?
It is not a matter of reading the serial port faster, it is a matter of not reading a fixed block of 11 characters when the input data has variable length.
You are telling it to read until 11 characters are received or the timeout occurs, but if the first group is fewer than 11 characters, and a second group follows immediately there will be no timeout, and you will partially read the second group. You seem to understand that, so I am not sure how you conclude that "reading faster" will help.
Using your existing data encoding of ASCII decimal space delimited triplets, one solution would be to read the input one character at a time until the entire triplet were read, however you could more simply use the Arduino ReadBytesUntil() function:
#define INPUT_SIZE 3
void loop()
{
if (Serial.available())
{
char rgb_str[3][INPUT_SIZE+1] = {{0},{0},{0}};
Serial.readBytesUntil( " ", rgb_str[0], INPUT_SIZE );
Serial.readBytesUntil( " ", rgb_str[1], INPUT_SIZE );
Serial.readBytesUntil( " ", rgb_str[2], INPUT_SIZE );
for( int channelNumber = 0; channelNumber < 3; channelNumber++)
{
color[channelNumber] = atoi(channel);
}
setColor(color);
}
}
Note that this solution does not require the somewhat heavyweight strtok() processing since the Stream class has done the delimiting work for you.
However there is a simpler and even more efficient solution. In your solution you are sending ASCII decimal strings then requiring the Arduino to spend CPU cycles needlessly extracting the fields and converting to integer values, when you could simply send the byte values directly - leaving if necessary the vastly more powerful PC to do any necessary processing to pack the data thus. Then the code might be simply:
void loop()
{
if( Serial.available() )
{
for( int channelNumber = 0; channelNumber < 3; channelNumber++)
{
color[channelNumber] = Serial.Read() ;
}
setColor(color);
}
}
Note that I have not tested any of above code, and the Arduino documentation is lacking in some cases with respect to descriptions of return values for example. You may need to tweak the code somewhat.
Neither of the above solve the synchronisation problem - i.e. when the colour values are streaming, how do you know which is the start of an RGB triplet? You have to rely on getting the first field value and maintaining count and sync thereafter - which is fine until perhaps the Arduino is started after data stream starts, or is reset, or the PC process is terminated and restarted asynchronously. However that was a problem too with your original implementation, so perhaps a problem to be dealt with elsewhere.
First of all, I agree with #Thomas Padron-McCarthy. Sending character string instead of a byte array(11 bytes instead of 3 bytes, and the parsing process) is wouldsimply be waste of resources. On the other hand, the approach you should follow depends on your sender:
Is it periodic or not
Is is fixed size or not
If it's periodic you can check in the time period of the messages. If not, you need to check the messages before the buffer is full.
If you think printable encoding is not suitable for you somehow; In any case i would add an checksum to the message. Let's say you have fixed size message structure:
typedef struct MyMessage
{
// unsigned char id; // id of a message maybe?
unsigned char colors[3]; // or unsigned char r,g,b; //maybe
unsigned char checksum; // more than one byte could be a more powerful checksum
};
unsigned char calcCheckSum(struct MyMessage msg)
{
//...
}
unsigned int validateCheckSum(struct MyMessage msg)
{
//...
if(valid)
return 1;
else
return 0;
}
Now, you should check every 4 byte (the size of MyMessage) in a sliding window fashion if it is valid or not:
void findMessages( )
{
struct MyMessage* msg;
byte size = Serial.readBytes(input, INPUT_SIZE);
byte msgSize = sizeof(struct MyMessage);
for(int i = 0; i+msgSize <= size; i++)
{
msg = (struct MyMessage*) input[i];
if(validateCheckSum(msg))
{// found a message
processMessage(msg);
}
else
{
//discard this byte, it's a part of a corrupted msg (you are too late to process this one maybe)
}
}
}
If It's not a fixed size, it gets complicated. But i'm guessing you don't need to hear that for this case.
EDIT (2)
I've striked out this edit upon comments.
One last thing, i would use a circular buffer. First add the received bytes into the buffer, then check the bytes in that buffer.
EDIT (3)
I gave thought on comments. I see the point of printable encoded messages. I guess my problem is working in a military company. We don't have printable encoded "fire" arguments here :) There are a lot of messages come and go all the time and decoding/encoding printable encoded messages would be waste of time. Also we use hardwares which usually has very small messages with bitfields. I accept that it could be more easy to examine/understand a printable message.
Hope it helps,
Gokhan.
If faster is really what you want....this is little far fetched.
The fastest way I can think of to meet your needs and provide synchronization is by sending a byte for each color and changing the parity bit in a defined way assuming you can read the parity and bytes value of the character with wrong parity.
You will have to deal with the changing parity and most of the characters will not be human readable, but it's gotta be one of the fastest ways to send three bytes of data.

on linux , use the compress() and uncompress() functions of ZLIB,it sometimes return Z_BUFFER_ERROR

I want to test the compression and decompression functions: compress () uncompresss ()provides by the ZLIB library ; wrote the following code to open a file that already exists, read in a while () loop insidetake the contents of the file already exists, the compression portion write to a single file, the uncompress part written to another file, the code shown below, the size of the file that already exists (originalFile) about 78K , the first time to enter while() loop compression with decompression of the return value is 0, so that the first entry is successful, but the second and the next a few times to enter, return values ​​are -5 (according to official documents, buffered output size is not large to contain the content), why ? Where was wrong? pre-thank you very much!
enter code here
#include <string>
#include <time.h>
#include <stdio.h>
#include <iostream>
#include <string.h>
#include "zlib.h"
int main()
{
unsigned long int fileLength;
unsigned long int readLength;
unsigned long int compressBufLength;
unsigned long int uncompressLength;
unsigned long int offset;
unsigned char *readBuf = new unsigned char[512];//the readbuf of the exist file content
unsigned char *compressBuf = new unsigned char[512];//the compress buffer
unsigned char *uncompressBuf = new unsigned char[512];//the uncompress content buffer
FILE *originalFile = fopen("/lgw150/temp/src/lg4/original.lg4","a+");//the exist file
FILE *compressedFile = fopen("/lgw150/temp/src/lg4/compressed.lg4","a+");//compressfile
FILE *uncompressFile = fopen("/lgw150/temp/src/lg4/uncompressed.lg4","a+");//
fseek(originalFile,0,2);
fileLength = ftell(originalFile);
offset = 0;//
while(offset <fileLength)//
{
printf("offset=%lu;fileLength=%lu\n",offset,fileLength);
memset(readBuf,0,512);
memset(compressBuf,0,512);
memset(uncompressBuf,0,512);
fseek(originalFile,offset,0);//
readLength = fread(readBuf,sizeof(char),512,originalFile);
offset += readLength;//
int compressValue = compress(compressBuf,&compressBufLength,readBuf,readLength);
int fwriteValue = fwrite(compressBuf,sizeof(char),compressBufLength,compressedFile);//
printf("compressValue = %d;fwriteLength = %d;compressBufLength=%lu;readLength = %lu\n",compressValue,fwriteValue,compressBufLength,readLength);
int uncompressValue = uncompress(uncompressBuf,&uncompressLength,compressBuf,compressBufLength);//
int fwriteValue2= fwrite(uncompressBuf,sizeof(char),uncompressLength,uncompressFile);//
}
fseek(originalFile,0,0);
fseek(compressedFile,0,0);
fseek(uncompressFile,0,0);
if(originalFile != NULL)
{
fclose(originalFile);
originalFile = NULL;
}
if(compressedFile != NULL)
{
fclose(compressedFile);
compressedFile = NULL;
}
if(uncompressFile != NULL)
{
fclose(uncompressFile);
uncompressFile = NULL;
}
delete[] readBuf;
delete[] compressBuf;
delete[] uncompressBuf;
return 0;
}
enter code here
First off, the reason you're getting "buffered output size is not large enough to contain the content" is because the buffered output size is not large enough to contain the content. If you give incompressible data to compress it will expand the data. So 512 bytes is not large enough if the input is 512 bytes. Use the compressBound() function for the maximum expansion for sizing the compression output buffer.
Second, compressing 512 bytes at a time is silly. You're not giving the compression algorithm enough data to work with in order to get the mileage you should be getting from the compression. Your application of reading 512 byte chunks at a time should not be using compress() and uncompress(). You should be using deflate() and inflate(), which were written for this purpose -- to feed chunks of data through the compression and decompression engines.
You need to read zlib.h. All of it. You can also look at the example (after reading zlib.h).

Generating Random ASCII

I've Been trying to work on a very simple encryption routine , It should work like this :
-- Generate A Random Key of ASCII Characters (Just a permutation of the ascii table)
-- For Every char in the File to be encrypted , Get Its Decimal Representation(X) , Then Replace it with the char at Index X at the key.
The problem is that It corrupts some files and I Have no idea why.
Any help would be appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main()
{
int temp,used[256];
char *key,*mFile;
long i,fSize;
memset(used,0,sizeof(used));
srand(time(NULL));
FILE *pInput = fopen("Input.in","rb");
FILE *pOutput = fopen("Encrypted.out","wb");
FILE *pKeyOutput = fopen("Key.bin","wb");
if(pInput==NULL||pOutput==NULL||pKeyOutput==NULL)
{
printf("File I/O Error\n");
return 1;
}
key = (char*)malloc(255);
for(i=0;i<256;i++)
{
temp = rand()%256;
while(used[temp])
temp = rand()%256;
key[i] = temp;
used[temp] = 1;
}
fwrite(key,1,255,pKeyOutput);
fseek(pInput,0,SEEK_END);
fSize = ftell(pInput);
rewind(pInput);
mFile = (char*)malloc(fSize);
fread(mFile,1,fSize,pInput);
for(i=0;i<fSize;i++)
{
temp = mFile[i];
fputc(key[temp],pOutput);
}
fclose(pInput);
fclose(pOutput);
fclose(pKeyOutput);
free(mFile);
free(key);
return 0;
}
The Decryption Routine :
#include <stdio.h>
#include <stdlib.h>
int main()
{
int temp,j;
char *key,*mFile;
long i,fSize;
FILE *pKeyInput = fopen("key.bin","rb");
FILE *pInput = fopen("Encrypted.out","rb");
FILE *pOutput = fopen("Decrypted.out","wb");
if(pInput==NULL||pOutput==NULL||pKeyInput==NULL)
{
printf("File I/O Error\n");
return 1;
}
key = (char*)malloc(255);
fread(key,1,255,pKeyInput);
fseek(pInput,0,SEEK_END);
fSize = ftell(pInput);
rewind(pInput);
mFile = (char*)malloc(fSize);
fread(mFile,1,fSize,pInput);
for(i=0;i<fSize;i++)
{
temp = mFile[i];
for(j=0;j<256;j++)
{
if(key[j]==temp)
fputc(j,pOutput);
}
}
fclose(pInput);
fclose(pOutput);
fclose(pKeyInput);
free(mFile);
free(key);
return 0;
}
Make sure you use unsigned char; if char is signed, things will go wrong when you process characters in the range 0x80..0xFF. Specifically, you'll be accessing negative indexes in your 'mapping table'.
Of course, strictly speaking, ASCII is a 7-bit code set and any character outside the range 0x00..0x7F is not ASCII.
You only allocate 255 bytes but you then proceed to overwrite one byte beyond what you allocate. This is a basic buffer overflow; you invoke undefined behaviour (which means anything may happen, including the possibility that it seems to work correctly without causing trouble - on some machines).
Another problem is that you write mappings for 255 of the 256 possible byte codes, which is puzzling. What happens with the other byte value?
Of course, since you write the 256-byte mapping to the 'encrypted' file, it will be child's play to decode; the security in this scheme is negligible. However, as a programming exercise, it still has some merit.
There is no reason to slurp the entire file and then write it out byte by byte. You can perfectly well read it byte by byte as well as write it byte by byte. Or you could slurp the whole file, map it in situ, and then write the whole file in one go. Consistency is important in programming.

Resources