How large is a deflated zlib object? - zlib

I am trying to to identify the size of a deflated zlib blob. In specific, I am deflating an object and shortly afterwards another deflated blob is in this file. I want to know how many files the deflated blob takes before moving to the next. Is there a way to identify this number of bytes?
Thanks

I figured it out.
When you assign the value for strm.avail_in, you capture the inputted bytes as follows
input_len = strm.avail_in = NUMBER_OF_BYTES_READ_FROM_FILE_OR_STREAM;
Then when you run this:
ret = inflate(&strm, Z_NO_FLUSH);
It should update the strm.avail_in value with the number of remaining bytes from that iteration of inflation. The difference between strm.avail_in and the input_len value is the number of deflated bytes read in that iteration. So, something like this:
read_bytes = input_len - strm.avail_in;
Hope this helps someone else in the future.

Related

Using Linux AIO, able to do IOs but writing garbage as well into the file

This might seem silly, but, I am using libaio ( not posix aio), I am able to write something into the file, but I am also writing extra stuff into the file.
I read about the alignment requirement and the data type of the buffer field of iocb.
Here is the code sample ( only relevant sections of use, for representation )
aio_context_t someContext;
struct iocb somecb;
struct io_event someevents[1];
struct iocb *somecbs[1];
somefd = open("/tmp/someFile", O_RDWR | O_CREAT);
char someBuffer[4096];
... // error checks
someContext = 0; // this is necessary
io_setup(32, &someContext ); // no error checks pasted here
strcpy(someBuffer, "hello stack overflow");
memset(&somecb, 0, sizeof(somecb));
somecb.aio_fildes = somefd ;
somecb.aio_lio_opcode = IOCB_CMD_PWRITE;
somecb.aio_buf = (uint64_t)someBuffer;
somecb.aio_offset = 0;
somecb.aio_nbytes = 100; // // //
// I am avoiding the memeaign and sysconf get page part in sample paste
somecbs[0] = &somecb; // address of the solid struct, avoiding heap
// avoiding error checks for this sample listing
io_submit(someContext, 1, somecbs);
// not checking for events count or errors
io_getevents(someContext, 1, 1, someevents, NULL);
The Output:
This code does create the file, and does write the intended string
hello stack overflow into the file /tmp/someFile.
The problem:
The file /tmp/someFile also contains after the intended string, in series,
#^#^#^#^#^#^#^#^#^ and some sections from the file itself ( code section), can say garbage.
I am certain to an extent that this is some pointer gone wrong in the data field, but cannot crack this.
How to use aio ( not posix) to write exactly and only 'hello world' into a file?
I am aware that aio calls might be not supported on all file systems as of now. The one I am running against does support.
Edit - If you want the starter pack for this attempt , you can get from here.
http://www.fsl.cs.sunysb.edu/~vass/linux-aio.txt
Edit 2 : Carelessness, I was setting up more number of bytes to write to within the file, and the code was honoring it. Put simply, to write 'hw' exactly one needed no more than 2 bytes in the bytes field of iocb.
There's a few things going on here. First up, the alignment requirement that you mentioned is either 512 bytes or 4096 bytes, depending on your underlying device. Try 512 bytes to start. It applies to:
The offset that you're writing in the file must be a multiple of 512 bytes. It can be 0, 512, 1024, etc. You can write at offset 0 like you're doing here, but you can't write at offset 100.
The length of data that you're writing to the file must be a multiple of 512 bytes. Again, you can write 512 bytes, 1024 bytes, or 2048 bytes, and so on - any multiple of 512. You can't write 100 bytes like you're trying to do here.
The address of the memory that contains the data you're writing must be a multiple of 512. (I typically use 4096, to be safe.) Here, you'll need to be able to do someBuffer % 512 and get 0. (With the code the way it is, it most likely won't be.)
In my experience, failing to meet any of the above requirements doesn't actually give you an error back! Instead, it'll complete the I/O request using normal, regular old blocking I/O.
Unaligned I/O: If you really, really need to write a smaller amount of data or write at an unaligned offset, then things get tricky even above and beyond the io_submit interface. You'll need to do an aligned read to cover the range of data that you need to write, then modify the data in memory and write the aligned region back to disk.
For example, say you wanted to modify offset 768 through 1023 on the disk. You'd need to read 512 bytes at offset 512 into a buffer. Then, memcpy() the 256 bytes you wanted to write 256 bytes into that buffer. Finally, you issue a write of the 512 byte buffer at offset 512.
Uninitialized Data: As others have pointed out, you haven't fully initialized the buffer that you're writing. Use memset() to initialize it to zero to avoid writing junk.
Allocating an Aligned Pointer: To meet the pointer requirements for the data buffer, you'll need to use posix_memalign(). For example, to allocate 4096 bytes with a 512 byte alignment restriction: posix_memalign(&ptr, 512, 4096);
Lastly, consider whether you need to do this at all. Even in the best of cases, io_submit still "blocks", albeit at the 10 to 100 microsecond level. Normal blocking I/O with pread and pwrite offers a great many benefits to your application. And, if it becomes onerous, you can relegate it to another thread. If you've got a latency-sensitive app, you'll need to do io_submit in another thread anyway!

Using fseek() To Update a Binary File

I have looked and looked online for help on using fseek() efficiently, but no matter what I do, I am still not receiving the right results. Basically I am reading from a file of animals that have an "age" parameter. If the age is -1, then upon adding to this binary file, I should use fseek() to find the first -1 in the file and overwriting that entire line with new information that the user inputs. I have an array that traverses and finds all of the holes at the beginning of the file, and it is working correctly. My issue is that it is updating the new animal and putting each one in the next empty slot with age -1, but when I go to refresh my file, all of the animals are appended to the end, even though their id's are the id's of the once empty slots. Here is my code:
void addingAnimal(FILE *file, struct animal ani, int * availableHoles) {
int i;
int offset = ((sizeof(int) + sizeof(ani)) * ani.id -1);
if (availableHoles[0] != 0) {
fseek(file, offset, SEEK_SET);
ani.id = availableHoles[0];
fwrite & ani, sizeof(ani), 1, file);
for (i = 0; i < sizeof(availableHoles) -1; i++) {
availableHoles[i] = avialablesHoles[i+1];
}
}
The very beginning of the file has an integer that tells us the number of holes within the file, so the offset is removing that, so once I print it, it prints everything correctly. Then I check if there are holes in the helper array I created, if there are, then I want the animal's id to be that id and I am trying to seek to the line with the first -1 age to put my updated animal's information there, and then writing it to the file. The last for-loop is just me shifting up the available holes. Oh and as for opening the file, I am using r+b for reading and writing. Thank you in advance!
You cannot use sizeof(availableHoles) to iterate on the array. You are in a function that receives availableHoles as a pointer, its size is irrelevant to the number of holes.
Pass the number of elements of this array as a separate argument.
Using the FILE streams in read/write mode is tricky, do you call fseek() systematically between accesses in read mode and write mode?
Post the calling code, the function addingAnimal alone is not enough to investigate your problem.

Copying specific data from a source buffer to several target buffers

I have a source buffer which i declared using malloc and i have used fread to read into the buffer some data from a big file. Now I want to separate out alternate chunks of data (say 2 bytes each) from this source buffer into two target buffers. This problem can be extrapolated to copying every nth chunk to n number of target buffers. I need help in the form of a sample code for the simplest case of two target buffers. This is what I thought about which I am quite sure isn't the right thing.
int totsamples = 256*2*2;
int *sbuff = malloc(totsamples);
int *tbuff1 = malloc(totsamples/2);
int *tbuff2 = malloc(totsamples/2);
elements = fread(sbuff, 2, 256*2, fs);
for(i = 0; i<256; i++)
{
tbuff1[i] = sbuff[i*2];
tbuff2[i] = sbuff[(i*2) + 1];
}
Maybe this will give you and idea:
for(i = 0; i<256; i++)
{
tbuff1[2*i+0] = sbuff[i*4+0];
tbuff1[2*i+1] = sbuff[i*4+1];
tbuff2[2*i+0] = sbuff[i*4+2];
tbuff2[2*i+1] = sbuff[i*4+3];
}
Note: Above code is wrong with respect to your malloc() parameters, as it is unclear what your totsamples means, so fix something before using...
Another note: If you want longer than 2 items long chunk, it starts to make sense to use memcpy to do the copying.
Suggestion: Use constants instead of magic numbers, such as const int SAMPLES=256;. Also I'm not sure, but it appears you think size of int is 2? Don't, instead use sizeof(int) etc (and size of int is rarely 2, btw).
Hmm... Are you actually trying to optimize things by copying bytes using integers to copy 4 bytes at a time? Don't! "Premature optimization is root of all evil". You may consider that later, after you code works otherwise, but first create a working non-hacky version, and doubly so, if you need to ask how to do even that, like here...

Remove Last two bytes from the File or Ignore last two bytes of the file In C

Here i am implementing CRC 16 for file for file verification.
Here i append 2 bytes CRC at the end of file. When File will be received on target device than i have to calculate CRC of this file without last two bytes
Here is my data after appeneding CRC at the end of file.
test123
wU
Now when i again calculate CRC on target device than i want to ignore last two bytes.
Here i have one common function in which i open file in read mode and calculate CRC. i want to use same function for this time.
I have one solution make another function same like previous one and go up to filesize-2. but dnt want to replicate function two times. i want to delete last two bytes.
So any body have Suggestion or Solution regarding this?
In addition, do you need help truncating two bytes off a file?
What kind of API is on the target.
On POSIX you can open the file, then off_t pos = lseek(fd, 0, SEEK_END) to seek to the end, which returns the position. if (pos == (off_t) -1) then the call failed. If the call succeeded, you can just ftruncate(fd, pos - 2) (provided that pos >= 2).
Have your function take a parameter to ignore the last n bytes. Pass in 0 for normal use a 2 for that case.

How to handle a huge string correctly?

This may be a newbie question, but i want to avoid buffer overflow. I read very much data from the registry which will be uploaded to an SQL database. I read the data in a loop, and the data was inserted after each loop. My problem is, that this way, if i read 20 keys, and the values under is ( the number of keys is different on every computer ), then i have to connect to the SQL database 20 times.
However i found out, that there is a way, to create a stored procedure, and pass the whole data it, and so, the SQL server will deal with data, and i have to connect only once to the SQL server.
Unfortunately i don't know how to handle such a big string to avoid any unexpected errors, like buffer owerflow. So my question is how should i declare this string?
Should i just make a string like char string[ 15000 ]; and concatenate the values? Or is there a simplier way for doing this?
Thanks!
STL strings should do a much better job than the approach you have described.
You'll also need to build some thresholds. For example, if your string grew more than a mega bytes, it will be worth considering making different SQL connections since your transaction will be too long.
You may read (key, value) pairs from a registry and store them into a preallocated buffer while there is sufficient space there.
Maintain "write" position within the buffer. You could use it to check whether there is enough space for new key,value pair in the buffer.
When there is no space left for new (key,value) pair - execute stored procedure and reset "write" position within the buffer.
At the end of the "read key, value pairs" loop - check buffer's 'write" position and execute stored procedure if it is greater than 0.
This way you will minimize number of times you execute stored procedure on a server.
const int MAX_BUFFER_SIZE = 15000;
char buffer[MAX_BUFFER_SIZE];
char buffer_pos = 0; // "write" position within the buffer.
...
// Retrieve key, value pairs and push them into the buffer.
while(get_next_key_value(key, value)) {
post(key, value);
}
// Execute stored procedure if buffer is not empty.
if(buffer_pos > 0) {
exec_stored_procedure(buffer);
}
...
bool post(const char* key, const char* value)
{
int len = strlen(key) + strlen(value) + <length of separators>;
// Execute stored procedure if there is no space for new key/value pair.
if(len + buffer_pos >= MAX_BUFFER_SIZE) {
exec_stored_procedure(buffer);
buffer_pos = 0; // Reset "write" position.
}
// Copy key, value pair to the buffer if there is sufficient space.
if(len + buffer_pos < MAX_BUFFER_SIZE) {
<copy key, value to the buffer, starting from "write" position>
buffer_pos += len; // Adjust "write" position.
return true;
}
else {
return false;
}
}
bool exec_stored_procedure(const char* buf)
{
<connect to SQL database and execute stored procedure.>
}
To do this properly in C you need to allocate the memory dynamically, using malloc or one of the operating system equivalents. The idea here is to figure out how much memory you actually need and then allocate the correct amount. The registry functions provide various ways you can determine how much memory you need for each read.
It gets a bit trickier if you're reading multiple values and concatenating them. One approach would be to read each value into a separately allocated memory block, then concatenate them to a new memory block once you've got them all.
However, it may not be necessary to go to this much trouble. If you can say "if the data is more than X bytes the program will fail" then you can just create a static buffer as you suggest. Just make sure that you provide the registry and/or string concatenation functions with the correct size for the remaining part of the buffer, and check for errors, so that if it does fail it fails properly rather than crashing.
One more note: char buf[15000]; is OK provided the declaration is in program scope, but if it appears in a function you should add the static specifier. Implicitly allocated memory in a function is by default taken from the stack, so a large allocation is likely to fail and crash your program. (Fifteen thousand bytes should be OK but it's not a good habit to get into.)
Also, it is preferable to define a macro for the size of your buffer, and use it consistently:
#define BUFFER_SIZE 15000
char buf[BUFFER_SIZE];
so that you can easily increase the size of the buffer later on by modifying a single line.

Resources