errors concatenating bytes from a block of bytes using memcpy - c

On occasion, the following code works, which probably means good concept, but poor execution. Since this crashes depending on where the bits fell, this means I am butchering a step along the way. I am interested in finding an elegant way to fill bufferdata with <=4096 bytes from buffer, but admittedly, this is not it.
EDIT: the error I receive is illegal access on bufferdata
unsigned char buffer[4096] = {0};
char *bufferdata;
bufferdata = (char*)malloc(4096 * sizeof(*bufferdata));
if (! bufferdata)
return false;
while( ... )
{
// int nextBlock( voidp _buffer, unsigned _length );
read=nextBlock( buffer, 4096);
if( read > 0 )
{
memcpy(bufferdata+bufferdatawrite,buffer,read);
if(read == 4096) {
// let's go for another chunk
bufferdata = (char*)realloc(bufferdata, ( bufferdatawrite + ( 4096 * sizeof(*bufferdata)) ) );
if (! bufferdata) {
printf("failed to realloc\n");
return false;
}
}
}
else if( read<0 )
{
printf("error.\n");
break;
}
else {
printf("done.\n");
break;
}
}
free(bufferdata);

It's hard to tell where the error is, there's some code missing here and there.
if(read == 4096) { looks like a culprit, what if nextBlock, returned 4000 on one iteration, and 97 on the next ? Now you need to store 4097 bytes but you don't reallocate the buffer to accomodate for it.
You need to accumulate the bytes, and realloc whenever you pass a 4096 boundary.
something like:
#define CHUNK_SIZE 4096
int total_read = 0;
int buffer_size = CHUNK_SIZE ;
char *bufferdata = malloc(CHUNK_SIZE );
char buffer[CHUNK_SIZE];
while( ... )
{
// int nextBlock( voidp _buffer, unsigned _length );
read=nextBlock( buffer, CHUNK_SIZE );
if( read > 0 )
{
total_read += read;
if(buffer_size < total_read) {
// let's go for another chunk
char *tmp_buf;
tmp_buf= (char*)realloc(bufferdata, buffer_size + CHUNK_SIZE );
if (! tmp_buf) {
free(bufferdata);
printf("failed to realloc\n");
return false;
}
buffer_data = tmp_buf;
buffer_size += CHUNK_SIZE ;
}
memcpy(bufferdata+total_read-read,buffer,read);
}
...
}

A few comments:
Please define or const 4096. You will get burned if you ever need to change this. realloc chaining is an extremely inefficient way to get a buffer. Any way you could prefetch the size and grab it all at once? perhaps not, but I always cringe when i see realloc(). I'd also like to know what kZipBufferSize is and if it's in bytes like the rest of your counts. Also, what exactly is bufferdatawrite? I'm assuming it's source data, but I'd like to see it's declaration to make sure it's not a memory alignment issue - which is kinda what this feels like. Or a buffer overrun due to bad sizing.
Finally, are you sure they nextBlock isn't overruning memory some how? This is another point of potential weakness in your code.

Related

Splitting a string and store to the heap algorithm question

For this code below that I was writing. I was wondering, if I want to split the string but still retain the original string is this the best method?
Should the caller provided the ** char or should the function "split" make an additional malloc call and memory manage the ** char?
Also, I was wondering if this is the most optimizing method, or could I optimize the code better than this?
I still have not debug the code yet, I am a bit undecided whether if the caller manage the ** char or the function manage the pointer ** char.
#include <stdio.h>
#include <stdlib.h>
size_t split(const char * restrict string, const char splitChar, char ** restrict parts, const size_t maxParts){
size_t size = 100;
size_t partSize = 0;
size_t len = 0;
size_t newPart = 1;
char * tempMem;
/*
* We just reverse a long page of memory
* At reaching the space character that is the boundary of the new
*/
char * mem = (char*) malloc( sizeof(char) * size );
if ( mem == NULL ) return 0;
for ( size_t i = 0; string[i] != 0; i++ ) {
// If it is a split char we at a new part
if ( string[i] == splitChar) {
// If the last character was not the split character
// Then mem[len] = 0 and increase the len by 1.
if (newPart == 0) mem[len++] = 0;
newPart = 1;
continue;
} else {
// If this is a new part
// and not a split character
// we make a new pointer
if ( newPart == 1 ){
// if reach maxpart we break.
// It is okay here, to not worry about memory
if ( partSize == maxParts ) break;
parts[partSize++] = &mem[len];
newPart = 0;
}
mem[len++] = string[i];
if ( len == size ){
// if ran out of memory realloc.
tempMem = (char*)realloc(mem, sizeof(char) * (size << 1) );
// if fail quit loop
if ( tempMem == NULL ) {
// If we can't get more memory the last part could be corrupted
// We have to return.
// Otherwise the code below can seg.
// There maybe a better way than this.
return partSize--;
}
size = size << 1;
mem = tempMem;
}
}
}
// If we got here and still in a newPart that is fine no need
// an additional character.
if ( newPart != 1 ) mem[len++] = 0;
// realloc to give back the unneed memory
if ( len < size ) {
tempMem = (char*) realloc(mem, sizeof(char) * len );
// If the resizing did not fail but yielded a different
// memory block;
if ( tempMem != NULL && tempMem != mem ){
for ( size_t i = 0; i < partSize; i++ ){
parts[i] = tempMem + (parts[i] - mem);
}
}
}
return partSize;
}
int main(){
char * tStr = "This is a super long string just to test the str str adfasfas something split";
char * parts[10];
size_t len = split(tStr, ' ', parts, 10);
for (size_t i = 0; i < len; i++ ){
printf("%zu: %s\n", i, parts[i]);
}
}
What is "best" is very subjective, as well as use case dependent.
I personally would keep the parameters as input only, define a struct to contain the split result, and probably return such by value. The struct would probably contain pointers to memory allocation, so would also create a helper function free that memory. The parts might be stored as list of strings (copy string data) or index&len pairs for the original string (no string copies needed, but original string needs to remain valid).
But there are dozens of very different ways to do this in C, and all a bit klunky. You need to choose your flavor of klunkiness based on your use case.
About being "more optimized": unless you are coding for a very small embedded device or something, always choose a more robust, clear, easier to use, harder to use wrong over more micro-optimized. The useful kind of optimization turns, for example, O(n^2) to O(n log n). Turning O(3n) to O(2n) of a single function is almost always completely irrelevant (you are not going to do string splitting in a game engine inner rendering loop...).

Brute-forcing find FILE* C

I have been finding a way to brute-force finding a int64_t in a file in C.
I have written the following code.
int64_t readbyte = 0, totalreadbytes = 0;
int64_t totalfound = 0;
const int64_t magic = MAGIC_NUMBER;
char *buffer = (char *)malloc(BUFFER_SIZE);
int64_t *offsets = (int64_t *)malloc(sizeof(int64_t) * (1 << 24));
if (buffer == NULL || offsets == NULL)
{
return -3;
}
while ((readbyte = fread(buffer, 1, BUFFER_SIZE, inptr)) > 0)
{
for (int i = 0; i <= readbyte - 8; i++)
{
if (memcmp(buffer + i, &magic, sizeof(magic))==0)
{
offsets[totalfound++] = totalreadbytes + i;
}
}
totalreadbytes += readbyte - 8;
fseek(inptr, -8, SEEK_CUR);
}
// Do something to those offsets found
free(offsets);
free(buffer);
I have been wondering if there is a way better to find that int64_t, because my goal is to find them in a file as large as 60gigs and there maybe several hundred thousands of them in that file
Backing up and re-reading data is going to slow things down quite a bit.
Building on #melpomene comment, here's a very simple way to do it with mmap():
uint64_t needle;
struct stat sb;
int fd = open( filename, O_RDONLY );
fstat( fd, &sb );
unsigned char *haystack = mmap( NULL, sb.st_size,
PROT_READ, MAP_PRIVATE, fd, 0 );
close( fd );
off_t bytesToSearch = sb.st_size - sizeof( needle );
// <= so the last bytes get searched
for ( off_t ii = 0; ii <= bytesToSearch; ii++ )
{
if ( 0 == memcmp( haystack + ii, &needle, sizeof( needle ) ) )
{
// found it!
}
}
Error checking and proper headers omitted for clarity.
There are a lot of ways to improve the performance of that. This IO pattern is the worst possible use of mmap() with regards to performance - read every byte in the file just once, then throw the mappings away. Because mapping a file isn't all that fast in the first place, and it impacts the entire machine.
It'd probably be a lot faster to just use open() and read() with direct IO in large page-sized chunks into page-aligned memory, especially if the file is a significant fraction of the system's RAM. But that would make the code much more complex, as the comparisons would have to span buffers - it's almost certainly much faster to use two buffers and copy a few bytes out to search across a break between buffers than it is to back up and do a non-aligned read.

malloc fails and return NULL

this is my code:
#include <stdlib.h>
#include <stdio.h>
int sendMessage(uint8_t *pui8MsgData, int messageLength, uint32_t ui32ObjID)
{
int total = 0;
int bytesleft = messageLength;
int n;
int chunkSize;
while (bytesleft)
{
chunkSize = bytesleft > sizeof(uint8_t)*8 ? sizeof(uint8_t)*8 : bytesleft;
uint8_t *buffer = (uint8_t *)malloc(sizeof(uint8_t) * chunkSize);
if(buffer == NULL)
{
printf("Memory allocation failed");
return 0;
}
memcpy(buffer, pui8MsgData, sizeof(uint8_t) * chunkSize);
n = send(buffer, chunkSize, ui32ObjID);
total += n;
bytesleft -= n;
}
return 1;
}
but for some reason, the malloc always return NULL.. what could be wrong? or how to get the error which is returned by malloc?
It is not possible to to tell you what is wrong here with 100% certainty; there's too little information.
However, the malloc() seems pointless, and you never free() it. This is a memory leak, which might explain why you run out of memory, causing malloc() to return NULL. Seems plausible to me.
Just pass the data directly to send(), no need to allocate a new buffer and copy data around.
Edit: also, you never update pui8MsgData so you're processing the first bytes of the message over and over.
So, to summarize, the loop should be something like:
while (bytesleft)
{
const chunkSize = bytesLeft > 8 ? 8 : bytesLeft;
const ssize_t n = send(ui32ObjID, pui8MsgData + total, chunkSize);
if (n < 0)
{
fprintf(stderr, "Send() failed\n");
return 0;
}
total += n;
bytesLeft -= n;
}
This fixes the problem by removing malloc(). I also swapped the arguments to send(), assuming ui32ObjID is a valid file descriptor.
You are using buffer as the first argument to send(). But the send() function expects a file descriptor, not some uint8_t * so send() will likely return -1. This results in an ever increasing value for bytesleft, and thus an infinite loop with infinite memory allocations, eventually returning NULL.

Is this appender, with realloc function safe?

Just finished putting this function together from some man documentation, it takes a char* and appends a const char* to it, if the size of the char* is too small it reallocates it to something a little bigger and finally appends it. Its been a long time since I used c, so just checking in.
// append with realloc
int append(char *orig_str, const char *append_str) {
int result = 0; // fail by default
// is there enough space to append our data?
int req_space = strlen(orig_str) + strlen(append_str);
if (req_space > strlen(orig_str)) {
// just reallocate enough + 4096
int new_size = req_space;
char *new_str = realloc(orig_str, req_space * sizeof(char));
// resize success..
if(new_str != NULL) {
orig_str = new_str;
result = 1; // success
} else {
// the resize failed..
fprintf(stderr, "Couldn't reallocate memory\n");
}
} else {
result = 1;
}
// finally, append the data
if (result) {
strncat(orig_str, append_str, strlen(append_str));
}
// return 0 if Ok
return result;
}
This is not usable because you never tell the caller where the memory is that you got back from realloc.
You will need to either return a pointer, or pass orig_str by reference.
Also (as pointed out in comments) you need to do realloc(orig_str, req_space + 1); to allow space for the null terminator.
Your code has a some inefficient logic , compare with this fixed version:
bool append(char **p_orig_str, const char *append_str)
{
// no action required if appending an empty string
if ( append_str[0] == 0 )
return true;
size_t orig_len = strlen(*p_orig_str);
size_t req_space = orig_len + strlen(append_str) + 1;
char *new_str = realloc(*p_orig_str, req_space);
// resize success..
if(new_str == NULL)
{
fprintf(stderr, "Couldn't reallocate memory\n");
return false;
}
*p_orig_str = new_str;
strcpy(new_str + orig_len, append_str);
return true;
}
This logic doesn't make any sense:
// is there enough space to append our data?
int req_space = strlen(orig_str) + strlen(append_str);
if (req_space > strlen(orig_str)) {
As long as append_str has non-zero length, you're always going to have to re-allocate.
The main problem is that you're trying to track the size of your buffers with strlen. If your string is NUL-terminated (as it should be), your perceived buffer size is always going to be the exact length of the data in it, ignoring any extra.
If you want to work with buffers like this, you need to track the size in a separate size_t, or keep some sort of descriptor like this:
struct buffer {
void *buf;
size_t alloc_size;
size_t used_amt; /* Omit if strings are NUL-terminated */
}

Copying a 2D array of char * to user space from kernel space?

In kernel space, I have the following:
char * myData[MAX_BUF_SIZE][2];
I need to define a kernel method that copies this data into user-space., so how would I go about defining this method? I've got the following, but I'm not quite sure what I'm doing.
asmlinkage int sys_get_my_data(char __user ***data, int rowLen, int bufferSize) {
if (rowLen < 1 || bufferSize < 1 || rowLen > MAX_BUF_SIZE || bufferSize
> MAX_BUF_SIZE) {
return -1;
}
if( copy_to_user( data, myData, rowLen * bufferSize * dataCounter * 2) )
{
printk( KERN_EMERG "Copy to user failure for get all minifiles\n" );
return -1;
}
return 0;
}
Help?
Per your comment, these char * values point to nul-terminated strings.
Now, you can't just go copying that whole fileDataMap memory block to userspace - that'll just give userspace a bunch of char * values that point into kernel space, so it won't actually be able to use them. You need to copy the strings themselves to userspace, not just the pointers (this is a "deep copy").
Now, there's a few ways you can go about this. The easiest it to simply pack all the strings, one after another, into a big char array in userspace. It's then up to userspace to scan through the block, reconstructing the pointers:
asmlinkage int sys_get_my_data(char __user *data, size_t bufferSize)
{
size_t i;
for (i = 0; i < MAX_BUF_SIZE; i++) {
size_t s0_len = strlen(fileDataMap[i][0]) + 1;
size_t s1_len = strlen(fileDataMap[i][1]) + 1;
if (s0_len + s1_len > bufferSize) {
return -ENOSPC;
}
if (copy_to_user(data, fileDataMap[i][0], s0_len)) {
return -EINVAL;
}
data += s0_len;
bufferSize -= s0_len;
if (copy_to_user(data, fileDataMap[i][1], s1_len)) {
return -EINVAL;
}
data += s1_len;
bufferSize -= s1_len;
}
return 0;
}
This will only work if there are always MAX_BUF_SIZE string-pairs, because userspace will need to know how many strings it is expecting to recieve in order to be able to safely scan through them. If that's not the case, you'll have to return that information somehow - perhaps the return value of the syscall could be the number of string-pairs?
If you want the kernel to reconstruct the pointer table in userspace, you'll have to copy the strings as above, and then fill out the pointer table - userspace will have to pass two buffers, one for the strings themselves and one for the pointers.

Resources