I have a source buffer which i declared using malloc and i have used fread to read into the buffer some data from a big file. Now I want to separate out alternate chunks of data (say 2 bytes each) from this source buffer into two target buffers. This problem can be extrapolated to copying every nth chunk to n number of target buffers. I need help in the form of a sample code for the simplest case of two target buffers. This is what I thought about which I am quite sure isn't the right thing.
int totsamples = 256*2*2;
int *sbuff = malloc(totsamples);
int *tbuff1 = malloc(totsamples/2);
int *tbuff2 = malloc(totsamples/2);
elements = fread(sbuff, 2, 256*2, fs);
for(i = 0; i<256; i++)
{
tbuff1[i] = sbuff[i*2];
tbuff2[i] = sbuff[(i*2) + 1];
}
Maybe this will give you and idea:
for(i = 0; i<256; i++)
{
tbuff1[2*i+0] = sbuff[i*4+0];
tbuff1[2*i+1] = sbuff[i*4+1];
tbuff2[2*i+0] = sbuff[i*4+2];
tbuff2[2*i+1] = sbuff[i*4+3];
}
Note: Above code is wrong with respect to your malloc() parameters, as it is unclear what your totsamples means, so fix something before using...
Another note: If you want longer than 2 items long chunk, it starts to make sense to use memcpy to do the copying.
Suggestion: Use constants instead of magic numbers, such as const int SAMPLES=256;. Also I'm not sure, but it appears you think size of int is 2? Don't, instead use sizeof(int) etc (and size of int is rarely 2, btw).
Hmm... Are you actually trying to optimize things by copying bytes using integers to copy 4 bytes at a time? Don't! "Premature optimization is root of all evil". You may consider that later, after you code works otherwise, but first create a working non-hacky version, and doubly so, if you need to ask how to do even that, like here...
Related
I have a program that results in an integer array of variable size and I need to write the elements to a text file. The elements need to be on the same line in the file.
This is a minimal example of what I am currently doing. I'm using the approach in this post https://stackoverflow.com/a/30234430/10163981
FILE *file = fopen("file.txt","w");
int nLines = 1000;
char breakstr[]="\n";
for(; ix<N; ix++){
char s[nLines*13];
for(int jx = 0 ; jx<nLines; jx++){
index += sprintf(&s[index],"%03i %03i %03i ", array[ix][jx], array[ix][jx], array[ix][jx]);
// I need jx:th element in repeats of three, and may need to modify it externally for printing
}
fwrite(s,sizeof(char),strlen(s), file);
fwrite(breakstr,sizeof(char),strlen(breakstr), file);
}
fclose(file);
I am formatting the array contents as a string and using fwrite, as this method has been given to me as a requirement. My problem is that this implementation that I am using is way too slow. I have also tried using shorter strings and writing for each iteration, but this even slower. There is not much I can do with regards to the outer ix loop, as the initial value of ix is variable. I included it for the same of completeness.
nLines is expected to reach as high as 10000 at most.
so i am writing a program, and i want to move all elements in the array N places to the left. wether the first elements of the array get added to the end or deleted: i don't care, the last N elements need to be nulled out anyway. i could ofcourse just make a copy of that array.
like this:
int *buffer = [loads of elements, these get assigned dynamically];
int *tmpbuffer = buffer;
for (int i = 0; i < sizeof(buffer); i++) {
buffer[i] = tmpbuffer[i + N];
}
(please ignore any pointer and sizeof mistakes, this is a really quick sketch)
but i doubt that'll be efficient at all. this is an array with roughly 4400 elements. but that will be expanded to a LOT more elements later.
what am i trying to do?
see it like a terminal program, but slightly different. so there are a few text lines, and when there are more than N lines, the top most line will be deleted and there will be a new line at the bottom. even though this sounds like a 3d array (one array for all the vertical lines, and one for the text lines), it's not.
this is done without any external library's, because it's for a "kernel". (you might say that i am prob not skilled enough to do so, and you're definetly right, right now i only have VGA ouput and basic terminal, but when all lines are filled, it just erases the entire screen. i just like to learn this way: have an objective and chase it.)
i hope i provided enough info. if i didn't i'll try to provide it.
Your approach
// sizeof(buffer) is sizeof (apointer); I replaced that with (nelems - N)
for (int i = 0; i < nelems - N; i++) {
buffer[i] = tmpbuffer[i + N];
}
looks very efficient to me.
You may want to compare with a pure pointer-based approach, but I doubt there will be any difference
src = buffer + N;
dst = buffer;
for (int i = 0; i < nelems - N; i++) *dst++ = *src++;
Hi I came across the following code as a response to a stackoverflow questions. Following was the answer which has got no direct connection to the question but it seems to improve the efficiency the of code.
for (int i = 0; i < nodeList.getLength(); i++)
change to
for (int i = 0, len = nodeList.getLength(); i < len; i++)
to be more efficient. The second way may be the best as it tends to
use a flatter, predictable memory model.
I read about flat memory model but I couldn't get the point here. i.e., in what way does it makes the code more efficient. Can somebody explain me.
Ref: https://stackoverflow.com/a/12736268/3320657
Flat memory model or linear memory model refers to a memory addressing paradigm in which "memory appears to the program as a single contiguous address space." The CPU can directly (and linearly) address all of the available memory locations without having to resort to any sort of memory segmentation or paging schemes.
Keeping this in mind, the computer declares memory one line at a time. When declaring the variable within the line itself it causes less of a strain on seeking the value.
for (int i = 0, len = nodeList.getLength(); i < len; i++)
is more efficient than;
len = nodelist.getLength();
for (int i = 0, i < len; i++)
nodeList.getLength() isn't called every time the program loops, instead it is called once, stored in the integer len, and then compares i to len instead of running nodeList.getLength().
I have a massive array I need to search (actually it's a massive array of smaller arrays, but for all intents and purposes, lets consider it one huge array). What I need to find is a specific series of numbers. Obviously, a simple for loop will work:
Pseudocode:
for(x = 0; x++) {
if(array[x] == searchfor[location])
location++;
else
location = 0;
if(location >= strlen(searchfor))
return FOUND_IT;
}
Thing is I want this to be efficient. And in a perfect world, I do NOT want to return the prepared data from an OpenCL kernel and do a simple search loop.
I'm open to non-OpenCL ideas, but something I can implement across a work group size of 64 on a target array length of 1024 would be ideal.
I'm kicking around ideas (split the target across work items, compare each item, looped, against each target, if it matches, set a flag. After all work items complete, check flags. Though as I write that, that sounds very inefficient) but I'm sure I'm missing something.
Other idea was that since the target array is uchar, to lump it together as a double, and check 8 indexes at a time. Not sure I can do that in opencl easily.
Also toying with the idea of hashing the search target with something fast, MD5 likely, then grabbing strlen(searchtarget) characters at a time, hashing it, and seeing if it matches. Not sure how much the hashing will kill my search speed though.
Oh - code is in C, so no C++ maps (something I found while googling that seems like it might help?)
Based on comments above, for future searches, it seems a simple for loop scanning the range IS the most efficient way to find matches given an OpenCL implementation.
Create an index array[sizeof uchar]. For each uchar in the search string make array[uchar] = position in search string of first occurence of uchar. The rest of array contains -1.
unsigned searchindexing[sizeof char] = { (unsigned)-1};
memcpy(searchindexing + 1, searchindexing, sizeof char - 1);
for (i = 0; i < strlen(searchfor); i++)
searchindexing[searchfor[i]] = i;
If you don't start at the beginning, an uchar occuring more than one time will get the wrong position entered into searchindexing.
Then you search the array by stepping strlen(searchfor) unless finding an uchar from searchfor.
for (i = 0; i < MAXARRAYLEN; i += strlen(searchfor))
if ((unsigned)-1 != searchindexing[array[i]]) {
i -= searchindexing[array[i]];
if (!memcmp(searchfor, &array[i], strlen(searchfor)))
return FOUND_IT;
}
If most of the uchar in array isn't in searchfor, this is probably the fastest way. Note the code has not been optimized.
Example: searchfor = "banana". strlen is 6. searchindexing['a'] = 5, ['b'] = 0, ['n'] = 4 and the rest a value not between 0 to 5, like -1 or maxuint. If array[i] is something not in banana like space, i increments by 6. If array[i] now is 'a', you might be in banana and it can be any of the 3 'a's. So we assume the last 'a' and move 5 places back and do a compare with searchfor. If succes, we found it, otherwise we step 6 places forward.
I am working on a Windows C project which is string-intensive: I need to convert a marked up string from one form to another. The basic flow is something like:
DWORD convert(char *point, DWORD extent)
{
char *point_end = point + extent;
char *result = memory_alloc(1);
char *p_result = result;
while (point < point_end)
{
switch (*point)
{
case FOO:
result_extent = p_result - result;
result = memory_realloc(12);
result += result_extent;
*p_result++ = '\n';
*p_result++ = '\t';
memcpy(result, point, 10);
point += 10;
result += 10;
break;
case BAR:
result_extent = p_result - result;
result = memory_realloc(1);
result += result_extent;
*result++ = *point++;
break;
default:
point++;
break;
}
}
// assume point is big enough to take anything I would copy to it
memcpy(point, result, result_extent);
return result_extent;
}
memory_alloc() and memory_realloc() are fake functions to highlight the purpose of my question. I do not know beforehand how big the result 'string' will be (technically, it's not a C-style/null-terminate string I'm working with, just a pointer to a memory address and a length/extent), so I'll need to dynamically size the result string (it might be bigger than the input, or smaller).
In my initial pass, I used malloc() to create room for the first byte/bytes and then subsequently realloc() whenever I needed to append another byte/handful of bytes...it works, but it feels like this approach will needlessly hammer away at the OS and likely result in shifting bytes around in memory over and over.
So I made a second pass, which determines how long the result_string will be after an individual unit of the transformation (illustrated above with the FOO and BAR cases) and picks a 'preferred allocation size', e.g. 256 bytes. For example, if result_extent is 250 bytes and I'm in the FOO case, I know I need to grow the memory 12 bytes (newline, tab and 10 bytes from the input string) -- rather than reallocating 260 bytes of memory, I'd reach for 512 bytes, hedging my bet that I'm likely going to continue to add more data (and thus I can save myself a few calls into realloc).
On to my question: is this latter thinking sound or is it premature optimization that the compiler/OS is probably already taking care of for me? Other than not wasting memory space, is there an advantage to reallocating memory by a couple bytes, as needed?
I have some rough ideas of what I might expect during a single conversion instance, e.g. a worse case scenario might be a 2MB input string with a couple hundred bytes of markup that will result in 50-100 bytes of data to be added to the result string, per markup instance (so, say 200 reallocs stretching the string by 50-100 bytes with another 100 reallocations caused by simply copying data from the input string into the result string, aside from the markup).
Any thoughts on the subject would be appreciated. thanks
As you might know, realloc can move your data at each call. This results in an additional copy. In cases like this, I think it is much better to allocate a large buffer that will most probably be sufficient for the operation (an upper bound). In the end, you can allocate the exact amount for the result and do a final copy/free. This is better and is not premature optimization at all. IMO using realloc might be considered premature optimization in this case.