Move a pointer location around to write a recursively allocated buffer - c

Apologies if the title doesn't make sense, I've been staring at my monitor for 15 minutes trying to come up with one.
I'm using a library function from a C API (in 64-bit Xubuntu 14.04) to move a set number of int16_t values into a buffer and repeat it a set number of times, described here in (sort of) pseudo-code:
int16_t *buffer = calloc(total_values_to_receive, 2 * sizeof(samples[0]));
while (!done){
receive_into_buffer(buffer, num_values_to_receive_per_pass);
fwrite(buffer, 2 * sizeof(samples[0]), num_values_to_receive_per_pass, file);
values_received += num_values_to_receive_per_pass;
if (values_received == total_values_to_receive){
done = true;
}
}
Basically what it does is receive a set number of values on each pass and writes those values to a file, also note that the same file is appended each time. E.g. if total_values_to_receive = 100 and num_values_to_receive_per_pass = 10, there would be 10 passes in total.
What I would like to do, mainly to increase speed, is to have the write part occur after all passes have been completed. The library function prototype contains void* samples, size_t num_samples, which, you guessed it, refers to the buffer where the samples need to be stored into and the amount of samples to store.
I'm not completely confident with pointers, but is there a way to write into the buffer on one pass, and then move the pointer by num_values_to_receive_per_pass so that the next time the library function is called, it just appends that buffer (so to speak). Then the pointer can be moved to the start of the buffer and fwrite can be called to write the total number of values into the file.
Does that make sense? Any tips on how to actually implement it?
Thanks in advance.

Assuming the second argument of receive_into_buffer is in unit of buffer element, not byte, the following should work,
int16_t *buffer = calloc(total_values_to_receive, 2 * sizeof(buffer[0]));
int16_t *temp = buffer;
while (!done){
receive_into_buffer(temp, num_values_to_receive_per_pass);
temp += 2 * num_values_to_receive_per_pass;
values_received += num_values_to_receive_per_pass;
if (values_received == total_values_to_receive){
done = true;
}
}
fwrite(buffer, 2 * sizeof(buffer[0]), total_values_to_receive, file);

Style: a plain for() loop will avoid the indicator variable:
for (values_receive=0; values_received < total_values_to_receive; values_received += num_values_to_receive_per_pass) {
receive_into_buffer(buffer, num_values_to_receive_per_pass);
fwrite(buffer, 2 * sizeof(samples[0]), num_values_to_receive_per_pass, file);
}

Related

How to read a file and simultaneously fill an array

In a C program I'm writing I have to read values from a text file and put them in an array for later use.
I don't like my code (snippet shown below) because I do two while loops, the first to count the number of values, then I create an array as big as that value, and lastly I read the file again, filling the array.
Also, in the first loop, I use a variable x because the fscanf() requires it, but I never use it later in the code and I'd like to avoid it at all if possible.
int x, n=0, sum=0;
fp=fopen("data.txt", "r");
while(fscanf(fp,"%d\n",&x)!=EOF){
n++;
}
rewind(fp);
int v[n];
while(fscanf(fp,"%d\n",&v[i])!=EOF){
sum+=v[i];
i++;
}
So, any advice on how can I improve this code? I figured I could kinda "fix" it by declaring an array big "enough" at the beginning and filling it as needed. But I don't know in advance how many values I have to work with, so I decided to trash out this method.
This is one scenario where dynamic memory allocation can come handy. You can follow the general procedure as described below
Define a pointer.
Open the file fopen() and read the first element from the file fscanf(). Error check should be taken care, also.
If the read is successful, allocate memory dynamically malloc() to the pointer and copy the value.
Read next element.
4.1. If the read is successful
If the read is successful, re-allocate the memory realloc() with one more new element size.
Copy the last read value to the newly allocated memory.
4.2. If the read id failure, check for EOF and stop the reading.
Continue to step 4.
Also, please keep in mind, the memory which you have allocate using dynamic memory allocation, needs to be free()d also.
As a note, referring to the comment of Mr. # szczurcio, this is not an optimized effort, because, you've to re-allocte memory in each successful read. TO minimize the impact of dynamic memory allocation, we can decide on a threshold value which we will use to allocate memory and then, when exhausted, will double the amount of the previous value. This way, the allocation will happen in a chunk and the allocation overhead in each read cycle can be avoided.
Minor changes to the code, please note that I've changed v to be an int* and then check the amount of carriage returns in the file. I then allocate the correct amount of memory for the array, rewind file and then let your code loop through the file again...
int x, n=0, sum=0;
char c;
int* v;
int i = 0;
fp=fopen("data.txt", "r");
while (f.get(c))
if (c == '\n')
++i;
rewind(fp);
v = malloc( i * sizeof( int ));
i = 0;
while(fscanf(fp,"%d\n",&v[i])!=EOF)
{
sum+=v[i];
i++;
}
As said by Sourav, dynamic memory allocation is definitely the way to go.
That said, you can also change the data structure to another that doesn't require a priori knowledge of N. If you only need sequential access to the values and don't really need random access, a linked list is an option. Moreover, you can always use binary trees, hash tables and so on. Depends on what you want to do with the data.
P.S: srry, I'd post this as a comment, but I don't have the reputation.
This is the typical scenario in which you would like to know the size of the file before creating the array. Well, better speaking, the number of lines in the file.
I'm going to suggest something radically different. Since this is a text file, the smallest number will occupy two chars (smallest in a "text" sense), one for the digit, and another one for the \n (though the \n can be one or two bytes, that's OS dependant).
We can now the size of the file. After fopen'ing it, you can, through ftell, know how many bytes it holds. If you divide that number by 2, you will have an approximation of the maximum possible number of lines in the file. So you can create the array of that size, and then save the number of positions really occupied.
FILE * fp = fopen( "data.txt", "rt" );
/* Get file size */
fseek( fp, SEEK_END, 0 );
long size = ftell( fp );
fseek( fp, SEEK_SET, 0 );
/* Create data */
long MaxNumbers = size / 2;
int * data = (int *) malloc( sizeof( int ) * MaxNumbers );
long lastPos = 0;
/* Read file */
int * next = data;
while( fscanf(fp, "%d\n", next) != EOF ) {
++next;
}
lastPos = ( next - data ) / sizeof( int );
/* Close the file */
fclose( fp );
Once you have the data loaded in data, you know the real number of items, so you can copy it to another array of the exact size (maybe through memcpy()), or stay with this array. If you want to change the array:
int * v = (int *) malloc( sizeof( int ) * lastPos );
memcpy( v, data, sizeof( int ) * lastPos );
free( data );
Note: this code is a simple demo, and it does not check for NULL's after calling malloc(), while a real program should.
This code does not waste memory or computation time when expanding the array because the data does not fit. However, it a) creates an array at the beginning which is potentially bigger than needed, and b) if you want to have an array of the exact size, then you will temporally have twice the needed space allocated. We are changing memory for better performance, and sometimes this is not a good idea for our environment (i.e., an embedded system).
A big improvement to this strategy would be to be able to deal with the input file. If you dedicate the same space to each number (say there are always three positions, and a 3 is stored as 003), and you know the maximum number (in order to know how many spaces you need for each number), then the algorithm will be completely accurate, and you don't need to change the data read to another array or whatever. With this change, this strategy is simply the best one I can imagine.
Hope this helps.

How do I read and parse a text file with numbers, fast (in C)?

The last time update: my classmate uses fread() to read about one third of the whole file into a string, this can avoid lacking of memory. Then process this string, separate this string into your data structure. Notice, you need to care about one problem: at the end of this string, these last several characters may cannot consist one whole number. Think about one way to detect this situation so you can connect these characters with the first several characters of the next string.
Each number is corresponding to different variable in your data structure. Your data structure should be very simple because each time if you insert your data into one data structure, it is very slow. The most of time is spent on inserting data into data structure. Therefore, the fastest way to process these data is: using fread() to read this file into a string, separate this string into different one-dimensional arrays.
For example(just an example, not come from my project), I have a text file, like:
72 24 20
22 14 30
23 35 40
42 29 50
19 22 60
18 64 70
.
.
.
Each row is one person's information. The first column means the person's age, the second column is his deposit, the second is his wife's age.
Then we use fread() to read this text file into string, then I use stroke() to separate it(you can use faster way to separate it).
Don't use data structure to store the separated data!
I means, don't do like this:
struct person
{
int age;
int deposit;
int wife_age;
};
struct person *my_data_store;
my_data_store=malloc(sizeof(struct person)*length_of_this_array);
//then insert separated data into my_data_store
Don't use data structure to store data!
The fastest way to store your data is like this:
int *age;
int *deposit;
int *wife_age;
age=(int*)malloc(sizeof(int)*age_array_length);
deposit=(int*)malloc(sizeof(int)*deposit_array_length);
wife_age=(int*)malloc(sizeof(int)*wife_array_length);
// the value of age_array_length,deposit_array_length and wife_array_length will be known by using `wc -l`.You can use wc -l to get the value in your C program
// then you can insert separated data into these arrays when you use `stroke()` to separate them.
The second update: The best way is to use freed() to read part of the file into a string, then separate these string into your data structure. By the way, don't use any standard library function which can format string into integer , that's to slow, like fscanf() or atoi(), we should write our own function to transfer a string into n integer. Not only that, we should design a more simpler data structure to store these data. By the way, my classmate can read this 1.7G file within 7 seconds. There is a way can do this. That way is much better than using multithread. I haven't see his code, after I see his code, I will update the third time to tell you how could hi do this. That will be two months later after our course finished.
Update: I use multithread to solve this problem!! It works! Notice: don't use clock() to calculate the time when using multithread, that's why I thought the time of execution increases.
One thing I want to clarify is that, the time of reading the file without storing the value into my structure is about 20 seconds. The time of storing the value into my structure is about 60 seconds. The definition of "time of reading the file" includes the time of read the whole file and store the value into my structure. the time of reading the file = scan the file + store the value into my structure. Therefore, have some suggestions of storing value faster ? (By the way, I don't have control over the inout file, it is generated by our professor. I am trying to use multithread to solve this problem, if it works, I will tell you the result.)
I have a file, its size is 1.7G.
It looks like:
1 1427826
1 1427827
1 1750238
1 2
2 3
2 4
3 5
3 6
10 7
11 794106
.
.
and son on.
It has about ten millions of lines in the file. Now I need to read this file and store these numbers in my data structure within 15 seconds.
I have tried to use freed() to read whole file and then use strtok() to separate each number, but it still need 80 seconds. If I use fscanf(), it will be slower. How do I speed it up? Maybe we cannot make it less than 15 seconds. But 80 seconds to read it is too long. How to read it as fast as we can?
Here is part of my reading code:
int Read_File(FILE *fd,int round)
{
clock_t start_read = clock();
int first,second;
first=0;
second=0;
fseek(fd,0,SEEK_END);
long int fileSize=ftell(fd);
fseek(fd,0,SEEK_SET);
char * buffer=(char *)malloc(sizeof(char)*fileSize);
char *string_first;
long int newFileSize=fread(buffer,1,fileSize,fd);
char *string_second;
while(string_first!=NULL)
{
first=atoi(string_first);
string_second=strtok(NULL," \t\n");
second=atoi(string_second);
string_first=strtok(NULL," \t\n");
max_num= first > max_num ? first : max_num ;
max_num= second > max_num ? second : max_num ;
root_level=first/NUM_OF_EACH_LEVEL;
leaf_addr=first%NUM_OF_EACH_LEVEL;
if(root_addr[root_level][leaf_addr].node_value!=first)
{
root_addr[root_level][leaf_addr].node_value=first;
root_addr[root_level][leaf_addr].head=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].tail=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].g_credit[0]=1;
root_addr[root_level][leaf_addr].head->neighbor_value=second;
root_addr[root_level][leaf_addr].head->next=NULL;
root_addr[root_level][leaf_addr].tail=root_addr[root_level][leaf_addr].head;
root_addr[root_level][leaf_addr].degree=1;
}
else
{
//insert its new neighbor
Neighbor *newNeighbor;
newNeighbor=(Neighbor*)malloc(sizeof(Neighbor));
newNeighbor->neighbor_value=second;
root_addr[root_level][leaf_addr].tail->next=newNeighbor;
root_addr[root_level][leaf_addr].tail=newNeighbor;
root_addr[root_level][leaf_addr].degree++;
}
root_level=second/NUM_OF_EACH_LEVEL;
leaf_addr=second%NUM_OF_EACH_LEVEL;
if(root_addr[root_level][leaf_addr].node_value!=second)
{
root_addr[root_level][leaf_addr].node_value=second;
root_addr[root_level][leaf_addr].head=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].tail=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].head->neighbor_value=first;
root_addr[root_level][leaf_addr].head->next=NULL;
root_addr[root_level][leaf_addr].tail=root_addr[root_level][leaf_addr].head;
root_addr[root_level][leaf_addr].degree=1;
root_addr[root_level][leaf_addr].g_credit[0]=1;
}
else
{
//insert its new neighbor
Neighbor *newNeighbor;
newNeighbor=(Neighbor*)malloc(sizeof(Neighbor));
newNeighbor->neighbor_value=first;
root_addr[root_level][leaf_addr].tail->next=newNeighbor;
root_addr[root_level][leaf_addr].tail=newNeighbor;
root_addr[root_level][leaf_addr].degree++;
}
}
Some suggestions:
a) Consider converting (or pre-processing) the file into a binary format; with the aim to minimise the file size and also drastically reduce the cost of parsing. I don't know the ranges for your values, but various techniques (e.g. using one bit to tell if the number is small or large and storing the number as either a 7-bit integer or a 31-bit integer) could halve the file IO (and double the speed of reading the file from disk) and slash parsing costs down to almost nothing. Note: For maximum effect you'd modify whatever software created the file in the first place.
b) Reading the entire file into memory before you parse it is a mistake. It doubles the amount of RAM required (and the cost of allocating/freeing) and has disadvantages for CPU caches. Instead read a small amount of the file (e.g. 16 KiB) and process it, then read the next piece and process it, and so on; so that you're constantly reusing the same small buffer memory.
c) Use parallelism for file IO. It shouldn't be hard to read the next piece of the file while you're processing the previous piece of the file (either by using 2 threads or by using asynchronous IO).
d) Pre-allocate memory for the "neighbour" structures and remove most/all malloc() calls from your loop. The best possible case is to use a statically allocated array as a pool - e.g. Neighbor myPool[MAX_NEIGHBORS]; where malloc() can be replaced with &myPool[nextEntry++];. This reduces/removes the overhead of malloc() while also improving cache locality for the data itself.
e) Use parallelism for storing values. For example, you could have multiple threads where the first thread handles all the cases where root_level % NUM_THREADS == 0, the second thread handles all cases where root_level % NUM_THREADS == 1, etc.
With all of the above (assuming a modern 4-core CPU), I think you can get the total time (for reading and storing) down to less than 15 seconds.
My suggestion would be to form a processing pipeline and thread it. Reading the file is an I/O bound task and parsing it is CPU bound. They can be done at the same time in parallel.
There are several possibilities. You'll have to experiment.
Exploit what your OS gives you. If Windows, check out overlapped io. This lets your computation proceed with parsing one buffer full of data while the Windows kernel fills another. Then switch buffers and continue. This is related to what #Neal suggested, but has less overhead for buffering. Windows is depositing data directly in your buffer through the DMA channel. No copying. If Linux, check out memory mapped files. Here the OS is using the virtual memory hardware to do more-or-less what Windows does with overlapping.
Code your own integer conversion. This is likely to be a bit faster than making a clib call per integer.
Here's example code. You want to absolutely limit the number of comparisons.
// Process one input buffer.
*end_buf = ' '; // add a sentinel at the end of the buffer
for (char *p = buf; p < end_buf; p++) {
// somewhat unsafe (but fast) reliance on unsigned wrapping
unsigned val = *p - '0';
if (val <= 9) {
// Found start of integer.
for (;;) {
unsigned digit_val = *p - '0';
if (digit_val > 9) break;
val = 10 * val + digit_val;
p++;
}
... do something with val
}
}
Don't call malloc once per record. You should allocate blocks of many structs at a time.
Experiment with buffer sizes.
Crank up compiler optimizations. This is the kind of code that benefits greatly from excellent code generation.
Yes, standard library conversion functions are surprisingly slow.
If portability is not a problem, I'd memory-map the file. Then, something like the following C99 code (untested) could be used to parse the entire memory map:
#include <stdlib.h>
#include <errno.h>
struct pair {
unsigned long key;
unsigned long value;
};
typedef struct {
size_t size; /* Maximum number of items */
size_t used; /* Number of items used */
struct pair item[];
} items;
/* Initial number of items to allocate for */
#ifndef ITEM_ALLOC_SIZE
#define ITEM_ALLOC_SIZE 8388608
#endif
/* Adjustment to new size (parameter is old number of items) */
#ifndef ITEM_REALLOC_SIZE
#define ITEM_REALLOC_SIZE(from) (((from) | 1048575) + 1048577)
#endif
items *parse_items(const void *const data, const size_t length)
{
const unsigned char *ptr = (const unsigned char *)data;
const unsigned char *const end = (const unsigned char *)data + length;
items *result;
size_t size = ITEMS_ALLOC_SIZE;
size_t used = 0;
unsigned long val1, val2;
result = malloc(sizeof (items) + size * sizeof (struct pair));
if (!result) {
errno = ENOMEM;
return NULL;
}
while (ptr < end) {
/* Skip newlines and whitespace. */
while (ptr < end && (*ptr == '\0' || *ptr == '\t' ||
*ptr == '\n' || *ptr == '\v' ||
*ptr == '\f' || *ptr == '\r' ||
*ptr == ' '))
ptr++;
/* End of data? */
if (ptr >= end)
break;
/* Parse first number. */
if (*ptr >= '0' && *ptr <= '9')
val1 = *(ptr++) - '0';
else {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
while (ptr < end && *ptr >= '0' && *ptr <= '9') {
const unsigned long old = val1;
val1 = 10UL * val1 + (*(ptr++) - '0');
if (val1 < old) {
free(result);
errno = EDOM; /* Overflow! */
return NULL;
}
}
/* Skip whitespace. */
while (ptr < end && (*ptr == '\t' || *ptr == '\v'
*ptr == '\f' || *ptr == ' '))
ptr++;
if (ptr >= end) {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
/* Parse second number. */
if (*ptr >= '0' && *ptr <= '9')
val2 = *(ptr++) - '0';
else {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
while (ptr < end && *ptr >= '0' && *ptr <= '9') {
const unsigned long old = val2;
val1 = 10UL * val2 + (*(ptr++) - '0');
if (val2 < old) {
free(result);
errno = EDOM; /* Overflow! */
return NULL;
}
}
if (ptr < end) {
/* Error unless whitespace or newline. */
if (*ptr != '\0' && *ptr != '\t' && *ptr != '\n' &&
*ptr != '\v' && *ptr != '\f' && *ptr != '\r' &&
*ptr != ' ') {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
/* Skip the rest of this line. */
while (ptr < end && *ptr != '\n' && *ptr != '\r')
ptr++;
}
/* Need to grow result? */
if (used >= size) {
items *const old = result;
size = ITEMS_REALLOC_SIZE(used);
result = realloc(result, sizeof (items) + size * sizeof (struct pair));
if (!result) {
free(old);
errno = ENOMEM;
return NULL;
}
}
result->items[used].key = val1;
result->items[used].value = val2;
used++;
}
/* Note: we could reallocate result here,
* if memory use is an issue.
*/
result->size = size;
result->used = used;
errno = 0;
return result;
}
I've used a similar approach to load molecular data for visualization. Such data contains floating-point values, but precision is typically only about seven significant digits, no multiprecision math needed. A custom routine to parse such data beats the standard functions by at least an order of magnitude in speed.
At least the Linux kernel is pretty good at observing memory/file access patterns; using madvise() also helps.
If you cannot use a memory map, then the parsing function would be a bit different: it would append to an existing result, and if the final line in the buffer is partial, it would indicate so (and the number of chars not parsed), so that the caller can memmove() the buffer, read more data, and continue parsing. (Use 16-byte aligned addresses for reading new data, to maximize copy speeds. You don't necessarily need to move the unread data to the exact beginning of the buffer, you see; just keep the current position in the buffered data.)
Questions?
First, what's your disk hardware? A single SATA drive is likely to be topped out at 100 MB/sec. And probably more like 50-70 MB/sec. If you're already moving data off the drive(s) as fast as you can, all the software tuning you do is going to be wasted.
If your hardware CAN support reading faster? First, your read pattern - read the whole file into memory once - is the perfect use-case for direct IO. Open your file using open( "/file/name", O_RDONLY | O_DIRECT );. Read to page-aligned buffers (see man page for valloc()) in page-sized chunks. Using direct IO will cause your data to bypass double buffering in the kernel page cache, which is useless when you're reading that much data that fast and not re-reading the same data pages over and over.
If you're running on a true high-performance file system, you can read asynchronously and likely faster with lio_listio() or aio_read(). Or you can just use multiple threads to read - and use pread() so you don't have waste time seeking - and because when you read using multiple threads a seek on an open file affects all threads trying to read from the file.
And do not try to read fast into a newly-malloc'd chunk of memory - memset() it first. Because truly fast disk systems can pump data into the CPU faster than the virtual memory manager can create virtual pages for a process.

Very Slow Data Processing

Consider the following code that loads a dataset of records into a buffer and creates a Record object for each record. A record constitutes one or more columns and this information is uncovered at run-time. However, in this particular example, I have set the number of columns to 3.
typedef unsigned int uint;
typedef struct
{
uint *data;
} Record;
Record *createNewRecord (short num_cols);
int main(int argc, char *argv[])
{
time_t start_time, end_time;
int num_cols = 3;
char *relation;
FILE *stream;
int offset;
char *filename = "file.txt";
stream = fopen(filename, "r");
fseek(stream, 0, SEEK_END);
long fsize = ftell(stream);
fseek(stream, 0, SEEK_SET);
if(!(relation = (char*) malloc(sizeof(char) * (fsize + 1))))
printf((char*)"Could not allocate buffer");
fread(relation, sizeof(char), fsize, stream);
relation[fsize] = '\0';
fclose(stream);
char *start_ptr = relation;
char *end_ptr = (relation + fsize);
while (start_ptr < end_ptr)
{
Record *new_record = createNewRecord(num_cols);
for(short i = 0; i < num_cols; i++)
{
sscanf(start_ptr, " %u %n",
&(new_record->data[i]), &offset);
start_ptr += offset;
}
}
Record *createNewRecord (short num_cols)
{
Record *r;
if(!(r = (Record *) malloc(sizeof(Record))) ||
!(r->data = (uint *) malloc(sizeof(uint) * num_cols)))
{
printf(("Failed to create new a record\n");
}
return r;
}
This code is highly inefficient. My dataset contains around 31 million records (~1 GB) and this code processes only ~200 records per minute. The reason I load the dataset into a buffer is because I'll later have multiple threads process the records in this buffer and hence I want to avoid files accesses. Moreover, I have a 48 GB RAM, so the dataset in memory should not be a problem. Any ideas on how can to speed things up??
SOLUTION: the sscanf function was actually extremely slow and inefficient.. When I switched to strtoul, the job finishes in less than a minute. Malloc-ing ~ 3 million structs of type Record took only few seconds.
Confident that a lurking non-numeric data exist in the file.
int offset;
...
sscanf(start_ptr, " %u %n", &(new_record->data[i]), &offset);
start_ptr += offset;
Notice that if the file begins with non-numeric input, offset is never set and if it had the value of 0, start_ptr += offset; would never increment.
If a non-numeric data exist later in the file like "3x", offset will get the value of 1, and cause the while loop to proceed slowly for it will never get an updated value.
Best to check results of fread(), ftell() and sscanf() for unexpected return values and act accordingly.
Further: long fsizemay be too small a size. Look to using fgetpos() and fsetpos().
Note: to save processing time, consider using strtoul() as it is certainly faster than sscanf(" %u %n"). Again - check for errant results.
BTW: If code needs to uses sscanf(), use sscanf("%u%n"), a tad faster and for your code and the same functionality.
I'm not an optimization professional but I think some tips should help.
First of all, I suggest you use filename and num_cols as macros because they tend to be faster as literals when I don't see you changing their values in code.
Seond, using a struct for storing only one member is generally not recommended, but if you want to use it with functions you should only pass pointers. Since I see you're using malloc to store a struct and again for storing the only member then I suppose that is the reason why it is too slow. You're using twice the memory you need. This might not be the case with some compilers, however. Practically, using a struct with only one member is pointless. If you want to ensure that the integer you get (in your case) is specifically a record, you can typedef it.
You should also make end_pointer and fsize const for some optimization.
Now, as for functionality, have a look at memory mapping io.

Best approach to continuously scan for a string in a streaming buffer

I have this situation where my function continuously receive data of various length. The data can be anything. I want to find the best way I to hunt for particular string in this data. The solution will require somehow to buffer previous data but I cannot wrap my head around the problem.
Here is an example of the problem:
DATA IN -> [\x00\x00\x01\x23B][][LABLABLABLABLA\x01TO][KEN][BLA\x01]...
if every [...] represents a data chunk and [] represents a data chunk with no items, what is the best way to scan for the string TOKEN?
UPDATE:
I realised the question is a bit more complex. the [] are not separators. I just use them to describe the structure of the chunk per above example. Also TOKEN is not a static string per-se. It is variable length. I think the best way to read line by line but than the question is how to read a streaming buffer of variable length into lines.
The simplest way to search for TOKEN is:
try to match "TOKEN" starting from position 0 in the stream
try to match "TOKEN" starting from position 1 in the stream
etc
So all you need to buffer is a number of bytes from the stream equal to the length of "TOKEN" (5 bytes, or actually 4 will do). At each position try to match "TOKEN", which might require waiting until you have at least 5 bytes read into your buffer. If the match fails, rewind to where you started matching, plus one. Since you never rewind more than the length of the string you're searching for (minus one) that's all the buffer you really need.
The technical issue then is, how to maintain your 5 bytes of buffered data as you read continuously from the stream. One way is a so-called "circular buffer". Another way, especially if the token is small, is to use a larger buffer and whenever you get too near the end, copy the bytes you need to the beginning and start again.
If your function is a callback, called once for each new chunk of data, then you will need to maintain some state from one call to the next to allow for a match that spans two chunks. If you're lucky then your callback API includes a "user data pointer", and you can set that to point to whatever struct you like that includes the buffer. If not, you'll need global or thread-local variables.
If the stream has a high data rate then you might want to think about speeding things up, with the KMP algorithm or otherwise.
Sorry, I voted to delete my previous answer as my understanding of the question was not correct. I didn't read carefully enouogh and thought that the [] are token delimiters.
For your problem I'd recommend building a small state machine based on a simple counter:
For every character you do something like the following pseudo code:
if (received_character == token[pos]) {
++pos;
if (pos >= token_length) {
token_received = 1;
}
}
else {
pos = 0; // Startover
}
This takes a minimum of processor cycles and also a minimum of memory aso you don't need to buffer anything except the chunk just received.
If the needle is contained within memory, it could be assumed that you can allocate an equally-sized object to read into (e.g. char input_array[needle_size];).
To start the search process, fill that object with bytes from your file (e.g. size_t sz = fread(input_array, 1, input_size, input_file);) and attempt a match (e.g. if (sz == needle_size && memcmp(input_array, needle, needle_size) == 0) { /* matched */ }.
If the match fails or you want to continue searching after a successful match, advance the position forward by one byte (e.g. memmove(input_array, input_array + 1, input_size - 1); input_array[input_size - 1] = fgetc(input_file); and try again.
A concern was raised that this idea copies too many bytes around, in the comments. While I don't believe that this concern has a significant merit (as there is no evidence of significant value), the copies can be avoided by using a circular array; we insert new characters at pos % needle_size and compare the regions before and after that boundary as though they are the tail and head respectively. For example:
void find_match(FILE *input_file, char const *needle, size_t needle_size) {
char input_array[needle_size];
size_t sz = fread(input_array, 1, needle_size, input_file);
if (sz != needle_size) {
// No matches possible
return;
}
setvbuf(input_file, NULL, _IOFBF, BUFSIZ);
unsigned long long pos = 0;
for (;;) {
size_t cursor = pos % needle_size;
int tail_compare = memcmp(input_array, needle + needle_size - cursor, cursor),
head_compare = memcmp(input_array + cursor, needle, needle_size - cursor);
if (head_compare == 0 && tail_compare == 0) {
printf("Match found at offset %llu\n", pos);
}
int c = fgetc(input_file);
if (c == EOF) {
break;
}
input_array[cursor] = c;
pos++;
}
}

fwrite() - effect of size and count on performance

There seems to be a lot of confusion regarding the purpose of the two arguments 'size' and 'count' in fwrite(). I am trying to figure out which will be faster -
fwrite(source, 1, 50000, destination);
or
fwrite(source, 50000, 1, destination);
This is an important decision in my code as this command will be executed millions of times.
Now, I could just jump to testing and use the one which gives better results, but the problem is that the code is intended for MANY platforms.
So,
How can I get a definitive answer to which is better across platforms?
Will implementation logic of fwrite() vary from platform to platform?
I realize there are similar questions (What is the rationale for fread/fwrite taking size and count as arguments?, Performance of fwrite and write size) but do understand that this is a different question regarding the same issue. The answers in similar questions do not suffice in this case.
The performance should not depend on either way, because anyone implementing fwrite would multiply size and count to determine how much I/O to do.
This is exemplified by FreeBSD's libc implementation of fwrite.c, which in its entirety reads (include directives elided):
/*
* Write `count' objects (each size `size') from memory to the given file.
* Return the number of whole objects written.
*/
size_t
fwrite(buf, size, count, fp)
const void * __restrict buf;
size_t size, count;
FILE * __restrict fp;
{
size_t n;
struct __suio uio;
struct __siov iov;
/*
* ANSI and SUSv2 require a return value of 0 if size or count are 0.
*/
if ((count == 0) || (size == 0))
return (0);
/*
* Check for integer overflow. As an optimization, first check that
* at least one of {count, size} is at least 2^16, since if both
* values are less than that, their product can't possible overflow
* (size_t is always at least 32 bits on FreeBSD).
*/
if (((count | size) > 0xFFFF) &&
(count > SIZE_MAX / size)) {
errno = EINVAL;
fp->_flags |= __SERR;
return (0);
}
n = count * size;
iov.iov_base = (void *)buf;
uio.uio_resid = iov.iov_len = n;
uio.uio_iov = &iov;
uio.uio_iovcnt = 1;
FLOCKFILE(fp);
ORIENT(fp, -1);
/*
* The usual case is success (__sfvwrite returns 0);
* skip the divide if this happens, since divides are
* generally slow and since this occurs whenever size==0.
*/
if (__sfvwrite(fp, &uio) != 0)
count = (n - uio.uio_resid) / size;
FUNLOCKFILE(fp);
return (count);
}
The purpose of two arguments gets more clear, if you consider ther return value, which is the count of objects successfuly written/read to/from the stream:
fwrite(src, 1, 50000, dst); // will return 50000
fwrite(src, 50000, 1, dst); // will return 1
The speed might be implementation dependent although, I don't expect any considerable difference.
I'd like to point you to my question, which ended up exposing an interesting performance difference between calling fwrite once and calling fwrite multiple times to write a file "in chunks".
My problem was that there's a bug in Microsoft's implementation of fwrite so files larger than 4GB cannot be written in one call (it hangs at fwrite). So I had to work around this by writing the file in chunks, calling fwrite in a loop until the data was completely written. I found that this latter method always returns faster than the single fwrite call.
I'm in Windows 7 x64 with 32 GB of RAM, which makes write caching pretty aggressive.

Resources