building large strings of unknown length in c

building large strings of unknown length in c - c

I've no doubt there is an answer to this somewhere, I just can't find it.
I have just returned to c after a long break and am very rusty, so please excuse dumb errors. I need to generate a large (maybe equivilent of 10mb) string. I don't know how long it's going to be until it's built.
I tried the following two approaches to test speed:
int main() {
#if 1
size_t message_len = 1; /* + 1 for terminating NULL */
char *buffer = (char*) malloc(message_len);
for (int i = 0; i < 200000; i++)
{
int size = snprintf(NULL, 0, "%d \n", i);
char * a = malloc(size + 1);
sprintf(a, "%d \n", i);
message_len += 1 + strlen(a); /* 1 + for separator ';' */
buffer = (char*) realloc(buffer, message_len);
strncat(buffer, a, message_len);
}
#else
FILE *f = fopen("test", "w");
if (f == NULL) return -1;
for (int i = 0; i < 200000; i++)
{
fprintf(f, "%d \n", i);
}
fclose(f);
FILE *fp = fopen("test", "r");
fseek(fp, 0, SEEK_END);
long fsize = ftell(f);
fseek(fp, 0, SEEK_SET);
char *buffer = malloc(fsize + 1);
fread(buffer, fsize, 1, f);
fclose(fp);
buffer[fsize] = 0;
#endif
char substr[56];
memcpy(substr, buffer, 56);
printf("%s", substr);
return 1;
}
The first solution of concatenating strings each time took 3.8s, the second of writing to a file then reading took 0.02s.
Surely there is a fast way to build a big string in c without resorting to reading and writing to a file? Am I just doing something very inefficient? If not can I write to some kind of file object, then read it at the end and never save it?
In C# you would use a stringbuffer to avoid the slow concatination, what's the equivilent in c?
Thanks in advance.

You are making life pretty rough with these lines:
for (int i = 0; i < 200000; i++)
{
int size = snprintf(NULL, 0, "%d \n", i); // << executed in first loop only
char * a = malloc(size + 1); // allocate enough space for "0 \n" + 1
sprintf(a, "%d \n", i); // may try to squeeze "199999 \n" into a
message_len += 1 + strlen(a); /* 1 + for separator ';' */
buffer = (char*) realloc(buffer, message_len);
strncat(buffer, a, message_len);
}
You compute size and allocate space for a in the first iteration - then proceed to use it in every subsequent iteration (where i gets bigger, and you will in principle exceed the storage allocated for a). If you did this correctly (allocating size for a in each loop) you would have to free in every loop as well, or create a giant memory leak.
The solution, in C, is to pre-allocate plenty of memory - and only reallocate in emergency. If you know "roughly" how big your string will be, allocate all that memory at once; keep track of how big it is, and add more if you run short. At the end you can always "give back what you didn't use". Too many calls to realloc keep moving memory around (since you often don't have enough contiguous memory available where you were). As #Matt clarified in his comment: there is a real risk that every call to realloc moves the entire block of memory - and as the block gets bigger, that becomes a quadratically increasing load on the system. Here is a possible better solution (complete, tested with small N and BLOCK just to show the principle; you will want to use large N (your value of 200000), and larger BLOCK - and get rid of the printf statements that were there to show things are working ):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#define N 2000000
#define BLOCK 32
int main(void) {
size_t message_len = BLOCK; //
char *buffer = (char*) malloc(message_len);
int bb;
int i, n=0;
char* a = buffer;
clock_t start, stop;
for(bb = 1; bb < 128; bb *= 2) {
int rCount = 0;
start = clock();
for (i = 0; i < N; i++)
{
a = buffer + n;
n += sprintf(a, "%d \n", i);
if ((message_len - n) < BLOCK*bb) {
rCount++;
message_len += BLOCK*bb;
//printf("increasing buffer\n");
//printf("increased buffer to %ld\n", (long int)message_len);
buffer = realloc(buffer, message_len);
}
}
stop = clock();
printf("\nat the end, buffer length is %d; rCount = %d\n", strlen(buffer), rCount);
// buffer = realloc(buffer, strlen(buffer+1));
//printf("buffer is now: \n%s\n", buffer);
printf("time taken with blocksize = %d: %.1f ms\n", BLOCK*bb, (stop - start) * 1000.0 / CLOCKS_PER_SEC);
}
}
You will want to use a fairly large value for BLOCK - this will limit the number of calls to realloc. I would use something like 100000; you get rid of the space at the end anyway.
EDIT I modified the code I had posted to allow timing of the loop - increasing N to 2 million to get "reasonable times". I also minimized the initial memory allocation (to force a lot of calls to realloc and fixed a bug (when realloc had to move memory, a was no longer pointing to an offset in buffer. That is fixed now by keeping track of the string length so far in n.
This is pretty fast - 450 ms for the smallest block, dropping to 350 ms for larger blocks (2 million numbers). That is comparable (within the resolution of my measurement) to your file read/write operation. But yes - file I/O streaming and associated memory management are highly optimized...

I have left out some details, but my approach is generally like this
create a structure like this one
typedef struct {
char *curr ;
char *start ;
char *end ;
} VBUF ;
write some functions along these lines:
void vbuf_alloc(VBUF *v,int n)
{
v->start = malloc(n) ;
v->end = v->start + n ;
v->curr = v->start ;
}
int vbuf_add(VBUF *v,char *s,int length)
{
if (v->end - v->curr < length) {
vbuf_realloc(v,(v->end - v->start) * 2) ;
}
memcpy(v->curr,s,length) ;
v->curr += length ;
return length ;
}
int vbuf_adds(VBUF *v,char *s)
{
return vbud_add(v,s,strlen(s)) ;
}
You can extend this suite of functions as much as you like.

c has no objects so there's no equivalent to the C# stringbuffer (though in C++ you would use std::string).
You would get a performance boost by not calling realloc on every append, and never calling malloc the way you are.
You can avoid your malloc completely by simply declaring a char[] large enough to print the largest int into; this would avoid the snprintf too, and the size is fairly small.
Instead of constantly calling realloc, you should grow your buffer by some reasonable size… say 4kb (a nice size to correspond with page size), and only grow it again when it comes close to being exhausted (that is, when its current usage is less than the array size you use above from the top).

I suggest instead of realloc on every successive string, attempt to intelligently realloc ahead of time if the length is too short. In other words, avoid realloc whenever possible.
A naive implementation in pseudocode might be something like
Initialize an int/long to "written so far"
Initialize an int/long to remember "buffer size"
Alloc memory for a string up to the "buffer size"
Read in the next chunk into a temporary buffer
Get the "chunk size" from the temporary buffer
If "written so far" + "chunk size" > "buffer size"
Reallocate the chunk to be much bigger (double "buffer size"?)
Set the new "buffer size"
Copy the data from the temporary buffer to "buffer address" + "written so far" + 1
Set "written so far" to "written so far" + "chunk size"
I just threw this together, so there may be indexing errors, but you get the idea: only allocate and copy when you have to, instead of every time through the loop.

Related

Stray characters seen at output of snprintf

I have a string creating function in C which accepts an array of structs as it's argument and outputs a string based on a predefined format (like a list of list in python).
Here's the function
typedef struct
{
PacketInfo_t PacketInfo;
char Gnss60[1900];
//and other stuff...
} Track_json_t;
typedef struct
{
double latitude;
double longitude;
} GPSPoint_t;
typedef struct
{
UInt16 GPS_StatusCode;
UInt32 fixtime;
GPSPoint_t point;
double altitude;
unsigned char GPS_Satilite_Num;
} GPS_periodic_t;
unsigned short SendTrack()
{
Track_json_t i_sTrack_S;
memset(&i_sTrack_S, 0x00, sizeof(Track_json_t));
getEvent_Track(&i_sTrack_S);
//Many other stuff added to the i_sTrack_S struct...
//Make a JSON format out of it
BuildTrackPacket_json(&i_sTrack_S, XPORT_MODE_GPRS);
}
Track_json_t *getEvent_Track(Track_json_t *trk)
{
GPS_periodic_t l_gps_60Sec[60];
memset(&l_gps_60Sec, 0x00,
sizeof(GPS_periodic_t) * GPS_PERIODIC_ARRAY_SIZE);
getLastMinGPSdata(l_gps_60Sec, o_gps_base);
get_gps60secString(l_gps_60Sec, trk->Gnss60);
return trk;
}
void get_gps60secString(GPS_periodic_t input[60], char *output)
{
int i = 0;
memcpy(output, "[", 1); ///< Copy the first char as [
char temp[31];
for (i = 0; i < 59; i++) { //Run for n-1 elements
memset(temp, 0, sizeof(temp));
snprintf(temp, sizeof(temp), "[%0.8f,%0.8f],",
input[i].point.latitude, input[i].point.longitude);
strncat(output, temp, sizeof(temp));
}
memset(temp, 0, sizeof(temp)); //assign last element
snprintf(temp, sizeof(temp), "[%0.8f,%0.8f]]",
input[i].point.latitude, input[i].point.longitude);
strncat(output, temp, sizeof(temp));
}
So the output of the function must be a string of format
[[12.12345678,12.12345678],[12.12345678,12.12345678],...]
But at times I get a string which looks like
[[12.12345678,12.12345678],[55.01[12.12345678,12.12345678],...]
[[21.28211567,84.13454083],[21.28211533,21.22[21.28211517,84.13454000],..]
Previously, I had a buffer overflow at the function get_gps60secString, I fixed that by using snprintf and strncat.
Note: This is an embedded application and this error occur once or twice a day (out of 1440 packets)
Question
1. Could this be caused by an interrupt during the snprintf/strncat process?
2. Could this be caused by a memory leak, overwriting the stack or some other segmentation issue caused else where?
Basically I would like to understand what might be causing a corrupt string.
Having a hard time finding the cause and fixing this bug.
EDIT:
I used chux's function. Below is the Minimal, Complete, and Verifiable Example
/*
* Test code for SO question https://stackoverflow.com/questions/5216413
* A Minimal, Complete, and Verifiable Example
*/
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <stdbool.h>
#include <signal.h>
#include <unistd.h>
typedef unsigned short UInt16;
typedef unsigned long UInt32;
#define GPS_PERIODIC_ARRAY_SIZE 60
#define GPS_STRING_SIZE 1900
/* ---------------------- Data Structs --------------------------*/
typedef struct
{
char Gnss60[GPS_STRING_SIZE];
} Track_json_t;
typedef struct
{
double latitude;
double longitude;
} GPSPoint_t;
typedef struct
{
UInt16 GPS_StatusCode;
UInt32 fixtime;
GPSPoint_t point;
double altitude;
unsigned char GPS_Satilite_Num;
} GPS_periodic_t;
/* ----------------------- Global --------------------------------*/
FILE *fptr; //Global file pointer
int res = 0;
int g_last = 0;
GPS_periodic_t l_gps_60Sec[GPS_PERIODIC_ARRAY_SIZE];
/* ----------------------- Function defs --------------------------*/
/* At signal interrupt this function is called.
* Flush and close the file. And safly exit the program */
void userSignalInterrupt()
{
fflush(fptr);
fclose(fptr);
res = 1;
exit(0);
}
/* #brief From the array of GPS structs we create a string of the format
* [[lat,long],[lat,long],..]
* #param input The input array of GPS structs
* #param output The output string which will contain lat, long
* #param sz Size left in the output buffer
* #return 0 Successfully completed operation
* 1 Failed / Error
*/
int get_gps60secString(GPS_periodic_t input[GPS_PERIODIC_ARRAY_SIZE],
char *output, size_t sz)
{
int cnt = snprintf(output, sz, "[");
if (cnt < 0 || cnt >= sz)
return 1;
output += cnt;
sz -= cnt;
int i = 0;
for (i = 0; i < GPS_PERIODIC_ARRAY_SIZE; i++) {
cnt = snprintf(output, sz, "[%0.8f,%0.8f]%s",
input[i].point.latitude, input[i].point.longitude,
i + 1 == GPS_PERIODIC_ARRAY_SIZE ? "" : ",");
if (cnt < 0 || cnt >= sz)
return 1;
output += cnt;
sz -= cnt;
}
cnt = snprintf(output, sz, "]");
if (cnt < 0 || cnt >= sz)
return 1;
return 0; // no error
}
/* #brief Create a GPS struct with data for testing. It will populate the
* point field of GPS_periodic_t. Lat starts from 0.0 and increases by 1*10^(-8)
* and Long will dstart at 99.99999999 and dec by 1*10^(-8)
*
* #param o_gps_60sec Output array of GPS structs
*/
void getLastMinGPSdata(GPS_periodic_t *o_gps_60sec)
{
//Fill in GPS related data here
int i = 0;
double latitude = o_gps_60sec[0].point.latitude;
double longitude = o_gps_60sec[0].point.longitude;
for (i = 0; i < 60; i++)
{
o_gps_60sec[i].point.latitude = latitude + (0.00000001 * (float)g_last +
0.00000001 * (float)i);
o_gps_60sec[i].point.longitude = longitude - (0.00000001 * (float)g_last +
0.00000001 * (float)i);
}
g_last = 60;
}
/* #brief Get the GPS data and convert it into a string
* #param trk Track structure with GPS string
*/
int getEvent_Track(Track_json_t *trk)
{
getLastMinGPSdata(l_gps_60Sec);
get_gps60secString(l_gps_60Sec, trk->Gnss60, GPS_STRING_SIZE);
return 0;
}
int main()
{
fptr = fopen("gpsAno.txt", "a");
if (fptr == NULL) {
printf("Error!!\n");
exit(1);
}
//Quit at signal interrupt
signal(SIGINT, userSignalInterrupt);
Track_json_t trk;
memset(&l_gps_60Sec, 0x00, sizeof(GPS_periodic_t) * GPS_PERIODIC_ARRAY_SIZE);
//Init Points to be zero and 99.99999999
int i = 0;
for (i = 0; i < 60; i++) {
l_gps_60Sec[i].point.latitude = 00.00000000;
l_gps_60Sec[i].point.longitude = 99.99999999;
}
do {
memset(&trk, 0, sizeof(Track_json_t));
getEvent_Track(&trk);
//Write to file
fprintf(fptr, "%s", trk.Gnss60);
fflush(fptr);
sleep(1);
} while (res == 0);
//close and exit
fclose(fptr);
return 0;
}
Note: Error was not recreated in the above code.
Because this doesn't have the strcat pitfalls.
I tested this function in the embedded application.
Through this I was able to find that the snprintf returns an error and the string created ended up to be:
[17.42401750,78.46098717],[17.42402083,53.62
It ended there (because of the return 1).
Does this mean that the data which was passed to snprints corrupted? It's a float value. How can it get corrupted?
Solution
The error have not been seen since I changed the sprintf function with one that doesn't directly deal with 64 bits of data.
Here's the function modp_dtoa2
/** \brief convert a floating point number to char buffer with a
* variable-precision format, and no trailing zeros
*
* This is similar to "%.[0-9]f" in the printf style, except it will
* NOT include trailing zeros after the decimal point. This type
* of format oddly does not exists with printf.
*
* If the input value is greater than 1<<31, then the output format
* will be switched exponential format.
*
* \param[in] value
* \param[out] buf The allocated output buffer. Should be 32 chars or more.
* \param[in] precision Number of digits to the right of the decimal point.
* Can only be 0-9.
*/
void modp_dtoa2(double value, char* str, int prec)
{
/* if input is larger than thres_max, revert to exponential */
const double thres_max = (double)(0x7FFFFFFF);
int count;
double diff = 0.0;
char* wstr = str;
int neg= 0;
int whole;
double tmp;
uint32_t frac;
/* Hacky test for NaN
* under -fast-math this won't work, but then you also won't
* have correct nan values anyways. The alternative is
* to link with libmath (bad) or hack IEEE double bits (bad)
*/
if (! (value == value)) {
str[0] = 'n'; str[1] = 'a'; str[2] = 'n'; str[3] = '\0';
return;
}
if (prec < 0) {
prec = 0;
} else if (prec > 9) {
/* precision of >= 10 can lead to overflow errors */
prec = 9;
}
/* we'll work in positive values and deal with the
negative sign issue later */
if (value < 0) {
neg = 1;
value = -value;
}
whole = (int) value;
tmp = (value - whole) * pow10[prec];
frac = (uint32_t)(tmp);
diff = tmp - frac;
if (diff > 0.5) {
++frac;
/* handle rollover, e.g. case 0.99 with prec 1 is 1.0 */
if (frac >= pow10[prec]) {
frac = 0;
++whole;
}
} else if (diff == 0.5 && ((frac == 0) || (frac & 1))) {
/* if halfway, round up if odd, OR
if last digit is 0. That last part is strange */
++frac;
}
/* for very large numbers switch back to native sprintf for exponentials.
anyone want to write code to replace this? */
/*
normal printf behavior is to print EVERY whole number digit
which can be 100s of characters overflowing your buffers == bad
*/
if (value > thres_max) {
sprintf(str, "%e", neg ? -value : value);
return;
}
if (prec == 0) {
diff = value - whole;
if (diff > 0.5) {
/* greater than 0.5, round up, e.g. 1.6 -> 2 */
++whole;
} else if (diff == 0.5 && (whole & 1)) {
/* exactly 0.5 and ODD, then round up */
/* 1.5 -> 2, but 2.5 -> 2 */
++whole;
}
//vvvvvvvvvvvvvvvvvvv Diff from modp_dto2
} else if (frac) {
count = prec;
// now do fractional part, as an unsigned number
// we know it is not 0 but we can have leading zeros, these
// should be removed
while (!(frac % 10)) {
--count;
frac /= 10;
}
//^^^^^^^^^^^^^^^^^^^ Diff from modp_dto2
// now do fractional part, as an unsigned number
do {
--count;
*wstr++ = (char)(48 + (frac % 10));
} while (frac /= 10);
// add extra 0s
while (count-- > 0) *wstr++ = '0';
// add decimal
*wstr++ = '.';
}
// do whole part
// Take care of sign
// Conversion. Number is reversed.
do *wstr++ = (char)(48 + (whole % 10)); while (whole /= 10);
if (neg) {
*wstr++ = '-';
}
*wstr='\0';
strreverse(str, wstr-1);
}

Here's (part of) my unabashedly opinionated guide on safe string handling in C. Normally, I would promote dynamic memory allocation instead of fixed-length strings, but in this case I'm assuming that in the embedded environment that might be problematic. (Although assumptions like that should always be checked.)
So, first things first:
Any function which creates a string in a buffer must be told explicitly how long the buffer is. This is non-negotiable.
As should be obvious, it's impossible for a function filling a buffer to check for buffer overflow unless it knows where the buffer ends. "Hope that the buffer is long enough" is not a viable strategy. "Document the needed buffer length" would be fine if everyone carefully read the documentation (they don't) and if the required length never changes (it will). The only thing that's left is an extra argument, which should be of type size_t (because that's the type of buffer lengths in the C library functions which require lengths).
Forget that strncpy and strncat exist. Also forget about strcat. They are not your friends.
strncpy is designed for a specific use case: ensuring that an entire fixed-length buffer is initialised. It is not designed for normal strings, and since it doesn't guarantee that the output is NUL-terminated, it doesn't produce a string.
If you're going to NUL-terminate yourself anyway, you might as well use memmove, or memcpy if you know that the source and destination don't overlap, which should almost always be the case. Since you'll want the memmove to stop at the end of the string for short strings (which strncpy does not do), measure the string length first with strnlen: strnlen takes a maximum length, which is precisely what you want in the case that you are going move a maximum number of characters.
Sample code:
/* Safely copy src to dst where dst has capacity dstlen. */
if (dstlen) {
/* Adjust to_move will have maximum value dstlen - 1 */
size_t to_move = strnlen(src, dstlen - 1);
/* copy the characters */
memmove(dst, src, to_move);
/* NUL-terminate the string */
dst[to_move] = 0;
}
strncat has a slightly more sensible semantic, but it's practically never useful because in order to use it, you already have to know how many bytes you could copy. In order to know that, in practice, you need to know how much space is left in your output buffer, and to know that you need to know where in the output buffer the copy will start. [Note 1]. But if you already know where the copy will start, what's the point of searching through the buffer from the beginning to find the copy point? And if you do let strncat do the search, how sure are you that your previously computed start point is correct?
In the above code snippet, we already computed the length of the copy. We can extend that to do an append without rescanning:
/* Safely copy src1 and then src2 to dst where dst has capacity dstlen. */
/* Assumes that src1 and src2 are not contained in dst. */
if (dstlen) {
/* Adjust to_move will have maximum value dstlen - 1 */
size_t to_move = strnlen(src1, dstlen - 1);
/* Copy the characters from src1 */
memcpy(dst, src1, to_move);
/* Adjust the output pointer and length */
dst += to_move;
dstlen -= to_move;
/* Now safely copy src2 to just after src1. */
to_move = strnlen(src2, dstlen - 1);
memcpy(dst, src2, to_move);
/* NUL-terminate the string */
dst[to_move] = 0;
}
It might be that we want the original values of dst and dstlen after creating the string, and it might also be that we want to know how many bytes we inserted into dst in all. In that case, we would probably want to make copies of those variables before doing the copies, and save the cumulative sum of moves.
The above assumes that we're starting with an empty output buffer, but perhaps that isn't the case. Since we still need to know where the copy will start in order to know how many characters we can put at the end, we can still use memcpy; we just need to scan the output buffer first to find the copy point. (Only do this if there is no alternative. Doing it in a loop instead of recording the next copy point is Shlemiel the Painter's algorithm.)
/* Safely append src to dst where dst has capacity dstlen and starts
* with a string of unknown length.
*/
if (dstlen) {
/* The following code will "work" even if the existing string
* is not correctly NUL-terminated; the code will not copy anything
* from src, but it will put a NUL terminator at the end of the
* output buffer.
*/
/* Figure out where the existing string ends. */
size_t prefixlen = strnlen(dst, dstlen - 1);
/* Update dst and dstlen */
dst += prefixlen;
dstlen -= prefixlen;
/* Proceed with the append, as above. */
size_t to_move = strnlen(src, dstlen - 1);
memmove(dst, src, to_move);
dst[to_move] = 0;
}
Embrace snprintf. It really is your friend. But always check its return value.
Using memmove, as above, is slightly awkward. It requires you to manually check that the buffer's length is not zero (otherwise subtracting one would be disastrous since the length is unsigned), and it requires you to manually NUL-terminate the output buffer, which is easy to forget and the source of many bugs. It is very efficient, but sometimes it's worth sacrificing a little efficiency so that your code is easier to write and easier to read and verify.
And that leads us directly to snprintf. For example, you can replace:
if (dstlen) {
size_t to_move = strnlen(src, dstlen - 1);
memcpy(dst, src, to_move);
dst[to_move] = 0;
}
with the much simpler
int copylen = snprintf(dst, dstlen, "%s", src);
That does everything: checks that dstlen is not 0; only copies the characters from src which can fit in dst, and correctly NUL-terminates dst (unless dstlen was 0). And the cost is minimal; it takes very little time to parse the format string "%s" and most implementations are pretty well optimised for this case. [Note 2]
But snprintf is not a panacea. There are still a couple of really important warnings.
First, the documentation for snprintf makes clear that it is not permitted for any input argument to overlap the output range. (So it replaces memcpy but not memmove.) Remember that overlap includes NUL-terminators, so the following code which attempts to double the string in str instead leads to Undefined Behaviour:
char str[BUFLEN];
/* Put something into str */
get_some_data(str, BUFLEN);
/* DO NOT DO THIS: input overlaps output */
int result = snprintf(str, BUFLEN, "%s%s", str, str);
/* DO NOT DO THIS EITHER; IT IS STILL UB */
size_t len = strnlen(str, cap - 1);
int result = snprintf(str + len, cap - len, "%s", str);
The problem with the second invocation of snprintf is that the NUL which terminates str is precisely at str + len, the first byte of the output buffer. That's an overlap, so it's illegal.
The second important note about snprintf is that it returns a value, which must not be ignored. The value returned is not the length of the string created by snprintf. It's the length the string would have been had it not been truncated to fit in the output buffer.
If no truncation occurred, then the result is the length of the result, which must be strictly less than the size of the output buffer (because there must be room for a NUL terminator, which is not considered part of the length of the result.) You can use this fact to check whether truncation occurred:
if (result >= dstlen) /* Output was truncated */
This can be used, for example, to redo the snprintf with a larger, dynamically-allocated buffer (of size result + 1; never forget the need to NUL-terminate).
But remember that the result is an int -- that is, a signed value. That means that snprintf cannot cope with very long strings. That's not likely to be an issue in embedded code, but on systems where it's conceivable that strings exceed 2GB, you may not be able to safely use %s formats in snprintf. It also means that snprintf is allowed to return a negative value to indicate an error. Very old implementations of snprintf returned -1 to indicate truncation, or in response to being called with buffer length 0. That's not standard behaviour according to C99 (nor recent versions of Posix), but you should be prepared for it.
Standard-compliant implementations of snprintf will return a negative value if the buffer length argument is too big to fit in a (signed) int; it's not clear to me what the expected return value is if the buffer length is OK but the untruncated length is too big for an int. A negative value will also be returned if you used a conversion which resulted in an encoding error; for example, a %lc conversion whose corresponding argument contains an integer which cannot be converted to a multibyte (typically UTF-8) sequence.
In short, you should always check the return value of snprintf (recent gcc/glibc versions will produce a warning if you do not), and you should be prepared for it to be negative.
So, with all that behind us, let's write a function which produces a string of co-ordinate pairs:
/* Arguments:
* buf the output buffer.
* buflen the capacity of buf (including room for trailing NUL).
* points a vector of struct Point pairs.
* npoints the number of objects in points.
* Description:
* buf is overwritten with a comma-separated list of points enclosed in
* square brackets. Each point is output as a comma-separated pair of
* decimal floating point numbers enclosed in square brackets. No more
* than buflen - 1 characters are written. Unless buflen is 0, a NUL is
* written following the (possibly-truncated) output.
* Return value:
* If the output buffer contains the full output, the number of characters
* written to the output buffer, not including the NUL terminator.
* If the output was truncated, (size_t)(-1) is returned.
*/
size_t sprint_points(char* buf, size_t buflen,
struct Point const* points, size_t npoints)
{
if (buflen == 0) return (size_t)(-1);
size_t avail = buflen;
char delim = '['
while (npoints) {
int res = snprintf(buf, avail, "%c[%f,%f]",
delim, points->lat, points->lon);
if (res < 0 || res >= avail) return (size_t)(-1);
buf += res; avail -= res;
++points; --npoints;
delim = ',';
}
if (avail <= 1) return (size_t)(-1);
strcpy(buf, "]");
return buflen - (avail - 1);
}
Notes
You will often see code like this:
strncat(dst, src, sizeof(src)); /* NEVER EVER DO THIS! */
Telling strncat not to append more characters from src than can fit in src is obviously pointless (unless src is not correctly NUL-terminated, in which case you have a bigger problem). More importantly, it does absolutely nothing to protect you from writing beyond the end of the output buffer, since you have not done anything to check that dst has room for all those characters. So about all it does is get rid of compiler warnings about the unsafety of strcat. Since this code is exactly as unsafe as strcat was, you probably would be better off with the warning.
You might even find a compiler which understands snprintf will enough to parse the format string at compile time, so the convenience comes at no cost at all. (And if your current compiler doesn't do this, no doubt a future version will.) As with any use of the *printf family, you should never try to economize keystrokes by
leaving out the format string (snprintf(dst, dstlen, src) instead of snprintf(dst, dstlen, "%s", src).) That's unsafe (it has undefined behaviour if src contains an unduplicated %). And it's much slower because the library function has to parse the entire string to be copied looking for percent signs, instead of just copying it to the output.

Code is using functions that expect pointers to string, yet not always passing pointers to strings as arguments.
Stray characters seen at output of snprintf
A string must have a terminating null character.
strncat(char *, .... expects the first parameter to be a pointer to a string. memcpy(output, "[",1); does not insure that. #Jeremy
memcpy(output, "[",1);
...
strncat(output, temp,sizeof(temp));
This is a candidate source of stray characters.
strncat(...., ..., size_t size). itself is a problem as the size is the amount of space available for concatenating (minus the null character). The size available to char * output is not passed in. #Jonathan Leffler. Might as well do strcat() here.
Instead, pass in the size available to output to prevent buffer overflow.
#define N 60
int get_gps60secString(GPS_periodic_t input[N], char *output, size_t sz) {
int cnt = snprintf(output, sz, "[");
if (cnt < 0 || cnt >= sz)
return 1;
output += cnt;
sz -= cnt;
int i = 0;
for (i = 0; i < N; i++) {
cnt = snprintf(output, size, "[%0.8f,%0.8f]%s", input[i].point.latitude,
input[i].point.longitude, i + 1 == N ? "" : ",");
if (cnt < 0 || cnt >= sz)
return 1;
output += cnt;
sz -= cnt;
}
cnt = snprintf(output, sz, "]");
if (cnt < 0 || cnt >= sz)
return 1;
return 0; // no error
}
OP has posted more code - will review.
Apparently the buffer char *output is pre-filled with 0 before the get_gps60secString() so the missing null character from memcpy(output, "[",1); should not cause the issue - hmmmmmm
unsigned short SendTrack() does not return a value. 1) Using its result value is UB. 2) Enable all compiler warnings.

how to keep using malloc?

I have a file which stored a sequence of integers. The number of total integers is unknown, so I keep using malloc() to apply new memory if i read an integer from the file.
I don't know if i could keep asking for memory and add them at the end of the array. The Xcode keeps warning me that 'EXC_BAD_EXCESS' in the line of malloc().
How could i do this if i keep reading integers from a file?
int main()
{
//1.read from file
int *a = NULL;
int size=0;
//char ch;
FILE *in;
//open file
if ( (in=fopen("/Users/NUO/Desktop/in.text","r")) == NULL){
printf("cannot open input file\n");
exit(0); //if file open fail, stop the program
}
while( ! feof(in) ){
a = (int *)malloc(sizeof(int));
fscanf(in,"%d", &a[size] );;
printf("a[i]=%d\n",a[size]);
size++;
}
fclose(in);
return 0;
}

Calling malloc() repeatedly like that doesn't do what you think it does. Each time malloc(sizeof(int)) is called, it allocates a separate, new block of memory that's only large enough for one integer. Writing to a[size] ends up writing off the end of that array for every value past the first one.
What you want here is the realloc() function, e.g.
a = realloc(a, sizeof(int) * (size + 1));
if (a == NULL) { ... handle error ... }
Reworking your code such that size is actually the size of the array, rather than its last index, would help simplify this code, but that's neither here nor there.

Instead of using malloc, use realloc.
Don't use feof(in) in a while loop. See why.
int number;
while( fscanf(in, "%d", &number) == 1 ){
a = realloc(a, sizeof(int)*(size+1));
if ( a == NULL )
{
// Problem.
exit(0);
}
a[size] = number;
printf("a[i]=%d\n", a[size]);
size++;
}

Your malloc() is overwriting your previous storage with just enough space for a single integer!
a = (int *)malloc(sizeof(int));
^^^ assignment overwrites what you have stored!
Instead, realloc() the array:
a = realloc(a, sizeof(int)*(size+1));

You haven't allocated an array of integers, you've allocated one integer here. So you'll need to allocate a default array size, then resize if you're about to over run. This will resize it by 2 each time it is full. Might not be in your best interest to resize it this way, but you could also reallocate each for each additional field.
size_t size = 0;
size_t current_size = 2;
a = (int *)malloc(sizeof(int) * current_size);
if(!a)
handle_error();
while( ! feof(in) ){
if(size >= current_size) {
current_size *= 2;
a = (int *)realloc(a, sizeof(int) * current_size);
if(!a)
handle_error();
}
fscanf(in,"%d", &a[size] );;
printf("a[i]=%d\n",a[size]);
size++;
}

The usual approach is to allocate some amount of space at first (large enough to cover most of your cases), then double it as necessary, using the realloc function.
An example:
#define INITIAL_ALLOCATED 32 // or enough to cover most cases
...
size_t allocated = INITIAL_ALLOCATED;
size_t size = 0;
...
int *a = malloc( sizeof *a * allocated );
if ( !a )
// panic
int val;
while ( fscanf( in, "%d", &val ) == 1 )
{
if ( size == allocated )
{
int *tmp = realloc( a, sizeof *a * allocated * 2 ); // double the size of the buffer
if ( tmp )
{
a = tmp;
allocated *= 2;
}
else
{
// realloc failed - you can treat this as a fatal error, or you
// can give the user a choice to continue with the data that's
// been read so far.
}
a[size++] = val;
}
}
We start by allocating 32 elements to a. Then we read a value from the file. If we're not at the end of the array (size is not equal to allocated), we add that value to the end of the array. If we are at the end of the array, we then double the size of it using realloc. If the realloc call succeeds, we update the allocated variable to keep track of the new size and add the value to the array. We keep going until we reach the end of the input file.
Doubling the size of the array each time we reach the limit reduces the total number of realloc calls, which can save performance if you're loading a lot of values.
Note that I assigned the result of realloc to a different variable tmp. realloc will return NULL if it cannot extend the array for any reason. If we assign that NULL value to a, we lose our reference to the memory that was allocated before, causing a memory leak.
Note also that we check the result of fscanf instead of calling feof, since feof won't return true until after we've already tried to read past the end of the file.

Multiple producer single consumer with Circular Buffer

Need help in getting the following to work.
I have a multiple producer threads (each writing say 100 bytes of data) to ringbuffer.
And one single reader(consumer) thread ,reads 100 bytes at a time and writes to stdout.(Finally i want to write to files based on the data)
With this implementation ,I get the data read from ring buffer wrong sometimes. see below
Since the ringbuffer size is small it becomes full and some part of data is loss.This is not my current problem.
** Questions:
On printing the data thats read from ringbuffer ,some data gets
interchanged !!I'm unable to find the bug.
Is the logic/approach correct ? (or) Is there a
better way to do this
ringbuffer.h
#define RING_BUFFER_SIZE 500
struct ringbuffer
{
char *buffer;
int wr_pointer;
int rd_pointer;
int size;
int fill_count;
};
ringbuffer.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ringbuffer.h"
int init_ringbuffer(char *rbuffer, struct ringbuffer *rb, size_t size)
{
rb->buffer = rbuffer;
rb->size = size;
rb->rd_pointer = 0;
rb->wr_pointer = 0;
rb->fill_count = 0;
return 0;
}
int rb_get_free_space (struct ringbuffer *rb)
{
return (rb->size - rb->fill_count);
}
int rb_write (struct ringbuffer *rb, unsigned char * buf, int len)
{
int availableSpace;
int i;
availableSpace = rb_get_free_space(rb);
printf("In Write AVAIL SPC=%d\n",availableSpace);
/* Check if Ring Buffer is FULL */
if(len > availableSpace)
{
printf("NO SPACE TO WRITE - RETURN\n");
return -1;
}
i = rb->wr_pointer;
if(i == rb->size) //At the end of Buffer
{
i = 0;
}
else if (i + len > rb->size)
{
memcpy(rb->buffer + i, buf, rb->size - i);
buf += rb->size - i;
len = len - (rb->size - i);
rb->fill_count += len;
i = 0;
}
memcpy(rb->buffer + i, buf, len);
rb->wr_pointer = i + len;
rb->fill_count += len;
printf("w...rb->write=%tx\n", rb->wr_pointer );
printf("w...rb->read=%tx\n", rb->rd_pointer );
printf("w...rb->fill_count=%d\n", rb->fill_count );
return 0;
}
int rb_read (struct ringbuffer *rb, unsigned char * buf, int max)
{
int i;
printf("In Read,Current DATA size in RB=%d\n",rb->fill_count);
/* Check if Ring Buffer is EMPTY */
if(max > rb->fill_count)
{
printf("In Read, RB EMPTY - RETURN\n");
return -1;
}
i = rb->rd_pointer;
if (i == rb->size)
{
i = 0;
}
else if(i + max > rb->size)
{
memcpy(buf, rb->buffer + i, rb->size - i);
buf += rb->size - i;
max = max - (rb->size - i);
rb->fill_count -= max;
i = 0;
}
memcpy(buf, rb->buffer + i, max);
rb->rd_pointer = i + max;
rb->fill_count -= max;
printf("r...rb->write=%tx\n", rb->wr_pointer );
printf("r...rb->read=%tx\n", rb->rd_pointer );
printf("DATA READ ---> %s\n",(char *)buf);
printf("r...rb->fill_count=%d\n", rb->fill_count );
return 0;
}

At the producer you also need to wait on conditional variable for the has empty space condition. The both conditional variables should be signaled unconditionally, i.e. when a consumer removes an element from the ring buffer it should signal the producers; when a producer put something in the buffer it should signal the consumers.
Also, I would move this waiting/signaling logic into rb_read and rb_write implementations, so your ring buffer is a 'complete to use solution' for the rest of your program.

As to your questions --
1. I can't find that bug either -- in fact, I've tried your code and don't see that behavior.
2. You ask if this is logic/approach correct -- well, as far as it goes, this does implement a kind of ring buffer. Your test case happens to have an integer multiple of the size, and the record size is constant, so that's not the best test.
In trying your code, I found that there is a lot of thread starvation -- the 1st producer thread to run (the last created) hits things really hard, trying and failing after the 1st 5 times to stuff things into the buffer, not giving the consumer thread a chance to run (or even start). Then, when the consumer thread starts, it stays cranking for quite some time before it releases the cpu, and the next producer thread finally starts. That's how it works on my machine -- it will be different on different machines, I'm sure.
It's too bad that your current code doesn't have a way to end -- creating files of 10's or 100's of MB ... hard to wade through.

(Probably a bit later for the author, but if anyone else searches for a "multiple producers single consumer")
I think the fundamental problem in that implementation is what rb_write modifies a global state (rb->fill_count and other rb->XX) w/o doing any synchronization between multiple writers.
For alternative ideas check the: http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer.

Create a string from uint32/16_t and then parse back the original numbers

I need to put into a char* some uint32_t and uint16_t numbers. Then I need to get them back from the buffer.
I have read some questions and I've tried to use sprintf to put them into the char* and sscanf get the original numbers again. However, I'm not able to get them correctly.
Here's an example of my code with only 2 numbers. But I need more than 2, that's why I use realloc. Also, I don't know how to use sprintf and sscanf properly with uint16_t
uint32_t gid = 1100;
uint32_t uid = 1000;
char* buffer = NULL;
uint32_t offset = 0;
buffer = realloc(buffer, sizeof(uint32_t));
sprintf(buffer, "%d", gid);
offset += sizeof(uint32_t);
buffer = realloc(buffer, sizeof(uint32_t) + sizeof(buffer));
sprintf(buffer+sizeof(uint32_t), "%d", uid);
uint32_t valorGID;
uint32_t valorUID;
sscanf(buffer, "%d", &valorGID);
buffer += sizeof(uint32_t);
sscanf(buffer, "%d", &valorUID);
printf("ValorGID %d ValorUID %d \n", valorGID, valorUID);
And what I get is
ValorGID 11001000 ValorUID 1000
What I need to get is
ValorGID 1100 ValorUID 1000
I am new in C, so any help would be appreciated.

buffer = realloc(buffer, sizeof(uint32_t));
sprintf(buffer, "%d", gid);
offset += sizeof(uint32_t);
buffer = realloc(buffer, sizeof(uint32_t) + sizeof(buffer));
sprintf(buffer+sizeof(uint32_t), "%d", uid);
This doesn't really make sense, and will not work as intended except in lucky circumstances.
Let us assume that the usual CHAR_BIT == 8 holds, so sizeof(uint32_t) == 4. Further, let us assume that int is a signed 32-bit integer in two's complement representation without padding bits.
sprintf(buffer, "%d", gid) prints the decimal string representation of the bit-pattern of gid interpreted as an int to buffer. Under the above assumptions, gid is interpreted as a number between -2147483648 and 2147483647 inclusive. Thus the decimal string representation may contain a '-', contains 1 to 10 digits and the 0-terminator, altogether it uses two to twelve bytes. But you have allocated only four bytes, so whenever 999 < gid < 2^32-99 (the signed two's complement interpretation is > 999 or < -99), sprintf writes past the allocated buffer size.
That is undefined behaviour.
It's likely to not crash immediately because allocating four bytes usually gives you a larger chunk of memory effectively (if e.g. malloc always returns 16-byte aligned blocks, the twelve bytes directly behind the allocated four cannot be used by other parts of the programme, but belong to the programme's address space, and writing to them will probably go undetected). But it can easily crash later when the end of the allocated chunk lies on a page boundary.
Also, since you advance the write offset by four bytes for subsequent sprintfs, part of the previous number gets overwritten if the string representation (excluding the 0-termnator) used more than four bytes (while the programme didn't yet crash due to writing to non-allocated memory).
The line
buffer = realloc(buffer, sizeof(uint32_t) + sizeof(buffer));
contains further errors.
buffer = realloc(buffer, new_size); loses the reference to the allocated memory and causes a leak if realloc fails. Use a temporary and check for success
char *temp = realloc(buffer, new_size);
if (temp == NULL) {
/* reallocation failed, recover or cleanup */
free(buffer);
exit(EXIT_FAILURE);
}
/* it worked */
buffer = temp;
/* temp = NULL; or let temp go out of scope */
The new size sizeof(uint32_t) + sizeof(buffer) of the new allocation is always the same, sizeof(uint32_t) + sizeof(char*). That's typically eight or twelve bytes, so it doesn't take many numbers to write outside the allocated area and cause a crash or memory corruption (which may cause a crash much later).
You must keep track of the number of bytes allocated to buffer and use that to calculate the new size. There is no (portable¹) way to determine the size of the allocated memory block from the pointer to its start.
Now the question is whether you want to store the string representations or the bit patterns in the buffer.
Storing the string representations has the problem that the length of the string representation varies with the value. So you need to include separators between the representations of the numbers, or ensure that all representations have the same length by padding (with spaces or leading zeros) if necessary. That would for example work like
#include <stdint.h>
#include <inttypes.h>
#define MAKESTR(x) # x
#define STR(x) MAKESTR(x)
/* A uint32_t can use 10 decimal digits, so let each field be 10 chars wide */
#define FIELD_WIDTH 10
uint32_t gid = 1100;
uint32_t uid = 1000;
size_t buf_size = 0, offset = 0;
char *buffer = NULL, *temp = NULL;
buffer = realloc(buffer, FIELD_WIDTH + 1); /* one for the '\0' */
if (buffer == NULL) {
exit(EXIT_FAILURE);
}
buf_size = FIELD_WIDTH + 1;
sprintf(buffer, "%0" STR(FIELD_WIDTH) PRIu32, gid);
offset += FIELD_WIDTH;
temp = realloc(buffer, buf_size + FIELD_WIDTH);
if (temp == NULL) {
free(buffer);
exit(EXIT_FAILURE);
}
buffer = temp;
temp = NULL;
buf_size += FIELD_WIDTH;
sprintf(buffer + offset, "%0" STR(FIELD_WIDTH) PRIu32, uid);
offset += FIELD_WIDTH;
/* more */
uint32_t valorGID;
uint32_t valorUID;
/* rewind for scanning */
offset = 0;
sscanf(buffer + offset, "%" STR(FIELD_WIDTH) SCNu32, &valorGID);
offset += FIELD_WIDTH;
sscanf(buffer + offset, "%" STR(FIELD_WIDTH) SCNu32, &valorUID);
printf("ValorGID %u ValorUID %u \n", valorGID, valorUID);
with zero-padded fixed-width fields. If you'd rather use separators than a fixed width, the calculation of the required length and the offsets becomes more complicated, but unless the numbers are large, it would use less space.
If you'd rather store the bit-patterns, which would be the most compact way of storing, you'd use something like
size_t buf_size = 0, offset = 0;
unsigned char *buffer = NULL, temp = NULL;
buffer = realloc(buffer, sizeof(uint32_t));
if (buffer == NULL) {
exit(EXIT_FAILURE);
}
buf_size = sizeof(uint32_t);
for(size_t b = 0; b < sizeof(uint32_t); ++b) {
buffer[offset + b] = (gid >> b*8) & 0xFF;
}
offset += sizeof(uint32_t);
temp = realloc(buffer, buf_size + sizeof(uint32_t));
if (temp == NULL) {
free(buffer);
exit(EXIT_FAILURE);
}
buffer = temp;
temp = NULL;
buf_size += sizeof(uint32_t);
for(size_t b = 0; b < sizeof(uint32_t); ++b) {
buffer[offset + b] = (uid >> b*8) & 0xFF;
}
offset += sizeof(uint32_t);
/* And for reading the values */
uint32_t valorGID, valorUID;
/* rewind */
offset = 0;
valorGID = 0;
for(size_t b = 0; b < sizeof(uint32_t); ++b) {
valorGID |= buffer[offset + b] << b*8;
}
offset += sizeof(uint32_t);
valorUID = 0;
for(size_t b = 0; b < sizeof(uint32_t); ++b) {
valorUID |= buffer[offset + b] << b*8;
}
offset += sizeof(uint32_t);
¹ If you know how malloc etc. work in your implementation, it may be possible to find the size from malloc's bookkeeping data.

The format specifier '%d' is for int and thus is wrong for uint32_t. First uint32_t is an unsigned type, so you should at least use '%u', but then it might also have a different width than int or unsigned. There are macros foreseen in the standard: PRIu32 for printf and SCNu32 for scanf. As an example:
sprintf(buffer, "%" PRIu32, gid);

The representation returned by sprintf is a char*. If you are trying to store an array of integers as their string representatins then your fundamental data type is a char**. This is a ragged matrix of char if we are storing only the string data itself, but since the longest string a uint32_t can yield is 10 chars, plus one for the terminating null, it makes sense to preallocate this many bytes to hold each string.
So to store n uint32_t's from array a in array s as strings:
const size_t kMaxIntLen=11;
uint32_t *a,b;
// fill a somehow
...
size_t n,i;
char **s.*d;
if((d=(char*)malloc(n*kMaxIntLen))==NULL)
// error!
if((s=(char**)malloc(n*sizeof(char*)))==NULL)
// error!
for(i=0;i<n;i++)
{
s[i]=d+i; // this is incremented by sizeof(char*) each iteration
snprintf(s[i],kMaxIntLen,"%u",a[i]); // snprintf to be safe
}
Now the ith number is at s[i] so to print it is just printf("%s",s[i]);, and to retrieve it as an integer into b is sscanf(s[i],"%u",&b);.
Subsequent memory management is a bit trickier. Rather than constantly using using realloc() to grow the buffer, it is better to preallocate a chunk of memory and only alter it when exhausted. If realloc() fails it returns NULL, so store a pointer to your main buffer before calling it and that way you won't lose a reference to your data. Reallocate the d buffer first - again allocate enough room for several more strings - then if it succeeds see if d has changed. If so, destroy (free()) the s buffer, malloc() it again and rebuild the indices (you have to do this since if d has changed all your indices are stale). If not, realloc() s and fix up the new indices. I would suggest wrapping this whole thing in a structure and having a set of routines to operate on it, e.g.:
typedef struct StringArray
{
char **strArray;
char *data;
size_t nStrings;
} StringArray;
This is a lot of work. Do you have to use C? This is vastly easier as a C++ STL vector<string> or list<string> with the istringstream classes and the push_back() container method.

uint32_t gid = 1100;
uint32_t uid = 1000;
char* buffer = NULL;
uint32_t offset = 0;
buffer = realloc(buffer, sizeof(uint32_t));
sprintf(buffer, "%d", gid);
offset += sizeof(uint32_t);
buffer = realloc(buffer, sizeof(uint32_t) + sizeof(buffer));
sprintf(buffer+sizeof(uint32_t), "%d", uid);
uint32_t valorGID;
uint32_t valorUID;
sscanf(buffer, "%4d", &valorGID);
buffer += sizeof(uint32_t);
sscanf(buffer, "%d", &valorUID);
printf("ValorGID %d ValorUID %d \n", valorGID, valorUID);
`
I think this may resolve the issue !

C - Unable to free allocated memory

I have a problem with an application I'm currently developing. In this program I have to read huge amounts (billions) of data from text files and manage them consequently, but since it's a two students project, the reading part will be developed by my mate. For testing reason I wrote a small procedures that generates pseudo-random structures to replace what my mate will do.
The problem is the following: a big amount of the generated data (due to redundancy) can be discarded in order to free its memory. But even invoking the free() function the memory usage keeps growing. So I tried to develop a debug application that simply generates a chunk of data and immediately frees it. And repeats that for thousands of times. Well, I can't grasp the reason, but the memory allocated to the process grows to ~1.8 GB ram and then crashes. Why? The strangest thing, that makes me thing there's a lot I'm not understanding well, is that when the process crashes the malloc does NOT return a NULL pointer, because the process always crashes when readCycles == 6008 and bypasses the NULL check.
I already read other related topics here on StackOverflow and I understood why free() doesn't reduce the memory allocated to my process. That's fine. But why the memory usage keeps growing? Shouldn't the malloc allocate previously freed memory instead of constantly requesting new one?
This is the most relevant part of my code:
#define NREAD 1000
#define READCYCLES 10000
#define N_ALPHA_ILLUMINA 7
#define N_ALPHA_SOLID 5
#define SEQLEN 76
typedef struct{
char* leftDNA;
char* leftQuality;
unsigned long int leftRow;
char* rightDNA;
char* rightQuality;
unsigned long int rightRow;
} MatePair;
unsigned long int readCycles = 0;
MatePair* readStream(MatePair* inputStream, short* eof, unsigned long int* inputSize){
double r;
unsigned long int i, j;
unsigned long int leftRow;
int alphabet[] = {'A', 'C', 'G', 'T', 'N'};
inputStream = (MatePair*) malloc (sizeof(MatePair) * (NREAD + 1));
printf("%d\n", readCycles);
if (inputStream == NULL){
(*eof) = 1;
return;
}
for (i = 0; i < NREAD; i++){
leftRow = readCycles * NREAD + i;
inputStream[i].leftDNA = (char*) malloc (SEQLEN);
inputStream[i].rightDNA = (char*) malloc (SEQLEN);
inputStream[i].leftQuality = (char*) malloc (SEQLEN);
inputStream[i].rightQuality = (char*) malloc (SEQLEN);
for (j = 0; j < SEQLEN; j++){
r = rand() / (RAND_MAX + 1);
inputStream[i].leftDNA[j] = alphabet[(int)(r * 5)];
inputStream[i].rightDNA[j] = alphabet[(int)(r * 5)];
inputStream[i].leftQuality[j] = (char) 64 + (int)(r * 60);
inputStream[i].rightQuality[j] = (char) 64 + (int)(r * 60);
}
inputStream[i].leftDNA[SEQLEN - 1] = '\0';
inputStream[i].rightDNA[SEQLEN - 1] = '\0';
inputStream[i].leftQuality[SEQLEN - 1] = '\0';
inputStream[i].rightQuality[SEQLEN - 1] = '\0';
inputStream[i].leftRow = leftRow;
inputStream[i].rightRow = leftRow;
}
inputStream[i].leftRow = -1;
readCycles++;
(*inputSize) = NREAD;
(*eof) = readCycles > READCYCLES;
return inputStream;
}
int main(int argc, char* argv[]){
short eof = 0;
unsigned long int inputSize = 0;
MatePair* inputStream = NULL;
while (!eof){
inputStream = readStream(inputStream, &eof, &inputSize);
free(inputStream);
inputStream = NULL;
}
return 0;
}
I forgot to mention that, but before posting here, instead of calling free(inputStream), I tried invoking freeMemory(inputStream). Not sure if it's the correct way of doing it, though.
void freeMemory(MatePair* memblock){
for ( ; memblock->leftRow != 1; memblock++){
free(memblock -> leftDNA);
free(memblock -> leftQuality);
free(memblock -> rightDNA);
free(memblock -> rightQuality);
}
}

Memory leaks. How many 'malloc()' you have called, how many 'free()' you must use to free all allocated memory on the heap.
Thus,
inputStream[i].leftDNA = (char*) malloc (SEQLEN);
inputStream[i].rightDNA = (char*) malloc (SEQLEN);
inputStream[i].leftQuality = (char*) malloc (SEQLEN);
inputStream[i].rightQuality = (char*) malloc (SEQLEN);
these 'malloc()' functions must be paired with free().

You're not freeing all members allocated within the read loop, hence you're losing memory eahc time. Remember, you have to free everything you allocate with a malloc, not just your array.
Ok, Just look at your edit, and your freeMemory is still wrong. Try this;
void freeMemory(MatePair* inputStream)
{
for (i = 0; i < NREAD; i++){
free(inputStream[i].leftDNA);
free(inputStream[i].leftQuality);
free(inputStream[i].rightDNA);
free(inputStream[i].rightQuality);
}
free (inputStream);
}
Your free(memblock) was in the loop, which it shouldn't have been, and I'd tend to use the same iteration sequence on freeing as mallocing. You also need to error check after each malloc, and decide what to do with a NULL at that point.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

building large strings of unknown length in c - c

Related

Stray characters seen at output of snprintf

how to keep using malloc?

Multiple producer single consumer with Circular Buffer

Create a string from uint32/16_t and then parse back the original numbers

C - Unable to free allocated memory

Categories

Resources