how to create array of struct - c

I want to implement a searching table and
here's the data:
20130610 Diamond CoinMate 11.7246 15.7762 2897
20130412 Diamond Bithumb 0.209 0.2293 6128
20130610 OKCash Bithumb 0.183 0.2345 2096
20130412 Ethereum Chbtc 331.7282 401.486 136786
20170610 OKCash Tidex 0.0459 0.0519 66
...
and my code
typedef struct data{
int *date;
string currency[100];
string exchange[100];
double *low;
double *high;
int *daily_cap;
} Data;
int main()
{
FILE *fp = fopen("test_data.txt", "r");
Data tmp[50];
int i = 0;
while (!feof(fp)){
fscanf(fp, "%d%s%s%f%f%7d", &tmp[i].date, tmp[i].currency, tmp[i].exchange, &tmp[i].low, &tmp[i].high, &tmp[i].daily_cap);
i++;
}
fclose(fp);
}
but the first problem is that I can't create a large array to store my struct like
Data tmp[1000000]
and even I try just 50 elements , the program break down when finish main().
can anyone tell how to fix it or give me a better method, thanks.

You can not scan a value to an unallocated space, in other words, you need room for all those pointers in the struct, switch to
typedef struct data{
int date;
string currency[100];
string exchange[100];
double low;
double high;
int daily_cap;
} Data;
Or use malloc to assign space to those pointers before using them.
while (!feof(fp)){
tmp[i].date = malloc(sizeof(int));
...
But in this case, you don't need to pass the address of such members to fscanf since they are already pointers:
fscanf(fp, "%d%s%s%f%f%7d", &tmp[i].date, ..
should be
fscanf(fp, "%d%s%s%lf%lf%7d", tmp[i].date, ...
Notice that double wants %lf instead of %f
This is also very confusing:
typedef struct data{
int *date;
string currency[100];
...
Is string a typedef of char? I think you mean string currency; since string is usually an alias of char *, in this case you need room for this member too: currency = malloc(100);
Finally, take a look to Why is “while ( !feof (file) )” always wrong?
There are too many errors in a short snippet, I suggest you to read a good C book.
Your code corrected using dynamic memory that allows you to reserve space for a big amount of data (see the other answer of #LuisColorado) and using fgets and sscanf instead of fscanf:
#include <stdio.h>
#include <stdlib.h>
typedef struct data{
int date;
char currency[100];
char exchange[100];
double low;
double high;
int daily_cap;
} Data;
int main(void)
{
FILE *fp = fopen("test_data.txt", "r");
/* Always check the result of fopen */
if (fp == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
Data *tmp;
tmp = malloc(sizeof(*tmp) * 50);
if (tmp == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
char buf[512];
int i = 0;
/* Check that you don't read more than 50 lines */
while ((i < 50) && (fgets(buf, sizeof buf, fp))) {
sscanf(buf, "%d%99s%99s%lf%lf%7d", &tmp[i].date, tmp[i].currency, tmp[i].exchange, &tmp[i].low, &tmp[i].high, &tmp[i].daily_cap);
i++;
}
fclose(fp);
/* Always clean what you use */
free(tmp);
return 0;
}

Of course you can't. Think you are creating an array of 1.0E6 registers of sizeof (Data) which I guess is not less than 32 (four pointers) and 200 bytes (not less than this, as you don't give the definition of type string) and this is 232MBytes (at least) in a 64 byte machine (in 32bit it is 216MBytes) and that in case the type string is only one character wide (what I fear is not) In case string is a typedef of char * then you have 432 pointers in your struct giving to 432MBytes in only one variable. Next, if you are declaring this absolutely huge variable as a local variable, you must know that te stack in most unix operating systems is limited to around 8Mb, and this means you need to build your program with special parameters to allow a larger stack max size. And also you probably need your account to raise to that size also the ulimits to make the kernel to allow you such a large stack size segment.
Please, next time, give us full information, as not knowing the definition of the string type, or posting an incomplete program, only allows us to make guesses on what can be ongoing, and not to be able to discover actual errors. This makes you to waste your time, and for us the same. Thanks.

If your list of currency and exchange are known before hand, then there is no need to allocate or store any arrays within your struct. The lists can be global arrays of pointers to string literals and all you need do is store a pointer to the literal for both currency and exchange (you can even save a few more bytes by storing the index instead of a pointer).
For example, your lists of exchanges can be stored once as follows:
const char *currency[] = { "Diamond", "OKCash", "Ethereum" },
*exchange[] = { "CoinMate", "Bithumb", "Chbtc", "Tidex" };
(if the number warrants, allocate storage for the strings and read them from a file)
Now you have all of the possible strings for currency and exchange stored, all you need in your data struct is a pointer for each, e.g.
typedef struct {
const char *currency, *exchange;
double low, high;
unsigned date, daily_cap;
} data_t;
(unsigned gives a better range and there are no negative dates or daily_cap)
Now simply declare an array of data_t (or allocate for them, depending on number). Below is a simply array of automatic storage for example purposes. E.g.
#define MAXD 128
...
data_t data[MAXD] = {{ .currency = NULL }};
Since you are reading 'lines' of data, fgets or POSIX getline are the line-oriented choices. After reading a line, you can parse the line with sscanf using temporary values, compare whether the values for currency and exchange read from the file match values stored, and then assign a pointer to the appropriate string to your struct, e.g.
int main (void) {
char buf[MAXC] = "";
size_t n = 0;
data_t data[MAXD] = {{ .currency = NULL }};
while (n < MAXD && fgets (buf, MAXC, stdin)) {
char curr[MAXE] = "", exch[MAXE] = "";
int havecurr = 0, haveexch = 0;
data_t tmp = { .currency = NULL };
if (sscanf (buf, "%u %31s %31s %lf %lf %u", &tmp.date,
curr, exch, &tmp.low, &tmp.high, &tmp.daily_cap) == 6) {
for (int i = 0; i < NELEM(currency); i++) {
if (strcmp (currency[i], curr) == 0) {
tmp.currency = currency[i];
havecurr = 1;
break;
}
}
for (int i = 0; i < NELEM(exchange); i++) {
if (strcmp (exchange[i], exch) == 0) {
tmp.exchange = exchange[i];
haveexch = 1;
break;
}
}
if (havecurr & haveexch)
data[n++] = tmp;
}
}
...
Putting it altogether in a short example, you could do something similar to the following:
#include <stdio.h>
#include <string.h>
#define MAXC 256
#define MAXD 128
#define MAXE 32
#define NELEM(x) (int)(sizeof (x)/sizeof (*x))
const char *currency[] = { "Diamond", "OKCash", "Ethereum" },
*exchange[] = { "CoinMate", "Bithumb", "Chbtc", "Tidex" };
typedef struct {
const char *currency, *exchange;
double low, high;
unsigned date, daily_cap;
} data_t;
int main (void) {
char buf[MAXC] = "";
size_t n = 0;
data_t data[MAXD] = {{ .currency = NULL }};
while (n < MAXD && fgets (buf, MAXC, stdin)) {
char curr[MAXE] = "", exch[MAXE] = "";
int havecurr = 0, haveexch = 0;
data_t tmp = { .currency = NULL };
if (sscanf (buf, "%u %31s %31s %lf %lf %u", &tmp.date,
curr, exch, &tmp.low, &tmp.high, &tmp.daily_cap) == 6) {
for (int i = 0; i < NELEM(currency); i++) {
if (strcmp (currency[i], curr) == 0) {
tmp.currency = currency[i];
havecurr = 1;
break;
}
}
for (int i = 0; i < NELEM(exchange); i++) {
if (strcmp (exchange[i], exch) == 0) {
tmp.exchange = exchange[i];
haveexch = 1;
break;
}
}
if (havecurr & haveexch)
data[n++] = tmp;
}
}
for (size_t i = 0; i < n; i++)
printf ("%u %-10s %-10s %8.4f %8.4f %6u\n", data[i].date,
data[i].currency, data[i].exchange, data[i].low,
data[i].high, data[i].daily_cap);
}
Example Use/Output
$ ./bin/coinread <dat/coin.txt
20130610 Diamond CoinMate 11.7246 15.7762 2897
20130412 Diamond Bithumb 0.2090 0.2293 6128
20130610 OKCash Bithumb 0.1830 0.2345 2096
20130412 Ethereum Chbtc 331.7282 401.4860 136786
20170610 OKCash Tidex 0.0459 0.0519 66
With this approach, regardless whether you allocate for your array of struct or use automatic storage, you minimize the size of the data stored by not duplicating storage of known values. On x86_64, your data_t struct size will be approximately 40-bytes. With on average a 1-4 Megabyte stack, you can store a lot of 40-byte structs safely before you need to start allocating. You can always start with automatic storage, and if you reach some percentage of the available stack space, dynamically allocate, memcpy, set a flag to indicate the storage in use and keep going...

Related

Why is realloc giving me inconsistent behaviour?

I am currently taking a procedural programming course at my school. We are using C with C99 standard. I discussed this with my instructor and I cannot understand why realloc() is working for his machine, but it is not working for mine.
The goal of this program is to parse a text file students.txt that has students' name and their GPA formatted like this:
Mary 4.0
Jack 2.45
John 3.9
Jane 3.8
Mike 3.125
I have a function that resizes my dynamically allocated array, and when I use realloc the debugger in my CLion IDE, it gave me SIGABRT.
I tried using an online compiler and I get realloc(): invalid next size.
I have been trying to debug this all weekend and I can't find the answer and I need help.
My code is currently looking like this
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define INITIAL_SIZE 4
#define BUFFER_SIZE 512
#define GRADE_CUTOFF 3.9
// ERROR CODES
#define FILE_OPEN_ERROR 1
#define MEMORY_ALLOCATION_ERROR 2
struct student {
double gpa;
char *name;
};
struct student *resizeAllocationIfNeeded(struct student *listOfStudents,
unsigned int studentCount, size_t *currentSize) {
if (studentCount <= *currentSize) {
return listOfStudents;
}
*currentSize *= 2;
struct student *resizedList = (struct student *) realloc(listOfStudents, *currentSize * sizeof(struct student));
if (resizedList == NULL) {
perror("Failed to allocate memory");
exit(MEMORY_ALLOCATION_ERROR);
}
return resizedList;
}
size_t getNamesAndGrades(FILE *file, struct student *listOfStudents, size_t size) {
unsigned int studentCount = 0;
char buffer[BUFFER_SIZE];
while(fscanf(file, "%s %lf", buffer, &listOfStudents[studentCount].gpa) > 0) {
listOfStudents[studentCount].name = strdup(buffer);
studentCount++;
listOfStudents = resizeAllocationIfNeeded(listOfStudents, studentCount, &size);
}
return studentCount;
}
void swapStudents(struct student *listOfStudents, int x, int y) {
struct student temp = listOfStudents[x];
listOfStudents[x] = listOfStudents[y];
listOfStudents[y] = temp;
}
void sortStudentsByGPA(struct student *listOfStudents, unsigned int studentCount) {
for (int i = 0; i < studentCount; i++) {
for (int j = 0; j < studentCount - i - 1; j++) {
if (listOfStudents[j].gpa < listOfStudents[j + 1].gpa) {
swapStudents(listOfStudents, j, j + 1);
}
}
}
}
void printStudentAndGPA(struct student *listOfStudents, unsigned int studentCount) {
for (int i = 0; i < studentCount; i++) {
if (listOfStudents[i].gpa > GRADE_CUTOFF) {
printf("%s %lf\n", listOfStudents[i].name, listOfStudents[i].gpa);
}
free(listOfStudents[i].name);
}
}
void topStudents(char *fileName) {
FILE *file = fopen(fileName, "r");
if (!file) {
perror("Could not open file for reading");
exit(FILE_OPEN_ERROR);
}
struct student *listOfStudents = (struct student *) malloc(INITIAL_SIZE * sizeof(struct student));
if (listOfStudents == NULL) {
perror("Failed to allocate memory");
exit(MEMORY_ALLOCATION_ERROR);
}
unsigned int studentCount = getNamesAndGrades(file, listOfStudents, INITIAL_SIZE);
sortStudentsByGPA(listOfStudents, studentCount);
printStudentAndGPA(listOfStudents, studentCount);
free(listOfStudents);
}
int main() {
topStudents("students.txt");
return 0;
}
You have a fencepost error when checking whether you need to resize the array.
Your initial allocation size is 4, which means that the highest valid index is 3.
In the loop in getNamesAndGrades(), after you read into listOfStudents[3] you increment studentCount to 4. Then you call resizeAllocationIfNeeded(listOfStudents, studentCount, &size);
Inside resizeAllocationIfNeeded(), studentCount == 4 and *currentSize == 4. So the test
if (studentCount <= *currentSize) {
return listOfStudents;
}
succeeds and you return without calling realloc().
Then the next iteration of the loop assigns to listOfStudents[4], which causes a buffer overflow.
You need to change that condition to studentCount < *currentSize.
There are two errors in your code: one is just a typo, the other is a more serious logical error.
First, you are reallocating too late, because of the condition in resizeAllocationIfNeeded(). When studentCount == currentSize, this doesn't resize (even though it should), which makes you overflow the array of students and causes problems.
You can change the condition to fix this:
if (studentCount < *currentSize) {
return listOfStudents;
}
Apart from the above, your main error is in getNamesAndGrades(), where you are reallocating memory and assigning the new pointers to a local variable. You then use that variable in topStudents() as if it was updated. This will of course not work, as the initial pointer passed by topStudents() becomes invalid after the first realloc() and memory is irrevocably lost when getNamesAndGrades() returns.
You should either pass a pointer to the student array, or better just make the function create the array for you.
Here's a solution, renaming getNamesAndGrades to getStudents:
struct student *getStudents(FILE *file, unsigned int *studentCount) {
char buffer[BUFFER_SIZE];
struct student *listOfStudents;
size_t size = INITIAL_SIZE;
*studentCount = 0;
listOfStudents = malloc(size * sizeof(struct student));
if (listOfStudents == NULL) {
perror("Failed to allocate memory");
exit(MEMORY_ALLOCATION_ERROR);
}
while(fscanf(file, "%511s %lf", buffer, &listOfStudents[*studentCount].gpa) == 2) {
listOfStudents[*studentCount].name = strdup(buffer);
(*studentCount)++;
listOfStudents = resizeAllocationIfNeeded(listOfStudents, *studentCount, &size);
}
return listOfStudents;
}
// ...
void topStudents(char *fileName) {
FILE *file = fopen(fileName, "r");
if (!file) {
perror("Could not open file for reading");
exit(FILE_OPEN_ERROR);
}
unsigned int studentCount;
struct student *listOfStudents = getStudents(file, &studentCount);
sortStudentsByGPA(listOfStudents, studentCount);
printStudentAndGPA(listOfStudents, studentCount);
free(listOfStudents);
}
int main() {
topStudents("students.txt");
return 0;
}
Additional notes:
When scanning on a fixed size buffer (in this case 512 bytes), use %511s, not just %s, that's a buffer overflow waiting to happen.
You are scanning two fields, so check if fscanf's return value is == 2, not > 0, you don't want for example one field initialized and one not.
Don't cast the result of malloc() or realloc()
For the future, if you are on Linux, compiling with gcc -g -fsanitize=address will give you detailed error reports when something goes bad in the heap, telling you exactly where memory was allocated, freed and used.

I lose the values in a struct (c)

#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#define stock_dir "/Users/myname/prices/"
#define file_list "/Users/myname/trade/trade/nasdaq100_stock_list.txt"
#define look_back_period 3
#define num_stocks 103
#define days_of_data 21
int main()
{
FILE *stocks, *stk;
char stock[11], fullpath[50] = "\0", header[25];
char line_of_data[40];
char *sclose, *svol;
int n = 0, i = 0;
typedef struct daily_data {
char *date;
float close;
int vol;
}data;
sclose = (char*) malloc(20*sizeof(char));
svol = (char*) malloc(20*sizeof(char));
data** day_data = (data**) malloc(num_stocks*sizeof(data*) );
if (day_data == NULL)
{
printf("day_data not allocated\n");
exit(0);
}
for(i = 0; i < num_stocks; i++)
if ((day_data[i] = (data*)malloc(days_of_data*sizeof(data))) == NULL)
{
printf("data[%d] not allocated\n", i);
exit(0);
}
for(i = 0; i < num_stocks; i++)
for(n = 0; n < days_of_data; n++)
if ((day_data[i][n].date = (char*)malloc(20)) == NULL)
{ printf("data[%d][%d] not allocated\n", i,n);
exit(0);
}
/* ... code omitted ... */
if ( (stocks = fopen(file_list, "r") )== NULL)
printf("didn't open file list\n");
i = 0;
while (fgets(stock, sizeof(stock), stocks) != NULL)
{
printf("%s",stock);
strcpy(fullpath,stock_dir);
strcat(fullpath,stock);
fullpath[strcspn(fullpath, "\n")] = 0;
if ( (stk = fopen(fullpath, "r") )== NULL)
printf("didn't open quote list\n");
fgets(header,sizeof(header),stk);
n=0;
while(fgets(line_of_data, sizeof(line_of_data),stk) !=NULL)
{
fgets(line_of_data,sizeof(line_of_data),stk);
day_data[i][n].date = strtok(line_of_data, ",");
sclose = strtok(NULL,",");
day_data[i][n].close = atof(sclose);
svol = strtok(NULL, ",");
day_data[i][n].vol = atoi(svol);;
printf("%s %f %d\n",day_data[i][n].date,day_data[i][n].close,day_data[i][n].vol);
n++;
}
fclose(stk);
i++;
}
for (n = look_back_period - 1; n < (days_of_data - look_back_period); n++)
printf("%d %s %f %d\n",n, day_data[1][n].date, day_data[1][n].close, day_data[1][n].vol);
}
The print statement in the while(fgets(line_of_data, sizeof(line_of_data),stk) !=NULL) loop shows that everything went into the right place. But when I print values outside they're mostly wrong. I'm supposed to add more details but I don't know what else to say. I lose the values in the struct when I leave the loop.
You overwrite the same data again and again.
Take a look at your structure:
typedef struct daily_data {
char *date; ///< a pointer without own storage
float close;
int vol;
}data;
while processing your file you read each line into line_of_data
while(fgets(line_of_data, sizeof(line_of_data),stk) !=NULL)
you tokenize the line_data and assign the pointer to data->date
day_data[i][n].date = strtok(line_of_data, ",");
What tokenize (strtok reference) does is inserting terminators into your input string and returning the pointer to the start of the new part of your input. So no new memory is allocated at this point. the returned pointer points into your input string.
So effectively you assigning the local variable pointer to your data storage structure.
Additionally to this you lose the pointer to your initially allocated memory for the date pointer.
I would suggest you to remove the a priory allocation of date and allocate the required memory at the point you really know the required length or if you are sure, you know the maximum length, then you can just make the date member an array.
So you either have to allocate new memory and copy the tokenized data or if you made date a fixed size array, just copy the tokenized data.
on the first variant it would look like this
char * tok = strtok(line_of_data, ",");
day_data[i][n].date = malloc(strlen(tok)+1);
strcpy(day_data[i][n].date, tok);
(+ remove the pre allocation of the date member)
or the second variant:
change data to
typedef struct daily_data {
char date[20];
float close;
int vol;
}data;
and the processing code looks like this:
char * tok = strtok(line_of_data, ",");
strcpy(day_data[i][n].date, tok);
(+ (of course) remove the pre allocation of the date member)
You also should in any case add error handling if the tokenized string exceeds the max length or the format of the lines does not match the expectation (missing delimiters, wrong/invalid number(formats), ...).

C : Realloc doesn't work with dynamic double pointer array

I am facing some issues regarding a realloc with a double pointer dynamic array.
What I would like to perform is to add 2 pointers of type Flight* inside the array schedule of type Flight **.
For that, I am relying on the function add_flight in the Functions.c file.
This function asks the user for the airline and flight number values and stores these data in a new Flight* f. If the schedule is null (no flight yet added) it allocates memory for the newly created flight otherwise it realloc the size of schedule in order the add the new flight.
Main.c file:
int main() {
int choice = 1;
Flight** schedule = NULL;
printf("---AIRPORT MANAGER---");
schedule = add_flight(schedule);
printf("\n%s : %d\n", (*schedule)->airline, (*schedule)->flightNumber);
schedule = add_flight(schedule);
printf("\n%s : %d\n", (*schedule + 1)->airline, (*schedule)->flightNumber);
return 0;
}
Functions.c file :
#include "Functions.h"
void mygets(char* s, int maxLength) {
fflush(stdout);
if (fgets(s, maxLength, stdin) != NULL) {
size_t lastIndex = strlen(s) - 1;
if (s[lastIndex] == '\n')
s[lastIndex] = '\0';
}
}
void flush() {
char buffer;
while ((buffer = getchar()) != EOF && buffer != '\n');
}
Flight** add_flight(Flight** schedule) {
Flight* f;
char buffer[100];
if ((f = (Flight*)malloc(sizeof(Flight*))) == NULL) {
exit(1);
}
printf("\n\n---FLIGHT CREATION---");
printf("\nAirline: ");
mygets(buffer, sizeof(buffer));
if ((f->airline = _strdup(buffer)) == NULL) {
exit(1);
}
memset(buffer, 0, 100);
printf("\nFlight number: ");
scanf("%d", &f->flightNumber);
flush();
if (schedule == NULL) {
if ((schedule = malloc(sizeof(Flight*))) == NULL) {
exit(1);
}
*schedule = f;
}
else {
int numberFlights = ((sizeof(*schedule)) / 4) + 1;
if ((schedule = realloc(schedule, numberFlights * sizeof(Flight*))) == NULL) {
exit(1);
}
*(schedule + numberFlights -1) = f;
}
return schedule;
}
The issue comes when the second call of add_flight is performed in the main.c
In the add_flight function, the data are indeed stored in the new Flight* f and then the else statement is considered: the variable numberFlights gets the value 2. However, the realloc doesn't work, the schedule is not enlarged and thus there is still only the first flight stored inside this schedule array. I can't figure out why the second flight is not added inside the schedule.
Can someone explain me why this realloc fails ?
Thanks for your help :)
The sizeof operator is evaluated at compile time. It cannot be used to determine the size of a dynamically allocated array.
C imposes the burden of keeping track of the actual size of an array onto the programmer. You could kee a separate count variable, but because the actual array and its size belong together, it is useful to store them alongside each other in a struct:
typedef struct Flight Flight;
typedef struct Flights Flights;
struct Flight {
char airline[4];
int number;
char dest[4];
};
struct Flights {
Flight *flight;
int count;
};
Instead of operating on the array, operate on the struct:
void add_flight(Flights *fl,
const char *airline, int number, const char *dest)
{
int n = fl->count++; // n is old count; fl->count is new count
fl->flight = realloc(fl->flight,
(fl->count + 1) * sizeof(*fl->flight));
snprintf(fl->flight[n].airline, 4, "%s", airline);
snprintf(fl->flight[n].dest, 4, "%s", dest);
fl->flight[n].number = number;
}
Intialize the flights struct with NULL and a count of zero and don't forget to release the used memory when you're done:
int main(void)
{
Flights fl = {NULL, 0};
add_flight(&fl, "AF", 5512, "CDG");
add_flight(&fl, "AA", 1100, "ATL");
add_flight(&fl, "LH", 6537, "FRA");
add_flight(&fl, "BA", 8821, "LHR");
add_flight(&fl, "IB", 1081, "EZE");
print_flights(&fl);
free(fl.flight);
return 0;
}
You can see it in action here. Some observations:
There is no need to distinguish between adding the first and subsequent flights, because realloc(NULL, size) behaves exactly like malloc(size).
It is not very efficient to reallocate the memory for each added item. Instead, you pick a suitable initial array size like 4 or 8, then double the size when you hit the limit. That means that the allocated size and the count may differ and you need an aditional memsize field in your flights struct.
The code above relies on manual initialization and destruction. Usually, you will write "constructor" and "destructor" functions to do that for you.

Most memory-efficient way to read & store list of strings in C

I'd like to know what's the most memory efficient way to read & store a list of strings in C.
Each string may have a different length, so pre-allocating a big 2D array would be wasteful.
I also want to avoid a separate malloc for each string, as there may be many strings.
The strings will be read from a large buffer into this list data-structure I'm asking about.
Is it possible to store all strings separately with a single allocation of exactly the right size?
One idea I have is to store them contiguously in a buffer, then have a char * array pointing to the different parts in the buffer, which will have '\0's in it to delimit. I'm hoping there's a better way though.
struct list {
char *index[32];
char buf[];
};
The data-structure and strings will be strictly read-only.
Here's a mildly efficient format, assuming you know the length of all the strings in advance:
|| total size | string 1 | string 2 | ........ | string N | len(string N) | ... | len(string 2) | len(string 1) ||
You can store the lengths either in fixed-width integers or in variable-width integers, but the point is that you can jump to the end and scan all the lengths relatively efficiently, and from the length sum you can compute the offset of the string. You know when you reached the last string when there is no remaining space.
You can create your single buffer and store them contiguously, expanding the buffer as needed by using realloc(). But then you would need a second array to store string positions and maybe realloc() it as well, so I might simply create a dynamically allocated array and malloc() each string separately.
Find the number and total-length of all strings:
int num = 0;
int len = 0;
char* string = GetNextString(input);
while (string)
{
num += 1;
len += strlen(string);
string = GetNextString(input);
}
Rewind(input);
Then, allocate the following two buffers:
int* indexes = malloc(num*sizeof(int));
char* strings = malloc((num+len)*sizeof(char));
Finally, fill these two buffers:
int index = 0;
for (int i=0; i<num; i++)
{
indexes[i] = index;
string = GetNextString(input);
strcpy(strings+index,string);
index += strlen(string)+1;
}
After that, you can simply use strings[indexes[i]] in order to access the ith string.
Most efficient and memory efficient way is a two pass solution. In the first pass you calculate the total size for all strings, then you allocate the total memory block. In the second pass you read all strings using large buffers.
You can create a pointer array for the strings and calculate the difference between the pointers to get the string sizes. This way you save the null byte as end marker.
Here a complete example:
#include <stdio.h>
#include <memory.h>
#include <stdlib.h>
struct StringMap
{
char *data;
char **ptr;
long cPos;
};
void initStringMap(StringMap *stringMap, long numberOfStrings, long totalCharacters)
{
stringMap->data = (char*)malloc(sizeof(char)*(totalCharacters+1));
stringMap->ptr = (char**)malloc(sizeof(char*)*(numberOfStrings+2));
memset(stringMap->ptr, 0, sizeof(char*)*(numberOfStrings+1));
stringMap->ptr[0] = stringMap->data;
stringMap->ptr[1] = stringMap->data;
stringMap->cPos = 0;
}
void extendString(StringMap *stringMap, char *str, size_t size)
{
memcpy(stringMap->ptr[stringMap->cPos+1], str, size);
stringMap->ptr[stringMap->cPos+1] += size;
}
void endString(StringMap *stringMap)
{
stringMap->cPos++;
stringMap->ptr[stringMap->cPos+1] = stringMap->ptr[stringMap->cPos];
}
long numberOfStringsInStringMap(StringMap *stringMap)
{
return stringMap->cPos;
}
size_t stringSizeInStringMap(StringMap *stringMap, long index)
{
return stringMap->ptr[index+1] - stringMap->ptr[index];
}
char* stringinStringMap(StringMap *stringMap, long index)
{
return stringMap->ptr[index];
}
void freeStringMap(StringMap *stringMap)
{
free(stringMap->data);
free(stringMap->ptr);
}
int main()
{
// The interesting values
long numberOfStrings = 0;
long totalCharacters = 0;
// Scan the input for required information
FILE *fd = fopen("/path/to/large/textfile.txt", "r");
int bufferSize = 4096;
char *readBuffer = (char*)malloc(sizeof(char)*bufferSize);
int currentStringLength = 0;
ssize_t readBytes;
while ((readBytes = fread(readBuffer, sizeof(char), bufferSize, fd))>0) {
for (int i = 0; i < readBytes; ++i) {
const char c = readBuffer[i];
if (c != '\n') {
++currentStringLength;
} else {
++numberOfStrings;
totalCharacters += currentStringLength;
currentStringLength = 0;
}
}
}
// Display the found results
printf("Found %ld strings with total of %ld bytes\n", numberOfStrings, totalCharacters);
// Allocate the memory for the resource
StringMap stringMap;
initStringMap(&stringMap, numberOfStrings, totalCharacters);
// read all strings
rewind(fd);
while ((readBytes = fread(readBuffer, sizeof(char), bufferSize, fd))>0) {
char *stringStart = readBuffer;
for (int i = 0; i < readBytes; ++i) {
const char c = readBuffer[i];
if (c == '\n') {
extendString(&stringMap, stringStart, &readBuffer[i]-stringStart);
endString(&stringMap);
stringStart = &readBuffer[i+1];
}
}
if (stringStart < &readBuffer[readBytes]) {
extendString(&stringMap, stringStart, &readBuffer[readBytes]-stringStart);
}
}
endString(&stringMap);
fclose(fd);
// Ok read the list
numberOfStrings = numberOfStringsInStringMap(&stringMap);
printf("Number of strings in map: %ld\n", numberOfStrings);
for (long i = 0; i < numberOfStrings; ++i) {
size_t stringSize = stringSizeInStringMap(&stringMap, i);
char *buffer = (char*)malloc(stringSize+1);
memcpy(buffer, stringinStringMap(&stringMap, i), stringSize);
buffer[stringSize-1] = '\0';
printf("string %05ld size=%8ld : %s\n", i, stringSize, buffer);
free(buffer);
}
// free the resource
freeStringMap(&stringMap);
}
This example reads a very large text file, splits it into lines and creates an array with a string per line. It only needs two malloc calls. One for the pointer array and one for the sting block.
If it's strictly read-only as you've described, you can store the entire list of strings and their offsets in a single chunk of memory and read the whole thing with a single read.
The first sizeof(long) bytes stores the number of strings, n. The next n longs store the offsets into each string from the start of the string buffer which starts at position (n+1)*sizeof(long). You don't have to store the trailing zero for each string, but if you do, you can access each string with &str_buffer[offset[i]]. If you don't store the trailing '\0' then you would have to copy into a temporary buffer and append it yourself.

Initializing an infinite number of char **

I'm making a raytracing engine in C using the minilibX library.
I want to be able to read in a .conf file the configuration for the scene to display:
For example:
(Az#Az 117)cat universe.conf
#randomcomment
obj:eye:x:y:z
light:sun:100
light:moon:test
The number of objects can vary between 1 and the infinite.
From now on, I'm reading the file, copying each line 1 by 1 in a char **tab, and mallocing by the number of objects found, like this:
void open_file(int fd, struct s_img *m)
{
int i;
char *s;
int curs_obj;
int curs_light;
i = 0;
curs_light = 0;
curs_obj = 0;
while (s = get_next_line(fd))
{
i = i + 1;
if (s[0] == 'l')
{
m->lights[curs_light] = s;
curs_light = curs_light + 1;
}
else if (s[0] == 'o')
{
m->objs[curs_obj] = s;
curs_obj = curs_obj + 1;
}
else if (s[0] != '#')
{
show_error(i, s);
stop_parsing(m);
}
}
Now, I want to be able to store each information of each tab[i] in a new char **tab, 1 for each object, using the ':' as a separation.
So I need to initialize and malloc an undetermined number of char **tab. How can I do that?
(Ps: I hope my code and my english are good enough for you to understand. And I'm using only the very basic function, like read, write, open, malloc... and I'm re-building everything else, like printf, get_line, and so on)
You can't allocate an indeterminate amount of memory; malloc doesn't support it. What you can do is to allocate enough memory for now and revise that later:
size_t buffer = 10;
char **tab = malloc(buffer);
//...
if (indexOfObjectToCreate > buffer) {
buffer *= 2;
tab = realloc(tab, buffer);
}
I'd use an alternative approach (as this is c, not c++) and allocate simply large buffers as we go by:
char *my_malloc(size_t n) {
static size_t space_left = 0;
static char *base = NULL;
if (base==NULL || space_left < n) base=malloc(space_left=BIG_N);
base +=n; return base-n;
}
Disclaimer: I've omitted the garbage collection stuff and testing return values and all safety measures to keep the routine short.
Another way to think this is to read the file in to a large enough mallocated array (you can check it with ftell), scan the buffer, replace delimiters, line feeds etc. with ascii zero characters and remember the starting locations of keywords.

Resources