I need a struct declaration that contains a single string which I can then make an array of(so basically an array of pointers to strings). I know how to make an array of structs which each contain a string:
typedef char line_t[MAX_INPUT + 1];
typedef struct {
line_t text;
} lines;
lines *arrayoflines;
arrayoflines = (char *)calloc(MAX_INPUT + 1, sizeof(char));
No issue here. However, what if the number of characters in the string that needs to be stored goes past the bounds of MAX_INPUT? I feel I'd need to realloc char line_t[MAX_INPUT + 1] and that is what I have absolutely no idea how to do.
Edit: Seems like some thought that the number of strings in the array of strings was the issue. I meant reallocating for the length of the string that can be stored in each element in the array of strings.
It is better to allocate memory for each line. I.e. instead of array of static arrays line_t[MAX_INPUT + 1] define array of char pointers: char ** arrayoflines.
You have a limit for the maximum line defined in your array. So, I assume any given line will never exceed that (you should check your input if this assumption is wrong). You can read a shorter line into the line_t (by terminating your string with a NULL), but you will not recover the space in memory. If you don't have many lines expected, your fixed buffer approach will work, i.e., you don't need to realloc.
However, if you need to be efficient with space, maybe because you will read millions of lines, you should allocate memory for each line. In other words, you might consider changing line_t to a char* instead of char[MAX_INPUT+1]. They are semantically equal for your programming purposes. But behind the scene, you are not setting a fixed size.
typedef char* line_t;
line_t *arrayoflines;
arrayoflines = calloc(1, sizeof(line_t));
arrayoflines[0] = calloc(MAX_INPUT + 1, sizeof(char));
//if you learn the size after reading into the buffer, and it's smaller,
// realloc the buffer - be sure to include space for NULL and be sure it
// is set to NULL
arrayoflines[0] = realloc(arrayoflines[0], (new_smaller_count+1) * sizeof(char))
arrayoflines[0][new_smaller_count] = NULL;
if you can not ensure that the all the inputs will be less or equal MAX_INPUT, you have to convert to dynamic allocation instead.
and by that you have to do this:
typedef struct {
char * text;
} lines;
lines *arrayoflines;
arrayoflines = calloc(MAX_LINES, sizeof(lines));
and for every (n) element allocate THE_DESIRED_INPUT_SIZE for that particular element;
arrayoflines[n].text=calloc(THE_DESIRED_INPUT_SIZE, sizeof(char));
the (n) is in range of [0,MAX_LINES-1]
Related
I am trying to write a function to convert a text file into a CSV file.
The input file has 3 lines with space-delimited entries. I have to find a way to read a line into a string and transform the three lines from the input file to three columns in a CSV file.
The files look like this :
Jake Ali Maria
24 23 43
Montreal Johannesburg Sydney
And I have to transform it into something like this:
Jake, 24, Montreal
...etc
I figured I could create a char **line variable that would hold three references to three separate char arrays, one for each of the three lines of the input file. I.e., my goal is to have *(line+i) store the i+1'th line of the file.
I wanted to avoid hardcoding char array sizes, such as
char line1 [999];
fgets(line1, 999, file);
so I wrote a while loop to fgets pieces of a line into a small buffer array of predetermined size, and then strcat and realloc memory as necessary to store the line as a string, with *(line+i) as as pointer to the string, where i is 0 for the first line, 1 for the second, etc.
Here is the problematic code:
#include <stdio.h>
#include<stdlib.h>
#include<string.h>
#define CHUNK 10
char** getLines (const char * filename){
FILE *file = fopen(filename, "rt");
char **lines = (char ** ) calloc(3, sizeof(char*));
char buffer[CHUNK];
for(int i = 0; i < 3; i++){
int lineLength = 0;
int bufferLength = 0;
*(lines+i) = NULL;
do{
fgets(buffer, CHUNK, file);
buffLength = strlen(buffer);
lineLength += buffLength;
*(lines+i) = (char*) realloc(*(lines+i), (lineLength +1)*sizeof(char));
strcat(*(lines+i), buffer);
}while(bufferLength ==CHUNK-1);
}
puts(*(lines+0));
puts(*(lines+1));
puts(*(lines+2));
fclose(file);
}
void load_and_convert(const char* filename){
char ** lines = getLines(filename);
}
int main(){
const char* filename = "demo.txt";
load_and_convert(filename);
}
This works as expected only for i=0. However, going through this with GDB, I see that I get a realloc(): invalid pointer error. The buffer loads fine, and it only crashes when I call 'realloc' in the for loop for i=1, when I get to the second line.
I managed to store the strings like I wanted in a small example I did to try to see what was going on, but the inputs were all on the same line. Maybe this has to do with fgets reading from a new line?
I would really appreciate some help with this, I've been stuck all day.
Thanks a lot!
***edit
I tried as suggested to use calloc instead of malloc to initialize the variable **lines, but I still have the same issue.I have added the modifications to the original code I uploaded.
***edit
After deleting the file and recompiling, the above now seems to work. Thank you to everyone for helping me out!
You allocate line (which is a misnomer since it's not a single line), which is a pointer to three char*s. You never initialize the contents of line (that is, you never make any of those three char*s point anywhere). Consequently, when you do realloc(*(line + i), ...), the first argument is uninitialized garbage.
To use realloc to do an initial memory allocation, its first argument must be a null pointer. You should explicitly initialize each element of line to NULL first.
Additionally, *(line+i) = (char *)realloc(*(line+i), ...) is still bad because if realloc fails to allocate memory, it will return a null pointer, clobber *(line + i), and leak the old pointer. You instead should split it into separate steps:
char* p = realloc(line[i], ...);
if (p == null) {
// Handle failure somehow.
exit(1);
}
line[i] = p;
A few more notes:
In C, you should avoid casting the result of malloc/realloc/calloc. It's not necessary since C allows implicit conversion from void* to other pointer types, and the explicit could mask an error where you accidentally omit #include <stdlib.h>.
sizeof(char) is, by definition, 1 byte.
When you're allocating memory, it's safer to get into a habit of using T* p = malloc(n * sizeof *p); instead of T* p = malloc(n * sizeof (T));. That way if the type of p ever changes, you won't silently be allocating the wrong amount of memory if you neglect to update the malloc (or realloc or calloc) call.
Here, you have to zero your array of pointers (for example by using calloc()),
char **line = (char**)malloc(sizeof(char*)*3); //allocate space for three char* pointers
otherwise the reallocs
*(line+i) = (char *)realloc(*(line+i), (inputLength+1)*sizeof(char)); //+1 for the empty character
use an uninitialized pointer, leading to undefined behaviour.
That it works with i=0 is pure coindicence and is a typical pitfall when encountering UB.
Furthermore, when using strcat(), you have to make sure that the first parameter is already a zero-terminated string! This is not the case here, since at the first iteration, realloc(NULL, ...); leaves you with an uninitialized buffer. This can lead to strcpy() writing past the end of your allocated buffer and lead to heap corruption. A possible fix is to use strcpy() instead of strcat() (this should even be more efficient here):
do{
fgets(buffer, CHUNK, file);
buffLength = strlen(buffer);
lines[i] = realloc(lines[i], (lineLength + buffLength + 1));
strcpy(lines[i]+lineLength, buffer);
lineLength += buffLength;
}while(bufferLength ==CHUNK-1);
The check bufferLength == CHUNK-1 will not do what you want if the line (including the newline) is exactly CHUNK-1 bytes long. A better check might be while (buffer[buffLength-1] != '\n').
Btw. line[i] is by far better readable than *(line+i) (which is semantically identical).
I was wondering is it possible to create one endless array which can store endlessly long strings?
So what I exactly mean is, I want to create a function which gets i Strings with n length.I want to input infinite strings in the program which can be infinite characters long!
void endless(int i){
//store user input on char array i times
}
To achieve that I need malloc, which I would normally use like this:
string = malloc(sizeof(char));
But how would that work for lets say 5 or 10 arrays or even a endless stream of arrays? Or is this not possible?
Edit:
I do know memory is not endless, what I mean is if it where infinite how would you try to achieve it? Or maybe just allocate memory until all memory is used?
Edit 2:
So I played around a little and this came out:
void endless (char* array[], int numbersOfArrays){
int j;
//allocate memory
for (j = 0; j < numbersOfArrays; j++){
array[j] = (char *) malloc(1024*1024*1024);
}
//scan strings
for (j = 0; j < numbersOfArrays; j++){
scanf("%s",array[j]);
array[j] = realloc(array[j],strlen(array[j]+1));
}
//print stringd
for (j = 0; j < numbersOfArrays; j++){
printf("%s\n",array[j]);
}
}
However this isn't working maybe I got the realloc part terrible wrong?
The memory is not infinite, thus you cannot.
I mean the physical memory in a computer has its limits.
malloc() will fail and allocate no memory when your program requestes too much memory:
If the function failed to allocate the requested block of memory, a null pointer is returned.
Assuming that memory is infinite, then I would create an SxN 2D array, where S is the number of strings and N the longest length of the strings you got, but obviously there are many ways to do this! ;)
Another way would be to have a simple linked list (I have one in List (C) if you need one), where every node would have a char pointer and that pointer would eventually host a string.
You can define a max length you will assume it will be the max lenght of your strings. Otherwise, you could allocate a huge 1d char array which you hole the new string, use strlen() to find the actual length of the string, and then allocate dynamically an array that would exactly the size that is needed, equal of that length + 1 for the null-string-terminator.
Here is a toy example program that asks the user to enter some strings. Memory is allocated for the strings in the get_string() function, then pointers to the strings are added to an array in the add_string() function, which also allocates memory for array storage. You can add as many strings of arbitrary length as you want, until your computer runs out of memory, at which point you will probably segfault because there are no checks on whether the memory allocations are successful. But that would take an awful lot of typing.
I guess the important point here is that there are two allocation steps: one for the strings and one for the array that stores the pointers to the strings. If you add a string literal to the storage array, you don't need to allocate for it. But if you add a string that is unknown at compile time (like user input), then you have to dynamically allocate memory for it.
Edit:
If anyone tried to run the original code listed below, they might have encountered some bizarre behavior for long strings. Specifically, they could be truncated and terminated with a mystery character. This was a result of the fact that the original code did not handle the input of an empty line properly. I did test it for a very long string, and it seemed to work. I think that I just got "lucky." Also, there was a tiny (1 byte) memory leak. It turned out that I forgot to free the memory pointed to from newstring, which held a single '\0' character upon exit. Thanks, Valgrind!
This all could have been avoided from the start if I had passed a NULL back from the get_string() function instead of an empty string to indicate an empty line of input. Lesson learned? The source code below has been fixed, NULL now indicates an empty line of input, and all is well.
#include <stdio.h>
#include <stdlib.h>
char * get_string(void);
char ** add_string(char *str, char **arr, int num_strings);
int main(void)
{
char *newstring;
char **string_storage;
int i, num = 0;
string_storage = NULL;
puts("Enter some strings (empty line to quit):");
while ((newstring = get_string()) != NULL) {
string_storage = add_string(newstring, string_storage, num);
++num;
}
puts("You entered:");
for (i = 0; i < num; i++)
puts(string_storage[i]);
/* Free allocated memory */
for (i = 0; i < num; i++)
free(string_storage[i]);
free(string_storage);
return 0;
}
char * get_string(void)
{
char ch;
int num = 0;
char *newstring;
newstring = NULL;
while ((ch = getchar()) != '\n') {
++num;
newstring = realloc(newstring, (num + 1) * sizeof(char));
newstring[num - 1] = ch;
}
if (num > 0)
newstring[num] = '\0';
return newstring;
}
char ** add_string(char *str, char **arr, int num_strings)
{
++num_strings;
arr = realloc(arr, num_strings * (sizeof(char *)));
arr[num_strings - 1] = str;
return arr;
}
I was wondering is it possible to create one endless array which can store endlessly long strings?
The memory can't be infinite. So, the answer is NO. Even if you have every large memory, you will need a processor that could address that huge memory space. There is a limit on amount of dynamic memory that can be allocated by malloc and the amount of static memory(allocated at compile time) that can be allocated. malloc function call will return a NULL if there is no suitable memory block requested by you in the heap memory.
Assuming that you have very large memory space available to you relative to space required by your input strings and you will never run out of memory. You can store your input strings using 2 dimensional array.
C does not really have multi-dimensional arrays, but there are several ways to simulate them. You can use a (dynamically allocated) array of pointers to (dynamically allocated) arrays. This is used mostly when the array bounds are not known until runtime. OR
You can also allocate a global two dimensional array of sufficient length and width. The static allocation for storing random size input strings is not a good idea. Most of the memory space will be unused.
Also, C programming language doesn't have string data type. You can simulate a string using a null terminated array of characters. So, to dynamically allocate a character array in C, we should use malloc like shown below:
char *cstr = malloc((MAX_CHARACTERS + 1)*sizeof(char));
Here, MAX_CHARACTERS represents the maximum number of characters that can be stored in your cstr array. The +1 is added to allocate a space for null character if MAX_CHARACTERS are stored in your string.
In a program I am writing I made a Tokenize struct that says:
TokenizerT *Tokenize(TokenizerT *str) {
TokenizerT *tok;
*tok->array = malloc(sizeof(TokenizerT));
char * arr = malloc(sizeof(50));
const char *s = str->input_strng;
int i = 0;
char *ds = malloc(strlen(s) + 1);
strcpy(ds, s);
*tok->array[i] = strtok(ds, " ");
while(*tok->array[i]) {
*tok->array[++i] = strtok(NULL, " ");
}
free(ds);
return tok;
}
where TokenizeT is defined as:
struct TokenizerT_ {
char * input_strng;
int count;
char **array[];
};
So what I am trying to do is create smaller tokens out of a large token that I already created. I had issues returning an array so I made array part of the TokenizerT struct so I can access it by doing tok->array. I am getting no errors when I build the program, but when I try to print the tokens I get issues.
TokenizerT *ans;
TokenizerT *a = Tokenize(tkstr);
char ** ab = a->array;
ans = TKCreate(ab[0]);
printf("%s", ans->input_strng);
TKCreate works because I use it to print argv but when i try to print ab it does not work. I figured it would be like argv so work as well. If someone can help me it would be greatl appreciated. Thank you.
Creating the Tokenizer
I'm going to go out on a limb, and guess that the intent of:
TokenizerT *tok;
*tok->array = malloc(sizeof(TokenizerT));
char * arr = malloc(sizeof(50));
was to dynamically allocate a single TokenizerT with the capacity to contain 49 strings and a NULL endmarker. arr is not used anywhere in the code, and tok is never given a value; it seems to make more sense if the values are each shifted one statement up, and corrected:
// Note: I use 'sizeof *tok' instead of naming the type because that's
// my style; it allows me to easily change the type of the variable
// being assigned to. I leave out the parentheses because
// that makes sure that I don't provide a type.
// Not everyone likes this convention, but it has worked pretty
// well for me over the years. If you prefer, you could just as
// well use sizeof(TokenizerT).
TokenizerT *tok = malloc(sizeof *tok);
// (See the third section of the answer for why this is not *tok->array)
tok->array = malloc(50 * sizeof *tok->array);
(tok->array is not a great name. I would have used tok->argv since you are apparently trying to produce an argument vector, and that's the conventional name for one. In that case, tok->count would probably be tok->argc, but I don't know what your intention for that member is since you never use it.)
Filling in the argument vector
strtok will overwrite (some) bytes in the character string it is given, so it is entirely correct to create a copy (here ds), and your code to do so is correct. But note that all of the pointers returned by strtok are pointers to character in the copy. So when you call free(ds), you free the storage occupied by all of those tokens, which means that your new freshly-created TokenizerT, which you are just about to return to an unsuspecting caller, is full of dangling pointers. So that will never do; you need to avoid freeing those strings until the argument vector is no longer needed.
But that leads to another problem: how will the string be freed? You don't save the value of ds, and it is possible that the first token returned by strtok does not start at the beginning of ds. (That will happen if the first character in the string is a space character.) And if you don't have a pointer to the very beginning of the allocated storage, you cannot free the storage.
The TokenizerT struct
char is a character (usually a byte). char* is a pointer to a character, which is usually (but not necessarily) a pointer to the beginning of a NUL-terminated string. char** is a pointer to a character pointer, which is usually (but not necessarily) the first character pointer in an array of character pointers.
So what is char** array[]? (Note the trailing []). "Obviously", it's an array of unspecified length of char**. Because the length of the array is not specified, it is an "incomplete type". Using an incomplete array type as the last element in a struct is allowed by modern C, but it requires you to know what you're doing. If you use sizeof(TokenizerT), you'll end up with the size of the struct without the incomplete type; that is, as though the size of the array had been 0 (although that's technically illegal).
At any rate, that wasn't what you wanted. What you wanted was a simple char**, which is the type of an argument vector. (It's not the same as char*[] but both of those pointers can be indexed by an integer i to return the ith string in the vector, so it's probably good enough.)
That's not all that's wrong with this code, but it's a good start at fixing it. Good luck.
I'm fairly new to C; been at it for 3 weeks in a class. I am having a bit of trouble with pointers, and am sure there is probably an easy fix. So basically, this program is supposed to read a word from an input file, store it in an array of pointers with memory allocation, print the word and the normalized form of the word (irrelevant process), and then reallocate the space so that the pointer array will grow as more words are inputted. However, I am having a bit of trouble getting the words to print and the array to reallocate (I currently have it set to a fixed size just to troubleshoot the whole printing aspect). Let me know if there is something wrong with my variable declarations, or if I am just making a stupid mistake please (I am sure it is the probably a combination of the two). Again, I'm very new to C, so I apologize if this is an easy question.
char * word_regular[100];
char * word_norm[100];
int main (int argc, char * argv[])
{
if (argc != 2){
printf("You have not entered a valid number of files.\n");
exit(1);
}
FILE * f_in = fopen(argv[1],"r");
int i = 0;
char word[512];
char norm_word[512];
while(fscanf(f_in, "%s", word) != EOF) {
if (is_valid_entry(word)) {
word_regular[i] = malloc(sizeof(char) * strlen(word) + 1);
strcpy(word_regular[i],word);
printf("%s\n",*word_regular[i]);
word_norm[i] = malloc(sizeof(char) * strlen(norm_word) + 1);
normalize(word, norm_word);
strcpy(word_norm[i],norm_word);
printf("%s\n", *word_norm[i]);
i++;
Some problems that are with your current code (ignoring the dynamic size need as opposed to fixed since you already said you are using that to debug),
printf("%s\n",*word_regular[i]);
%s takes a char * for printing, so it should be
printf("%s\n",word_regular[i]);
For the second printf, since norm_word itself is a char array,
you should simply use
printf("%s\n", &norm_word[i]);
If you want to print string starting from the ith index.
Update:
A quick tip is to pay attention whether you are copying the \0 with strings or not. Because your api calls, such as strlen would go beyond string crashing (or worst silently), unless it is null terminated.
The problem with your printf call is that you pass a char (*word_regular[i], *norm_word[i]) instead of char * (word_regular[i], word_norm[i]) when trying to print a string.
If you want to dynamically grow the array, you need to dynamically allocate it in the first place, so instead of declaring arrays of pointers:
char * word_regular[100];
char * word_norm[100];
You need to declare pointers to pointers:
char ** word_regular;
char ** word_norm;
Allocate an initial buffer for them (in a function, main for example):
word_regular = malloc(sizeof(char *) * INITIAL_AMOUNT);
Then reallocate them as needed.
word_regular = realloc(word_regular, sizeof(char *) * new_amount);
You will need to keep track of the amount of pointers in the arrays, and free them properly of course...
I have an string array in the form of char**
I am struggling to find the length of that array:
typedef struct _stringArray
{
int (*Length)(char**);
char** (*Push)(char**, char*);
char** (*Pop)(char**, char*);
}StringArray;
StringArray* StringArray_Constructor(void)
{
StringArray* stringArray = (StringArray *)malloc(sizeof(StringArray));
stringArray->Push = StringArray_Push;
stringArray->Pop = StringArray_Pop;
}
char** StringArray_Push(char** array, char* string)
{
int size = 0; //how to find how many elements in the array object???
array = realoc(array, (sizeof(char *) * (size + 1));
array[size] = string;
return array;
}
Any help would be greatly appreciated!
Thanks.
With C, you will have to keep track of this yourself.
There's no way you can infere the lenght of the array, the only way you could do it is doing it dynamically. You have an array of strings (char**), so you have the pointer to the first character of the first element of the array. We all know that, in C, all strings must ed with '\0', so you can "scan" for the strings of the array taking this pointer and saving it, then increment it until you get a '\0'. The next pointer is the first character of the next string and so on.
But this have a huge flaw: memory is not as linear as it appears. What I'm saying is that your first string can be entirely allocated at, e.g., address 0x0010101A, and the next at 0xF0FF0001, so or you hae a huge string #0x0010101A or there is a bunch of data beetween them and you do not know if they are part of the string or not.
And that's why you need to maintain a counter of how many strings you have. :)
PS: and as this number is always greater than zero, you should use unsigned int to type it.
You have a few options:
1) Pass a size parameter around which indicates the current size of your char **array.
2) Declare a structure which combines char **array with int array_size (really the same as #1).
3) If your array will always contain valid pointers (i.e. non-NULL) then create an extra element at the end which is always set to NULL. This acts as an array terminator, you can scan char **array looking for this terminating element:
int size;
for (size = 0; array[size] != NULL; size++);
// 'size' is number of valid entries in 'array'.