I'm trying to split a char* to an array of char* in C.
I'm used to program in Java / PHP OO. I know several easy way to do that in these languages but in C... I'm totally lost. I often have segfault for hours x)
I'm using TinyXML and getting info from XML File.
Here's the struct where we find the array.
const int MAX_GATES = 64;
typedef struct {
char *name;
char *firstname;
char *date;
char *id;
char *gates[MAX_GATES];
} UserInfos;
And here's where I fill this struct :
UserInfos * infos = (UserInfos*)malloc(1024);
infos->firstname = (char*)malloc(256);
infos->name = (char*)malloc(128);
infos->id = (char*)malloc(128);
infos->date = (char*)malloc(128);
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
sprintf(infos->name, "%s", card->FirstChild("name")->FirstChild()->Value());
sprintf(infos->date, "%s", card->FirstChild("date")->FirstChild()->Value());
sprintf(infos->id, "%s", card->FirstChild("filename")->FirstChild()->Value());
////////////////////////
// Gates
char * gates = (char*) card->FirstChild("gates")->FirstChild()->Value();
//////////////////////////
The only problem is on 'gates'.
The input form XML looks like "gate1/gate2/gate3" or just blank sometimes.
I want gate1 to be in infos->gates[0] ; etc.
I want to be able to list the gates array afterwards..
I always have a segfault when I try.
Btw, I don't really now how to initialize this array of pointers. I always initialize all gates[i] to NULL but It seems that I've a segfault when I do
for(int i=0;i
Thanks for all.
It's OK when I've only pointers but when String(char*) / Arrays / Pointers are mixed.. I can't manage =P
I saw too that we can use something like
int *myArray = calloc(NbOfRows, NbOfRows*sizeof(int));
Why should we declare an array like that.. ? x)
Thanks!
The problem that people frequently have with XML is that they assume all the elements are available. That's not always safe. Thus this statement:
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
Isn't safe to do because you don't actually know if all of those
functions actually return valid objects. You really need something
like the following (which is not optimized for speed, as I don't
know the tinyXML structure name being returned at each point and thus
am not storing the results once and am rather calling each function
multiple times:
if (card->FirstChild("firstname") &&
card->FirstChild("firstname")->FirstChild()) {
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
}
And then, to protect against buffer overflows from the data you should
really be doing:
if (card->FirstChild("firstname") &&
card->FirstChild("firstname")->FirstChild()) {
infos->firstname[sizeof(infos->firstname)-1] = '\0';
snprintf(infos->firstname, sizeof(infos->firstname)-1, "%s", card->FirstChild("firstname")->FirstChild()->Value());
}
Don't you just love error handling?
As to your other question:
I saw too that we can use something like int *myArray =
calloc(NbOfRows, NbOfRows*sizeof(int)); Why should we declare an array
like that.. ? x)
calloc first initializes the resulting memory to 0, unlike malloc.
If you see above where I set the end of the buffer to '\0' (which is
actually 0), that's because malloc returns a buffer with potentially
random (non-zero) data in it. calloc will first set the entire buffer
to all 0s first, which can be generally safer.
Related
Hello and TIA for your help. As I am new to to posting questions, I welcome any feedback on how this quesiton has been asked. I have researched much in SO without finding what I thought I was looking for.
I'm still working on it, and I'm not really good at C.
My purpose is extracting data from certain specific tags from a given XML and writing it to file. My issue arises because as I try to fill up the data struct I created for this purpose, at a certain point the realloc() function gives me a pointer to an address that's out of bounds.
If you look at this example
#include <stdio.h>
int main() {
char **arrayString = NULL;
char *testString;
testString = malloc(sizeof("1234567890123456789012345678901234567890123456789"));
strcpy(testString, "1234567890123456789012345678901234567890123456789");
int numElem = 0;
while (numElem < 50) {
numElem++;
arrayString = realloc(arrayString, numElem * sizeof(char**));
arrayString[numElem-1] = malloc(strlen(testString)+1);
strcpy(arrayString[numElem-1], testString);
}
printf("done\n");
return 0;
}
it does a similar, but simplified thing to my code. Basically tries to fill up the char** with c strings but it goes to segfault. (Yes I understand I am using strcpy and not its safer alternatives, but as far as I understand it copies until the '\0', which is automatically included when you write a string between "", and that's all I need)
I'll explain more in dephth below.
In this code i make use of the libxml2, but you don't need to know it to help me.
I have a custom struct declared this way:
struct List {
char key[24][15];
char **value[15];
int size[15];
};
struct List *list; //i've tried to make this static after reading that it could make a difference but to no avail
Which is filled up with the necessary key values. list->size[] is initialized with zeros, to keep track of how many values i've inserted in value.
value is delcared this way because for each key, i need an array of char* to store each and every value associated with it. (I thought this through, but it could be a wrong approach and am welcome to suggestions - but that's not the purpose of the question)
I loop through the xml file, and for each node I do a strcmp between the name of the node and each of my keys. When there is a match, the index of that key is used as an index in the value matrix. I then try to extend the allocated memory for the c string matrix and then afterwards for the single char*.
The "broken" code, follows, where
read is the index of the key abovementioned.
reader is the xmlNode
string contained the name of the xmlNode but is then freed so consider it as if its a new char*
list is the above declared struct
if (xmlTextReaderNodeType(reader) == 3 && read >= 0)
{
/* pull out the node value */
xmlChar *value;
value = xmlTextReaderValue(reader);
if (value != NULL) {
free(string);
string=strdup(value);
/*increment array size */
list->size[read]++;
/* allocate char** */ list->value[read]=realloc(list->value[read],list->size[read] * sizeof(char**));
if (list->value[read] == NULL)
return 16;
/*allocate string (char*) memory */
list->value[read][list->size[read]-1] = realloc(list->value[read][list->size[read]-1], sizeof(char*)*sizeof(string));
if (list->value[read][list->size[read]-1] == NULL)
return 16;
/*write string in list */
strcpy(list->value[read][list->size[read]-1], string);
}
/*free memory*/
xmlFree(value);
}
xmlFree(name);
free(string);
I'd expect this to allocate the char**, and then the char*, but after a few iteration of this code (which is a function wrapped in a while loop) i get a segfault.
Analyzing this with gdb (not an expert with it, just learned it on the fly) I noticed that indeed the code seems to work as expected for 15 iteration. At the 16th iteration, the list->value[read][list->size[read]-1] after the size is incremented, list->value[read][list->size[read]-1] points to a 0x51, marked as address out of bounds. The realloc only brings it to a 0x3730006c6d782e31, still marked as out of bounds. I would expect it to point at the last allocated value.
Here is an image of that: https://imgur.com/a/FAHoidp
How can I properly allocate the needed memory without going out of bounds?
Your code has quite a few problems:
You are not including all the appropriate headers. How did you get this to compile? If you are using malloc and realloc, you need to #include <stdlib.h>. If you are using strlen and strcpy, you need to #include <string.h>.
Not really a mistake, but unless you are applying sizeof to a type itself you don't have to use enclosing brackets.
Stop using sizeof str to get the length of a string. The correct and safe approach is strlen(str)+1. If you apply sizeof to a pointer someday you will run into trouble.
Don't use sizeof(type) as argument to malloc, calloc or realloc. Instead, use sizeof *ptr. This will avoid your incorrect numElem * sizeof(char**) and instead replace it with numElem * sizeof *arrayString, which correctly translates to numElem * sizeof(char*). This time, though, you were saved by the pure coincidence that sizeof(char**) == sizeof(char*), at least on GCC.
If you are dynamically allocating memory, you must also deallocate it manually when you no longer need it. Use free for this purpose: free(testString);, free(arrayString);.
Not really a mistake, but if you want to cycle through elements, use a for loop, not a while loop. This way your intention is known by every reader.
This code compiles fine on GCC:
#include <stdio.h> //NULL, printf
#include <stdlib.h> //malloc, realloc, free
#include <string.h> //strlen, strcpy
int main()
{
char** arrayString = NULL;
char* testString;
testString = malloc(strlen("1234567890123456789012345678901234567890123456789") + 1);
strcpy(testString, "1234567890123456789012345678901234567890123456789");
for (int numElem = 1; numElem < 50; numElem++)
{
arrayString = realloc(arrayString, numElem * sizeof *arrayString);
arrayString[numElem - 1] = malloc(strlen(testString) + 1);
strcpy(arrayString[numElem - 1], testString);
}
free(arrayString);
free(testString);
printf("done\n");
return 0;
}
Let's consider following piece of code:
int len = 100;
char *buf = (char*)malloc(sizeof(char)*len);
printf("Appended: %s\n",struct_to_string(some_struct,buf,len));
Someone allocated amount of memory in order to get it filled with string data. The problem is that string data taken from some_struct could be ANY length. So what i want to achieve is to make struct_to_string function do the following:
Do not allocate any memory that goes outside (so, buf has to be allocated outside of the function, and passed)
Inside the struct_to_string I want to do something like:
char* struct_to_string(const struct type* some_struct, char* buf, int len) {
//it will be more like pseudo code to show the idea :)
char var1_name[] = "int l1";
buf += var1_name + " = " + some_struct->l1;
//when l1 is a int or some non char, I need to cast it
char var2_name[] = "bool t1";
buf += var2_name + " = " + some_struct->t1;
// buf+= (I mean appending function) should check if there is a place in a buf,
//if there is not it should fill buf with
//as many characters as possible (without writting to memory) and stop
//etc.
return buf;
}
Output should be like:
Appended: int l1 = 10 bool t1 = 20 //if there was good amount of memory allocated or
ex: Appended: int l1 = 10 bo //if there was not enough memory allocated
To sum up:
I need a function (or couple of functions) that adds given strings to the base string without overwritting base string;
do nothing when base string memory is full
I can not use C++ libraries
Another things that I could ask but are not so important right now:
Is there a way (in C) iterate through structure variable list to get their names, or at least to get their values without their names? (for example iterate through structure like through array ;d)
I do not normally use C, but for now I'm obligated to do, so I have very basic knowledge.
(sorry for my English)
Edit:
Good way to solve that problem is shown in post below: stackoverflow.com/a/2674354/2630520
I'd say all you need is the standard strncat function defined in the string.h header.
About the 'iterate through structure variable list' part, I'm not exactly sure what you mean. If your talking about iterating over the structure's members, a short answer would be : you can't introspect C structs for free.
You need to know beforehand what structure type you're using so that the compiler know at what offset in the memory it can find each member of your struct. Otherwise it's just an array of bytes like any other.
Don't mind asking if I wasn't clear enough or if you want more details.
Good luck.
So basically I did it like here: stackoverflow.com/a/2674354/2630520
int struct_to_string(const struct struct_type* struct_var, char* buf, const int len)
{
unsigned int length = 0;
unsigned int i;
length += snprintf(buf+length, len-length, "v0[%d]", struct_var->v0);
length += other_struct_to_string(struct_var->sub, buf+length, len-length);
length += snprintf(buf+length, len-length, "v2[%d]", struct_var->v2);
length += snprintf(buf+length, len-length, "v3[%d]", struct_var->v3);
....
return length;
}
snprintf writes as much as possible and discards everything left, so it was exactly what I was looking for.
Since I'm very new to C programming, I have a probably very simple problem.
I got a struct looking like this
typedef struct Vector{
int a;
int b;
int c;
}Vector;
Now I want to write an array of Vectors in a file. To achieve that, I thought to create following method scheme
String createVectorString(Vector vec){
// (1)
}
String createVectorArrayString(Vector arr[]){
int i;
String arrayString;
for(i=0; i<sizeof(arr); i++){
//append createVectorString(arr[i]) to arrayString (2)
}
}
void writeInFile(Vector arr[]){
FILE *file;
file = fopen("sorted_vectors.txt", "a+");
fprintf(file, "%s", createVectorArrayString(arr);
fclose(file);
}
int main(void){
// create here my array of Vectors (this has already been made and is not part of the question)
// then call writeInFile
return 0;
}
My main problems are at (1), which involves also (2) (since I have no clue how to work with Strings in C, eclipse is saying "Type "String" unknown", although I included <string.h>)
So I read at some point that transforming an int to a String is possible with the method itoa().
As I understood it, I can simply do following
char buf[33];
int a = 5;
itoa(a, buf, 10)
However, I cannot bring that to work, let alone that I can't figure out how to "paste" chars or ints into a String.
In my point (1), I would like to create a String of the Form (a,b,c), where as a, b and c are the "fields" of my struct Vector.
In point (2), I would like to create a single String of the Form (a1,b1,c1)\n(a2,b2,c2)\n...(an,bn,cn), whereby n is the amount of Vectors in the array.
Is there a quick solution? Do I confuse the concept of Strings from Java with them of C?
Yes, you do confuse the concept of strings in Java and C.
The C strings are rather inconvenient to work with. They require dynamic memory allocation, and what is worse, corresponding deallocation (which is possible but tedious). In your case, it might be best to remove strings completely, and implement whatever you need without strings.
To write a vector directly to file:
Vector vec;
FILE* file = ...;
fprintf(file, "%d %d %\n", vec.a, vec.b, vec.c);
To write an array of vectors, just do the above in a loop.
A string, in C, is just a null-terminated array of characters. It is generally declared as a char *, though if you have a fixed maximum length, and can allocate it on the stack or inline in a structure, it might be declared as char str[LENGTH].
One of the easiest ways to build a string out of a mix of characters and numbers is to use snprintf(). This is like printf(), but instead of printing to standard output, will print into a string (an array of char). Note that you need to allocate and pass in the buffer yourself; so you will either need to know the maximum length beforehand, or find out by trying to call snprintf(), finding out how many characters it would print, allocating an array of that size, and calling snprintf() again to actually print the result.
So if you have a vector of three integers, and want to build a string out of it, you could write:
char *createVectorString(Vector vec){
int count = snprintf(NULL, 0, "(%d,%d,%d)", vec.a, vec.b, vec.c);
if (count < 0)
return NULL;
char *result = malloc(count * sizeof(char));
if (result == NULL)
return NULL;
count = snprintf(result, count, "(%d,%d,%d)", vec.a, vec.b, vec.c);
if (count < 0)
return NULL;
return result;
}
Note that because you called malloc() to allocate this buffer, you will need to call free() once you are done with it, to avoid a memory leak.
Note that snprintf() only returns the length that you need as of C99. Some compilers (like MSVC), don't support C99 yet, so they return -1 instead of the length that the string would be. In those cases, there may be another function that you can call to determine the size of buffer you need (in MSVC, it's _vscprintf), or you may need to just guess at a size, and if that doesn't work, allocate a buffer twice that size and try again, until it succeeds.
In short: yes, you are confusing Java Strings with C, where you do not have standard string type. What is a string is in reality a sequence of chars terminated with a char with value 0 (or '\0', if you want to be purist).
The quickest solution is to not generate strings (and manually allocate all the memory), but rather to use fprintf with FILE*. Instead of functions to create strings, write functions to write various things into supplied FILE*, for example int writeVector(FILE* output, Vector v). It will be easier for the beginning. I don't think all the gory details of manual memory management required for constructing such strings are good start.
(Note the return type of int in proposed prototype; this is for error codes.)
Additionally, as one of the commenters noted, you misunderstand sizeof. sizeof(arr) would return size of all the elements of the array combined, in bytes (well, technically in chars, but it's a distinction you don't need to worry about right now). To get number of elements in an array, you'd need to use sizeof(arr)/sizeof(arr[0]). But I'm not sure it would work with your function argument, which is technically a pointer, despite the fancy syntax. Applying sizeof to pointer will return size of the pointer itself, not the data it points to.
Which is why in C you would usually provide size of an array in an extra function argument, like:
String createVectorArrayString(Vector arr[], size_t n)
or more in line with what I wrote above:
int writeVectorArray(FILE *output, Vector arr[], size_t n)
{
int retcode = 0;
size_t i;
for (i = 0; i < n; ++i) {
if ( (retcode = writeVector(output, arr[i])) != 0)
return retcode;
}
}
Yes, you are confusing Java Strings with C.
you can't pass arrays in C, only pointers to the first element.
sizeof (arr) where arr is a function argument is the size of the pointer.
You can't return a block scope String, only a pointer to a string. But pointers to local automatic variables go out of scope when the function returns.
I'd write a loop more along
#define N 42
/* Typedef for Vector assumed somewhere.*/
Vector arr[N];
/* Fill arr[]. */
for (i = 0; i < N; ++i) {
fprintf (file, "arr[%d] = { a=%d, b=%d, c=%d }\n", i, arr[i].a, arr[i].b, arr[i].c);
}
It's the first time posting so I apologise for any confusion:
I am writing a function like this:
int myFunc(char* inputStr, int *argCTemp, char** argVTemp[]);
The purpose of my function is to take a copy of the input string (basically any user input) and then use strtok to convert it to tokens and populate an array via an array pointer (argV). When myFunc is finished, hopefully I have the argument count and array of strings from my inputStr string.
Here is an example of how I call it:
int main(int argc, char** argv[])
{
int argCTemp = -1;
char** argVTemp;
// 1 Do Stuff
// 2 Get input string from user
// 3 then call myfunc like this:
myFunc(inputStr, &argCTemp, &argVTemp);
// 4: I get garbage whenever I try to use "argVTemp[i]"
}
My Questions: How should I best do this in a safe and consistent way. How do the pro's do this?
I don't use malloc because:
I don't know the number of arguments or the length of each for my input (to dynamically allocate the space). I figured that's why I use pointers
since I declare it in the main function, I thought the pointers to/memory used by argCTemp and argVTemp would be fine/remain in scope even if they are on the stack.
I know when myFunc exits it invalidates any stack references it created, so that's why I sent it pointers from a calling function. Should I be using pointers and malloc and such or what?
Last thing: before myfunc exits, I check to see the values of argCTemp and argVTemp and they have valid content. I am setting argCtemp and argVtemp like this:
(*argCTemp) = argCount;
(*argVTemp)[0] = "foo";
and it seems to be working just fine BEFORE the function exits. Since I'm setting pointers somewhere else in memory, I'm confused why the reference is failing. I tried using malloc INSIDE myFunc when setting the pointers and it is still becoming garbage when myFunc ends and is read by the calling function.
I'm sorry if any of this is confusing and thank you in advance for any help.
Since "don't know the number of arguments or the length of each for my input ", you can use malloc also. When your buffer abouting full, you should realloc your buffer.
The better way: You needn't store whole input. A line, a token or a block is better. Just set a static array to store them. and maybe hash is better if your input more than 100 mb.
I'm sorry for my poor English.
You send an uninitialized pointer (you call is isn't correct as well, you don't need the & ) to the function, this pointer points to some random place and that is why you get garbage, you can also get segmentation fault.
You can do one of the two.
Per allocate a large enough array which can be static for example
static char * arr[MAX SIZE] and send it (char **)&arr in the function call, or run twice and use malloc.
You should also pass the max size, or use constant and make sure you don't pass it.
Lets say you the number of tokens in int n then
char * arr[] = malloc(sizeof(int *)*n);
this will create array of pointers, now you pass it to your populate function by calling
it with (char **)&arr, and use it like you did in your code
for example (*argVTemp)[0] = ;.
(when the array is not needed any more don't forget to free it by caliing free(arr))
Generally speaking, since you don't know how many tokens will be in the result you'll need to allocate the array dynamically using malloc(), realloc() and/or some equivalent. Alternatively you can have the caller pass in array along with the array's size and return an error indication if the array isn't large enough (I do this for simple command parsers on embedded systems where dynamic allocation isn't appropriate).
Here's an example that allocates the returned array in small increments:
static
char** myFunc_realloc( char** arr, size_t* elements)
{
enum {
allocation_chunk = 16
};
*elements += allocation_chunk;
char** tmp = (char**) realloc( arr, (*elements) * sizeof(char*));
if (!tmp) {
abort(); // or whatever error handling
}
return tmp;
}
void myFunc_free( char** argv)
{
free(argv);
}
int myFunc(char* inputStr, int *argCTemp, char** argVTemp[])
{
size_t argv_elements = 0;
size_t argv_used = 0;
char** argv_arr = NULL;
char* token = strtok( inputStr, " ");
while (token) {
if ((argv_used+1) >= argv_elements) {
// we need to realloc - the +1 is because we want an extra
// element for the NULL sentinel
argv_arr = myFunc_realloc( argv_arr, &argv_elements);
}
argv_arr[argv_used] = token;
++argv_used;
token = strtok( NULL, " ");
}
if ((argv_used+1) >= argv_elements) {
argv_arr = myFunc_realloc( argv_arr, &argv_elements);
}
argv_arr[argv_used] = NULL;
*argCTemp = argv_used;
*argVTemp = argv_arr;
return argv_used;
}
Some notes:
if an allocation fails, the program is terminated. You may need different error handling.
the passed in input string is 'corrupted'. This might not be an appropriate interface for your function (in general, I'd prefer that a function like this not destroy the input data).
the user of the function should call myFunc_free() to deallocate the returned array. Currently this is a simple wrapper for free(), but this gives you flexibility to do more sophisticated things (like allocating memory for the tokens so you don't have to destroy the input string).
I'm pretty new at C programming, and this type of thing keeps popping up. As a simple example, suppose I have a struct http_header with some char pointers:
struct http_header {
char* name;
char* value;
};
I want to fill an http_header where value is the string representation of an int. I "feel" like, semantically, I should be able to write a function that takes in an empty header pointer, a name string, and an int and fills out the header appropriately.
void fill_header(struct http_header *h, char* name, int value)
{
h->name = name;
char *value_str = malloc(100);
sprintf(value_str, "%d", value);
h->value = value_str;
}
int main(int argc, const char * argv[])
{
struct http_header h;
char *name = "Header Name";
int val = 42;
fill_header(&h, name, val);
...
free(h.value);
}
Here, the calling code reads exactly as my intent, but in this case I'm creating the value string dynamically, which means I'd have to free it later. That doesn't smell right to me; it seems like the caller then knows too much about the implementation of fill_header. And in actual implementations it may not be so easy to know what to free: consider filling an array of http_headers where only one of them needed to have its value malloced.
To get around this, I'd have to create the string beforehand:
void fill_header2(struct http_header *h, char* name, char *value_str)
{
h->name = name;
h->value = value_str;
}
int main(int argc, const char * argv[])
{
struct http_header h;
char *name = "Header Name";
int value = 42;
char value_str[100];
sprintf(value_str, "%d", value);
fill_header2(&h, name, value_str);
}
As this pattern continues down the chain of structures with pointers to other structures, I end up doing so much work in top level functions the lower level ones seem hardly worth it. Furthermore, I've essentially sacrificed the "fill a header with an int" idea which I set out to write in the first place. I'm I missing something here? Is there some pattern or design choice that will make my life easier and keep my function calls expressing my intent?
P.S. Thanks to all at Stackoverfow for being the best professor I've ever had.
Well, I would go with the first approach (with a twist), and also provide a destroy function:
struct http_header *make_header(char *name, int value)
{
struct http_header *h = malloc(sizeof *h);
/* ... */
return h;
}
void destroy_header(struct http_header *h)
{
free(h->name);
free(h);
}
This way the caller doesn't have to know anything about http_header.
You might also get away with a version that leaves the main allocation (the struct itself) to the caller and does it's own internal allocation. Then you would have to provide a clear_header which only frees that fill allocated. But this clear_header leaves you with a partially-valid object.
I think your problem is simply that you are programming asymmetrically. You should once and for all decide who is responsible for the string inside your structure. Then you should have two functions, not only one, that should be called something like header_init and header_destroy.
For the init function I'd be a bit more careful. Check for a 0 argument of your pointer, and initialize your DS completely, something like *h = (http_header){ .name = name }. You never know if you or somebody will end up in adding another field to your structure. So by that at least all other fields are initialized with 0.
If you are new at C programming, you might perhaps want to use the Boehm's conservative garbage collector. Boehm's GC works very well in practice, and by using it systematically in your own code you could use GC_malloc instead of malloc and never bother about calling free or GC_free.
Hunting memory leaks in C (or even C++) code is often a headache. There are tools (like valgrind) which can help you, but you could decide to not bother by using Boehm's GC.
Garbage collection (and memory management) is a global property of a program, so if you use Boehm's GC you should decide that early.
The general solution to your problem is that of object ownership, as others have suggested. The simplest solution to your particular problem is, however, to use a char array for value, i.e., char value[12]. 2^32 has 10 decimal digits, +1 for the sign, +1 for the null-terminator.
You should ensure that 1) int is not larger than 32-bits at compile-time, 2) ensure that the value is within some acceptable range (HTTP codes have only 3 digits) before calling sprintf, 3) use snprintf.
So by using a static array you get rid of the ownership problem, AND you use less memory.