Related
I have a function that accepts a struct * pointer containing sensitive data (in a char array) as an argument (sort of a small library).
The two struct models are as follows:
struct struct1 {
char str[1024]; /* maybe even 4096 or 10KB+ */
size_t str_length;
}
struct struct2 {
char *str;
size_t str_length;
}
The test function is:
/* Read str_length bytes from the char array. */
void foo(struct struct1/struct2 *s) {
int i;
for (i = 0; i < s->str_length; i++) {
printf("%c\n", s->str[i]);
}
}
My concern is that, since the str_length parameter is an arbitrary value, one could intentionally set it to cause a buffer overflow (actually someone stupid enough to purposely create a security flaw in its own program, but I feel I have to take such cases into account). By using the struct1 model, however, I could simply check for a possible buffer overflow by just using:
if (s->str_length > sizeof(s->str)) {
/* ERROR */
}
The problem is that the length array is actually unknown at compile-time. So I don't know whether to use a char * pointer (struct2 style, so no overflow check) or define a very big array (struct1), which would limit the max length (something I would like to avoid) and would allocate unnecessary space most of the time (which could be problematic in embedded systems with scarce memory, I suppose). I know I have to make a compromise, I'd personally use the struct2 model, but I'm not sure if it's a good choice security-wise.
Where does the user of your library get the struct2 instance to pass to the function from? I don't think he creates it by himself and then passes its address to your function, that would be a weird way to pass arguments. It is most likely returned from another function in your library, in which case you can make struct2 an opaque data type that the user cannot alter directly (or only in hacky ways):
/* in the header file */
typedef struct2_s struct2;
/* in the implementation file, where allocation is handled as well
* so you know str_length is set to the proper value.
*/
struct struct2_s {
char *str;
size_t str_length;
};
Put the big array at the end..
struct struct1 {
anyType thisVar;
someType anotherVar
size_t str_length;
char str[10240000]; /
}
Let the user malloc it to whatever 'real' size they wish. If they set 'str_length' wrong, well, there's not much you can do about it, no matter what you do:(
I looked at couple of instances wherein I see something like char fl[1] in the following code snippet. I am not able to guess what might possibly be the use of such construct.
struct test
{
int i;
double j;
char fl[1];
};
int main(int argc, char *argv[])
{
struct test a,b;
a.i=1;
a.j=12;
a.fl[0]='c';
b.i=2;
b.j=24;
memcpy(&(b.fl), "test1" , 6);
printf("%lu %lu\n", sizeof(a), sizeof(b));
printf("%s\n%s\n",a.fl,b.fl);
return 0;
}
output -
24 24
c<some junk characters here>
test1
It's called "the struct hack", and you can read about it at the C FAQ. The general idea is that you allocate more memory then necessary for the structure as listed, and then use the array at the end as if it had length greater than 1.
There's no need to use this hack anymore though, since it's been replaced by C99+ flexible array members.
The idea usually is to have a name for variable-size data, like a packet read off a socket:
struct message {
uint16_t len; /* tells length of the message */
uint16_t type; /* tells type of the message */
char payload[1]; /* placeholder for message data */
};
Then you cast your buffer to such struct, and work with the data by indexing into the array member.
Note that the code you have written is overwriting memory that you shouldn't be touching. The memcpy() is writing more than one character into a one character array.
The use case for this is often more like this:
struct test *obj;
obj = malloc(sizeof(struct test) + 300); // 300 characters of space in the
// flexible member (the array).
obj->i = 3;
obj->j = 300;
snprintf(obj->f, 300, "hello!");
I'm trying to split a char* to an array of char* in C.
I'm used to program in Java / PHP OO. I know several easy way to do that in these languages but in C... I'm totally lost. I often have segfault for hours x)
I'm using TinyXML and getting info from XML File.
Here's the struct where we find the array.
const int MAX_GATES = 64;
typedef struct {
char *name;
char *firstname;
char *date;
char *id;
char *gates[MAX_GATES];
} UserInfos;
And here's where I fill this struct :
UserInfos * infos = (UserInfos*)malloc(1024);
infos->firstname = (char*)malloc(256);
infos->name = (char*)malloc(128);
infos->id = (char*)malloc(128);
infos->date = (char*)malloc(128);
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
sprintf(infos->name, "%s", card->FirstChild("name")->FirstChild()->Value());
sprintf(infos->date, "%s", card->FirstChild("date")->FirstChild()->Value());
sprintf(infos->id, "%s", card->FirstChild("filename")->FirstChild()->Value());
////////////////////////
// Gates
char * gates = (char*) card->FirstChild("gates")->FirstChild()->Value();
//////////////////////////
The only problem is on 'gates'.
The input form XML looks like "gate1/gate2/gate3" or just blank sometimes.
I want gate1 to be in infos->gates[0] ; etc.
I want to be able to list the gates array afterwards..
I always have a segfault when I try.
Btw, I don't really now how to initialize this array of pointers. I always initialize all gates[i] to NULL but It seems that I've a segfault when I do
for(int i=0;i
Thanks for all.
It's OK when I've only pointers but when String(char*) / Arrays / Pointers are mixed.. I can't manage =P
I saw too that we can use something like
int *myArray = calloc(NbOfRows, NbOfRows*sizeof(int));
Why should we declare an array like that.. ? x)
Thanks!
The problem that people frequently have with XML is that they assume all the elements are available. That's not always safe. Thus this statement:
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
Isn't safe to do because you don't actually know if all of those
functions actually return valid objects. You really need something
like the following (which is not optimized for speed, as I don't
know the tinyXML structure name being returned at each point and thus
am not storing the results once and am rather calling each function
multiple times:
if (card->FirstChild("firstname") &&
card->FirstChild("firstname")->FirstChild()) {
sprintf(infos->firstname, "%s", card->FirstChild("firstname")->FirstChild()->Value());
}
And then, to protect against buffer overflows from the data you should
really be doing:
if (card->FirstChild("firstname") &&
card->FirstChild("firstname")->FirstChild()) {
infos->firstname[sizeof(infos->firstname)-1] = '\0';
snprintf(infos->firstname, sizeof(infos->firstname)-1, "%s", card->FirstChild("firstname")->FirstChild()->Value());
}
Don't you just love error handling?
As to your other question:
I saw too that we can use something like int *myArray =
calloc(NbOfRows, NbOfRows*sizeof(int)); Why should we declare an array
like that.. ? x)
calloc first initializes the resulting memory to 0, unlike malloc.
If you see above where I set the end of the buffer to '\0' (which is
actually 0), that's because malloc returns a buffer with potentially
random (non-zero) data in it. calloc will first set the entire buffer
to all 0s first, which can be generally safer.
I'm pretty new at C programming, and this type of thing keeps popping up. As a simple example, suppose I have a struct http_header with some char pointers:
struct http_header {
char* name;
char* value;
};
I want to fill an http_header where value is the string representation of an int. I "feel" like, semantically, I should be able to write a function that takes in an empty header pointer, a name string, and an int and fills out the header appropriately.
void fill_header(struct http_header *h, char* name, int value)
{
h->name = name;
char *value_str = malloc(100);
sprintf(value_str, "%d", value);
h->value = value_str;
}
int main(int argc, const char * argv[])
{
struct http_header h;
char *name = "Header Name";
int val = 42;
fill_header(&h, name, val);
...
free(h.value);
}
Here, the calling code reads exactly as my intent, but in this case I'm creating the value string dynamically, which means I'd have to free it later. That doesn't smell right to me; it seems like the caller then knows too much about the implementation of fill_header. And in actual implementations it may not be so easy to know what to free: consider filling an array of http_headers where only one of them needed to have its value malloced.
To get around this, I'd have to create the string beforehand:
void fill_header2(struct http_header *h, char* name, char *value_str)
{
h->name = name;
h->value = value_str;
}
int main(int argc, const char * argv[])
{
struct http_header h;
char *name = "Header Name";
int value = 42;
char value_str[100];
sprintf(value_str, "%d", value);
fill_header2(&h, name, value_str);
}
As this pattern continues down the chain of structures with pointers to other structures, I end up doing so much work in top level functions the lower level ones seem hardly worth it. Furthermore, I've essentially sacrificed the "fill a header with an int" idea which I set out to write in the first place. I'm I missing something here? Is there some pattern or design choice that will make my life easier and keep my function calls expressing my intent?
P.S. Thanks to all at Stackoverfow for being the best professor I've ever had.
Well, I would go with the first approach (with a twist), and also provide a destroy function:
struct http_header *make_header(char *name, int value)
{
struct http_header *h = malloc(sizeof *h);
/* ... */
return h;
}
void destroy_header(struct http_header *h)
{
free(h->name);
free(h);
}
This way the caller doesn't have to know anything about http_header.
You might also get away with a version that leaves the main allocation (the struct itself) to the caller and does it's own internal allocation. Then you would have to provide a clear_header which only frees that fill allocated. But this clear_header leaves you with a partially-valid object.
I think your problem is simply that you are programming asymmetrically. You should once and for all decide who is responsible for the string inside your structure. Then you should have two functions, not only one, that should be called something like header_init and header_destroy.
For the init function I'd be a bit more careful. Check for a 0 argument of your pointer, and initialize your DS completely, something like *h = (http_header){ .name = name }. You never know if you or somebody will end up in adding another field to your structure. So by that at least all other fields are initialized with 0.
If you are new at C programming, you might perhaps want to use the Boehm's conservative garbage collector. Boehm's GC works very well in practice, and by using it systematically in your own code you could use GC_malloc instead of malloc and never bother about calling free or GC_free.
Hunting memory leaks in C (or even C++) code is often a headache. There are tools (like valgrind) which can help you, but you could decide to not bother by using Boehm's GC.
Garbage collection (and memory management) is a global property of a program, so if you use Boehm's GC you should decide that early.
The general solution to your problem is that of object ownership, as others have suggested. The simplest solution to your particular problem is, however, to use a char array for value, i.e., char value[12]. 2^32 has 10 decimal digits, +1 for the sign, +1 for the null-terminator.
You should ensure that 1) int is not larger than 32-bits at compile-time, 2) ensure that the value is within some acceptable range (HTTP codes have only 3 digits) before calling sprintf, 3) use snprintf.
So by using a static array you get rid of the ownership problem, AND you use less memory.
Is it possible to have pointers to data variables? I know I can have, say, pointers to strings e.g. char *str[n] and I can perform a 'for' loop over those pointers to retrieve the strings ... str[i] where i is the index counter.
If I have some data e.g.
char var1;
int var2;
char var3;
and I wanted to get data from stdin I might use 3 separate calls to scanf()- just an example - to populate these variables.
Can I have 'an array of pointers to data' e.g. void *data[] where data[0] = char var1, data[1] = int var2 and data[2] = char var3, so that I could then use a single call to scanf() in a 'for' loop to populate these variables? (I'm assuming the type would have to be void to cater for the different types in the array)
I don't really recommend this, but here's the implementation you describe:
char var1;
int var2;
char var3;
void *vars[3];
char *types[3];
vars[0]=&var1; types[0]="%c";
vars[1]=&var2; types[1]="%d";
vars[2]=&var3; types[2]="%c";
for (int i=0;i<3;i++)
{
scanf(types[i],vars[i]);
}
You need the array of types so that scanf knows what it should expect.
However, this procedure is extremely unsafe. By discarding any type-safety, you invite crashes from malformed input. Also, if you misconfigure types[] then you will almost certainly crash, or see unexpected results.
By the time you've set up the arrays, have you really saved any code?
There are plenty of answers here that will allow you to use either a type-safe C++ solution, or as others have recommended, calling scanf() explicitly.
You certainly could have such a void *data[] array. You wouldn't be able to read those in via scanf, though, as you need a different format specifier for the different data types.
If you wanted to do this, you could iterate over an array of
struct dataType
{
void *data;
char *format_specifier;
}
or somesuch. However, I doubt this would be a good idea - you probably want to also prompt for each value, so you'd add another char *prompt to that struct, and you'll probably need other things later as well.
I suspect the code you'd end up writing to do this would be much more effort than simply scanf-ing n times, even for quite large n.
The problem is that this void * array would be dealing with datatypes of different sizes. For this problem, you'd probably want to use a struct and maintain an array of those instead. Or you could just put your data in as a byte array, but then you'd have to know how to "chop it up" properly.
you could have an array of void*, so *array[0] = var1, etc.
To illustrate the problem..
int main(int argc, char* argv[])
{
char var1 = 'a';
int var2 = 42;
char var3 = 'b';
void* stuff[3] = {0};
stuff[0] = &var1;
stuff[1] = &var2;
stuff[2] = &var3;
// Can't really use the array of void*'s in a loop because the
// types aren't known...
assert( var1 == (char)(*(char*)stuff[0]));
assert( var2 == (int)(*(int*)stuff[1]));
assert( var3 == (char)(*(char*)stuff[2]));
return 0;
}
Not directly, no. If you're able to use C++ in this situation, the closest you could do would be to wrap each variable in an object (either something like a variant_t or some templated, polymorphic solution). For instance, I believe you can do something like this:
class BaseType
{
public:
virtual void DoScanf();
};
template<typename TYPE>
class SubType : public BaseType
{
public:
SubType(const TYPE& data) : m_data(data) {}
const TYPE& m_data;
virtual void DoScanf()
{
// Your code here
}
};
int num1;
char char1;
SubType<int> num1Wrapper(num1);
SubType<char> char1Wrapper(char1);
// You can then make a list/vector/array of BaseTypes and iterate over those.
Looks like you are trying to implement a template (C++) equivalent in C. :D exactly that is what i am trying to do, for one of my project. I think mine case is less confusing as I am using only one datatype (some project specific structure), mine array would not be intermixing the datatypes.
Hey, do one thing try using a union of various data-types you intent to use, i think reading this shall not be a problem. As when your read-function using that union reads it, will be able to read it, because of the inherit C-type safety. What i mean here is the follow the same concept which we usually use to check a endianess of a machine.
These are few idea, i am myself working on, shall be able to complete this is a day or two. And only then i can tell you, if this is exactly possible or not. Good luck, if you are Implementing this for some project. Do share your solution, may be here itself, I might also find some answer. :)
I am using the array of void pointers.
If you want an array of variables that can be "anything" and you are working in C, then I think you want something like a struct that contains a typeid and a union.
Like this, maybe: (Note, quick example, not compile tested, not a complete program)
struct anything_t {
union {
int i;
double d;
char short_str[7]; /* 7 because with this and typeid makes 8 */
char *str; /* must malloc or strdup to use this */
}; /* pretty sure anonymous union like this works, not compiled */
char type; /* char because it is small, last because of alignment */
};
char *anything_to_str(char *str, size_t len, const struct anything_t *any)
{
switch(any->type) {
case 1: snprintf(str, len, "%d", any->i); break;
case 2: snprintf(str, len, "%f", any->d); break;
case 3: snprintf(str, len, "%.7s", any->short_str); break; /* careful, not necessarily 0-terminated */
case 4: snprintf(str, len, "%s", any->str); break;
default: abort(); break;
}
return str;
}
And I forgot to add the scanf part I intended:
char *scanf_anything(struct anything_t *inputs, size_t count)
{
int input_i;
struct anything_t *i;
for(input_i=0; input_i<count; ++input_i) {
any = inputs + input_i;
switch(any->type) {
case 1: scanf(" %d ", any->i); break;
case 2: scanf(" %lf ", any->d); break;
case 3: scanf(" %.6s ", any->short_str); break;
case 4: scanf(" %a ", any->str); break; /* C99 or GNU but you'd be a fool not to use it */
default: abort(); break;
}
}
}
Formally all the solutions presented here which are using void* have the same problem. Passing a void* as a variadic argument (like scanf) which expects another type of pointer put you in the "undefined behavior" domains (for the same reason, you should cast NULL to the correct type when passed as variadic argument as well).
on most common platforms, I see no reason for a compiler to take advantage of that. So it will probably work until a compiler maker find out that there is a test in SPEC where it allows to get a 0.000001% improvement :-)
on some exotic platforms, taking advantage of that is the obvious thing to do (that rule has been put for them after all; they are mostly of historical interest only but I'd not bet anything about embedded platforms; I know, using scanf on embedded platforms can be considered as strange)
All solutions above are valid but no one has yet mentioned that you can do what you want with a single simple scanf call:
scanf(%c%d%c",var1,var2,var3);