C conventions - how to use memset on array field of a struct - c

I wold like to settle an argument about proper usage of memset when zeroing an array field in a struct (language is C).
Let say that we have the following struct:
struct my_struct {
int a[10]
}
Which of the following implementations are more correct ?
Option 1:
void f (struct my_struct * ptr) {
memset(&ptr->a, 0, sizeof(p->a));
}
Option 2:
void f (struct my_struct * ptr) {
memset(ptr->a, 0, sizeof(p->a));
}
Notes:
If the field was of a primitive type or another struct (such as 'int') option 2 would not work, and if it was a pointer (int *) option 1 would not work.
Please advise,

For a non-compound type, you would not use memset at all, because direct assignment would be easier and potentially faster. It would also allow for compiler optimizations a function call does not.
For arrays, variant 2 works, because an array is implictily converted to a pointer for most operations.
For pointers, note that in variant 2 the value of the pointer is used, not the pointer itself, while for an array, a pointer to the array is used.
Variant 1 yields the address of the object itself. For a pointer, that is that of the pointer (if this "works" depends on your intention), for an array, it is that of the array - which happens to always be the address of its first element - but the type differs here (irrelevant, as memset takes void * and internally converts to char *).
So: it depends; for an array, I do not see much difference actually, except the address-operator might confuse reads not so familar with operator preceedence (and it is more to type). As a personal opinion: I prefer the simpler syntax, but would not complain about the other.
Note that memset with any other value than 0 does not make much sense actually; it might not even guarantee an array of pointers to be interpreted as null pointer.

IMO, option 1 is preferable because the same pattern works for any object, not just arrays:
memset(&obj, 0, sizeof obj);
You can tell just from this statement that it does not cause a buffer overflow -- i.e. does not access out of bounds. It's still possible that this doesn't do what was intended (e.g. if obj is a pointer and it was intended to set what the pointer was pointing to), but at least the damage is contained.
However if you accidentally use memset(p, 0, sizeof p) on a pointer then you may write past the end of the object being pointed to; or if the object is bigger than sizeof p, you leave the object in a weird state.

Related

(int*) when dynamically allocating array of ints in c

So I'm a bit confused on how to make a function that will return a pointer to an array of ints in C. I understand that you cannot do:
int* myFunction() {
int myInt[aDefinedSize];
return myInt; }
because this is returning a pointer to a local variable.
So, I thought about this:
int* myFunction(){
int* myInt = (int) malloc(aDefinedSize * sizeof(int));
return myInt; }
This gives the error: warning cast from pointer to integer of different size
This implies to use this, which works:
int* myFunction(){
int* myInt = (int*) malloc(aDefinedSize * sizeof(int));
return myInt; }
What I'm confused by though is this:
the (int*) before the malloc was explained to me to do this: it tells the compiler what the datatype of the memory being allocated is. This is then used when, for example, you are stepping through the array and the compiler needs to know how many bytes to increment by.
So, if this explanation I was given is correct, isn't memory being allocated for aDefinedSize number of pointers to ints, not actually ints? Thus, isnt myInt a pointer to an array of pointers to ints?
Some help in understanding this would be wonderful. Thanks!!
So, if this explanation I was given is correct, isn't memory being allocated for aDefinedSize number of pointers to ints, not actually ints?
No, you asked malloc for aDefinedSize * sizeof(int) bytes, not
aDefinedSize * sizeof(int *) bytes. That's the size of memory you get, the type depends on the pointer used to access the memory.
Thus, isnt myInt a pointer to an array of pointers to ints?
No, since you defined it as a int *, a pointer-to-an-int.
Of course the pointer has no knowledge of how large the allocated memory are is, but only points at the first int that fits there. It's up to you as programmer to keep track of the size.
Note that you shouldn't use that explicit typecast. malloc returns a void *, that can be silently assigned to any pointer, as in here:
int* myInt = malloc(aDefinedSize * sizeof(int));
Arithmetic on the pointer works in strides of the pointed-to type, i.e. with int *p, p[3] is the same as *(p+3), which means roughly "go to p, go forward three times sizeof(int) in bytes, and access that location".
int **q would be a pointer-to-a-pointer-to-an-int, and might point to an array of pointers.
malloc allocates an array of bytes and returns void* pointing to the first byte. Or NULL if the allocation failed.
To treat this array as an array of a different data type, the pointer must be cast to that data type.
In C, void* implicitly casts to any data pointer type, so no explicit cast is required:
int* allocateIntArray(unsigned number_of_elements) {
int* int_array = malloc(number_of_elements * sizeof(int)); // <--- no cast is required here.
return int_array;
}
Arrays in C
In C, you want to remember that an array is just an address in memory, plus a length and an object type. When you pass it as an argument to a function or a return value from a function, the length gets forgotten and it’s treated interchangeably with the address of the first element. This has led to a lot of security bugs in programs that either read or write past the end of a buffer.
The name of an array automatically converts to the address of its first element in most contexts, so you can for example pass either arrays or pointers to memmove(), but there are a few exceptions where the fact it also has a length matters. The sizeof() operator on an array is the number of bytes in the array, but sizeof() a pointer is the size of a pointer variable. So if we declare int a[SIZE];, sizeof(a) is the same as sizeof(int)*(size_t)(SIZE), whereas sizeof(&a[0]) is the same as sizeof(int*). Another important one is that the compiler can often tell at compile time if an array access is out of bounds, whereas it does not know which accesses to a pointer are safe.
How to Return an Array
If you want to return a pointer to the same, static array, and it’s fine that you’ll get the same array each time you call the function, you can do this:
#define ARRAY_SIZE 32U
int* get_static_array(void)
{
static int the_array[ARRAY_SIZE];
return the_array;
}
You must not call free() on a static array.
If you want to create a dynamic array, you can do something like this, although it is a contrived example:
#include <stdlib.h>
int* make_dynamic_array(size_t n)
// Returns an array that you must free with free().
{
return calloc( n, sizeof(int) );
}
The dynamic array must be freed with free() when you no longer need it, or the program will leak memory.
Practical Advice
For anything that simple, you would actually write:
int * const p = calloc( n, sizeof(int) );
Unless for some reason the array pointer would change, such as:
int* p = calloc( n, sizeof(int) );
/* ... */
p = realloc( p, new_size );
I would recommend calloc() over malloc() as a general rule, because it initializes the block of memory to zeroes, and malloc() leaves the contents unspecified. That means, if you have a bug where you read uninitialized memory, using calloc() will always give you predictable, reproducible results, and using malloc() could give you different undefined behavior each time. In particular, if you allocate a pointer and then dereference it on an implementation where 0 is a trap value for pointers (like typical desktop CPUs), a pointer created by calloc() will always give you a segfault immediately, while a garbage pointer created by malloc() might appear to work, but corrupt any part of memory. That kind of bug is a lot harder to track down. It’s also easier to see in the debugger that memory is or is not zeroed out than whether an arbitrary value is valid or garbage.
Further Discussion
In the comments, one person objects to some of the terminology I used. In particular, C++ offers a few different kinds of ways to return a reference to an array that preserve more information about its type, for example:
#include <array>
#include <cstdlib>
using std::size_t;
constexpr size_t size = 16U;
using int_array = int[size];
int_array& get_static_array()
{
static int the_array[size];
return the_array;
}
std::array<int, size>& get_static_std_array()
{
static std::array<int, size> the_array;
return the_array;
}
So, one commenter (if I understand correctly) objects that the phrase “return an array” should only refer to this kind of function. I use the phrase more broadly than that, but I hope that clarifies what happens when you return the_array; in C. You get back a pointer. The relevance to you is that you lose the information about the size of the array, which makes it very easy to write security bugs in C that read or write past the block of memory allocated for an array.
There was also some kind of objection that I shouldn’t have told you that using calloc() instead of malloc() to dynamically allocate structures and arrays that contain pointers will make almost all modern CPUs segfault if you dereference those pointers before you initialize them. For the record: this is not true of absolutely all CPUs, so it’s not portable behavior. Some CPUs will not trap. Some old mainframes will trap on a special pointer value other than zero. However, it’s come in very handy when I’ve coded on a desktop or workstation. Even if you’re running on one of the exceptions, at least your pointers will have the same value each time, which should make the bug more reproducible, and when you debug and look at the pointer, it will be immediately obvious that it’s zero, whereas it will not be immediately obvious that a pointer is garbage.

Can malloc() be used to define the size of an array?

Here consider the following sample of code:
int *a = malloc(sizeof(int) * n);
Can this code be used to define an array a containing n integers?
int *a = malloc(sizeof(int) * n);
Can this code be used to define an array a containing n integers?
That depends on what you mean by "define an array".
A declaration like:
int arr[10];
defines a named array object. Your pointer declaration and initialization does not.
However, the malloc call (if it succeeds and returns a non-NULL result, and if n > 0) will create an anonymous array object at run time.
But it does not "define an array a". a is the name of a pointer object. Given that the malloc call succeeds, a will point to the initial element of an array object, but it is not itself an array.
Note that, since the array object is anonymous, there's nothing to which you can apply sizeof, and no way to retrieve the size of the array object from the pointer. If you need to know how big the array is, you'll need to keep track of it yourself.
(Some of the comments suggest that the malloc call allocates memory that can hold n integer objects, but not an array. If that were the case, then you wouldn't be able to access the elements of the created array object. See N1570 6.5.6p8 for the definition of pointer addition, and 7.22.3p1 for the description of how a malloc call can create an accessible array.)
int *a = malloc(sizeof(int) * n);
Assuming malloc() call succeeds, you can use the pointer a like an array using the array notation (e.g. a[0] = 5;). But a is not an array itself; it's just a pointer to an int (and it may be a block of memory which can store multiple ints).
Your comment
But I can use an array a in my program with no declaration otherwise
suggests this is what you are mainly asking about.
In C language,
p[i] == *(p + i) == *(i + p) == i[p]
as long as one of i or p is of pointer type (p can an array as well -- as it'd be converted into a pointer in any expression). Hence, you'd able to index a like you'd access an array. But a is actually a pointer.
Yes. That is exactly what malloc() does.
The important distinction is that
int array[10];
declares array as an array object with enough room for 10 integers. In contrast, the following:
int *pointer;
declares pointer as a single pointer object.
It is important to distiguinsh that one of them is a pointer and that the other as an actual array, and that arrays and pointers are closely related but are different things. However, saying that there is no array in the following is also incorrect:
pointer = malloc(sizeof (int) * 10);
Because what this piece of code does is precisely to allocate an array object with room for 10 integers. The pointer pointer contains the address of the first element of that array.(C99 draft, section 7.20.3 "Memory management functions")
Interpreting your question very literally, the answer is No: To "define an array" means something quite specific; an array definition looks something like:
int a[10];
Whereas what you have posted is a memory allocation. It allocates a space suitable for holding an array of 10 int values, and stores a pointer to the first element within this space - but it doesn't define an array; it allocates one.
With that said, you can use the array element access operator, [], in either case. For instance the following code snippets are legal:
int a[10];
for (int i = 0; i < 10; i++) a[i] = 0;
and
int *a = malloc(sizeof(int) * n);
for (int i = 0; i < n; i++) a[i] = 0;
There is a subtle difference between what they do however. The first defines an array, and sets all its elements to 0. The second allocates storage which can hold an equivalently-typed array value, and uses it for this purpose by initialising each element to 0.
It is worth pointing out that the second example does not check for an allocation error, which is generally considered bad practice. Also, it constitutes a potential memory leak if the allocated storage is not later freed.
In the language the Standard was written to describe (as distinct from the language that would be described by a pedantic literal reading of it), the intention was that malloc(n) would return a pointer that would, if cast to a T*, could be treated as a pointer to the first element of a T[n/sizeof T*]. Per N1570 7.22.3:
The
pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to
a pointer to any type of object with a fundamental alignment requirement and then used
to access such an object or an array of such objects in the space allocated (until the space
is explicitly deallocated).
The definition of pointer addition and subtraction, however, do not speak of acting upon pointers that are "suitably aligned" to allow access to arrays of objects, but rather speak of pointers to elements of actual array objects. If a program accesses space for 20 int objects, I don't think the Standard does actually says that the resulting pointer would behave in all respects as though it were a pointer to element [0] of an int[20], as distinct from e.g. a pointer to element [0][0] of an int[4][5]. An implementation would have to be really obtuse not to allow it to be used as either, of course, but I don't think the Standard actually requires such treatment.

memcpy start index really needed?

The question is when we are copying any Byte array using memcpy(), shall we explicitly declare the starting (0 th) index for the destination buffer or simple mentioning it would suffice. Let me show the examples what I'm talking about. Provided that we are trying to copy source buffer to starting of the destination buffer.
BYTE *pucInputData; // we have some data here
BYTE ucOutputData[20] = {0};
Code 1
memcpy((void*)&ucOutputData, (void*)pucInputData, 20);
Code 2
memcpy((void*)&ucOutputData[0], (void*)pucInputData, 20);
In your case, considering this a C code snippet, and ucOutputData is an array
memcpy(ucOutputData, pucInputData, 20);
memcpy(&ucOutputData[0], pucInputData, 20);
both are same and can be used Interchangeably. The name of the array essentially gives you the address of the first element in the array.
Now, as per the very useful discussion in below comments, it is worthy to mention, that
memcpy(&ucOutputData, pucInputData, 20);
will also do the job here, but there is a fundamental difference between the usage of array name and address of array name. Considering the example in the question, for a definition like BYTE ucOutputData[20],
ucOutputData points to the address of the first element of an array of 20 BYTEs.
&ucOutputData is a pointer to an array of 20 BYTEs.
So, they are of different type and C respects the type of the variable. Hence, to avoid any possible misuse and misconception, the recommended and safe way to use this is either of the the first two expressions.
FWIW, the cast(s) here is(are) really not needed. Any pointer type can be implicitly ansd safely be converted to void * in C.
No, both of your examples are sub-optimal.
Remember that all data pointers in C convert to/from void * (which is the type of the first argument to memcpy()) without loss of information and that no cast is necessary to do so.
Also remember that the name of an array evaluates to the address of the first element in many contexts, such as here.
Also remember to use sizeof when you can, never introduce a literal constant when you don't have to.
So, the copy should just be:
memcpy(ucOutputData, pucInputData, sizeof ucOutputData);
Note that we use sizeof without parentheses, it's not a function. Also we use it on the destination buffer, which seems the safer choice.
Since an expression &array[0] is the same as array, and because any pointer can be implicitly converted to void*, you should do this instead:
memcpy(ucOutputData, pucInputData, 20);
Moreover, since you are writing over the entire ucOutputData, you do not need to zero out its content, so it's OK to drop the initializer:
BYTE ucOutputData[20]; // no "= {0}" part
A native array can decay to a pointer without conversion, so in the snippet below, the three assignments to p all result in the same value; p will point to the beginning of the array. No explicit cast is needed because casting to void* is implicit.
typedef char BYTE;
BYTE ucOutputData[20] = {0};
void *p = &ucOutputData;
p = ucOutputData;
p = &ucOutputData[0];

Getting length of an array

I've been wondering how to get the number of elements of an array. Somewhere in this website I found an answer which told me to declare the following macro:
#define NELEMS(x) (sizeof(x) / sizeof(x[0]))
It works well for arrays defined as:
type arr[];
but not for the following:
type *arr = (type) malloc(32*sizeof(type));
it returns 1 in that case (it's supposed to return 32).
I would appreciate some hint on that
Pointers do not keep information about whether they point to a single element or the first element of an array
So if you have a statement like this
type *arr = (type) malloc(32*sizeof(type));
then here is arr is not an array. It is a pointer to the beginning of the dynamically allocated memory extent.
Or even if you have the following declarations
type arr[10];
type *p = arr;
then again the pointer knows nothing about whether it points to a single object or the first element of an array. You can in any time write for example
type obj;
p = &obj;
So when you deal with pointers that point to first elements of arrays you have to keep somewhere (in some other variable) the actual size of the referenced array.
As for arrays themselves then indeed you may use expression
sizeof( arr ) / sizeof( *arr )
or
sizeof( arr ) / sizeof( arr[0] )
But arrays are not pointers though very often they are converted to pojnters to their first elements with rare exceptions. And the sizeof operator is one such exception. Arrays used in sizeof operator are not converted to pointers to their first elements.
sizeof operator produces the size of a type of the variable. It does not count the amount of memory allocated to a pointer (representing the array).
To elaborate,
in case of type arr[32];, sizeof (arr) is essentially sizeof(type[32]).
in case of type *arr;, sizeof(arr) is essentially sizeof(type*)
To get the length of a string, you need to use strlen().
Remember, the definition of string is a null-terminated character array.
That said, in your code,
type *arr = (type) malloc(32*sizeof(type));
is very wrong. To avoid this kind of error, we suggest do not cast malloc().
And remove the cast. You should not cast the result of malloc and
family.
These are the main reasons for not casting the returned value from malloc (and family of functions).
in C, the return type of those functions is 'void*'. A void * can be assigned to any pointer type.
During debugging and during maintenance the receiving pointer type is often changed. The origin of that change is often not where the malloc function is called. If the returned value is cast, then a bug is introduced to the code. This kind of bug can be very difficult to find.
There is no safe and sound way of finding the length of an array in C since no bookkeeping is done for them.
You will need to use some other data structures which does the book keeping for you in order to ensure the correct result every time.

What's a Singleton pointer in C?

I have some code which is like this (This is not production code. Just a sample code)
char *inbuf = NULL;
inbuf = buf; //buf is some other valid buffer of size 100.
func(&inbuf);
.....
void func(char **p)
{
...
(*p)++;
...
}
Coverity Tool says that "Taking address with &inbuf yields a singleton". I have heard the term singleton with respect to C++. But, what does a singleton pointer mean in terms of C?
What does a singleton pointer mean in terms of C?
In this case I think Coverity is referring to the difference between an array of char* and a pointer to a single char* created by taking the address of that array.
Coverity is warning you that by passing the address of the first element of buf to func, you're making it more difficult for yourself to safely write to that array because you can't easily determine its size.
It's difficult to be sure without seeing all of your code, but assuming buf is an array you've declared somewhere in your top-level function then using the sizeof operator on buf from that function will yield the size of the array.
However, when you pass the address of buf to func at the line
func(&inbuf);
...func merely receives a pointer to the first element of that array. From func you can no longer use sizeof to determine the size of the array - it will just return the size of the pointer - and so you can't safely write to that pointer without some implicit understanding of how much space the array contains.
This makes for fragile code, and hence is poor practice.
(None of this is anything to do with the Singleton Pattern)
The Coverity analysis is flagging a defect of the following pattern:
typeA var; // declare a variable to some type
func(&var) // call a function passing the address of var
func(typeA *var) {
...
var++; // inside the function do pointer arithmetic on var
This is a bug pattern, frequently, because the function expects a pointer to a buffer, but you're passing it a pointer to a singleton value. The type systems in C/C++ do not distinguish between "pointer to one object" and "pointer to array of objects".

Resources