I'm trying to figure out which OpenSSL functions might return NULL pointers and which cannot. Sometimes the documentation clearly states that the function might return NULL, e.g.
X509_NAME_get_entry
X509_NAME_get_entry() returns an X509_NAME pointer to the requested
entry or NULL if the index is invalid.
Sometimes they it does not, e.g.
X509_get_subject_name
It only states
X509_get_subject_name() returns the subject name of certificate x. The returned value is an internal pointer that MUST NOT be freed.
It's unclear (to me) if that means that the pointer is always valid or if it can be NULL, too.
Another example:
x509_name_entry_get_data
explains it for version 1.1.1
X509_NAME_ENTRY_get_object() returns a valid ASN1_OBJECT structure if
it is set or NULL if an error occurred.
but not for 1.1.0
A glance at the source code does not help much either as e.g. X509_get_subject_name just returns a member:
X509_NAME *X509_get_subject_name(const X509 *a)
{
return a->cert_info.subject;
}
If this member is ever assigned a NULL pointer or not...
The last example makes it seem necessary to add checks to all pointers coming from functions where there is no explicit documentation that the pointer cannot be NULL. I don't want to add pointless pointer checks in the code, but I do not want to use a security related library in a way where missing pointer checks might somehow be exploited.
Does anybody know how to interpret the OpenSSL documentation correctly in this regard? Or is it just horribly inconsistent and one cannot really tell?
In general, I would trust the manpages and if it says, that a pointer is returned, and it doesn't mention that a NULL pointer may be returned, then you should be able to rely on it, that this indeed is never the case. Otherwise, I would consider it a severe bug (be it in the manpage or in the code) that should be reported.
For the x509_name_entry_get_data() the "returned value" section was entirely missing in the 1.1.0 version. This was fixed.
I would indeed play it safe and check every returned pointer for NULL. This also saves your code when the API changes in the future.
If you really do not want that and do not want to rely only on the manpages, you can at least perform a simple check: Create an empty object, call the function in question and check the return value. For example, for an X509 object:
X509 *x509=X509_new();
printf("%p\n", X509_get_subject_name(x509));
This indeed gives a valid pointer, so this hints, that the manpage is correct.
When browsing through code on the Internet, I often see snippets like these:
struct epoll_event event;
memset(&event, 0, sizeof(event));
This pattern seems needless to me, if event is filled out in full, but it is widespread. Perhaps to take into account possible future changes of the struct?
This is surely just bad copy-and-paste coding. The man page for epoll does not document any need to zero-initialize the epoll_event structure, and does not do so in the examples. Future changes to the struct do not even seem to be possible (ABI), but if they were, the contract would clearly be that any parts of the structure not related to the events you requested would be ignored (and not even read, since the caller may be passing a pointer to storage that does not extend past the original definition).
Also, in general it's at best pointless and at worst incorrect/nonportable to use memset when a structure is supposed to be zero-initialized, since the zero representation need not be the zero value (for pointer and floating point types). Nowadays this generality is mostly a historical curiosity, and not relevant to a Linux-specific interface like epoll anyway, but it comes up as well with mbstate_t which exists in fully general C, and where zero initialization is required to correctly use the associated interfaces. The correct way to zero-initialize things that need zero values, rather than all-zero-bytes representations, is with the universal zero initializer, { 0 }.
Using memset like this can help you locate bugs faster. Consider it a defensive (even secure) style of programming.
Lets say you didn't use memset, and instead attempt to diligently fill in each member as documented by the API. But if you ever forget to fill in a field (or a later API change leads to the addition of a new field), then the value that field takes at run-time is undefined; and in practice will use whatever the memory previously held.
What are the consequences?
If you are lucky, your code will immediately fail in a nice way that can be debugged, for example, if the unset field needs a highly specific value.
If you are unlucky, your code may still work, and it may work for years. Perhaps on your current operating system the program memory somehow already held the correct value expected by the API. But as you move your code across systems and compilers, expect confusing behavior: "it works on my machine, but I don't understand why it doesn't work on yours".
So in this case, memset is helping you avoid this undeterministic behavior.
Of course, you can still profile your code, check for undefined memory, unit tests etc. Doing memset is not a replacement for those. It's just another technique to get to safe software.
Checking the changes in recent OpenSSL releases, I now noticed that the HMAC_CTX structure must be allocated on the heap now. The headers only forward-declare it (in ossl_typ.h).
I wonder what the idea behind is. Given that heap allocated memory creates overhead, they must have a good reason for making the library slower. I just can't find the rationale behind it.
Anyone here know what made the developers decide to force allocation for this?
I've seen a lot of the OpenSSL structures going the same way. I would think that it's because the implementers of OpenSSL want to "hide" the implementation state away from the users of the library. That way the user can't "mess" with it in ways that the implementers don't want you to. It also means that the implementers can change there implementation without the user code caring. It's basically the "C" version of the C++ PIMPL pattern.
Recently, I read a white paper by an individual who refers to a pointer to a struct as a handle. The author was clearly someone who had written C code on the windows platform previously. Googling indicates that windows programmers interact with system components via handles. I am wondering if it is common practice for windows programmers to refer to all struct pointers as handles? Or is the term handle meant to convey something beyond pointer to struct? I am asking as a linux C programmer.
The white paper I am referring to is:
Duff, Heroux, and Pozo. An Overview of the Sparse Basic Linear Algebra Subprograms: The New Standard from the BLAS Technical Forum. ACM Transactions on Mathematical Software, Vol 28, No. 2, June 2002, Pages 239-267.
The term handle generally means some opaque value that has meaning only to the API which produced it. In Win32, the HANDLE type is either a pointer in kernel memory (which applications cannot access anyway) or an index into some kernel-internal array.
A handle is an old and revered concept.
A cookie is much the same thing. Or a GUID. Or a ticket to retrieve your car from a car park, or your coat from a fancy restaurant, etc.
Its any unique value that when presented back to the issuer can be used to track back to the actual thing referred, by whatever opaque mechanism the issuer wants. You may or may not know anything about that process, nor what the underlying thing is, exactly (only conceptually).
It was heavily used by Windows, but it is certainly not unique to Windows.
You would not normally use "handle" to mean "pointer to struct." Handle is more like "token" than like "pointer." It refers to something - file, system resource, memory, state-snapshot, etc. But what-exactly-it-is is based on the context of the handle itself (i.e. who issued the handle).
Handles were also used heavily in early filesystem programming in K&R C.
I use the word handle to mean a pointer that points to an "object" that represents a resource - often an OS resource, whereas a pointer just points to some memory. If you have a handle to something, you shouldn't try to read and write bytes into it directly, but manipulate it through provided methods.
Often handles are implemented as an opaque void *, which is further encouragement not to try to directly dereference it.
Since you refer to handles being used as a pointer to a structure, as used by a Windows programmer, I'll answer within that context. Please note that there are clearly many different kinds of "handles", as it is a generic concept widely used within the computing environment. Certainly you will be familiar with the concept of a file handle; Windows also offers window handles and many other kinds of handles. Having said that:
A "memory handle" (that is similar to a pointer to a struct) is a concept from the land of 16-bit Windows programming, where there was no memory manager in the CPU and all memory management had to be done in software. Essentially, a "handle" was sort of a pointer, but the OS would be free to move around the memory that the handle referred to. You can't do that with a regular pointer, but the handle had functions that would get and release the actual memory address.
With the introduction of Win32, where the CPU had a hardware memory manager, the concept of the memory handle became obsolete. Other types of handles such as file handles and window handles still exist in Win32, but are not pointers to structs.
The term handle is used to mean any technique that lets you access to another object. A handle can be a pointer, a reference, a pointer to a pointer, etc.. But sure its related to classes, objects etc. So handle need not always be a pointer to structure.
-AD.
In the old days of MacOS programming, back before OSX, a handle was a pointer to a pointer. That allowed the OS to move things around without invalidating the user's pointers. There were rules on when we could assume the pointed-to object wouldn't move, which I don't remember.
The term "handle" didn't orignate on Windows, though it became widespread among Windows programmers.
In the C standard library (stdio.h), file handles are pointers to a data structure used by the C library.
Pure Unix programming uses file descriptors, which are indexes into a kernel data structure, but pointers have been used as handles in Unix for over 30 years.
"Handle" is a logical term, not a physical one. It's meant as a proxy to a physical object to code that has more intimate knowledge of the object. A pointer to a struct is one such proxy, but there are many other possibilites.
No, it is not particularly common amongst Windows programmers to refer to pointers as handles, but doing so isn't WRONG either. The term "handle" is usually used to describe something you use to access something through, and in that sense are all pointers handles (but not all handles are pointers). Win32's handles are AFAIK usually not pointers, but instead indices to internal OS tables - but this might change in future versions of Windows.
A handle is a generic term for a reference (not specifically a C++ reference) to an object.
A pointer is a subset of handle, since it points to objects.
A foreign key in a database is also a handle, since it points to records in other tables; and it is not a pointer.
In the Windows API environment, they used the abstract term handle so they could use an integer into a table, a pointer, or other methods, without interfering with the client; IOW, defining an abstract interface.
In summary, a handle can be something other than a pointer, such as an integer index or an object containing more details about the object (such as a smart pointer).
I'm probably older than most of the respondents, having earned a living coding in C on both the early (late 80s) Macintosh and both 16 and 32-bit Windows. In those ancient times (when an IBM mainframe might have only 256k of memory) a handle was always a pointer (or table offset) to a memory pointer.
As a previous respondent mentioned, that allowed tables of pointers to memory blocks to be managed by the OS without invalidating the "handles" used by the programmer. Unfortunately, I do not remember how we guaranteed that an allocated block would not be moved while we using the handle.
Windows defines handles for many things. They're not necessarily pointers at all -- some are, but others are things like offsets into particular tables. A few are intentionally obfuscated. There are handles for everything from windows to device contexts to bitmaps, and so on.
In any case, a handle is normally intended as an opaque data type -- i.e. you're not supposed to know anything about its value, only a set of predefined operations that can use it to accomplish various tasks. I believe C++/CLI also has a pointer-like object that's called a handle. I believe it's supposed to be closer to an opaque data type though -- if memory serves, you're not allowed to do any pointer arithmetic on them.
Handles are generally pointers that you don't directly need to dereference. Rather you pass them to API calls which operate on the underlying structs.
Historically on Windows, handles were not pointers. You would lock the handle to get a pointer before using it, and unlock it when you were done (and the pointer would become invalid). In the days before paged memory, old-school Windows did it's own memory management by swapping out resources only referenced by handles and swap them back in when they got locked. In practice, this made memory management a nightmare, but allowed Windows to simulate virtual memory on systems without hardware support for it.
A pointer is definitely different than a handle. A pointer is an address of something unspecified in memory. A pointer to a structure can be called a "handle" (usually by using 'typedef').
A handle is a concept used in writing the windows operating system.
A pointer is a part of the C language.
Actually a pointer is a variable which contains the address of another variable,but a handle is a pointer to a pointer i.e a pointer which contains the address of another pointer
FOR EX:
int x=10;
int *a=&x;// this is a simple pointer
int *b=a;// this is a handle
A handle is a number, the pointer is not a handle
// storage
char data[0xFFFF] = {0} ;
// pointer aka "iterator"
char * storage_pointer = & data[42];
// handle
size_t storage_handle = 42 ;
The key difference or if you prefer to call it the "advantage" of handles is one can try to deduce if the handle is valid or if you prefer the term, "dangling".
I do use handles whenever feasible. Here is a good article on advantages and implementation practices.
I've written an API that requires a context to be initialized and thereafter passed into every API call. The caller allocates the memory for the context, and then passes it to the init function with other parameters that describe how they want later API calls to behave. The context is opaque, so the client can't really muck around in there; it's only intended for the internal use of the API functions.
The problem I'm running into is that callers are allocating the context, but not initializing it. As a result, subsequent API functions are referring to meaningless garbage as if it was a real context.
I'm looking for a way to verify that the context passed into an API function has actually been initialized. I'm not sure if this is possible. Two ideas I've thought of are:
Use a pre-defined constant and store it in a "magic" field of the context to be verified at API invocation time.
Use a checksum of the contents of the context, storing this in the "magic" field and verifying it at invocation time.
Unfortunately I know that either one of these options could result in a false positive verification, either because random crap in memory matches the "magic" number, or because the context happens to occupy the same space as a previously initialized context. I think the latter scenario is more likely.
Does this simply boil down to a question of probability? That I can avoid false positives in most cases, but not all? Is it worth using a system that merely gives me a reasonable probability of accuracy, or would this just make debugging other problems more difficult?
Best solution, I think, is add create()/delete() functions to your API and use create to allocate and initialize the structure. You can put a signature at the start of the structure to verify that the pointer you are passed points to memory allocated with create() and use delete() to overwrite the signature (or entire buffer) before freeing the memory.
You can't actually avoid false positives in C because the caller malloc'd memory that "happened" to start with your signature; but make you signature reasonably long (say 8 bytes) and the odds are low. Taking allocation out of the hands of the caller by providing a create() function will go a long way, though.
And, yeah, your biggest risk is that an initialized buffer is free'd without using delete(), and a subsequent malloc happens to reuse that memory block.
Your context variable is probably at the moment some kind of pointer to allocated memory. Instead of this, make it a token or handle that can be explicitly verified. Every time a context is initialised, you return a new token (not the actual context object) and store that token in an internal list. Then, when a client gives you a context later on, you check it is valid by looking in the list. If it is, the token can then be converted to the actual context and used, otherwise an error is returned.
typedef Context long;
typedef std::map<Context, InternalContext> Contexts;
Contexts _contexts;
Context nextContext()
{
static Context next=0;
return next++;
}
Context initialise()
{
Context c=nextContext();
_contexts.insert(make_pair(c, new InternalContext));
return c;
}
void doSomethingWithContext(Context c)
{
Contexts::iterator it=_ _contexts.find(c);
if (it==_contexts.end())
throw "invalid context";
// otherwise do stuff with the valid context variable
InternalContext *internalContext=*it.second;
}
With this method, there is no risk of an invalid memory access as you will only correctly use valid context references.
Look at the paper by Matt Bishop on Robust Programming. The use of tickets or tokens (similar to file handles in some respects, but also including a nonce - number used once) allows your library code ensure that the token it is using is valid. In fact, you allocate the data structure on behalf of the user, and pass back to the user a ticket which must be provided for each call to the API you define.
I have some code based closely on that system. The header includes the comments:
/*
** Based on the tickets in qlib.c by Matt Bishop (bishop#ucdavis.edu) in
** Robust Programming. Google terms: Bishop Robust Nonce.
** http://nob.cs.ucdavis.edu/~bishop/secprog/robust.pdf
** http://nob.cs.ucdavis.edu/classes/ecs153-1998-04/robust.html
*/
I also built an arena-based memory allocation system using tickets to identify different arenas.
You could define a new API call that takes uninitialised memory and initialises it in whatever way you need. Then, part of the client API is that the client must call the context initialisation function, otherwise undefined behaviour will result.
To sidestep the issue of a memory location of a previous context being reused, you could, in addition to freeing the context, reset it and remove the "magic" number, assuming of course that the user frees the context using your API. That way when the system returns that same block of memory for the next context request, the magic number check will fail.
see what your system does with uninitialzed menmory. m$ does: Uninitialized memory blocks in VC++