Is there a requirement to use context in OpenSSL MP functions? - c

Part of the functions from <openssl/bn.h> take BN_CTX *ctx as the last argument.
This is a structure that stores temporary BIGNUM variables, allowing you to avoid frequent memory allocation when creating variables with repeated subroutine calls.
I thought that using ctx is not necessary, because it can only help optimize performance. But this functionality may not be used. For example, I will call the division operation only once.
I also found that in the OpenSSL 1.1.1 version, BN_mod, BN_div and BN_mul worked, even if NULL was passed instead of a ctx pointer.
In version 3.2.0, this leads to a segmentation fault.
Please explain the logic of using BN_CTX.

Related

How to hook an unknown number of functions - x86

Problem description
At runtime, I am given a list of addresses of functions (in the same process). Each time any of them is called, I need to log its address.
My attempt
If there was just one function (with help of a hooking library like subhook) I could create a hook:
create_hook(function_to_be_hooked, intermediate)
intermediate(args...):
log("function with address {&function_to_be_hooked} got called")
remove_hook(function_to_be_hooked)
ret = function_to_be_hooked(args...)
create_hook(function_to_be_hooked, intermediate)
return ret
This approach does not trivially extend. I could add any number of functions at compile-time, but I only know how many I need at runtime. If I hook multiple functions with the same intermediate, it doesn't know who called it.
Details
It seems like this problem should be solved by a hooking library. I am using C/C++ and Linux and the only options seem to be subhook and funchook, but none of them seem to support this functionality.
This should be fairly doable with assembly language manually, like if you were modifying a hook library. The machine code that overwrites the start of the original function can set a register or global variable before jumping to (or calling) the hook. Using call would push a unique return address which the hook likely wouldn't want to actually return to. (So it unbalances the return-address predictor stack, unless the hook uses ret with a modified return address, or it uses some prefixes as padding to make the call hook or call [rel hook_ptr] or whatever end at an instruction boundary of the original code so it can ret.)
Like mov al, imm8 if the function isn't variadic in the x86-64 System V calling convention, or mov r11b, imm8 in x86-64. Or mov ah, imm8 would work in x86-64 SysV without disturbing the AL= # of XMM args for a variadic function and still only be 2 bytes. Or use push imm8.
If the hook function itself was written in asm, it would be straightforward for it to look for a register, and extra stack arg, or just a return address from a call, as an extra arg without disturbing its ability to find the args for the hooked function. If it's written in C, looking in a global (or thread-local) variable avoids needing a custom calling convention.
But with existing hook libraries, assuming you're right they don't pass an int id
Using that library interface, it seems you'd need to generate an unknown number of unique things that are callable as a function pointer? That's not something ISO C can do. It can be strictly ahead-of-time compiled, not needing to generate any new machine code at run-time. It's compatible with a strict Harvard architecture.
You could define a huge array of function pointers to hook1(), hook2(), etc. which each look for their own piece of side data in another struct member of that array. Enough hook functions that however many you need at run-time, you'll already have enough. Each one can hard-code the array element it should access for its unique string.
You could use some C preprocessor macros to define some large more-than-enough number of hooks, and separately get an array initialized with structs containing function pointers to them. Some CPP tricks may allow iterating over names so you don't have to manually write out define_hook(0) define_hook(1) ... define_hook(MAX_HOOKS-1). Or maybe have a counter as a CPP macro that gets #defined to a new higher value.
Unused hooks would be sitting in memory and in your executable on disk, but wouldn't ever be called so they wouldn't be hot in cache. Ones that didn't share a page with any other code wouldn't ever need to get paged in to RAM at all. Same for later parts of the array of pointers and side-data. It's inelegant and clunky, and doesn't allow an unbounded number, but if you can reasonably say that 1024 or 8000 "should be enough for everyone", then this can work.
Another way also has many downsides, different but worse than the above. Especially that it requires calling the rest of your program from the bottom of a recursion (not just calling an init function that returns normally), and uses a lot of stack space. (You might ulimit -s to bump up your stack size limit over Linux's usual 8MiB.) Also it requires GNU extensions.
GNU C nested functions can make new callable entities with, making "trampoline" machine code on the stack when you take the address of a nested function. This would your stack executable, so there's a security hardening downside. There'd be one copy of the actual machine code for the nested function, but n copies of trampoline code that sets up a pointer to the right stack frame. And n instances of a local variable that you can arrange to have different values.
So you could use a recursive function that went through your array of hooks like foo(counter+1, hooks+1), and have the hook be a nested function that reads counter. Or instead of a counter, it can be a char* or whatever you like; you just set it in this invocation of the function.
This is pretty nasty (the hook machine code and data is all on the stack) and uses potentially a lot of stack space for the rest of your program. You can't return from this recursion or your hooks will break. So the recursion base-case will have to be (tail) calling a function that implements the rest of your program, not returning to your ultimate caller until the program is ending.
C++ has some std:: callable objects, like std::function = std::bind of a member function of a specific object, but they're not type-compatible with function pointers.
You can't pass a std::function * pointer to a function expecting a bare void (*fptr)(void) function pointer; making that happen would potentially require the library to allocate some executable memory and generate machine code in it. But ISO C++ is designed to be strictly ahead-of-time compilable, so they don't support that.
std::function<void(void)> f = std::bind(&Class::member, hooks[i]); compiles, but the resulting std::function<void(void)> object can't convert to a void (*)() function pointer. (https://godbolt.org/z/TnYM6MYTP). The caller needs to know it's invoking a std::function<void()> object, not a function pointer. There is no new machine code, just data, when you do this.
My instinct is to follow a debugger path.
You would need
a uin8_t * -> uint8_t map,
a trap handler, and
a single step handler
In broad stokes,
When you get a request to monitor a function, add its address, and the byte pointed by it to the map. Patch the pointed-to byte with int3.
The trap handler shall get an offending address from the exception frame, and log it. Then It shall unpatch the byte with the value from the map, set the single-step flag (TF) in FLAGS (again, in the exception frame), and return. That will execute the instruction, and raise a single-step exception.
You can set TF from user-space yourself and catch the resulting SIGTRAPs until you clear it (on a POSIX OS); it's more common for TF to only be used by debuggers, e.g. set by the kernel as part of Linux's ptrace(PTRACE_SINGLESTEP). But setting/clearing TF is not a privileged operation. (Patching bytes of machine code with int3 is how debuggers implement software breakpoints, not using x86's dr0-7 hardware debug registers. In your own process, no system call is necessary after an mprotect to make it writeable.)
The single-step handler shall re-patch int3, and return to let the program run until it hits int3 again.
In POSIX, the exception frame is pointed by uap argument to a sigaction handler.
PROS:
No bloated binary
No compile-time instrumentation
CONS:
Tricky to implement correctly. Remapping text segment writable; invalidating I-cache; perhaps something more.
Huge performance penalty; a no-go in real-time system.
Funchook now implements this functionality (on master branch, to be released with 2.0.0).

A problem in parse certificate with c and openssl

I have a self-signature certificate in X509, and I need to parse the key in this certificate.
I referred to the code of this website: https://zakird.com/2013/10/13/certificate-parsing-with-openssl
And in his parsing part, there are some grammar I can't use.
Like:
int pubkey_algonid = OBJ_obj2nid(cert->cert_info->key->algor->algorithm);
and:
rsa_key = pkey->pkey.rsa;
rsa_e_dec = BN_bn2dec(rsa_key->e);
rsa_n_hex = BN_bn2hex(rsa_key->n);
I can't use '->' to refer the struct internal variables. Am I need some files necessary ?
How do I solve it? Or is any other way to get the key in certificate?
You're referring to a website from 2013. Old versions of OpenSSL (and SSLeay before it) did allow access to fields in many structures (by declaring them publicly) including X509 (a typedef for struct x509_st) for a cert, and RSA for an RSA key(pair). Since 2016 it no longer does; this is called 'opaque' typing and is considered good style for a library which undergoes significant changes from time to time to avoid those changes causing crashes, corruption, and dangerously or even catastrophically wrong results in programs using the library. By making the interface 'opaque' internal changes either still work correctly, or if they can't, give an error message rather than undetected and possibly dangerous nonsense or undefined behavior.
Use X509_get_pubkey or X509_get0_pubkey to get the publickey as an EVP_PKEY structure, from which
you can get the (already-converted) NID for the algorithm with EVP_PKEY_id and
assuming the key is RSA get the RSA-specific structure with EVP_PKEY_get1_RSA or EVP_PKEY_get0_RSA, from which
you can get the n and e fields with RSA_get0_n and RSA_get0_e (or both at once with RSA_get0_key).
The difference between get or get1 routines and get0 routines is that the former allocate a copy, which you should free using the appropriate routine when done to avoid a memory leak; the latter (get0) share existing memory and should not be freed, but must not be used if the thing they are sharing is freed or otherwise invalidated.
PS: the fact the cert is self-signed has no effect on parsing the publickey, although it can affect whether a system that does so is secure.
But that's out of scope for SO.

Bison - symbol table - matching free for malloc

I am going through mfcalc example in the Bison manual and I had a question about the symbol table.
Specifically in the routine putsym() we have calls to malloc but I don't see the corresponding call to free. Do we need to deallocate symbol table (sym_table in the following code) manually or does the tool take care of this automatically?
symrec *
putsym (char const *sym_name, int sym_type)
{
symrec *ptr = (symrec *) malloc (sizeof (symrec));
ptr->name = (char *) malloc (strlen (sym_name) + 1);
strcpy (ptr->name,sym_name);
ptr->type = sym_type;
ptr->value.var = 0; /* Set value to 0 even if fctn. */
ptr->next = (struct symrec *)sym_table;
sym_table = ptr;
return ptr;
}
"The tool" knows nothing about what your actions do.
I quoted "the tool" because in reality, there are at least two code generation tools involved in most parsing projects: a parser generators (bison, in this case) and a scanner generator ((f)lex, perhaps). The mfcalc example uses a hand-built lexer to avoid depending on lex although it would probably have been simpler to have used (f)lex. In any event, the only calls to the symbol table library are in the scanner and have absolutely nothing to do with the bison-generated code.
Of course, there are other tools at play. For example, the entire project is built with a C compiler and runs inside some kind of hosted environment (to use the words of the C standard); in other words, an operating system and runtime support library which includes implementations of malloc and free (although, as you point out, free is nowhere called by the example code).
I mention these last because they are relevant to your question. When a process terminates, all process resources are released, including its memory image. (This is not required by the C standard but almost all hosted environments work that way.) So you don't really need to free() memory allocated if it is going to be in use up to program termination.
Like global variables, unreleased memory allocations were pretty common at one time. These days, such things are considered poor practice (at best) and most programmers will avoid them, but it wasn't always the case. There was a time when many programmers considered it wasteful to track resources only in order to release them just before program termination, or to jump through the hoops necessary to ensure that pre-termination cleanup was guaranteed to execute. (Even today, many programmers will just insert a call to exit(1) when an unrecoverable error occurs, rather than going to the bother of tracking down and manually freeing every allocated memory block. Particularly in non-production code.)
Whether you approve of this coding style or not, the examples in the bison manual (and many other code examples of all kinds) date back to that innocent time.
So, it's true that the symbol table entries in this example are never freed. Your production code should probably do better, but it also should probably use a more efficient data structure and avoid depending on a (single) global context. But none of that has anything to do with the bison features that mfcalc is attempting to illustrate.

memcpy [or not?] and multithreading [std::thread from c++11]

I writing a software in C/C++ using a lot BIAS/Profil, an interval algebra library. In my algorithm I have a master which divide a domain and feeds parts of it to slave process(es). Those return an int statute about those domain parts. There is common data for reading and that's it.
I need to parallelize my code, however as soon as 2 slave-threads are running (or more I guess) and are both calling functions of this library, it segfaults. What is peculiar about those segfaults, is that gdb rarely indicates the same error line from two builds: it depends on the speed of the threads, if one started earlier, etc. I've tried having the threads yield until a go-ahead from the master, it 'stabilize' the error. I'm fairly sure that it comes from the calls to memcpy of the library (following the gdb backtrace, I always end-up on a BIAS/Profil function calling a memcpy. To be fair, almost all functions call a memcpy to a temporary object before returning the result...). From what I read on the web, it would appear that memcpy() could be not thread-safe, depending on the implementations (especially here). (It seems weird for a function supposed to only read the shared data... or maybe when writing the thread-wise data both threads go for the same memory space?)
To try to address this, I'd like to 'replace' (at least for tests if behavior changes) the call to memcpy for a mutex-framed call. (something like mtx.lock();mempcy(...);mtx.unlock();)
1st question: I'm not a dev/code engineer at all, and lack of lot of base knowledge. I think that as I use a pre-built BIAS/Profil library, the memcpy called is the one of the system the library was built on, correct? If so, would it change anything were I to try building the library from source on my system? (I'm not sure I can build this library hence the question.)
2nd question:
in my string.h, memcpy is declared by:
#ifndef __HAVE_ARCH_MEMCPY
extern void * memcpy(void *,const void *,__kernel_size_t); #endif and in some other string headers (string_64.h, string_32.h) a definition of the form: #define memcpy(dst, src, len) __inline_memcpy((dst), (src), (len)) or some more explicit definition, or just a declaration like the one quoted.
It's starting to get ugly but, ideally, I'd like to create a pre-processor variable #define __HAVE_ARCH_MEMCPY 1, and a void * memcpy(void *,const void *,__kernel_size_t) which would do the mutex-framed memcpy with the the dismissed memcpy.
The idea here is to avoid messing with the library and make it work with 3 lines of code ;)
Any better idea? (it would make my day...)
IMHO you shouldn't concentrate to the memcpy()s, but to the higher level funktionality.
And memcpy() is thread-safe if the handled memory intervals of the parallel running threads don't overlap. Practically, in the memcpy() is there only a for(;;) loop (with a lot of optimizations) [at least in glibc], it is the cause, why is it in declared.
If you want to know, what your parallel memcpy()-ing threads will do, you should imagine the for(;;) loops which copy memory through longint-pointers.
Given that your observations, and that the Profil lib is from the last millennium, and that the documentation (homepage and Profil2.ps) do not even contain the word "thread", I would assume that the lib is not thread safe.
1st: No, usually memcpy is part of libc which is dynamically linked (at least nowadays). On linux, check with ldd NAMEOFBINARY, which should give a line with something like libc.so.6 => /lib/i386-linux-gnu/libc.so.6 or similar. If not: rebuild. If yes: rebuilding could help anyway, as there are many other factors.
Besides this, I think memcpy is thread safe as long as long as you do never write back data (even writing back unmodified data will hurt: https://blogs.oracle.com/dave/entry/memcpy_concurrency_curiosities).
2nd: If it turns out that you have to use a modified memcpy, also think about LD_PRELOAD.
In general, you must use a critical section, mutex, or some other protection technique to keep multiple threads from accessing non thread safe (non- re-entrant) functions simultaneously. Some ANSI C implementations of memcpy() are not thread safe, some are. ( safe, not safe )
Writing functions that are thread-safe, and/or writing threaded programs that can safely accommodate non thread safe functions is a substantial topic. Very doable, but requires reading up on the topic. There is much written. This, will at least help you to start asking the right questions.

shared_ptr Assertion px != 0 failed

I have a fairly complex multi threaded application (server) that from time to time will crash due to an assert:
/usr/include/boost/smart_ptr/shared_ptr.hpp:418: T* boost::shared_ptr< <template-parameter-1-1> >::operator->() const [with T = msg::Player]: Assertion `px != 0' failed.
I have been unable to pinpoint the cause and was wondering if this is a problem with boost::shared_ptr or it is me?
I tried g++ 4.4.3-4ubuntu5 and llvm-g++ (GCC) 4.2.1 with optimization and without optimization and libboost1.40-dev (= 1.40.0-4ubuntu4).
There should be no problem with using boost::shared_ptr as long as you initialize your shared pointers correctly and use the same memory management context for all your shared object libraries.
In your case you are trying to use an uninitialized shared pointer.
boost::shared_ptr<Obj> obj;
obj->Something(); // assertion failed
boost::shared_ptr<Obj> obj(new Obj);
obj->Something(); // ok
I would advise to initialize them right on declaration whenever possible. Exception handling can create a lot of "invisible" paths for the code to run and it might be quite difficult to identify the non initialized shared pointers.
PS: There are other issues if you load/unload modules where shared_ptr are in use leading to chaos. This is very hard to solve but in this case you would have a non zero pointer. This is not what is happening to you right now.
PPS: The pattern used is called RAII (Resource Acquisition Is Initialization)
you might want to make sure that you
always use a named smart pointer variable to hold the result of new
like it is recommended here: boost::shared_ptr - Best Practices
Regards,
Jonny
Here's to reviving an ancient question. I just hit this, and it was due to a timing issue. I was trying to use the shared_ptr from one thread before I'd finished initializing it in another.
So if someone hits the above message, check your timing to ensure your shared_ptr has been initialized.

Resources