Define a struct pointer as threadprivate in OpenMP - c

From C I am calling a piece of Fortran code that then calls some other C code. In order to call the last bit of C code, I need to have two global pointers to an EarthModel struct and a SurveyGeometry struct that I have defined. I have tried to parallelize the for loop below in calcGreen.c, but have been unsuccessful with more than 1 thread (the program segfaults).
I need each thread to have its own pointer to different EarthModels and SurveyGeometrys while keeping the global definition. I tried using the omp threadprivate directive to give each thread its own struct pointer which it can allocate and free and maintain the global definition on the thread level. I have also read that the default stack is 2M for created threads, so I've tried giving the threads more memory by setting the environment variable with export OMP_STACKSIZE=512M (and higher), but the segfault persists.
shared.h
extern EarthModel *g_em;
extern SurveyGeometry *g_sg;
#pragma omp thradprivate(g_em, g_sg)
util.h
#include "shared.h"
EarthModel *g_em;
SurveyGeometry *g_sg;
calcGreen.c
#include "util.h"
...
omp_set_num_threads(2);
#pragma omp parallel for schedule(dynamic,1)
for(int ii=0; ii<nseg; ++ii){
for(int jj=0; jj<nseg; ++jj){
...
// code to allocate and initialize g_sg and g_em
g_sg = initSG();
g_em = initEM();
// code to pass through to Fortran and execute C function on g_sg and g_em
// code to free g_sg and g_em
freeSG(g_sg);
freeEM(g_em);
...
}
}
...
EDIT: Alternatively, is there a way of getting the structs g_sg and g_em from the first C function where there are allocated and set to the C function that Fortran calls in a thread safe way without using global variables?

Not entirely sure why this worked, but spelling "threadprivate" correctly AND moving the #pragma omp threadprivate directive to util.h seems to have done the trick. The first is unsurprising, but the second isn't intuitive to me. Thank you for the help.

If Harald's comment does not already solve the problem, some suggestions:
1) If it is allowed to change the source code of calcGreen.c and if each thread does not use the pointers before they are (re-?)allocated and (re-?)initialized by calling initSG() and initEM(), I would declare them as local variables inside the inner for-loop.
2) Are the implementations of initSG(), initEM(), freeSG() and freeEM() thread-safe and reentrant?

Related

changing extern function pointer to extern pointer using preprocessor

I am using library that I shouldn't change it files, that including my h file.
the code of the library looks somthing like like:
#include "my_file"
extern void (*some_func)();
void foo()
{
(some_func)();
}
my problem is that I want that some_func will be extern function and not extern pointer to function (I am implementing and linking some_func). and that how main will call it.
that way I will save little run time and code space, and no one in mistake will change this global.
is it possible?
I thought about adding in my_file.h somthing as
#define *some_func some_func
but it won't compile because asterisk is not allowed in #define.
EDIT
The file is not compiled already, so changes at my_file.h will effect the compilation.
First of all, you say that you can't change the source of the library. Well, this is bad, and some "betrayal" is necessary.
My approach is to let the declaration of the pointer some_func as is, a non-constant writable variable, but to implement it as constant non-writable variable, which will be initialized once for all with the wanted address.
Here comes the minimal, reproducible example.
The library is implemented as you show us:
// lib.c
#include "my_file"
extern void (*some_func)();
void foo()
{
(some_func)();
}
Since you have this include file in the library's source, I provide one. But it is empty.
// my_file
I use a header file that declares the public API of the library. This file still has the writable declaration of the pointer, so that offenders believe they can change it.
// lib.h
extern void (*some_func)();
void foo();
I separated an offending module to try the impossible. It has a header file and an implementation file. In the source the erroneous assignment is marked, already revealing what will happen.
// offender.h
void offend(void);
// offender.c
#include <stdio.h>
#include "lib.h"
#include "offender.h"
static void other_func()
{
puts("other_func");
}
void offend(void)
{
some_func = other_func; // the assignment gives a run-time error
}
The test program consists of this little source. To avoid compiler errors, the declaration has to be attributed as const. Here, where we are including the declarating header file, we can use some preprocessor magic.
// main.c
#include <stdio.h>
#define some_func const some_func
#include "lib.h"
#undef some_func
#include "offender.h"
static void my_func()
{
puts("my_func");
}
void (* const some_func)() = my_func;
int main(void)
{
foo();
offend();
foo();
return 0;
}
The trick is, that the compiler places the pointer variable in the read-only section of the executable. The const attribute is just used by the compiler and is not stored in the intermediate object files, and the linker happily resolves all references. Any write access to the variable will generate a runtime error.
Now all of this is compiled in an executable, I used GCC on Windows. I did not bother to create a separated library, because it doesn't make a difference for the effect.
gcc -Wall -Wextra -g main.c offender.c lib.c -o test.exe
If I run the executable in "cmd", it just prints "my_func". Apparently the second call of foo() is never executed. The ERRORLEVEL is -1073741819, which is 0xC0000005. Looking up this code gives the meaning "STATUS_ACCESS_VIOLATION", on other systems known as "segmentation fault".
Because I deliberately compiled with the debugging flag -g, I can use the debugger to examine more deeply.
d:\tmp\StackOverflow\103> gdb -q test.exe
Reading symbols from test.exe...done.
(gdb) r
Starting program: d:\tmp\StackOverflow\103\test.exe
[New Thread 12696.0x1f00]
[New Thread 12696.0x15d8]
my_func
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004015c9 in offend () at offender.c:16
16 some_func = other_func;
Alright, as I intended, the assignment is blocked. However, the reaction of the system is quite harsh.
Unfortunately we cannot get a compile-time or link-time error. This is because of the design of the library, which is fixed, as you say.
You could look at the ifunc attribute if you are using GCC or related. It should patch a small trampoline at load time. So when calling the function, the trampoline is called with a known static address and then inside the trampoline there is a jump instruction that was patched with the real address. So when running, all jump locations are directly in the code, which should be efficient with the instruction cache. Note that it might even be more efficient than this, but at most as bad as calling the function pointer. Here is how you would implement it:
extern void (*some_func)(void); // defined in the header you do not have control about
void some_func_resolved(void) __attribute__((ifunc("resolve_some_func")));
static void (*resolve_some_func(void)) (void)
{
return some_func;
}
// call some_func_resolved instead now

OpenMP and math.h

I've been experimenting with openmp and some math functions in C. If I try to declare and initialize some variables outside a parallel construct, then use them within a math function inside the parallel, when I compile using gcc -fopenmp practice.c -o practice I get the following error:
/usr/bin/ld: /tmp/ccQj4iIQ.o: in function `main._omp_fn.0':
practice.c:(.text+0xb3): undefined reference to `fmax'
collect2: error: ld returned 1 exit status
This issue happens with fmax, fmin, sqrt, pow, cos, etc. Some sample code that illustrates this is:
#include <omp.h>
#include <math.h>
void main(void){
double m=1;
double a=12;
#pragma omp parallel
{
m = fmax(m,a);
}
}
I've found that the issue goes away if I 1) move the fmax outside the parallel, or 2) re-initialize the variables inside the parallel, or 3) use fmax on 1 and 12 directly instead of m and a. This issue also does NOT occur if I simply try to use printf to print m and a inside the parallel, so I know each thread can "see" the values correctly.
Why is this happening, and is there a way to fix it other than the 3 things I've already tried? So far it seems like 2) is my best bet, but it seems silly to have to do initialization immediately inside the parallel when it would make more sense to do it beforehand.

OpenMP and Thread Local Storage identifier with icc

This is a simple test code:
#include <stdlib.h>
__thread int a = 0;
int main() {
#pragma omp parallel default(none)
{
a = 1;
}
return 0;
}
gcc compiles this without any problems with -fopenmp, but icc (ICC) 12.0.2 20110112 with -openmp complains with
test.c(7): error: "a" must be specified in a variable list at enclosing OpenMP parallel pragma
#pragma omp parallel default(none)
I have no clue which paradigm (i.e. shared, private, threadprivate) applies to this type of variables. Which one is the correct one to use?
I get the expected behaviour when calling a function that accesses that thread local variable, but I have trouble accessing it from within an explicit parallel section.
Edit:
My best solution so far is to return a pointer to the variable through a function
static inline int * get_a() { return &a; }
__thread is roughly analogous to the effect that the threadprivate OpenMP directive has. To a great extent (read as when no C++ objects are involved), both are often implemented using the same underlying compiler mechanism and therefore are compatible but this is not guaranteed to always work. Of course, the real world is far from ideal and we have to sometimes sacrifice portability for just having things working within the given development constraints.
threadprivate is a directive and not a clause, therefore you have to do something like:
#include "header_providing_a.h"
#pragma omp threadprivate(a)
void parallel_using_a()
{
#pragma omp parallel default(none) ...
... use 'a' here
}
GCC (at least version 4.7.1) treats __thread as implicit threadprivate declaration and you don't have to do anything.

How to run constructor even if "-nostdlib" option is defined

I have a dynamic library that contains a constructor.
__attribute__ ((constructor))
void construct() {
// This is initialization code
}
The library is compiled with -nostdlib option and I cannot change that. As a result there are no .ctor and .dtor sections in library and the constructor is not running on the library load.
As written there there should be special measures that allow running the constructor even in this case. Could you please advice me what and how that can be done?
Why do you need constructors? Most programmers I work with, myself included, refuse to use libraries with global constructors because all too often they introduce bugs by messing up the program's initial state when main is entered. One concrete example I can think of is OpenAL, which broke programs when it was merely linked, even if it was never called. I was not the one on the project who dealt with this bug, but if I'm not mistaken it had something to do with mucking with ALSA and breaking the main program's use of ALSA later.
If your library has nontrivial global state, instead see if you can simply use global structs and initializers. You might need to add flags with some pointers to indicate whether they point to allocated memory or static memory, though. Another method is to defer initialization to the first call, but this can have thread-safety issues unless you use pthread_once or similar.
Hmm missed the part that there where no .ctor and .dtor sections... forget about this.
#include <stdio.h>
#include <stdint.h>
typedef void (*func)(void);
__attribute__((constructor))
void func1(void) {
printf("func1\n");
}
__attribute__((constructor))
void func2(void) {
printf("func2\n");
}
extern func* __init_array_start;
int main(int argc, char **argv)
{
func *funcarr = (func*)&__init_array_start;
func f;
int idx;
printf("start %p\n", *funcarr);
// iterate over the array
for (idx = 0; ; ++idx) {
f = funcarr[idx];
// skip the end of array marker (0xFFFFFFFF) on 64 bit it's twice as long ;)
if (f == (void*)~0)
continue;
// till f is NULL which indicates the start of the array
if (f == NULL)
break;
printf("constructor %p\n", *f);
f();
}
return 0;
}
Which gives:
Compilation started at Fri Mar 9 09:28:29
make test && ./test
cc test.c -o test
func2
func1
start 0xffffffff
constructor 0x80483f4
func1
constructor 0x8048408
func2
Probably you need to swap the continue and break if you are running on an Big Endian system but i'm not entirely sure.
But just like R.. stated using static constructors in libraries is not so nice to the developers using your library :p
On some platforms, .init_array/.fini_array sections are generated to include all global constructors/destructors. You may use that.

Function pointer location not getting passed

I've got some C code I'm targeting for an AVR. The code is being compiled with avr-gcc, basically the gnu compiler with the right backend.
What I'm trying to do is create a callback mechanism in one of my event/interrupt driven libraries, but I seem to be having some trouble keeping the value of the function pointer.
To start, I have a static library. It has a header file (twi_master_driver.h) that looks like this:
#ifndef TWI_MASTER_DRIVER_H_
#define TWI_MASTER_DRIVER_H_
#define TWI_INPUT_QUEUE_SIZE 256
// define callback function pointer signature
typedef void (*twi_slave_callback_t)(uint8_t*, uint16_t);
typedef struct {
uint8_t buffer[TWI_INPUT_QUEUE_SIZE];
volatile uint16_t length; // currently used bytes in the buffer
twi_slave_callback_t slave_callback;
} twi_global_slave_t;
typedef struct {
uint8_t slave_address;
volatile twi_global_slave_t slave;
} twi_global_t;
void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback);
#endif
Now the C file (twi_driver.c):
#include <stdint.h>
#include "twi_master_driver.h"
void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback)
{
twi->slave.length = 0;
twi->slave.slave_callback = slave_callback;
twi->slave_address = slave_address;
// temporary workaround <- why does this work??
twi->slave.slave_callback = twi->slave.slave_callback;
}
void twi_slave_interrupt_handler(twi_global_t *twi)
{
(twi->slave.slave_callback)(twi->slave.buffer, twi->slave.length);
// some other stuff (nothing touches twi->slave.slave_callback)
}
Then I build those two files into a static library (.a) and construct my main program (main.c)
#include
#include
#include
#include
#include "twi_master_driver.h"
// ...define microcontroller safe way for mystdout ...
twi_global_t bus_a;
ISR(TWIC_TWIS_vect, ISR_NOBLOCK)
{
twi_slave_interrupt_handler(&bus_a);
}
void my_callback(uint8_t *buf, uint16_t len)
{
uint8_t i;
fprintf(&mystdout, "C: ");
for(i = 0; i < length; i++)
{
fprintf(&mystdout, "%d,", buf[i]);
}
fprintf(&mystdout, "\n");
}
int main(int argc, char **argv)
{
twi_init(2, &bus_a, &my_callback);
// ...PMIC setup...
// enable interrupts.
sei();
// (code that causes interrupt to fire)
// spin while the rest of the application runs...
while(1){
_delay_ms(1000);
}
return 0;
}
I carefully trigger the events that cause the interrupt to fire and call the appropriate handler. Using some fprintfs I'm able to tell that the location assigned to twi->slave.slave_callback in the twi_init function is different than the one in the twi_slave_interrupt_handler function.
Though the numbers are meaningless, in twi_init the value is 0x13b, and in twi_slave_interrupt_handler when printed the value is 0x100.
By adding the commented workaround line in twi_driver.c:
twi->slave.slave_callback = twi->slave.slave_callback;
The problem goes away, but this is clearly a magic and undesirable solution. What am I doing wrong?
As far as I can tell, I've marked appropriate variables volatile, and I've tried marking other portions volatile and removing the volatile markings. I came up with the workaround when I noticed removing fprintf statements after the assignment in twi_init caused the value to be read differently later on.
The problem seems to be with how I'm passing around the function pointer -- and notably the portion of the program that is accessing the value of the pointer (the function itself?) is technically in a different thread.
Any ideas?
Edits:
resolved typos in code.
links to actual files: http://straymark.com/code/ [test.c|twi_driver.c|twi_driver.h]
fwiw: compiler options: -Wall -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -mmcu=atxmega128a1 -DF_CPU=2000000UL
I've tried the same code included directly (rather than via a library) and I've got the same issue.
Edits (round 2):
I removed all the optimizations, without my "workaround" the code works as expected. Adding back -Os causes an error. Why is -Os corrupting my code?
Just a hunch, but what happens if you switch these two lines around:
twi->slave.slave_callback = slave_callback;
twi->slave.length = 0;
Does removing the -fpack-struct gcc flag fix the problem? I wonder if you haven't stumbled upon a bug where writing that length field is overwriting part of the callback value.
It looks to me like with the -Os optimisations on (you could try combinations of the individual optimisations enabled by -Os to see exactly which one is causing it), the compiler isn't emitting the right code to manipulate the uint16_t length field when its not aligned on a 2-byte boundary. This happens when you include a twi_global_slave_t inside a twi_global_t that is packed, because the initial uint8_t member of twi_global_t causes the twi_global_slave_t struct to be placed at an odd address.
If you make that initial field of twi_global_t a uint16_t it will probably fix it (or you could turn off struct packing). Try the latest gcc build and see if it still happens - if it does, you should be able to create a minimal test case that shows the problem, so you can submit a bug report to the gcc project.
This really sounds like a stack/memory corruption issue. If you run avr-size on your elf file, what do you get? Make sure (data + bss) < the RAM you have on the part. These types of issues are very difficult to track down. The fact that removing/moving unrelated code changes the behavior is a big red flag.
Replace "&my_callback" with "my_callback" in function main().
Because different threads access the callback address, try protecting it with a mutex or read-write lock.
If the callback function pointer isn't accessed by a signal handler, then the "volatile" qualifier is unnecessary.

Resources