I'm having trouble identifying why valgrind is throwing this error:
==82185== Thread 2:
==82185== Use of uninitialised value of size 8
==82185== at 0x401B9A: proc_outconnection_thread (station.c:401)
==82185== by 0x4E3CDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==82185== by 0x51471AC: clone (in /usr/lib64/libc-2.17.so)
==82185==
the pass im sending is 'this'
==82185== Use of uninitialised value of size 8
==82185== at 0x401BCA: proc_outconnection_thread (station.c:403)
==82185== by 0x4E3CDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==82185== by 0x51471AC: clone (in /usr/lib64/libc-2.17.so)
==82185==
As a bit of background information, the program i'm trying to create in C simulates a train station that uses TCP connections as "trains". I'm trying to get the program to use threads in order to both listen for and try and connect to other stations (other instances of the program).
The problem seems to exist when passing an internal data struct to a thread creation function via an argument struct that contains a pointer to the internal data struct. This way each thread has a pointer to the programs internal data.
In my efforts of testing, the file is compiled with
gcc -pthread -g -o station station.c -Wall -pedantic -std=gnu99
To produce my error, begin an instance of station with valgrind ./station tom authf logfile 3329 127.0.1.1
and then begin another instance with valgrind ./station tim authf logfile 3328 127.0.1.1
Due to an if statement in main, the station name with tim will attempt to connect to tom, and tom will create a socket and listen for tims attempt to connect. The connection seems to be successful however for some reason I'm also unable to flush the connection to send anything between, which i have a feeling may be because of what Valgrind is telling me.
What's so strange is that when a thread is created for the connection on tom's instance, no errors in valgrind are thrown despite having a very similar procedure for creating the thread (the same arguments are passed through the argument pointer and the same assignments are made).
Could it be a false positive for tim's end, or am I doing something severely wrong here?
Your problem is passing pointer on local variable into thread function. Simplest workaround is declare this variable as static or global, but this is not good if there are several threads use that variable.
Its better to allocate needed memory size for structure, initialize and pass this into thread function:
ArgStruct *argStruct = malloc(sizeof(ArgStruct));
if(argStruct == NULL) {
fprintf(stderr, "Cant alloc memory!\n");
exit(98);
}
argStruct->internalStruct = internal;
argStruct->clientCon = fdopen(fd, "r+");
pthread_create(&threadId, NULL, proc_outconnection_thread, (void *)argStruct);
Also, don't forget to free this memory (at the end of proc_outconnection_thread() for example).
Track the value of your internal data structure back to where it comes from and you will see that it originates from a struct object that is not initialized. You later assign values to some of the fields, but not to all.
Always initialize struct objects, and at the same time watch that you have a convention that makes it clear what default initialization (as if done with 0) means for the type.
If, one day, you really have a performance bottleneck because your compiler doesn't optimize an unused initialization, think of it again and do it differently. Here, because you are launching threads and do other complicated stuff, the difference will never be measurable.
Related
For a project written in C, I am creating a number of semaphore with sem_open, some are binary other are counting, I don't think it matters. The semaphores are stored in a static structure (singleton) then I fork the process multiple time, and afterwards exit all my forks. The main process returns however only after I used sem_close() followed by sem_unlink() on all my semaphores.
void init_semaphore(void)
{
ru->stop = sem_open("/stop", O_CREAT, 0644, 1);
...
}
void sem_close_all(void)
{
sem_close(ru->stop);
...
sem_unlink("/stop");
}
When I run valgrind I get the following errors:
==744644== 39 bytes in 1 blocks are still reachable in loss record 10 of 13
==744644== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==744644== by 0x49107A8: __sem_check_add_mapping (sem_routines.c:104)
==744644== by 0x49104BB: sem_open##GLIBC_2.34 (sem_open.c:192)
==744644== by 0x10A34F: init_semaphore
==744644== by 0x10A0A9: init_all
==744644== by 0x10A05E: main
So I don't understand how to avoid those leaks that (I think) happens in my children ?... Note: I can't use sem_destroy because of project restriction.
Am I doing something wrong ? I tried to use free() on sem_t *stop, but it didn't get rid of the leaks. Is there a solution I could use to fix this, is it a valgrind false positive ?... I tried to use my sem_close_all() function at the end of all my process, but this does not fix the valgrind leak report.
I would not worry too much about it. Valgrind suppresses many other known memory leaks.
The error-checking tools detect numerous problems in the system
libraries, such as the C library, which come preinstalled with your
OS. You can't easily fix these, but you don't want to see these errors
(and yes, there are many!) So Valgrind reads a list of errors to
suppress at startup. A default suppression file is created by the
./configure script when the system is built.
You can manually add it permanently yourself
tried to use free() on sem_t *stop
Do not do this.
I had an exercise for class a few weeks ago, my solution was good, but I noticed some weird behaviour when observing it for a longer time.
The exercise was generating a deadlock with two posix threads and then to resolve it. (I abstracted the solution so it has no unnecessary code.
The scenario is the following:
I have two threads who share two fictional resources
both threads start in sequence and then try to occupy both resources (in sequence too)
both threads have different time spans for occupying
when a thread has both resources he works for 5 seconds and then frees the resources and takes a break, when the break is over he begins again with trying to occupy both resources
every 8 seconds a function checkes if both threads have the state waiting (both threads have ONE resource and are waiting for the second)
when a deadlock occures, the thread who worked more is getting canceled and then restarted
Here comes the problem, depending on the machine and the compilerflags the output says that e.g. thread A is cancelled but then thread B started. I tried it on different computers with different compilers, with different istallations.
Weird is that I compile with gcc -Wall -Werror -ansi -pedantic -D_POSIX_C_SOURCE=200809L -pthread -lrt and the problem occures with the second deadlock, but when I remove -Wall and -Werror the problems comes with the 3. deadlock 0o
I uploaded the source here. Compile flags are in the source, I tried gcc and clang.
And I also tried Ubuntu 13.04 and Arch.
Here is the output, I marked the lines with "-->"
Did I forget something so this effect appears? I don't think that there are bugs in some libs.
The problem is that you are passing the address of a local variable to the thread. And that this local variable may no longer exist when the thread starts and you are dereferencing the address location that used to hold the local variable but which now holds something else.
Since it is in the stack space of the program you aren't getting a segfault.
Here's a highlight of the problem areas of code and how it can be caused:
void resolve_deadlock()
{
void *pthread_exit_state;
int id_a = THREAD_A;
int id_b = THREAD_B;
<some code to detect deadlocks and kill a thread>
/* restart the killed thread */
if (pthread_create(&threads[THREAD_B], NULL, &thread_function, (void *) &id_b) != 0) {
perror("Create THREAD_B\n");
exit(EXIT_FAILURE);
}
}
So the program runs and:
resolve_deadlock is called
thread X is killed
pthread_create is called to create a thread
resolve_deadlock function ends
stack is over written on next function call
The OS swaps us out and runs another thread
thread X runs and dereferences our local var which no longer exists -> undefined behaviour.
When I use OpenCL to process many chunks of data it crashes in the 7th iteration.
I ensure that memory is released before each iteration of the loop, and allocated again for new chunk, but the crash still occurs with an error of -38 on Clenqueuewritebuffer()
I have tried a lot, but am not getting anywhere.
The following is the flow of my code :
clGetPlatformIDs
clGetDeviceIDs
clCreateContext
clCreateCommandQueue
clCreateProgramWithSource
clBuildProgram
clCreateKernel
for(x){
clCreateBuffer
clEnqueueWriteBuffer
clSetKernelArg
clEnqueueNDRangeKernel
clFinish
clEnqueueMapBuffer
clReleaseMemObject
}
Is it correct or do I have to use it in other ways?
If so, What am I doing wrong?...
Some code and the specific command where this error comes up would be nice.
Error -38 is CL_INVALID_MEM_OBJECT
Please check if you initialised all memory object correctly.
Could you explicitly check for the output of clCreateBuffer clCreateImage.. whatever you are using? This error could also come if the Buffer you provided to your kernel doesn't match it's parameter definition in terms of type or read/write modifiers.
EDIT to match the edited question:
1) You can change the kernel arg when the kernel is not running, but good practice is to set a kernel arg only once. (At best directly after clCreateKernel)
Even better is to reuse the assigned buffer. (Or create several kernels if you use the same buffer combinations several times)
In your case I would at least do createBuffer and setKernelArg before the loop and releaseMemObject after the loop.
2) You are doing clEnqueueMapBuffer on you mem-object. This should be followed by a clEnqueueUnmapMemObject when you are done interacting with you object. If you just want to read data from your buffer, try: enqueueReadBuffer as equivalent to enqueueWriteBuffer
I am writing a stock market system that uses several threads to process the incoming orders.
The project was going fine until i added one more thread. When i launch the said thread my program segfaults. The segfault is generated in the above thread by an invalid memory read.
This segfault is generated only when the program is compiled with optimization -O2 and above.
After compiling the programming with debug info using -g3 and running valgrind using
valgrind ./marketSim
and get the following output about the segfault
==2524== Thread 5:
==2524== Invalid read of size 4
==2524== at 0x402914: limitWorker (limit.c:4)
==2524== by 0x4E33D5F: start_thread (in /lib/libpthread-2.14.so)
==2524== Address 0x1c is not stack'd, malloc'd or (recently) free'd
==2524==
==2524==
==2524== Process terminating with default action of signal 11 (SIGSEGV)
==2524== Access not within mapped region at address 0x1C
==2524== at 0x402914: limitWorker (limit.c:4)
==2524== by 0x4E33D5F: start_thread (in /lib/libpthread-2.14.so)
The thread is launched like this
pthread_t limit_thread;
pthread_create(&limit_thread, NULL, limitWorker, q);
q is variable which is also passed to other threads i initialize
the limitWorker code is as follows
void *limitWorker(void *arg){
while(1){
if ((!lsl->empty) && (!lbl->empty)) {
if ((currentPriceX10 > lGetHead(lsl)->price1) && (currentPriceX10 < lGetHead(lbl)->price1)) {
llPairDelete(lsl,lbl);
}
}
}
return NULL;
}
Line 4: The line which according to valgrind produces the segfault is void *limitWorker(void *arg){
Also some more info this is compiled using gcc 4.6.1, when using gcc 4.1.2 the program doesn't segfault, even when it is optimized although it's performance is much worse.
When the program is complied using clang it also doesn't segfault when optimized.
Question
Am i making a mistake?? Is it a gcc bug?? What course of action should i follow??
If you want to take a look at the code the github page is https://github.com/spapageo/Stock-Market-Real-Time-System/
The code in question is in file marketSim.c and limit.c
EDIT: Valgrind specifies that the invalid read happens at line 4. Line 4 is the "head" of the function. I don't know compiler internals, so my naive thought is that the argument is wrong. BUT while using gdb after the segfault the argument , because the program is optimized, is optimized out according to gdb. So i don't think that that is the culprit.
If you are compiling for a 64 bit system, then 0x1c is the offset of the price1 field within the order struct. This implies that either (or both) of lsl->HEAD and lbl->HEAD are NULL pointers when the fault occurs.
Note that because your limitWorker() function includes no thread synchronisation outside of the llPairDelete() function, it is incorrect and the compiler may not be reloading those values on every execution of the loop. You should be using a using a mutex to protect the linked lists even in the read-only paths.
Additionally, your lsl and lbl variables are multiply defined. You should declare them as extern in limit.h, and define them without the extern in limit.c.
I've done a lot of programming but not much in C, and I need advice on debugging. I have a static variable (file scope) that is being clobbered after about 10-100 seconds of execution of a multithreaded program (using pthreads on OS X 10.4). My code looks something like this:
static float some_values[SIZE];
static int * addr;
addr points to valid memory address for a while, and then gets clobbered with some value (sometimes 0, sometimes nonzero), thereby causing a segfault when dereferenced. Poking around with gdb I have verified that addr is being layed out in memory immediately after some_values as one would expect, so my first guess would be that I have used an out-of-bounds index to write to some_values. However, this is a tiny file, so it is easy to check this is not the problem.
The obvious debugging technique would be to set a watchpoint on the variable addr. But doing so seems to create erratic and inexplicable behavior in gdb. The watchpoint gets triggered at the first assignment to addr; then after I continue execution, I immediately get a nonsensical segfault in another thread...supposedly a segfault on accessing the address of a static variable in a different part of the program! But then gdb lets me read from and write to that memory address interactively.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x001d5bd0
0x0000678d in receive (arg=0x0) at mainloop.c:39
39 sample_buf_cleared ++;
(gdb) p &sample_buf_cleared
$17 = (int *) 0x1d5bd0
(gdb) p sample_buf_cleared
$18 = 1
(gdb) set sample_buf_cleared = 2
(gdb)
gdb is obviously confused. Does anyone know why? Or does anyone have any suggestions for debugging this bug without using watchpoints?
You could put an array of uint's between some_values and addr and determine if you are overruning some_values or if the corruption affects more addresses then you first thought. I would initialize padding to DEADBEEF or some other obvious pattern that is easy to distinguish and unlikely to occur in the program. If a value in the padding changes then cast it to float and see if the number makes sense as a float.
static float some_values[SIZE];
static unsigned int padding[1024];
static int * addr;
Run the program multiple times. In each run disable a different thread and see when the problems goes away.
Set the programs process affinity to a single core and then try the watchpoint. You may have better luck if you don't have two threads simultaneously modifying the value. NOTE: This solution does not preclude that from happening. It may make it easier to catch in a debugger.
static variables and multi-threading generally do not mix.
Without seeing your code (you should include your threaded code), my guess is that you have two threads concurrently writing to addr variable. It doesn't work.
You either need to:
create separate instances of addr for each thread; or
provide some sort of synchronisation around addr to stop two threads changing the value at the same time.
Try using valgrind; I haven't tried valgrind on OS X, and I don't understand your problem, but "try valgrind" is the first thing I think of when you say "clobbered".
One thing you could try would be to create a separate thread whose only purpose is to watch the value of addr, and to break when it changes. For example:
static int * volatile addr; // volatile here is important, and must be after the *
void *addr_thread_proc(void *arg)
{
while(1)
{
int *old_value = addr;
while(addr == old_value) /* spin */;
__asm__("int3"); // break the debugger, or raise SIGTRAP if no debugger
}
}
...
pthread_t spin_thread;
pthread_create(&spin_thread, NULL, &addr_thread_proc, NULL);
Then, whenever the value of addr changes, the int3 instruction will run, which will break the debugger, stopping all threads.
gdb often acts weird with multithreaded programs. Another solution (if you can afford it) would be to put printf()s all over the place to try and catch the moment where your value gets clobbered. Not very elegant, but sometimes effective.
I have not done any debugging on OSX, but I have seen the same behavior in GDB on Linux: program crashes, yet GDB can read and write the memory which program just tried to read/write unsuccessfully.
This doesn't necessarily mean GDB is confused; rather the kernel allowed GDB to read/write memory via ptrace() which the inferior process is not allowed to read or write. IOW, it was a (recently fixed) kernel bug.
Still, it sounds like GDB watchpoints aren't working for you for whatever reason.
One technique you could use is to mmap space for some_values rather than statically allocating space for them, arrange for the array to end on a page boundary, and arrange for the next page to be non-accessible (via mprotect).
If any code tries to access past the end of some_values, it will get an exception (effectively you are setting a non-writable "watch point" just past some_values).