Why am I getting segmentation fault here?

Why am I getting segmentation fault here? - c

I have the following code, where I try to write something into the stack. I write at the bottom of the stack, which the application still hasn't touched (Note that stack grows downwards and stackaddr here points to the bottom).
However I get segmentation fault even after doing mprotect to give both write and read permissions to that memory region. I get segmentation fault even if I use the compilation flag -fno-stack-protector. What is happening here?
pthread_attr_t attr;
void * stackaddr;
int * plocal_var;
size_t stacksize;
pthread_getattr_np(pthread_self(), &attr);
pthread_attr_getstack( &attr, &stackaddr, &stacksize );
printf( "stackaddr = %p, stacksize = %d\n", stackaddr, stacksize );
plocal_var = (int*)stackaddr;
mprotect((void*)plocal_var, 4096, PROT_READ | PROT_WRITE);
*plocal_var = 4;
printf( "local_var = %d!\n", *plocal_var );

You are almost certainly trying to mprotect() pages which are not yet mapped. You should check the return code: mprotect() is probably returning -1 and setting errno to ENOMEM (this is documented in the mprotect(2) man page).
Stack pages are mapped on demand, but the kernel is clever enough to distinguish between page faults caused by an access at or above the current stack pointer (which are caused by valid attempts to expand the stack downwards, by decrementing the stack pointer, and then performing a read or write to some positive offset from the new value), and page faults caused by an access below the stack pointer (which are not valid).

Related

Why can't malloc/free APIs be correctly called in threads created by clone?

Why does some glibc's APIs(such as function malloc(), realloc() or free()) can not be correctly called in threads that are created by syscall clone?
Here is my code only for testing:
int thread_func( void *arg )
{
void *ptr = malloc( 4096 );
printf( "tid=%d, ptr=%x\n", gettid(), ptr );
sleep(1);
if( ptr )
free( ptr );
return 0;
}
int main( int argc, char **argv )
{
int i, m;
void *stk;
int stksz = 1024 * 128;
int flag = CLONE_VM | CLONE _FILES | CLONE_FS | CLONE_SIGHAND;
for( i=m=0; i < 100; i++ )
{
stk = malloc( stksz );
if( !stk ) break;
if( clone( thread_func, stk+stksz, flags, NULL, NULL, NULL, NULL ) != -1 )
m++;
}
printf( "create %d thread\n", m );
sleep(10);
return 0;
}
Testing result: thread thread_func or main thread main will be blocked on malloc() or free() function randomly. Or sometimes causes malloc() or free() to crash.
I think may be malloc() and free() need certain TLS data to distinguish every thread.
Does anyone know the reason, and what solution can been used to resolve this problem?

I think may be malloc() and free() need certain TLS data to distinguish every thread.
Glibc's malloc() and free() do not rely on TLS. They use mutexes to protect the shared memory-allocation data structures. To reduce contention for those, they employ a strategy of maintaining separate memory-allocation arenas with independent metadata and mutexes. This is documented on their manual page.
After correcting the syntax errors in your code and dummying-out the call to non-existent function gettid() (see comments on the question), I was able to produce segmentation faults, but not blockage. Perhaps you confused the exit delay caused by your program's 10-second sleep with blockage.
In addition to any issue that may have been related to your undisclosed implementation of gettid(), your program contains two semantic errors, each producing undefined behavior:
As I already noted in comments, it passes the wrong child-stack pointer values.*
It uses the wrong printf() directive in thread_func() for printing the pointer. The directive for pointer values is %p; %x is for arguments of type unsigned int.
After I corrected those errors as well, the program consistently ran to completion for me. Revised code:
int thread_func(void *arg) {
void *ptr = malloc(4096);
// printf( "tid=%d, ptr=%x\n", gettid(), ptr );
printf("tid=%d, ptr=%p\n", 1, ptr);
sleep(1);
if (ptr) {
free(ptr);
}
return 0;
}
int main(int argc, char **argv) {
int i, m;
char *stk; // Note: char * instead of void * to afford arithmetic
int stksz = 1024 * 128;
int flags = CLONE_VM | CLONE_FILES | CLONE_FS | CLONE_SIGHAND;
for (i = m = 0; i < 100; i++) {
stk = malloc( stksz );
if( !stk ) break;
if (clone(thread_func, stk + stksz - 1, flags, NULL, NULL, NULL, NULL ) != -1) {
m++;
}
}
printf("create %d thread\n", m);
sleep(10);
return 0;
}
Even with that, however, all is not completely well: I see various anomalies in the program output, especially near the beginning.
The bottom line is that, contrary to your assertion, you are not creating any threads, at least not in the sense that the C library recognizes. You are merely creating processes that have behavior similar to threads'. That may be sufficient for some purposes, but you cannot rely on the system to treat such processes identically to threads.
On Linux, bona fide threads that the system and standard library will recognize are POSIX threads, launched via pthread_create(). (I note here that modifying your program to use pthread_create() instead of clone() resolved the output anomalies for me.) You might be able to add flags and arguments to your clone() calls that make the resulting processes enough like the Linux implementation of pthreads to be effectively identical, but whyever would you do such a thing instead of just using real pthreads in the first place?
* The program also performs pointer arithmetic on a void *, which C does not permit. GCC accepts that as an extension, however, and since your code is deeply Linux-specific anyway, I'm letting that slide with only this note.

Correct, malloc and free need TLS for at least the following things:
The malloc arena attached to the current thread (used for allocation operations).
The errno TLS variable (written to when system calls fail).
The stack protector canary (if enabled and the architecture stores the canary in the TCB).
The malloc thread cache (enabled by default in the upcoming glibc 2.26 release).
All these items need a properly initialized thread control block (TCB), but curiously, until recently and as far as malloc/free was concerned, it almost did not matter if a thread created with clone was shared with another TCB (so that the data is no longer thread-local):
Threads basically never reattach themselves to a different arena, so the arena TLS variable is practically read-only after initialization—and multiple threads can share a single arena. errno can be shared as long as system calls only fail in one of the threads undergoing sharing. The stack protector canary is read-only after process startup, and its value is identical across threads anyway.
But all this is an implementation detail, and things change radically in glibc 2.26 with its malloc thread cache: The cache is read and written without synchronization, so it is very likely that what you are trying to do results in memory corruption.
This is not a material change in glibc 2.26, it is always how things were: calling any glibc function from a thread created with clone is undefined. As John Bollinger pointed out, this mostly worked by accident before, but I can assure you that it has always been completely undefined.

Debug and HeapAlloc

On trying to deal with heaps in WinApi I've got some strange results for HeapAlloc's actions. Lets consider the following code. The problem is that according to Microsoft Documentation on Windows API(next - Doc), I have to get two Success strings printed to the console. By I get an Error when trying to run this code with the Debud option in MSVC 2013. But the strangest thing is that when I try to run this code without Debug option, or run the compiled .exe file, I get the correct result.
#include <Windows.h>
#include <stdio.h>
int main()
{
LPSYSTEM_INFO sys;
HANDLE hNewHeap;
LPVOID ptr;
sys = (LPSYSTEM_INFO) HeapAlloc(GetProcessHeap(),
0,
sizeof(SYSTEM_INFO));
GetSystemInfo(sys);
printf("Page size: %u\n", sys->dwPageSize);//Here we get the
//'Page size: 4096' string
//printed to the console
hNewHeap = HeapCreate(0, 1, 1);
//That's easy. We create new heap object, getting its HADNLE descriptor.
//According to Doc, the initial heap size is set to page size, which is
//4096 on my computer, like maximum heap size is also done. So the heap
//size now is 4096.
ptr = HeapAlloc(hNewHeap, 0, 2624); //Here we allocate the memory
//block in our new heap, that might have 2624 bytes size.
if ( ptr ) printf("Success!\n");//Here we check if the HeapAlloc functio
//worked correctly and print the appropriate string.
else printf("Error!\n");
//On this time we get 'Success' string printed to the console and free
//allocated memory block
if ( ptr ) HeapFree(hNewHeap, 0, ptr);
ptr = HeapAlloc(hNewHeap, 0, 2525);//Here we try to allocate the memory
//block, which size is 2526. And, like previous time, we expect to get
//'Success'.
if ( ptr ) printf("Success!\n");
else printf("Error!\n");
//But we get 'Error' here!!!
if ( ptr ) HeapFree(hNewHeap, 0, ptr);
HeapDestroy(hNewHeap);
system("pause");
};
If you try the same with any less than 2624 number, you will not get an 'Error'. If you try to do this with any more that 2625 number, you WILL get an 'Error'. But we get an 'Error' only when the Debug option is on.
Can somebody explain me why is happening so?
P.S.: Sorry for bad English.
P.S.: The strange is also the number 2625 that do not correspond to any function or application size and that sometimes I get correct result, that is after restarting studio or making some changes in the code.(But only sometimes)

You are creating a fixed size heap. The documentation says:
The HeapCreate function rounds dwMaximumSize up to a multiple of the system page size and then reserves a block of that size in the process's virtual address space for the heap.
So your heap has a fixed size of 4096 bytes.
Subsequently your allocation from this heap fails because the heap is not large enough for the block being allocated. Again from the documentation:
The system uses memory from the private heap to store heap support structures, so not all of the specified heap size is available to the process. For example, if the HeapAlloc function requests 64 kilobytes (K) from a heap with a maximum size of 64K, the request may fail because of system overhead.
It is these heap support structures that are causing your confusion. You think that there is sufficient space in the heap for your second allocation, but you are not accounting for the heap support structures.
The documentation for HeapAlloc tells you to call GetExceptionCode on failure. Expect that to return STATUS_NO_MEMORY.

Assert calls segmentation fault

I'd written some sort of basic multi-threading library. So, for each thread, I'd a context (ucontext_t). In one of my test programs, I put an assert which failed. Instead of aborting with a line number, it threw a segmentation fault. I then checked and saw the stack size of my context was 8192. When I increased it to 16394, the assert failure worked as expected.
Can you someone tell me as to how assert works internally and why would it use up so many bytes? I believe 8192 is a fairly large size for my context.
This is how my thread is created
MyThread *temp;
temp=malloc(sizeof(MyThread_t));
ucontext_t tempContext;
if (getcontext(&tempContext) == -1)
temp->ThreadId = 0;
tempContext.uc_stack.ss_sp = (char *)malloc(SIZE_STACK*sizeof(char));;
tempContext.uc_stack.ss_size = SIZE_STACK*sizeof(char);
tempContext.uc_link = NULL;
makecontext(&tempContext,(void(*)(void))start_funct,1, args);
And my test function has it this way.
T = MyThreadCreate(t0, (void *)n2);
re=MyThreadJoin(T);
printf("%d\n",re);
assert(re==-1);
re value is 0. When my SIZE_STACK is 8192, I get a seg fault. When its increased to 16384, it is a proper abort as expected from assert.

The implementation of assert is platform dependent.

Bus error: 10 when scanning address space in c

I am trying to scan the address space to find my chunks of memory that have read/write permissions. It is acceptable to check a single address per page as each page have the same permissions. I know I should be getting Segmentation Fault: 11 when trying to write to a piece of memory I shouldn't be able to. This happens when I am trying to access higher addresses but when I am in the lower portion, say 0x00000100, I get the Bus error: 10.
NOTE: The code is compiled with the -m32 flag so it simulates a 32 bit machine.
ALSO NOTE: The memory for chunk_list has already been malloc'ed before this function is called.
I have copied the code below:
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include "memchunk.h"
int get_mem_layout (struct memchunk *chunk_list, int size)
{
//grab the page size
long page = sysconf(_SC_PAGESIZE);
printf("The page size for this system is %ld bytes\n", page);
//page size is 4069 bytes
//test printing the number of words on a page
long words = page / 4;
printf("Which works out to %ld words per page\n", words);
//works out to 1024 words a page
//1024 = 0x400
//create the addy pointer
//start will be used after bus error: 10 is solved
void *start;
char * currAddy;
currAddy = (char*)0x01000000;
//someplace to store the addy to write to
//char * testWrite;
//looping through the first size pages
int i;
for(i = 0; i < size; i++){
//chunk start - wrong addy being written just testing
chunk_list[i].start = currAddy;
printf("addy is %p\n",currAddy);
sleep(1);
//try and write to the current addy
//testWrite = currAddy;
//*testWrite = 'a';
*currAddy = '1';
//+= 0x400 to get to next page
currAddy += 0x400;
}
//while loop here for all the addys - not there yet because still dealing with bus error: 10
return 0;
}
Any help would be greatly appreciated. I also left some other attempts at it commented out in the code, still all produce a bus error: 10 in the lower portion of the memory space.
EDIT: I will be dealing with seg faults using signals. I know how to deal with the seg fault, so is there a way to handle a bus error: 10 using signals as well?

Reading from or writing to unmapped memory is supposed to cause a bus fault. To discover whether a memory is there, install a handler for SEGFAULTs to react accordingly.
In a Linux SE (Security Enhanced) process, the program sections are loaded at randomized locations to frustrate viruses being able to rely on stable addresses.
In most virtual memory systems, a non-mapped space is usually left from address zero up a ways so attempts to dereference a NULL pointer or a structure based on a NULL pointer cause an exception. In the 1980s, the blank space was often 64K to 256K. On modern architectures, 16M is a reasonable choice to detect NULL-based accesses.
On many virtual memory systems, there is a system call to obtain per process mapped memory locations. On Linux, inspect the contents of /proc/self/maps.

Can I get a thread's stack address from pthread_self()

I want to get the stack address of a thread through some function to which we can pass pthread_self(). Is it possible? The reason I am doing this is because I want to write my own assigned thread identifier for a thread somewhere in its stack. I can write near the end of the stack (end of the stack memory and not the current stack address. We can ofcourse expect the application to not get to the bottom of the stack and therefore use space from there).
In other words, I want to use the thread stack for putting a kind of thread local variable there. So, do we have some function like the following provided by pthread?
stack_address = stack_address_for_thread( pthread_self() );
I can use the syntax for thread local variables by gcc for this purpose, but I'm in a situation where I can't use them.

Probably it's better to use pthread_key_create and pthread_key_getspecific and let the implementation worry about those details.
A good example of usage is here:
pthread_key_create
Edit: I should clarify -- I'm suggesting you use the libpthread provided method of creating thread-local information, instead of rolling your own by pushing something onto the end of the stack where it's possible your information could be lost.

With GCC, it is simpler to declare your thread local variables with __thread keyword, like
__thread int i;
extern __thread struct state s;
static __thread char *p;
That is GCC specific (but I'll guess clang has it also, and the newest C++ & future C standards have something similar), but less brittle than pointers hacks based upeon pthread_self() (and should be a bit faster, but less portable, than pthread_key_getsspecific, as suggested by Denniston)
But I would really like you to give more context and motivation in your questions.

I want to write my own assigned thread identifier for a thread
There are multiple ways to achieve that. The most obvious one:
__thread int my_id;
I can use the syntax for thread local variables by gcc for this purpose, but I'm in a situation where I can't use them.
You need to explain why you can't use thread-locals. Chances are high that other solutions, such as pthread_getattr_np, wouldn't work either.

First get the bottom of the stack and give read/write permission to it with the following code.
pthread_attr_t attr;
void * stackaddr;
int * plocal_var;
size_t stacksize;
pthread_getattr_np(pthread_self(), &attr);
pthread_attr_getstack( &attr, &stackaddr, &stacksize );
printf( "stackaddr = %p, stacksize = %d\n", stackaddr, stacksize );
plocal_var = (int*)mmap( stackaddr, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0 );
// Now try to write something
*plocal_var = 4;
and then you can get the thread ID, with the function get_thread_id() shown below. Note that calling mmap with size 4096 has the effect of pushing the boundary of the stack by 4096, that is why we subtract 4096 when getting the local variable address.
int get_thread_id()
{
pthread_attr_t attr;
char * stackaddr;
int * plocal_var;
size_t stacksize;
pthread_getattr_np(pthread_self(), &attr);
pthread_attr_getstack( &attr, (void**)&stackaddr, &stacksize );
//printf( "stackaddr = %p, stacksize = %d\n", stackaddr, stacksize );
plocal_var = (int*)(stackaddr - 4096);
return *plocal_var;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight