segmentation fault when using a non NULL pointer - c

there is a weird problem as title when using dpdk,
When I use rte_pktmbuf_alloc(struct rte_mempool *) and already verify the return value of rte_pktmbuf_pool_create() is not NULL, the process receive segmentation fault.
Follow
ing message is output of gdb in dpdk source code:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdec8, mp=0x101a7df00)at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1449 if (unlikely(cache == NULL || n >= cache->size))
(gdb) p cache
$1 = (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1517
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1552
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1578
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
And I dig into rte_mempool.h:
and change line 1449-1450
1449 if (unlikely(cache == NULL || n >= cache->size))
1450 goto ring_dequeue;
to
1449 if (unlikely(cache == NULL))
1450 goto ring_dequeue;
1451 if (unlikely(n >= cache->size))
1452 goto ring_dequeue;
and it also fail at line 1451
the gdb output message after changing:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
__mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1451 if (unlikely(n >= cache->size))
(gdb) p cache
$1 = (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1519
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1554
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1580
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
6 main (argc=<optimized out>, argv=<optimized out>) at ofpd.c:150
(gdb) p cache->size
Cannot access memory at address 0x1a7dfc000000000
It looks like the memory address “cache” pointer stored is not NULL but it actually is a NULL pointer.
I have no idea that why does the "cache" pointer address be non zero at prefix 4 bytes and zero at postfix 4 bytes.
The DPDK version is 20.05, I also tried 18.11 and 19.11.
OS is CentOS 8.1 kernel is 4.18.0-147.el8.x86_64.
CPU is AMD EPYC 7401P.
#define RING_SIZE 16384
#define NUM_MBUFS 8191
#define MBUF_CACHE_SIZE 512
int main(int argc, char **argv)
{
int ret;
uint16_t portid;
unsigned cpu_id = 1;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %x\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
rte_exit(EXIT_FAILURE, "end\n");
}
I have ever faced same problem when using the return pointer of getifaddrs(), it also got segmentation fault, I had to shift the pointer address like
ifa->ifa_addr = (struct sockaddr *)((uintptr_t)(ifa->ifa_addr) >> 32);
and then it can work normally.
Thereforer, I think this is not dpdk specific issue.
Does anyone know this issue?
Thanks.

I am able to run this without any error by modifying your code for
include headers
removed unused variables
add check if the returned value is NULL or not for alloc
Test on:
CPU: Intel(R) Xeon(R) CPU E5-2699
OS: 4.15.0-101-generic
GCC: 7.5.0
DPDK version: 19.11.2, dpdk mainline
Library mode: static
code:
int main(int argc, char **argv)
{
int ret = 0;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %p\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
if (test == NULL)
rte_exit(EXIT_FAILURE, "end\n");
return ret;
}
[EDIT-1] based on the comment Brydon Gibson
Note:
As I do not have access to your codebase or working code snippet my suggestion is to lookup any example code from DPDK/examples/l2fwd or DPDK/examples/skeleton and copy the headers for your compilation.
I assume both the author THE and Brydon are different individuals and might be facing similar on the different code bases.
current question claims with DPDK version 20.05, 18.11 and 19.11 the error is reproduced with code snippet.
current answer clearly sates with static linking of the library the same code snippet works
Requested #BrydonGibson to open the ticket with relevant information and environment details as it might be different.

Related

pintos userprogram-argument passing segmentation fault problem

I'm Working on Pintos Project 2 to prepare Operating System Course.
After implementing argument passing, I enter below command.
pintos -q run 'echo x'
The result is like below.
0627antaechan#pintos-2:~/pintos/src/userprog/build$ pintos -q run 'echo x'
Use of literal control characters in variable names is deprecated at /home/0627antaechan/pintos/src/utils/pintos line 911.
Prototype mismatch: sub main::SIGVTALRM () vs none at /home/0627antaechan/pintos/src/utils/pintos line 935.
Constant subroutine SIGVTALRM redefined at /home/0627antaechan/pintos/src/utils/pintos line 927.
squish-pty bochs -q
========================================================================
Bochs x86 Emulator 2.6.2
Built from SVN snapshot on May 26, 2013
Compiled on Aug 5 2021 at 07:51:40
========================================================================
00000000000i[ ] reading configuration from bochsrc.txt
00000000000e[ ] bochsrc.txt:8: 'user_shortcut' will be replaced by new 'keyboard' option.
00000000000i[ ] installing nogui module as the Bochs GUI
00000000000i[ ] using log file bochsout.txt
PiLo hda1
Loading.........
Kernel command line: -q run 'echo x'
Pintos booting with 4,096 kB RAM...
383 pages available in kernel pool.
383 pages available in user pool.
Calibrating timer... 204,600 loops/s.
hda: 1,008 sectors (504 kB), model "BXHD00011", serial "Generic 1234"
hda1: 147 sectors (73 kB), Pintos OS kernel (20)
hdb: 5,040 sectors (2 MB), model "BXHD00012", serial "Generic 1234"
hdb1: 4,096 sectors (2 MB), Pintos file system (21)
filesys: using hdb1
Boot complete.
Executing 'echo x':
Execution of 'echo' complete.
Timer: 127 ticks
Thread: 30 idle ticks, 96 kernel ticks, 8 user ticks
hdb1 (filesys): 28 reads, 0 writes
Console: 611 characters output
Keyboard: 0 keys pressed
Exception: 1 page faults
Powering off...
========================================================================
Bochs is exiting with the following message:
[UNMP ] Shutdown port: shutdown requested
========================================================================
In Program echo, it calls system call function 'write'.
Below is my syscall_handler function in
//userprog/syscall.c
static void
syscall_handler (struct intr_frame *f UNUSED)
{
printf ("system call!\n");
thread_exit ();
}
However, there are no message "system call!" after I enter pintos -q run echo x.
I also tried to implement gdb debugger. The result is like below.
0627antaechan#pintos-2:~/pintos/src/userprog/build$ gdb kernel.o
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from kernel.o...done.
(gdb) run echo x
Starting program: /home/0627antaechan/pintos/src/userprog/build/kernel.o echo x
Program received signal SIGSEGV, Segmentation fault.
start () at ../../threads/start.S:30
30 mov %ax, %es
(gdb)
I don't know the cause of problem. Plz let me some help!!.
Below is code of process_execute, start_process, construct_esp function.
tid_t
process_execute (const char *file_name)
{
char *fn_copy;
tid_t tid;
// make token to put in "thread_create"
char *token;
char *save_ptr;
/* Make a copy of FILE_NAME.
Otherwise there's a race between the caller and load(). */
fn_copy = palloc_get_page (0);
if (fn_copy == NULL)
return TID_ERROR;
strlcpy (fn_copy, file_name, PGSIZE);
token = strtok_r(file_name, " ", &save_ptr);
/* Create a new thread to execute FILE_NAME. */
// modifying file_name to token
tid = thread_create (token, PRI_DEFAULT, start_process, fn_copy);
if (tid == TID_ERROR)
palloc_free_page (fn_copy);
return tid;
}
/* A thread function that loads a user process and starts it
running. */
static void
start_process (void *file_name_)
{
char *cmd_line = file_name_;
struct intr_frame if_;
bool success;
char fn_copy[MAX_ARGU_LENGTH];
// char *argv[MAX_ARGUMENT];
char ** argv = malloc(sizeof(char *)*MAX_ARGUMENT);
char *save_ptr;
int argc = 0;
strlcpy(fn_copy, cmd_line, PGSIZE);
argv[0] = strtok_r(fn_copy, " ", &save_ptr);
while(argv[argc++] != NULL){
argv[argc] = strtok_r(NULL, " ", &save_ptr);
}
/* Initialize interrupt frame and load executable. */
memset (&if_, 0, sizeof if_);
if_.gs = if_.fs = if_.es = if_.ds = if_.ss = SEL_UDSEG;
if_.cs = SEL_UCSEG;
if_.eflags = FLAG_IF | FLAG_MBS;
success = load (argv[0], &if_.eip, &if_.esp);
// if success, construct esp
if(success) construct_esp(argv, argc, &if_.esp);
/* If load failed, quit. */
palloc_free_page (cmd_line);
free(argv);
if (!success)
thread_exit ();
/* Start the user process by simulating a return from an
interrupt, implemented by intr_exit (in
threads/intr-stubs.S). Because intr_exit takes all of its
arguments on the stack in the form of a `struct intr_frame',
we just point the stack pointer (%esp) to our stack frame
and jump to it. */
asm volatile ("movl %0, %%esp; jmp intr_exit" : : "g" (&if_) : "memory");
NOT_REACHED ();
}
void construct_esp(char **argv, int argc, void **esp) {
uint32_t *addrs[MAX_ARGUMENT];
int i;
for(i = argc-1; i >= 0; i--){
*esp = *esp - (strlen(argv[i])+1);
addrs[i] = (uint32_t *)*esp;
memcpy(*esp, argv[i], (strlen(argv[i])+1));
}
//word-align
while(((uintptr_t)(*esp) % WSIZE) != 0){
*esp = *esp -1;
}
//argv[argc], namely NULL
*esp = *esp - WSIZE;
*(int32_t *)*esp = 0;
for(i = argc-1; i >= 0; i--){
*esp = *esp - WSIZE;
*(uint32_t **)*esp = addrs[i];
}
*esp = *esp - WSIZE;
*(uint32_t **)*esp = *esp + WSIZE;
*esp = *esp - WSIZE;
*(int32_t *)*esp = argc;
*esp = *esp - WSIZE;
*(int32_t *)*esp = 0;
printf("your stack is like below\n");
hex_dump((uintptr_t)*esp, *esp, 0xc0000000-(uintptr_t)*esp, true);
}

double free or corruption after freeing a pointer on 6731st time

The problem is that my code is not so small so I will post only fragments of it, let me know if more of it is needed for the question to be valid.
The program uses library with various sorting algorithms (18 of them), and applies them to an unsorted array. User can select array size and how many arrays will be tested.
If I select, for example, 100arrays of 1000 elements (18*100 = 1800 times algorithms sort the array), the code works. If it is the other way, ie 1000 arrays of say 10 elements, 1000 * 18 = 18 000 sortings, then it crashes with "double free or corruption".
The question is not as much as where the problem lies, but firstly, how to debug it correctly.
Here is the function which gets a function pointer (func) of sorting algorithm, sorts the elements, checks if sorted, adds appropriate data to
"Iteration" struct, and then frees both "target" (array to be sorted) and "Iter" (Iterations struct).
void test_sort(int* data, int size, sort_pointer func, Algorithm* Algo, int no)
{
count_ncomp = 0;
count_assign = 0;
begin = clock();
int* target = NULL;
target = malloc(size * sizeof(int));
if (!target)
die("Memory error.");
memcpy(target, data, size * sizeof(int));
Iteration* Iter = malloc(sizeof(Iteration));
Iter->no = no;
if (is_sorted(func(target, size), size)) {
end = clock();
clocks = (double)(end - begin);
time_spent = clocks / CLOCKS_PER_SEC;
Iter->is_sorted = 1;
Iter->comp_count = count_ncomp;
Iter->assign_count = count_assign;
Iter->clocks_total = clocks;
Iter->time_spent = time_spent;
} else {
Iter->is_sorted = 0;
debug("Not sorted, no: %d", no);
};
Algo->iterations[no - 1] = Iter;
if (target == NULL) {
debug("Target is NULL");
}
debug("before target1 free");
debug("NO: %d", no);
debug("pointer %p", target);
free(target);
free(Iter);
debug("after target1 free");
/*target = NULL;*/
}
Here is the list of data structures:
typedef struct {
int no;
int is_sorted;
int comp_count;
int assign_count;
double clocks_total;
double time_spent;
} Iteration;
typedef struct {
char* type;
char* complexity;
int iter_count;
int rank;
int avg_comp;
int avg_assign;
double avg_clocks;
double avg_time;
Iteration* iterations[MAX_ITER];
} Algorithm;
Here is the start of the program:
Starting program: /home/riddle/tmp1/pratybos12
Mem used total: 0
How many arrays would you like to test? > 1000
What is the size of each array? > 10
What is the minimum number in each array? > 1
What is the maximum number in each array? > 10
How many repeating values there will be AT LEAST? > 0
Here is the running program:
DEBUG pratybos12.c:537: NO: 374
DEBUG pratybos12.c:538: pointer 0x55899fe8cb10
*** Error in `./pratybos12': double free or corruption (out): 0x000055899fe8cae0 ***
[1] 3123 abort (core dumped) ./pratybos12
GDB output (I get a crash always on no=374):
DEBUG pratybos12.c:542: after target1 free
DEBUG pratybos12.c:535: before target1 free
DEBUG pratybos12.c:537: NO: 374
DEBUG pratybos12.c:538: pointer 0x555555760b10
Breakpoint 1, test_sort (data=0x555555760ab0, size=10, func=0x555555558c04 <bubble_sort_b_and_c_and_e_and_f>, Algo=0x55555575ff00, no=374) at pratybos12.c:541
541 free(Iter);
(gdb) print Iter
$3 = (Iteration *) 0x555555760ae0
(gdb) c
Continuing.
*** Error in `/home/riddle/tmp1/pratybos12': double free or corruption (out): 0x0000555555760ae0 ***
Program received signal SIGABRT, Aborted.
0x00007ffff7a54860 in raise () from /usr/lib/libc.so.6
I've read many posts suggesting testing it with Valgrind, however, running with Valgrind, this crash does not arise.
I know that I have posted very little info, please keep me updated to what more is needed.
The main question is how to debug. How do I check if the pointer before freeing points to something valid, is the memory accessible, etc.
These two lines from the function:
Algo->iterations[no - 1] = Iter;
...
free(Iter);
First you make Algo->iterations[no - 1] point to the same memory that Iter is pointing two, so you have two pointers pointing to the same memory. Then you free the memory, making both pointers invalid.
If you want to use the pointer later you can't free it.

Segmentation Fault: C-Program migrating from HPUX to Linux

I'm trying to migrate a small c program from hpux to linux. The project compiles fine but crashes at runtime showing me a segmentation fault. I've already tried to see behind the mirror using strace and gdb but still don't understand. The relevant (truncated) parts:
tts_send_2.c
Contains a method
int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
TS_TEL_TAB tel_tab_S01;
int n;
# truncated
}
which is called from within that file like this:
. . .
. . .
switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
/* kritischer Fehler */
case -1:
. . .
. . .
when calling that method I'm presented a segmentation fault (gdb output):
Program received signal SIGSEGV, Segmentation fault.
0x0000000000403226 in sequenznummernabgleich (sockfd=<error reading variable: Cannot access memory at address 0x7fffff62f94c>,
snd_id=<error reading variable: Cannot access memory at address 0x7fffff62f940>, rec_id=<error reading variable: Cannot access memory at address 0x7fffff62f938>,
timeout_quit=<error reading variable: Cannot access memory at address 0x7fffff62f934>) at tts_snd_2.c:498
498 int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
which I just don't understand. When I'm stepping to the line where the method is called using gdb, all the variables are looking fine:
1008 switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
(gdb) p sockfd
$9 = 8
(gdb) p &sockfd
$10 = (int *) 0x611024 <sockfd>
(gdb) p c_snd_id
$11 = "KR", '\000' <repeats 253 times>
(gdb) p &c_snd_id
$12 = (char (*)[256]) 0xfde220 <c_snd_id>
(gdb) p c_rec_id
$13 = "CO", '\000' <repeats 253 times>
(gdb) p &c_rec_id
$14 = (char (*)[256]) 0xfde560 <c_rec_id>
(gdb) p c_timeout_quit
$15 = 20
(gdb) p &c_timeout_quit
$16 = (int *) 0xfde660 <c_timeout_quit>
I've also created an strace output. Here's the last part concerning the code shown above:
strace output
Any ideas ? I've searched the web and of course stackoverflow for hours without finding a really similar case.
Thanks
Kriz
I haven't used an HP/UX in eons but do hazily remember enough for the following suggestions:
Make sure you're initializing variables / struts correctly. Use calloc instead of malloc.
Also don't assume a specific bit pattern order: eg low byte then high byte. Ska endian-ness of the machine. There are usually macros in the compiler that will handle the appropriate ordering for you.
Update 15.10.16
After debugging for even more hours I found the real Problem. On the first line of the Method "sequenznummernabgleich" is a declaration of a struct
TS_TEL_TAB tel_tab_S01;
This is defined as following:
typedef struct {
TS_BOF_REC bof;
TS_REM_REC rem;
TS_EOF_REC eof;
int bof_len;
int rem_len;
int eof_len;
int cnt;
char teltyp[LEN_TELTYP+1];
TS_TEL_ENTRY entries[MAX_TEL];
} TS_TEL_TAB;
and it's embedded struct TS_TEL_ENTRY
typedef struct {
int len;
char tel[MAX_TEL_LEN];
} TS_TEL_ENTRY;
The problem is that the value for MAX_TEL_LEN had been changed from 512 to 1024 and thus the struct almost doubled in size what lead to that the STACK SIZE was not big enough anymore.
SOLUTION
Simply set the stack size from 8Mb to 64Mb. This can be achieved using ulimit command (under linux).
List current stack size: ulimit -s
Set stack size to 64Mb: ulimit -s 65535
Note: Values for stack size are in kB.
For a good short ref on ulimit command have a look # ss64

Why does srand(time(NULL)) gives me segmentation fault on main?

Need some help here.
I want to understand what's happening in this code.
I'm trying to generate random numbers as tickets to the TCP_t struct created inside ccreate function.
The problem is, everytime I executed this code WITHOUT the srand(time(NULL)) it returned the same sequence of "random" numbers over and over, for example:
TID: 0 | TICKET : 103
TID: 1 | TICKET : 198
So I seeded it with time, to generate really random numbers.
When I put the seed inside the newTicket function, it brings different numbers in every execution, but the same numbers for every thread. Here is an example of output:
Execution 1:
TID: 0 | TICKET : 148
TID: 1 | TICKET : 148
Execution 2:
TID: 0 | TICKET : 96
TID: 1 | TICKET : 96
So, after some research, I found out I shouldn't seed it everytime I call rand but only once, in the beginning of the program. Now, after putting the seed inside the main function, it gives me segmentation fault, and I have NO IDEA why.
This might be a stupid question, but I really want to understand what's happening.
Is the seed screwing anything, somehow?
Am I missing something?
Should I generate random number in another way?
#include <ucontext.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#define MAX_TICKET 255
#define STACK_SIZE 32000
typedef struct s_TCB {
int threadId;
int ticket;
ucontext_t context;
} TCB_t;
void test();
int newTicket();
int newThreadId();
int ccreate (void* (*start)(void*), void *arg);
int threadId = 0;
int main(){
srand(time(NULL)); //<<<============== HERE = SEGMENTATION FAULT
ccreate((void*)&test, 0);
ccreate((void*)&test, 0);
}
int ccreate (void* (*start)(void*), void *arg){
if(start == NULL) return -1;
ucontext_t threadContext;
getcontext(&threadContext);
makecontext(&threadContext, (void*)start, 0);
threadContext.uc_stack.ss_sp = malloc(STACK_SIZE);
threadContext.uc_stack.ss_size = STACK_SIZE;
TCB_t * newThread = malloc(sizeof(TCB_t));
if (newThread == NULL) return -1;
int threadThreadId = newThreadId();
newThread->threadId = threadThreadId;
newThread->ticket = newTicket();
printf("TID: %d | TICKET : %d\n", newThread->threadId, newThread->ticket);
return threadThreadId;
}
int newThreadId(){
int newThreadId = threadId;
threadId++;
return newThreadId;
}
int newTicket(){
//srand(time(NULL)); //<<<============== HERE = IT PARTIALLY WORKS
return (rand() % (MAX_TICKET+1));
}
void test(){
printf("this is a test function");
}
Thanks to everyone who lends me a hand here.
And sorry if the code is too ugly to read. Tried to simplify it as much as I could.
The problem is not with srand(time(NULL)), but with makecontext.
You can run your code through a sanatizer to confirm:
gcc-6 -fsanitize=undefined -fsanitize=address -fsanitize=leak -fsanitize-recover=all -fuse-ld=gold -o main main.c
./main
ASAN:DEADLYSIGNAL
=================================================================
==8841==ERROR: AddressSanitizer: SEGV on unknown address 0x7fc342ade618 (pc 0x7fc340aad235 bp 0x7ffd1b945950 sp 0x7ffd1b9454f8 T0)
#0 0x7fc340aad234 in makecontext (/lib/x86_64-linux-gnu/libc.so.6+0x47234)
#1 0x400d2f in ccreate (/home/malko/Desktop/main+0x400d2f)
#2 0x400c19 in main (/home/malko/Desktop/main+0x400c19)
#3 0x7fc340a87f44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21f44)
#4 0x400b28 (/home/malko/Desktop/main+0x400b28)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x47234) in makecontext
==8841==ABORTING
You can solve the problem by setting a stack size before making the context:
char stack[20000];
threadContext.uc_stack.ss_sp = stack;
threadContext.uc_stack.ss_size = sizeof(stack);
makecontext(&threadContext, (void*)start, 0);
Unrelated, but make sure you also free that malloc'd memory in your sample code.

Segmentation fault from a function that is not called at all

Ok, this is really freaking me out. I have a following function that just reads input and returns a string
unsigned char* readFromIn() {
unsigned char* text = malloc(1024);
if (fgets(text, 1024, stdin) != NULL) { <--This is what's causing segmentation fault
int textLen = strlen(text);
if (textLen > 0 && text[textLen - 1] == '\n')
text[textLen - 1] = '\0'; // getting rid of newline character
return text;
}
else {
free(text);
return NULL;
}
}
The thing is, this function isn't called anywhere and just to confirm, I changed the name of the function to something crazy like 9rawiohawr90awrhiokawrioawr and put printf statement on the top of the function.
I'm genuinely not sure why an uncalled function might cause a segmentation fault error.
I'm using gcc 4.6.3 on ubuntu.
Edit: I know that the line
if (fgets(text, 1024, stdin) != NULL) {
is the offending code because as soon as i comment out that conditional, no segmentation error occurs.
I know that the function is NOT being called because i'm seeing no output of the printf debug statement I put.
Edit2: I've tried changing the type from unsigned char to char. Still segmentation error. I will try to get gdb output.
Edit3: gdb backtrace produced the following
#0 0xb7fa5ac2 in _IO_2_1_stdin_ () from /lib/i386-linux-gnu/libc.so.6
#1 0xb7faf2fb in libwebsocket_create_context (info=0xbffff280) at libwebsockets.c:2125
#2 0x0804a5bb in main()
doing frame 0,1,2 doesn't output anything interesting in particular.
Edit4: I've tried all of the suggestions in the comment, but to no avail, I still get the same segmentation fault.
So I installed a fresh copy of Ubuntu on a virtual OS and recompiled my code. Still the same issue occurs.
It seems to me the problem is in either some obscurity going on in my code or the library itself. I've created a minimal example demonstrating the problem:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <libwebsockets.h>
unsigned char* readFromIn() {
unsigned char* text = malloc(1024);
if (fgets(text, 1024, stdin) != NULL) { <--SEGMENTATION FAULT HERE
int textLen = strlen(text);
if (textLen > 0 && text[textLen - 1] == '\n')
text[textLen - 1] = '\0';
return text;
}
else {
free(text);
return NULL;
}
}
int callback_http(struct libwebsocket_context *context,
struct libwebsocket *wsi,
enum libwebsocket_callback_reasons reason, void *user,
void *in, size_t len)
{
return 0;
}
static struct libwebsocket_protocols protocols[] = {
/* first protocol must always be HTTP handler */
{
"http-only", // name
callback_http, // callback
0 // per_session_data_size
}
};
int main(void) {
printf("Initializing Web Server\n");
// server url will be http://localhost:8081
int port = 8081;
const char *interface = NULL;
struct libwebsocket_context *context;
// we're not using ssl
const char *cert_path = NULL;
const char *key_path = NULL;
// no special options
int opts = 0;
struct lws_context_creation_info info;
memset(&info, 0, sizeof info);
info.port = port;
info.iface = interface;
info.protocols = protocols;
info.extensions = libwebsocket_get_internal_extensions();
info.ssl_cert_filepath = NULL;
info.ssl_private_key_filepath = NULL;
info.gid = -1;
info.uid = -1;
info.options = opts;
context = libwebsocket_create_context(&info);
if (context == NULL) {
fprintf(stderr, "libwebsocket init failed\n");
return 0;
}
printf("starting server...\n");
while (1) {
libwebsocket_service(context, 50);
}
printf("Shutting server down...\n");
libwebsocket_context_destroy(context);
return 0;
}
And here's how I compiled my code
gcc -g testbug.c -o test -lwebsockets
Here's the library I'm using
http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/tag/?id=v1.23-chrome32-firefox24
You will see that I'm not calling the function readFromIn() yet, segmentation fault occurs as soon as you try to run the executable.
I've re-ran gdb and this time, backtrace and the frames tell me a little bit more info.
(gdb) run
Starting program: /home/l46kok/Desktop/websocketserver/test
Initializing Web Server
[1384002761:2270] NOTICE: Initial logging level 7
[1384002761:2270] NOTICE: Library version: 1.3 unknown-build-hash
[1384002761:2271] NOTICE: Started with daemon pid 0
[1384002761:2271] NOTICE: static allocation: 4448 + (12 x 1024 fds) = 16736 bytes
[1384002761:2271] NOTICE: canonical_hostname = ubuntu
[1384002761:2271] NOTICE: Compiled with OpenSSL support
[1384002761:2271] NOTICE: Using non-SSL mode
[1384002761:2271] NOTICE: per-conn mem: 124 + 1360 headers + protocol rx buf
[1384002761:2294] NOTICE: Listening on port 8081
Program received signal SIGSEGV, Segmentation fault.
0xb7fb1ac0 in _IO_2_1_stdin_ () from /lib/i386-linux-gnu/libc.so.6
(gdb) backtrace
#0 0xb7fb1ac0 in _IO_2_1_stdin_ () from /lib/i386-linux-gnu/libc.so.6
#1 0xb7fcc2c6 in libwebsocket_create_context () from /usr/local/lib/libwebsockets.so.4.0.0
#2 0x080488c4 in main () at testbug.c:483
(gdb) frame 1
#1 0xb7fcc2c6 in libwebsocket_create_context () from /usr/local/lib/libwebsockets.so.4.0.0
(gdb) frame 2
#2 0x080488c4 in main () at testbug.c:483
483 context = libwebsocket_create_context(&info);
So yeah.. I think I gave all the information at hand.. but I'm genuinely not sure what the issue is. The program causes segmentation fault at line 483 but the issue is gone when I comment out the offending function that's not being called.
Probably you're missing something when initializing libwebsockets.
Indeed, recompiling libwebsockets with debug reveals that:
GNU gdb (GDB) 7.6.1 (Debian 7.6.1-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/vili/x...done.
(gdb) r
Starting program: /home/vili/./x
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Initializing Web Server
[1384020141:5692] NOTICE: Initial logging level 7
[1384020141:5692] NOTICE: Library version: 1.2
[1384020141:5693] NOTICE: Started with daemon pid 0
[1384020141:5693] NOTICE: static allocation: 5512 + (16 x 1024 fds) = 21896 bytes
[1384020141:5693] NOTICE: canonical_hostname = x220
[1384020141:5693] NOTICE: Compiled with OpenSSL support
[1384020141:5693] NOTICE: Using non-SSL mode
[1384020141:5693] NOTICE: per-conn mem: 248 + 1328 headers + protocol rx buf
[1384020141:5713] NOTICE: Listening on port 8081
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bc2080 in _IO_2_1_stderr_ () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff7bc2080 in _IO_2_1_stderr_ () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7bcd83c in libwebsocket_create_context (info=0x7fffffffe580)
at libwebsockets.c:2093
#2 0x0000000000400918 in main () at x.c:66
(gdb) up
#1 0x00007ffff7bcd83c in libwebsocket_create_context (info=0x7fffffffe580)
at libwebsockets.c:2093
2093 info->protocols[context->count_protocols].callback(context,
(gdb) p context->count_protocols
$1 = 1
(gdb) p info->protocols[1]
$2 = {
name = 0x7ffff7bc2240 <_IO_2_1_stdin_> "\210 \255", <incomplete sequence \373>, callback = 0x7ffff7bc2080 <_IO_2_1_stderr_>,
per_session_data_size = 140737349689696, rx_buffer_size = 0,
owning_server = 0x602010, protocol_index = 1}
(gdb)
Quite likely you need to close the array of libwebsocket_protocols with a special entry (NULL) so that the lib will know how many entries it got via info->protocols.
Edit: yep, check the docs: http://jsk.pp.ua/knowledge/libwebsocket.html
Array of structures listing supported protocols and a protocol- specific callback for each one. The list is ended with an entry that
has a NULL callback pointer.

Resources