Segmentation fault from a function that is not called at all - c

Ok, this is really freaking me out. I have a following function that just reads input and returns a string
unsigned char* readFromIn() {
unsigned char* text = malloc(1024);
if (fgets(text, 1024, stdin) != NULL) { <--This is what's causing segmentation fault
int textLen = strlen(text);
if (textLen > 0 && text[textLen - 1] == '\n')
text[textLen - 1] = '\0'; // getting rid of newline character
return text;
}
else {
free(text);
return NULL;
}
}
The thing is, this function isn't called anywhere and just to confirm, I changed the name of the function to something crazy like 9rawiohawr90awrhiokawrioawr and put printf statement on the top of the function.
I'm genuinely not sure why an uncalled function might cause a segmentation fault error.
I'm using gcc 4.6.3 on ubuntu.
Edit: I know that the line
if (fgets(text, 1024, stdin) != NULL) {
is the offending code because as soon as i comment out that conditional, no segmentation error occurs.
I know that the function is NOT being called because i'm seeing no output of the printf debug statement I put.
Edit2: I've tried changing the type from unsigned char to char. Still segmentation error. I will try to get gdb output.
Edit3: gdb backtrace produced the following
#0 0xb7fa5ac2 in _IO_2_1_stdin_ () from /lib/i386-linux-gnu/libc.so.6
#1 0xb7faf2fb in libwebsocket_create_context (info=0xbffff280) at libwebsockets.c:2125
#2 0x0804a5bb in main()
doing frame 0,1,2 doesn't output anything interesting in particular.
Edit4: I've tried all of the suggestions in the comment, but to no avail, I still get the same segmentation fault.
So I installed a fresh copy of Ubuntu on a virtual OS and recompiled my code. Still the same issue occurs.
It seems to me the problem is in either some obscurity going on in my code or the library itself. I've created a minimal example demonstrating the problem:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <libwebsockets.h>
unsigned char* readFromIn() {
unsigned char* text = malloc(1024);
if (fgets(text, 1024, stdin) != NULL) { <--SEGMENTATION FAULT HERE
int textLen = strlen(text);
if (textLen > 0 && text[textLen - 1] == '\n')
text[textLen - 1] = '\0';
return text;
}
else {
free(text);
return NULL;
}
}
int callback_http(struct libwebsocket_context *context,
struct libwebsocket *wsi,
enum libwebsocket_callback_reasons reason, void *user,
void *in, size_t len)
{
return 0;
}
static struct libwebsocket_protocols protocols[] = {
/* first protocol must always be HTTP handler */
{
"http-only", // name
callback_http, // callback
0 // per_session_data_size
}
};
int main(void) {
printf("Initializing Web Server\n");
// server url will be http://localhost:8081
int port = 8081;
const char *interface = NULL;
struct libwebsocket_context *context;
// we're not using ssl
const char *cert_path = NULL;
const char *key_path = NULL;
// no special options
int opts = 0;
struct lws_context_creation_info info;
memset(&info, 0, sizeof info);
info.port = port;
info.iface = interface;
info.protocols = protocols;
info.extensions = libwebsocket_get_internal_extensions();
info.ssl_cert_filepath = NULL;
info.ssl_private_key_filepath = NULL;
info.gid = -1;
info.uid = -1;
info.options = opts;
context = libwebsocket_create_context(&info);
if (context == NULL) {
fprintf(stderr, "libwebsocket init failed\n");
return 0;
}
printf("starting server...\n");
while (1) {
libwebsocket_service(context, 50);
}
printf("Shutting server down...\n");
libwebsocket_context_destroy(context);
return 0;
}
And here's how I compiled my code
gcc -g testbug.c -o test -lwebsockets
Here's the library I'm using
http://git.libwebsockets.org/cgi-bin/cgit/libwebsockets/tag/?id=v1.23-chrome32-firefox24
You will see that I'm not calling the function readFromIn() yet, segmentation fault occurs as soon as you try to run the executable.
I've re-ran gdb and this time, backtrace and the frames tell me a little bit more info.
(gdb) run
Starting program: /home/l46kok/Desktop/websocketserver/test
Initializing Web Server
[1384002761:2270] NOTICE: Initial logging level 7
[1384002761:2270] NOTICE: Library version: 1.3 unknown-build-hash
[1384002761:2271] NOTICE: Started with daemon pid 0
[1384002761:2271] NOTICE: static allocation: 4448 + (12 x 1024 fds) = 16736 bytes
[1384002761:2271] NOTICE: canonical_hostname = ubuntu
[1384002761:2271] NOTICE: Compiled with OpenSSL support
[1384002761:2271] NOTICE: Using non-SSL mode
[1384002761:2271] NOTICE: per-conn mem: 124 + 1360 headers + protocol rx buf
[1384002761:2294] NOTICE: Listening on port 8081
Program received signal SIGSEGV, Segmentation fault.
0xb7fb1ac0 in _IO_2_1_stdin_ () from /lib/i386-linux-gnu/libc.so.6
(gdb) backtrace
#0 0xb7fb1ac0 in _IO_2_1_stdin_ () from /lib/i386-linux-gnu/libc.so.6
#1 0xb7fcc2c6 in libwebsocket_create_context () from /usr/local/lib/libwebsockets.so.4.0.0
#2 0x080488c4 in main () at testbug.c:483
(gdb) frame 1
#1 0xb7fcc2c6 in libwebsocket_create_context () from /usr/local/lib/libwebsockets.so.4.0.0
(gdb) frame 2
#2 0x080488c4 in main () at testbug.c:483
483 context = libwebsocket_create_context(&info);
So yeah.. I think I gave all the information at hand.. but I'm genuinely not sure what the issue is. The program causes segmentation fault at line 483 but the issue is gone when I comment out the offending function that's not being called.

Probably you're missing something when initializing libwebsockets.
Indeed, recompiling libwebsockets with debug reveals that:
GNU gdb (GDB) 7.6.1 (Debian 7.6.1-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/vili/x...done.
(gdb) r
Starting program: /home/vili/./x
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Initializing Web Server
[1384020141:5692] NOTICE: Initial logging level 7
[1384020141:5692] NOTICE: Library version: 1.2
[1384020141:5693] NOTICE: Started with daemon pid 0
[1384020141:5693] NOTICE: static allocation: 5512 + (16 x 1024 fds) = 21896 bytes
[1384020141:5693] NOTICE: canonical_hostname = x220
[1384020141:5693] NOTICE: Compiled with OpenSSL support
[1384020141:5693] NOTICE: Using non-SSL mode
[1384020141:5693] NOTICE: per-conn mem: 248 + 1328 headers + protocol rx buf
[1384020141:5713] NOTICE: Listening on port 8081
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bc2080 in _IO_2_1_stderr_ () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff7bc2080 in _IO_2_1_stderr_ () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7bcd83c in libwebsocket_create_context (info=0x7fffffffe580)
at libwebsockets.c:2093
#2 0x0000000000400918 in main () at x.c:66
(gdb) up
#1 0x00007ffff7bcd83c in libwebsocket_create_context (info=0x7fffffffe580)
at libwebsockets.c:2093
2093 info->protocols[context->count_protocols].callback(context,
(gdb) p context->count_protocols
$1 = 1
(gdb) p info->protocols[1]
$2 = {
name = 0x7ffff7bc2240 <_IO_2_1_stdin_> "\210 \255", <incomplete sequence \373>, callback = 0x7ffff7bc2080 <_IO_2_1_stderr_>,
per_session_data_size = 140737349689696, rx_buffer_size = 0,
owning_server = 0x602010, protocol_index = 1}
(gdb)
Quite likely you need to close the array of libwebsocket_protocols with a special entry (NULL) so that the lib will know how many entries it got via info->protocols.
Edit: yep, check the docs: http://jsk.pp.ua/knowledge/libwebsocket.html
Array of structures listing supported protocols and a protocol- specific callback for each one. The list is ended with an entry that
has a NULL callback pointer.

Related

pintos userprogram-argument passing segmentation fault problem

I'm Working on Pintos Project 2 to prepare Operating System Course.
After implementing argument passing, I enter below command.
pintos -q run 'echo x'
The result is like below.
0627antaechan#pintos-2:~/pintos/src/userprog/build$ pintos -q run 'echo x'
Use of literal control characters in variable names is deprecated at /home/0627antaechan/pintos/src/utils/pintos line 911.
Prototype mismatch: sub main::SIGVTALRM () vs none at /home/0627antaechan/pintos/src/utils/pintos line 935.
Constant subroutine SIGVTALRM redefined at /home/0627antaechan/pintos/src/utils/pintos line 927.
squish-pty bochs -q
========================================================================
Bochs x86 Emulator 2.6.2
Built from SVN snapshot on May 26, 2013
Compiled on Aug 5 2021 at 07:51:40
========================================================================
00000000000i[ ] reading configuration from bochsrc.txt
00000000000e[ ] bochsrc.txt:8: 'user_shortcut' will be replaced by new 'keyboard' option.
00000000000i[ ] installing nogui module as the Bochs GUI
00000000000i[ ] using log file bochsout.txt
PiLo hda1
Loading.........
Kernel command line: -q run 'echo x'
Pintos booting with 4,096 kB RAM...
383 pages available in kernel pool.
383 pages available in user pool.
Calibrating timer... 204,600 loops/s.
hda: 1,008 sectors (504 kB), model "BXHD00011", serial "Generic 1234"
hda1: 147 sectors (73 kB), Pintos OS kernel (20)
hdb: 5,040 sectors (2 MB), model "BXHD00012", serial "Generic 1234"
hdb1: 4,096 sectors (2 MB), Pintos file system (21)
filesys: using hdb1
Boot complete.
Executing 'echo x':
Execution of 'echo' complete.
Timer: 127 ticks
Thread: 30 idle ticks, 96 kernel ticks, 8 user ticks
hdb1 (filesys): 28 reads, 0 writes
Console: 611 characters output
Keyboard: 0 keys pressed
Exception: 1 page faults
Powering off...
========================================================================
Bochs is exiting with the following message:
[UNMP ] Shutdown port: shutdown requested
========================================================================
In Program echo, it calls system call function 'write'.
Below is my syscall_handler function in
//userprog/syscall.c
static void
syscall_handler (struct intr_frame *f UNUSED)
{
printf ("system call!\n");
thread_exit ();
}
However, there are no message "system call!" after I enter pintos -q run echo x.
I also tried to implement gdb debugger. The result is like below.
0627antaechan#pintos-2:~/pintos/src/userprog/build$ gdb kernel.o
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from kernel.o...done.
(gdb) run echo x
Starting program: /home/0627antaechan/pintos/src/userprog/build/kernel.o echo x
Program received signal SIGSEGV, Segmentation fault.
start () at ../../threads/start.S:30
30 mov %ax, %es
(gdb)
I don't know the cause of problem. Plz let me some help!!.
Below is code of process_execute, start_process, construct_esp function.
tid_t
process_execute (const char *file_name)
{
char *fn_copy;
tid_t tid;
// make token to put in "thread_create"
char *token;
char *save_ptr;
/* Make a copy of FILE_NAME.
Otherwise there's a race between the caller and load(). */
fn_copy = palloc_get_page (0);
if (fn_copy == NULL)
return TID_ERROR;
strlcpy (fn_copy, file_name, PGSIZE);
token = strtok_r(file_name, " ", &save_ptr);
/* Create a new thread to execute FILE_NAME. */
// modifying file_name to token
tid = thread_create (token, PRI_DEFAULT, start_process, fn_copy);
if (tid == TID_ERROR)
palloc_free_page (fn_copy);
return tid;
}
/* A thread function that loads a user process and starts it
running. */
static void
start_process (void *file_name_)
{
char *cmd_line = file_name_;
struct intr_frame if_;
bool success;
char fn_copy[MAX_ARGU_LENGTH];
// char *argv[MAX_ARGUMENT];
char ** argv = malloc(sizeof(char *)*MAX_ARGUMENT);
char *save_ptr;
int argc = 0;
strlcpy(fn_copy, cmd_line, PGSIZE);
argv[0] = strtok_r(fn_copy, " ", &save_ptr);
while(argv[argc++] != NULL){
argv[argc] = strtok_r(NULL, " ", &save_ptr);
}
/* Initialize interrupt frame and load executable. */
memset (&if_, 0, sizeof if_);
if_.gs = if_.fs = if_.es = if_.ds = if_.ss = SEL_UDSEG;
if_.cs = SEL_UCSEG;
if_.eflags = FLAG_IF | FLAG_MBS;
success = load (argv[0], &if_.eip, &if_.esp);
// if success, construct esp
if(success) construct_esp(argv, argc, &if_.esp);
/* If load failed, quit. */
palloc_free_page (cmd_line);
free(argv);
if (!success)
thread_exit ();
/* Start the user process by simulating a return from an
interrupt, implemented by intr_exit (in
threads/intr-stubs.S). Because intr_exit takes all of its
arguments on the stack in the form of a `struct intr_frame',
we just point the stack pointer (%esp) to our stack frame
and jump to it. */
asm volatile ("movl %0, %%esp; jmp intr_exit" : : "g" (&if_) : "memory");
NOT_REACHED ();
}
void construct_esp(char **argv, int argc, void **esp) {
uint32_t *addrs[MAX_ARGUMENT];
int i;
for(i = argc-1; i >= 0; i--){
*esp = *esp - (strlen(argv[i])+1);
addrs[i] = (uint32_t *)*esp;
memcpy(*esp, argv[i], (strlen(argv[i])+1));
}
//word-align
while(((uintptr_t)(*esp) % WSIZE) != 0){
*esp = *esp -1;
}
//argv[argc], namely NULL
*esp = *esp - WSIZE;
*(int32_t *)*esp = 0;
for(i = argc-1; i >= 0; i--){
*esp = *esp - WSIZE;
*(uint32_t **)*esp = addrs[i];
}
*esp = *esp - WSIZE;
*(uint32_t **)*esp = *esp + WSIZE;
*esp = *esp - WSIZE;
*(int32_t *)*esp = argc;
*esp = *esp - WSIZE;
*(int32_t *)*esp = 0;
printf("your stack is like below\n");
hex_dump((uintptr_t)*esp, *esp, 0xc0000000-(uintptr_t)*esp, true);
}

segmentation fault when using a non NULL pointer

there is a weird problem as title when using dpdk,
When I use rte_pktmbuf_alloc(struct rte_mempool *) and already verify the return value of rte_pktmbuf_pool_create() is not NULL, the process receive segmentation fault.
Follow
ing message is output of gdb in dpdk source code:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdec8, mp=0x101a7df00)at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1449 if (unlikely(cache == NULL || n >= cache->size))
(gdb) p cache
$1 = (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 0x00000000005e9f41 in __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1449
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1517
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1552
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1578
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
And I dig into rte_mempool.h:
and change line 1449-1450
1449 if (unlikely(cache == NULL || n >= cache->size))
1450 goto ring_dequeue;
to
1449 if (unlikely(cache == NULL))
1450 goto ring_dequeue;
1451 if (unlikely(n >= cache->size))
1452 goto ring_dequeue;
and it also fail at line 1451
the gdb output message after changing:
Thread 1 "osw" received signal SIGSEGV, Segmentation fault.
__mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1451 if (unlikely(n >= cache->size))
(gdb) p cache
$1 = (struct rte_mempool_cache *) 0x1a7dfc000000000
(gdb) bt
0 __mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1451
1 rte_mempool_generic_get (cache=0x1a7dfc000000000, n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1519
2 rte_mempool_get_bulk (n=1, obj_table=0x7fffffffdeb8, mp=0x101a7df00)
at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1554
3 rte_mempool_get (obj_p=0x7fffffffdeb8, mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1580
4 rte_mbuf_raw_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:586
5 rte_pktmbuf_alloc (mp=0x101a7df00) at /root/dpdk-20.05/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:896
6 main (argc=<optimized out>, argv=<optimized out>) at ofpd.c:150
(gdb) p cache->size
Cannot access memory at address 0x1a7dfc000000000
It looks like the memory address “cache” pointer stored is not NULL but it actually is a NULL pointer.
I have no idea that why does the "cache" pointer address be non zero at prefix 4 bytes and zero at postfix 4 bytes.
The DPDK version is 20.05, I also tried 18.11 and 19.11.
OS is CentOS 8.1 kernel is 4.18.0-147.el8.x86_64.
CPU is AMD EPYC 7401P.
#define RING_SIZE 16384
#define NUM_MBUFS 8191
#define MBUF_CACHE_SIZE 512
int main(int argc, char **argv)
{
int ret;
uint16_t portid;
unsigned cpu_id = 1;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %x\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
rte_exit(EXIT_FAILURE, "end\n");
}
I have ever faced same problem when using the return pointer of getifaddrs(), it also got segmentation fault, I had to shift the pointer address like
ifa->ifa_addr = (struct sockaddr *)((uintptr_t)(ifa->ifa_addr) >> 32);
and then it can work normally.
Thereforer, I think this is not dpdk specific issue.
Does anyone know this issue?
Thanks.
I am able to run this without any error by modifying your code for
include headers
removed unused variables
add check if the returned value is NULL or not for alloc
Test on:
CPU: Intel(R) Xeon(R) CPU E5-2699
OS: 4.15.0-101-generic
GCC: 7.5.0
DPDK version: 19.11.2, dpdk mainline
Library mode: static
code:
int main(int argc, char **argv)
{
int ret = 0;
struct rte_mempool *tmp;
int arg = rte_eal_init(argc, argv);
if (arg < 0)
rte_exit(EXIT_FAILURE, "Cannot init EAL: %s\n", rte_strerror(rte_errno));
if (rte_lcore_count() < 10)
rte_exit(EXIT_FAILURE, "We need at least 10 cores.\n");
argc -= arg;
argv += arg;
/* Creates a new mempool in memory to hold the mbufs. */
tmp = rte_pktmbuf_pool_create("TMP", NUM_MBUFS, MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (tmp == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool, %s\n", rte_strerror(rte_errno));
printf("tmp addr = %p\n", tmp);
struct rte_mbuf *test = rte_pktmbuf_alloc(tmp);
if (test == NULL)
rte_exit(EXIT_FAILURE, "end\n");
return ret;
}
[EDIT-1] based on the comment Brydon Gibson
Note:
As I do not have access to your codebase or working code snippet my suggestion is to lookup any example code from DPDK/examples/l2fwd or DPDK/examples/skeleton and copy the headers for your compilation.
I assume both the author THE and Brydon are different individuals and might be facing similar on the different code bases.
current question claims with DPDK version 20.05, 18.11 and 19.11 the error is reproduced with code snippet.
current answer clearly sates with static linking of the library the same code snippet works
Requested #BrydonGibson to open the ticket with relevant information and environment details as it might be different.

Segmentation Fault: C-Program migrating from HPUX to Linux

I'm trying to migrate a small c program from hpux to linux. The project compiles fine but crashes at runtime showing me a segmentation fault. I've already tried to see behind the mirror using strace and gdb but still don't understand. The relevant (truncated) parts:
tts_send_2.c
Contains a method
int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
TS_TEL_TAB tel_tab_S01;
int n;
# truncated
}
which is called from within that file like this:
. . .
. . .
switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
/* kritischer Fehler */
case -1:
. . .
. . .
when calling that method I'm presented a segmentation fault (gdb output):
Program received signal SIGSEGV, Segmentation fault.
0x0000000000403226 in sequenznummernabgleich (sockfd=<error reading variable: Cannot access memory at address 0x7fffff62f94c>,
snd_id=<error reading variable: Cannot access memory at address 0x7fffff62f940>, rec_id=<error reading variable: Cannot access memory at address 0x7fffff62f938>,
timeout_quit=<error reading variable: Cannot access memory at address 0x7fffff62f934>) at tts_snd_2.c:498
498 int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
which I just don't understand. When I'm stepping to the line where the method is called using gdb, all the variables are looking fine:
1008 switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
(gdb) p sockfd
$9 = 8
(gdb) p &sockfd
$10 = (int *) 0x611024 <sockfd>
(gdb) p c_snd_id
$11 = "KR", '\000' <repeats 253 times>
(gdb) p &c_snd_id
$12 = (char (*)[256]) 0xfde220 <c_snd_id>
(gdb) p c_rec_id
$13 = "CO", '\000' <repeats 253 times>
(gdb) p &c_rec_id
$14 = (char (*)[256]) 0xfde560 <c_rec_id>
(gdb) p c_timeout_quit
$15 = 20
(gdb) p &c_timeout_quit
$16 = (int *) 0xfde660 <c_timeout_quit>
I've also created an strace output. Here's the last part concerning the code shown above:
strace output
Any ideas ? I've searched the web and of course stackoverflow for hours without finding a really similar case.
Thanks
Kriz
I haven't used an HP/UX in eons but do hazily remember enough for the following suggestions:
Make sure you're initializing variables / struts correctly. Use calloc instead of malloc.
Also don't assume a specific bit pattern order: eg low byte then high byte. Ska endian-ness of the machine. There are usually macros in the compiler that will handle the appropriate ordering for you.
Update 15.10.16
After debugging for even more hours I found the real Problem. On the first line of the Method "sequenznummernabgleich" is a declaration of a struct
TS_TEL_TAB tel_tab_S01;
This is defined as following:
typedef struct {
TS_BOF_REC bof;
TS_REM_REC rem;
TS_EOF_REC eof;
int bof_len;
int rem_len;
int eof_len;
int cnt;
char teltyp[LEN_TELTYP+1];
TS_TEL_ENTRY entries[MAX_TEL];
} TS_TEL_TAB;
and it's embedded struct TS_TEL_ENTRY
typedef struct {
int len;
char tel[MAX_TEL_LEN];
} TS_TEL_ENTRY;
The problem is that the value for MAX_TEL_LEN had been changed from 512 to 1024 and thus the struct almost doubled in size what lead to that the STACK SIZE was not big enough anymore.
SOLUTION
Simply set the stack size from 8Mb to 64Mb. This can be achieved using ulimit command (under linux).
List current stack size: ulimit -s
Set stack size to 64Mb: ulimit -s 65535
Note: Values for stack size are in kB.
For a good short ref on ulimit command have a look # ss64

memcpy segmentation fault. Misalignment of data structure boundaries

I am trying to debug this error but have not been able to do it for a while now. I have tried to use memmove as an alternative but that also results in a segmentation fault.
The link to the code in this question is posted at - http://pastebin.com/hiwV5G04
Can someone please help me understand what am I doing wrong ?
//------------------------------------------------------------------------
// Somewhere in the main function, This is the piece of code I am executing
//------------------------------------------------------------------------
SslDecryptSession *ssl_session = malloc(sizeof(struct _SslDecryptSession ));
ssl_session->client_random.data = NULL; //Make the stuff point somewhere. Else can use malloc also here. Not sure if this is a problem
ssl_session->server_random.data= NULL;
const u_char *payload; /* Packet payload */
//Case for client random
printf("Client Random ");
for (cs_id = 11; cs_id < 43; cs_id++){
printf("%hhX", payload[cs_id] );
}
printf("\n");
cs_id=11;
ssl_session->client_random.data_len=32;
// Segmentation fault here
memcpy(ssl_session->client_random.data, payload[cs_id], 32);
The definitions of the structures involved are -
typedef struct _SslDecryptSession {
guchar _master_secret[SSL_MASTER_SECRET_LENGTH];
guchar _session_id[256];
guchar _client_random[32];
guchar _server_random[32];
StringInfo session_id;
StringInfo session_ticket;
StringInfo server_random;
StringInfo client_random;
StringInfo master_secret;
StringInfo handshake_data;
StringInfo pre_master_secret;
guchar _server_data_for_iv[24];
StringInfo server_data_for_iv;
guchar _client_data_for_iv[24];
StringInfo client_data_for_iv;
gint state;
SslCipherSuite cipher_suite;
SslDecoder *server;
SslDecoder *client;
SslDecoder *server_new;
SslDecoder *client_new;
gcry_sexp_t private_key;
StringInfo psk;
guint16 version_netorder;
StringInfo app_data_segment;
SslSession session;
} SslDecryptSession;
typedef struct _StringInfo {
guchar *data;
guint data_len;
} StringInfo
The output from gdb is this
b 1985 // Putting a break point at line 1985 in my source code.
//Here this is eqvialent to line 83, that is "ssl_session->client_random.data_len=32;"
Breakpoint 1 at 0x403878: file Newversion.c, line 1985.
run //run the code in gdb
At breakpoint 1 the following info is in the variables
p ssl_session
$1 = (SslDecryptSession *) 0x60fc50 // I put some data in ssl_session->version_netorder earlier. So it is not null here. Everything works fine here
p ssl_session->client_random.data
$2 = (guchar *) 0x0
p ssl_session->client_random.data_len
$3 = 32
step // Execute 1 more line in the code
// I reach at the memcpy line and I get this error then
Breakpoint 1, got_packet (args=0x0, header=0x7fffffffe2c0, packet=0x7ffff6939086 "P=\345\203\376\177") at Newversion.c:1995
1995 memcpy(ssl_session->client_random.data, payload[cs_id], 32);
(gdb)
(gdb) s
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:27
27 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb)
28 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
29 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
30 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
31 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
32 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
33 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
34 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
35 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
(gdb)
Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:35
35 in ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S
There's many things that doesn't seems right with the code. The problematic line is :
memcpy(ssl_session->client_random.data, payload[cs_id], 32);
This line will copy what is pointed by payload[cs_id] at the adress pointed by ssl_session->client_random.data. Will do this for 32 bytes.
You provided the content of payload to memcpy instead of it's address, therefore the warning you get at compilation.
You probably meant something like
memcpy(ssl_session->client_random.data, &payload[cs_id], 32); // Note the & symbol
Also, there is a comment in your code stating that you are unsure whether you should use malloc or not. You do.
In the snippet of code you provided, payload is not initilized (therefore, unpredictable value) and ssl_session->client_random.data is initilized with NULL. This means you try to write at address 0, which will raise a segfault for sure. Moreover, before writing at address 0, you read a random address in the memory, which will most likely raise an exception as well.
To solve the issue, make sure your OS has given you a memory space to use before reading/writing in it.
const u_char payload[43]; // 43 is based on the example you provided
...
ssl_session->client_random.data = malloc(sizeof(u_char)*32); // Also based on your example
...
memcpy(ssl_session->client_random.data, &payload[cs_id], 32);
Hope this helps.
1-you forget to allocate the memory.
2- memcpy(ssl_session->client_random.data, &payload[cs_id], 32*sizeof(u_char)
SslDecryptSession *ssl_session = malloc(sizeof(struct _SslDecryptSession ));
ssl_session->client_random.data = NULL; //Make the stuff point somewhere. Else can use malloc also here. Not sure if this is a problem
ssl_session->server_random.data= NULL;
const u_char *payload; /* Packet payload */
//Case for client random
printf("Client Random ");
for (cs_id = 11; cs_id < 43; cs_id++){
printf("%hhX", payload[cs_id] );
}
printf("\n");
cs_id=11;
ssl_session->client_random.data_len=32;
guchar *pData = malloc(32*sizeof(guchar));
ssl_session->client_random.data = pData;
memcpy(ssl_session->client_random.data, &payload[cs_id], 32*sizeof(u_char);
The offending code is:
memcpy(ssl_session->client_random.data, payload[cs_id], 32);
With payload defined as:
const u_char *payload;
You seem to have a type mismatch for operand 2 of memcpy, you do not pass a pointer but an integer. The compiler should complain with a warning, and such warnings should not be ignored.
Did you mean to use memset() to initialize the data instead of memcpy()?

How to use GDB inside giant loops

I have the following loop and my code breaks but I don't know at which iteration it breaks exactly.
int n=1000;
for (i=0; i<n; i++) {
slot = random() % max_allocs;
doAlloc = random() % 4;
doWrite = writeData;
if (!doAlloc || ptr[slot] != NULL) {
if (ptr[slot] == NULL)
;//assert(Mem_Free(ptr[slot]) == -1);
else
{
printf("I got here \n");
printf("mem free ptr slot is %d \n",Mem_Free(ptr[slot]));
}
free(shadow[slot]);
ptr[slot] = NULL;
shadow[slot] = NULL;
}
if (doAlloc) {
size[slot] = min_alloc_size +
(random() % (max_alloc_size - min_alloc_size + 1));
printf("size[slot] :%d\n", size[slot]);
ptr[slot] = Mem_Alloc(size[slot], BESTFIT);
printf("ptr slot is %p \n",ptr[slot]);
assert(ptr[slot] != NULL);
if (doWrite) {
shadow[slot] = malloc(size[slot]);
int j;
for (j=0; j<size[slot]; j++) {
char data = random();
*((char*)(ptr[slot] + j)) = data;
*((char*)(shadow[slot] + j)) = data;
}
}
}
}
How can I find at which iteration of n the code breaks and how can I put a breakpoint at that iteration?
P.S.: Is there any other better debugger for this purpose in Linux? (If I don't want to use Eclipse!)
Here's the error I am receiving in gdb:
mymain: mymain.c:104: main: Assertion `ptr[slot] != ((void *)0)' failed.
Program received signal SIGABRT, Aborted.
0x000000368da328e5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) backtrace
#0 0x000000368da328e5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x000000368da340c5 in abort () at abort.c:92
#2 0x000000368da2ba0e in __assert_fail_base (fmt=<value optimized out>, assertion=0x40114b "ptr[slot] != ((void *)0)", file=0x401142 "mymain.c", line=<value optimized out>, function=<value optimized out>)
at assert.c:96
#3 0x000000368da2bad0 in __assert_fail (assertion=0x40114b "ptr[slot] != ((void *)0)", file=0x401142 "mymain.c", line=104, function=0x401199 "main") at assert.c:105
#4 0x0000000000400e2a in main (argc=4, argv=0x7fffffffdb68) at mymain.c:104
(gdb) frame 1
#1 0x000000368da340c5 in abort () at abort.c:92
92 raise (SIGABRT);
(gdb) frame 3
#3 0x000000368da2bad0 in __assert_fail (assertion=0x40114b "ptr[slot] != ((void *)0)", file=0x401142 "mymain.c", line=104, function=0x401199 "main") at assert.c:105
105 __assert_fail_base (_("%s%s%s:%u: %s%sAssertion `%s' failed.\n%n"),
How do you know the code is "breaking" in the first place? Usually it's because some variable suddenly takes on a value you don't expect. In this case, you can set a watchpoint rather than a breakpoint, and it'll break when and only when that variable goes outside of expectations.
For instance, with this program:
#include <stdio.h>
int main(void) {
int b = 0;
for ( int i = 0; i < 20; ++i ) {
b += 5;
}
return 0;
}
we can get gdb to stop when b hits or exceeds a certain value, and find out on exactly which iteration of the loop it occurred:
paul#local:~/src/c/scratch$ gdb testwatch
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/paul/src/c/scratch/testwatch...done.
(gdb) list
1 #include <stdio.h>
2
3 int main(void) {
4 int b = 0;
5 for ( int i = 0; i < 20; ++i ) {
6 b += 5;
7 }
8 return 0;
9 }
(gdb) break 5
Breakpoint 1 at 0x400567: file testwatch.c, line 5.
(gdb) run
Starting program: /home/paul/src/c/scratch/testwatch
Breakpoint 1, main () at testwatch.c:5
5 for ( int i = 0; i < 20; ++i ) {
(gdb) watch b > 20
Hardware watchpoint 2: b > 20
(gdb) continue
Continuing.
Hardware watchpoint 2: b > 20
Old value = 0
New value = 1
main () at testwatch.c:5
5 for ( int i = 0; i < 20; ++i ) {
(gdb) print b
$1 = 25
(gdb) print i
$2 = 4
(gdb)
Here we can tell that b went above 20 when i was 4, i.e. on the fifth iteration of the loop. You can watch for whole expressions, such as watch b > 20 && i > 10, to look for combinations of values that you don't expect to be simultaneously true. gdb is pretty powerful when you get into it.
You can watch for a variable becoming a particular value, or a pointer becoming NULL, or a range counter going past the last element of your array, or whatever other condition is resulting in your code being broken. Once it stops, you'll know exactly the point at which your error occurs, and you can poke around looking at other variables to figure out what's going wrong.
In general, a debugger wouldn't be all that useful if you had to know where and when an error was occurring before you could use it.
EDIT: Since updating your post, in your particular case, you can just use backtrace and get right to the iteration, e.g.
paul#local:~/src/c/scratch$ gdb segfault
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/paul/src/c/scratch/segfault...done.
(gdb) list 1,16
1 #include <stdlib.h>
2
3 void segfault(int * p) {
4 int n = *p;
5 }
6
7 int main(void) {
8 int n = 0;
9 int * parray[] = {&n, &n, &n, &n, NULL};
10
11 for ( int i = 0; i < 10; ++i ) {
12 segfault(parray[i]);
13 }
14
15 return 0;
16 }
(gdb) run
Starting program: /home/paul/src/c/scratch/segfault
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400568 in segfault (p=0x0) at segfault.c:4
4 int n = *p;
(gdb) backtrace
#0 0x0000000000400568 in segfault (p=0x0) at segfault.c:4
#1 0x00000000004005c1 in main () at segfault.c:12
(gdb) frame 1
#1 0x00000000004005c1 in main () at segfault.c:12
12 segfault(parray[i]);
(gdb) print i
$1 = 4
(gdb)
In your case, you'd go to whatever frame corresponds to the function your loop is in, and just do print i to get the loop index.
take a look at this: GDB Tutorial.
You can use break (to set a breakpoint) and continue / next to do what you want:
Don't forget to compile with -g option: gcc -g source.c
gdb ./a.out
break linenumber
continue or next (to proceed to the next breakpoint)
print variable (to print the value of variable)
Hope it helps.
From gdb's documentation 5.1.7 "Breakpoint Command Lists":
You can give any breakpoint (or watchpoint or catchpoint) a series of commands to execute when your program stops due to that breakpoint. For example, you might want to print the values of certain expressions, or enable other breakpoints.
So you can set a breakpoint in the loop that displays the iteration value, i, each time it is hit. That way when you crash you can see the last value printed:
break <line number just after start of the loop>
commands
silent
printf "i == %d\n", i
continue
end
Of course there are other (probably more efficient) ways of debugging this problem, but the technique of using a breakpoint to display information or perform other scripted actions then continue running is a valuable thing to have in your debugging toolbox.
If I want to set a breakpoint at line 94 when I am in the 500th iteration I should do it like this:
b 94 if i=500
generally you would say:
break line_number if condition
You seem to be hung up on finding the iteration on which it breaks, but the answer from nos, above, clearly states how to do this.
Run your program in GDB, wait for the code to crash (at which point GDB will grab it), and then work out which iteration it's crashed in by printing the value of the index variable using print i at the GDB prompt.
Edit: Ok, I think I understand. When you say the code "breaks", you mean it's breaking in such a way that allows it to continue to be executed: it's not crashing, and GDB isn't automatically catching it.
In this case, there's no way to determine where to set the breakpoint you want. You simply don't know when the problem is occurring. How are you determining that the program is breaking? Are there any variables you could print the value of to show when the breakage occurs? If so, you could have GDB print the values during each iteration (rather than writing debug directly into the code).
You can do this using the commands option. There's an example of how to do this in this thread.
On each iteration print the value of i and also the value of whichever variable you're using to track the breakage. This should then give you the iteration on which the breakage occurs, and you can go back and set a breakpoint in the right place.

Resources