This is the first time I work with wchar and I found something surprising about it. I can't find the answer so I will share my experiences.
I have a simple test program (based on a swprintf example)
#include <stdio.h>
#include <wchar.h>
int main()
{
wchar_t str_unicode[100] = {0};
swprintf(str_unicode, sizeof(str_unicode), L"%3d\n", 120);
fputws(str_unicode, stdout);
return 0;
}
Compiling it with or without optimization works fine:
gcc -O2 test.c -o test
Running it also works fine:
./test
120
But in my current project, I use -D_FORTIFY_SOURCE=2, and it makes this simple program crash:
gcc -O2 -D_FORTIFY_SOURCE=2 test.c -o test
./test
*** buffer overflow detected ***: terminated
[1] 28569 IOT instruction (core dumped) ./test
I have more details with valgrind, and it seems that __swprintf_chk fails.
valgrind ./test
==30068== Memcheck, a memory error detector
==30068== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30068== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==30068== Command: ./test
==30068==
*** buffer overflow detected ***: terminated
==30068==
==30068== Process terminating with default action of signal 6 (SIGABRT): dumping core
==30068== at 0x48A29E5: raise (in /usr/lib64/libc-2.32.so)
==30068== by 0x488B8A3: abort (in /usr/lib64/libc-2.32.so)
==30068== by 0x48E5006: __libc_message (in /usr/lib64/libc-2.32.so)
==30068== by 0x4975DF9: __fortify_fail (in /usr/lib64/libc-2.32.so)
==30068== by 0x4974695: __chk_fail (in /usr/lib64/libc-2.32.so)
==30068== by 0x49752C4: __swprintf_chk (in /usr/lib64/libc-2.32.so)
==30068== by 0x401086: main (in /home/pierre/workdir/test)
==30068==
==30068== HEAP SUMMARY:
==30068== in use at exit: 0 bytes in 0 blocks
==30068== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==30068==
==30068== All heap blocks were freed -- no leaks are possible
==30068==
==30068== For lists of detected and suppressed errors, rerun with: -s
==30068== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1] 30068 IOT instruction (core dumped) valgrind ./test
I don't understand why this check fails, the buffer size is more than enough (100) for a single integer. Is it a bug in libc? Or did I do something wrong?
My GCC version is 10.3.1
gcc --version
gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)
Your problem is the second parameter of the function call -
swprintf(str_unicode, sizeof(str_unicode), L"%3d\n", 120);
You passed in the size in bytes of the entire array - i.e. 400 bytes if sizeof(wchar_t) == 4.
But swprintf's second parameter is the number of cells in the wchar_t array - i.e. 100 cells in your example.
Change your function call to:
swprintf(str_unicode, sizeof(str_unicode) / sizeof(wchar_t), L"%3d\n", 120);
Related
Description
This is an example created to introduce issue from larger solution. I have to use flex and yy_scan_string(). I have an issue with memory leaks in flex (code below). In this example memory leaks are marked as "still reachable", but in the original solution they get marked as "lost memory".
I think this problem is somewhere in memory allocated by flex internally, which I don't know how to free properly and I can't find any tutorial / documentation for that issue.
file.lex
%option noyywrap
%{
#include <stdio.h>
%}
%%
. printf("%s\n", yytext);
%%
int main() {
printf("Start\n");
yy_scan_string("ABC");
yylex();
printf("Stop\n");
return 0;
}
bash$ flex file.lex
bash$ gcc lex.yy.c
bash$ ./a.out
Start
H
a
l
l
o
W
o
r
l
d
Stop
bash$ valgrind --leak-check=full --show-leak-kinds=all ./a.out
==6351== Memcheck, a memory error detector
==6351== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6351== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6351== Command: ./a.out
==6351==
==6351== error calling PR_SET_PTRACER, vgdb might block
Start
A
B
C
Stop
==6351==
==6351== HEAP SUMMARY:
==6351== in use at exit: 77 bytes in 3 blocks
==6351== total heap usage: 4 allocs, 1 frees, 589 bytes allocated
==6351==
==6351== 5 bytes in 1 blocks are still reachable in loss record 1 of 3
==6351== at 0x483577F: malloc (vg_replace_malloc.c:299)
==6351== by 0x10AE84: yyalloc (in ./a.out)
==6351== by 0x10ABE5: yy_scan_bytes (in ./a.out)
==6351== by 0x10ABBC: yy_scan_string (in ./a.out)
==6351== by 0x10AEE2: main (in /home/./a.out)
==6351==
==6351== 8 bytes in 1 blocks are still reachable in loss record 2 of 3
==6351== at 0x483577F: malloc (vg_replace_malloc.c:299)
==6351== by 0x10AE84: yyalloc (in ./a.out)
==6351== by 0x10A991: yyensure_buffer_stack (in ./a.out)
==6351== by 0x10A38D: yy_switch_to_buffer (in ./a.out)
==6351== by 0x10AB8E: yy_scan_buffer (in ./a.out)
==6351== by 0x10AC68: yy_scan_bytes (in ./a.out)
==6351== by 0x10ABBC: yy_scan_string (in ./a.out)
==6351== by 0x10AEE2: main (in ./a.out)
==6351==
==6351== 64 bytes in 1 blocks are still reachable in loss record 3 of 3
==6351== at 0x483577F: malloc (vg_replace_malloc.c:299)
==6351== by 0x10AE84: yyalloc (in ./a.out)
==6351== by 0x10AAEF: yy_scan_buffer (in ./a.out)
==6351== by 0x10AC68: yy_scan_bytes (in ./a.out)
==6351== by 0x10ABBC: yy_scan_string (in ./a.out)
==6351== by 0x10AEE2: main (in ./a.out)
==6351==
==6351== LEAK SUMMARY:
==6351== definitely lost: 0 bytes in 0 blocks
==6351== indirectly lost: 0 bytes in 0 blocks
==6351== possibly lost: 0 bytes in 0 blocks
==6351== still reachable: 77 bytes in 3 blocks
==6351== suppressed: 0 bytes in 0 blocks
==6351==
==6351== For counts of detected and suppressed errors, rerun with: -v
==6351== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
The answer is mentioned in the flex documentation, but you have to read carefully to find it. (See below.)
If you generate a reentrant scanner, then you are responsible for creating and destroying each scanner object you require, and the scanner object manages all of the memory required by a scanner instance. But even if you don't use the reentrant interface, you can use yylex_destroy to manage memory. In the traditional non-reentrant interface, then there is no scanner_t argument, so the prototype of yylex_destroy is simply
int yylex_destroy(void);
(Although it has a return value which is supposed to be a status code, it never returns an error.)
You can, if you want to, call yylex_init, which also takes no arguments in the non-reentrant interface, but unlike the reentrant interface, it is not necessary to call it.
From the manual chapter on memory management:
Flex allocates dynamic memory during initialization, and once in a while from within a call to yylex(). Initialization takes place during the first call to yylex(). Thereafter, flex may reallocate more memory if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up all memory when you call yylex_destroy See faq-memory-leak.
Example:
$ cat clean.l
%option noinput nounput noyywrap nodefault
%{
#include <stdio.h>
%}
%%
.|\n ECHO;
%%
int main() {
printf("Start\n");
yy_scan_string("ABC");
yylex();
printf("Stop\n");
yylex_destroy();
return 0;
}
$ flex -o clean.c clean.l
$ gcc -Wall -o clean clean.c
$ valgrind --leak-check=full --show-leak-kinds=all ./clean
==16187== Memcheck, a memory error detector
==16187== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16187== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==16187== Command: ./clean
==16187==
Start
ABCStop
==16187==
==16187== HEAP SUMMARY:
==16187== in use at exit: 0 bytes in 0 blocks
==16187== total heap usage: 4 allocs, 4 frees, 1,101 bytes allocated
==16187==
==16187== All heap blocks were freed -- no leaks are possible
==16187==
==16187== For counts of detected and suppressed errors, rerun with: -v
==16187== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
In my C code I call assembly (NASM x86_64) function (main.c).
In assembly I call C function (printf.asm)
That function prints float using printf. (c_printf.c)
I'm getting segfault.
Should I compile/link in another way? Maybe the way I call C function from assembly is incorrect?
I compile with:
gcc -c c_printf.c
nasm -f elf64 -F dwarf -g printf.asm
gcc -o main printf.o c_printf.o main.c
My code:
main.c
extern void start_asm(void);
int main(void) {
start_asm();
return 0;
}
printf.asm
global start_asm
extern printf_float
start_asm:
call printf_float
ret
c_printf.c
#include <stdio.h>
void printf_float(void) {
printf("%f\n", 42.0f);
}
Maybe valgrind output will be helpful?
==16257== Memcheck, a memory error detector
==16257== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==16257== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==16257== Command: ./main
==16257==
--16257-- WARNING: Serious error when reading debug info
--16257-- When reading debug info from /home/starsep/stack/main:
--16257-- Overrun whilst parsing .debug_abbrev section(2)
==16257==
==16257== Process terminating with default action of signal 11 (SIGSEGV)
==16257== General Protection Fault
==16257== at 0x4E8F824: printf (printf.c:28)
==16257== by 0x40055F: printf_float (in /home/starsep/stack/main)
==16257== by 0x400534: ??? (printf.asm:6)
==16257== by 0x4E5A82F: (below main) (libc-start.c:291)
==16257==
==16257== HEAP SUMMARY:
==16257== in use at exit: 0 bytes in 0 blocks
==16257== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==16257==
==16257== All heap blocks were freed -- no leaks are possible
==16257==
==16257== For counts of detected and suppressed errors, rerun with: -v
==16257== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1] 16257 segmentation fault valgrind ./main
(original post was here)
Consider the following clearly buggy program:
#include <string.h>
int main()
{
char string1[10] = "123456789";
char *string2 = "123456789";
strcat(string1, string2);
}
and suppose to compile it:
gcc program.c -ggdb
and run valgrind on it:
valgrind --track-origins=yes --leak-check=yes --tool=memcheck --read-var-info=yes ./a.out
In the result, no error is shown:
==29739== Memcheck, a memory error detector
==29739== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==29739== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==29739== Command: ./a.out
==29739==
==29739==
==29739== HEAP SUMMARY:
==29739== in use at exit: 0 bytes in 0 blocks
==29739== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==29739==
==29739== All heap blocks were freed -- no leaks are possible
==29739==
==29739== For counts of detected and suppressed errors, rerun with: -v
==29739== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
What am I missing?
It did not report anything wrong because you were using memcheck, which does not perform check on global or stack arrays, it only perform bounds checks and use-after-free checks for heap arrays. So in this case, you can use valgrind SGCheck to check stack arrays:
valgrind --tool=exp-sgcheck ./a.out
It indeed report the error for me.
For more information, refer the sgcheck docs:
http://valgrind.org/docs/manual/sg-manual.html
adding the log:
$ valgrind --tool=exp-sgcheck ./a.out
==10485== exp-sgcheck, a stack and global array overrun detector
==10485== NOTE: This is an Experimental-Class Valgrind Tool
==10485== Copyright (C) 2003-2015, and GNU GPL'd, by OpenWorks Ltd et al.
==10485== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==10485== Command: ./a.out
==10485==
==10485== Invalid read of size 1
==10485== at 0x4C2A374: strlen (h_intercepts.c:131)
==10485== by 0x4E9DD5B: puts (in /usr/lib64/libc-2.22.so)
==10485== by 0x4005C8: main (v.c:11)
==10485== Address 0xfff00042a expected vs actual:
==10485== Expected: stack array "string1" of size 10 in frame 2 back from here
==10485== Actual: unknown
==10485== Actual: is 0 after Expected
==10485==
==10485== Invalid read of size 1
==10485== at 0x4EA9BA2: _IO_default_xsputn (in /usr/lib64/libc-2.22.so)
==10485== by 0x4EA7816: _IO_file_xsputn##GLIBC_2.2.5 (in /usr/lib64/libc-2.22.so)
==10485== by 0x4E9DDF7: puts (in /usr/lib64/libc-2.22.so)
==10485== by 0x4005C8: main (v.c:11)
==10485== Address 0xfff00042a expected vs actual:
==10485== Expected: stack array "string1" of size 10 in frame 3 back from here
==10485== Actual: unknown
==10485== Actual: is 0 after Expected
==10485==
123456789123456789
==10485==
==10485== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
I have a simple question here. I have some variable declarations as follows:
char long_name_VARA[]="TEST -- Gridded 450m daily Evapotranspiration (ET)";
int16 fill_PET_8day=32767;
Given the above valgrind complains for the char declaration as follows:
Invalid write of size 8
==21902== at 0x408166: main (main.c:253)
==21902== Location 0x7fe677840 is 0 bytes inside long_name_VARA[0]
and for the int16 declaration as follows:
==21902== Invalid write of size 2
==21902== at 0x408178: main (main.c:226)
Location 0x7fe677420 is 0 bytes inside local var "fill_PET_8day"
What am i doing wrong in my declarations here?
Also can I not declare a char array like this:
char temp_year[5]={0}
The warning messages you quoted show invalid memory accesses, which happen to hit memory areas belonging to the above two variables. The variables in question are victims of the error, not perpetrators. The variables are not to blame here. Nothing is wrong with the above declarations. Most likely these declarations are not in any way relevant here.
The perpetrators are lines at main.c:253 and main.c:226, which you haven't quoted yet. That's where your problem occurs.
A wild guess would be that you have another object declared after fill_PET_8day (an array?). When working with that other object, you overrun its memory boundary by ~10 bytes, thus clobbering fill_PET_8day and first 8 bytes of long_name_VARA. This is what valgrind is warning you about.
As mentioned, the declarations aren't the issue.
Although FYI, it may be better to declare your constant string as const...
const char* long_name_VARA = "TEST -- Gridded 450m daily Evapotranspiration (ET)";
or even...
const char* const long_name_VARA = "TEST -- Gridded 450m daily Evapotranspiration (ET)";
This prevents the string from being modified in code, (and the pointer).
When I run the following
/* test.c */
#include <stdio.h>
int main(void)
{
char s[8000000];
int x;
s[0] = '\0';
x=5;
printf("%s %d\n",s,x);
return 0;
}
with Valgrind I get
$ gcc test.c
$ valgrind ./a.out
==1828== Memcheck, a memory error detector
==1828== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1828== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==1828== Command: ./a.out
==1828==
==1828== Warning: client switching stacks? SP change: 0xfff0002e0 --> 0xffe85f0d0
==1828== to suppress, use: --max-stackframe=8000016 or greater
==1828== Invalid write of size 1
==1828== at 0x400541: main (in /home/m/a.out)
==1828== Address 0xffe85f0d0 is on thread 1's stack
==1828== in frame #0, created by main (???)
==1828==
==1828== Invalid write of size 8
==1828== at 0x400566: main (in /home/m/a.out)
==1828== Address 0xffe85f0c8 is on thread 1's stack
==1828== in frame #0, created by main (???)
==1828==
==1828== Invalid read of size 1
==1828== at 0x4E81ED3: vfprintf (vfprintf.c:1642)
==1828== by 0x4E88038: printf (printf.c:33)
==1828== by 0x40056A: main (in /home/m/a.out)
==1828== Address 0xffe85f0d0 is on thread 1's stack
==1828== in frame #2, created by main (???)
==1828==
5
==1828== Invalid read of size 8
==1828== at 0x4E88040: printf (printf.c:37)
==1828== by 0x40056A: main (in /home/m/a.out)
==1828== Address 0xffe85f0c8 is on thread 1's stack
==1828== in frame #0, created by printf (printf.c:28)
==1828==
==1828== Warning: client switching stacks? SP change: 0xffe85f0d0 --> 0xfff0002e0
==1828== to suppress, use: --max-stackframe=8000016 or greater
==1828==
==1828== HEAP SUMMARY:
==1828== in use at exit: 0 bytes in 0 blocks
==1828== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1828==
==1828== All heap blocks were freed -- no leaks are possible
==1828==
==1828== For counts of detected and suppressed errors, rerun with: -v
==1828== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
If I give the option Valgrind warns with I get
$ valgrind --max-stackframe=10000000 ./a.out
==1845== Memcheck, a memory error detector
==1845== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1845== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==1845== Command: ./a.out
==1845==
5
==1845==
==1845== HEAP SUMMARY:
==1845== in use at exit: 0 bytes in 0 blocks
==1845== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1845==
==1845== All heap blocks were freed -- no leaks are possible
==1845==
==1845== For counts of detected and suppressed errors, rerun with: -v
==1845== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
So the "Invalid reads/writes" are due to the large stack variable without the proper --max-stackframe=... option.
I'm not able to figure out why Valgrind is printing Invalid read of size 8 when using wchar_t. I'm running a 64bit Ubuntu (3.5.0-25) system with valgrind-3.7.0 and gcc 4.7.2.
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <string.h>
int main()
{
// const wchar_t *text = L"This is a t"; // no Valgrind error
// const wchar_t *text = L"This is a teeeeeeee"; // no Valgrind error
const wchar_t *text = L"This is a test"; // Valgrind ERRROR
wchar_t *new_text = NULL;
new_text = (wchar_t*) malloc( (wcslen(text) + 1) * sizeof(wchar_t));
wcsncpy(new_text, text, wcslen(text));
new_text[wcslen(text)] = L'\0';
printf("new_text: %ls\n", new_text);
free(new_text);
return 0;
}
Compile:
$ gcc -g -std=c99 test.c -o test
$ valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-reachable=yes ./test
Valgrind results:
==19495== Memcheck, a memory error detector
==19495== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==19495== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==19495== Command: ./test
==19495==
==19495== Invalid read of size 8
==19495== at 0x4ED45A7: wcslen (wcslen.S:55)
==19495== by 0x4ED5C0E: wcsrtombs (wcsrtombs.c:74)
==19495== by 0x4E7D160: vfprintf (vfprintf.c:1630)
==19495== by 0x4E858D8: printf (printf.c:35)
==19495== by 0x4006CC: main (test.c:16)
==19495== Address 0x51f1078 is 56 bytes inside a block of size 60 alloc'd
==19495== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19495== by 0x40066F: main (test.c:12)
==19495==
new_text: This is a test
==19495==
==19495== HEAP SUMMARY:
==19495== in use at exit: 0 bytes in 0 blocks
==19495== total heap usage: 1 allocs, 1 frees, 60 bytes allocated
==19495==
==19495== All heap blocks were freed -- no leaks are possible
==19495==
==19495== For counts of detected and suppressed errors, rerun with: -v
==19495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
Now if I run the same but with a 'working string', let's say
const wchar_t *text = L"This is a t"; // no Valgrind error
// const wchar_t *text = L"This is a teeeeeeee"; // no Valgrind error
// const wchar_t *text = L"This is a test"; // Valgrind ERRROR
I get no issue:
==19571== Memcheck, a memory error detector
==19571== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==19571== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==19571== Command: ./test
==19571==
new_text: This is a t
==19571==
==19571== HEAP SUMMARY:
==19571== in use at exit: 0 bytes in 0 blocks
==19571== total heap usage: 1 allocs, 1 frees, 48 bytes allocated
==19571==
==19571== All heap blocks were freed -- no leaks are possible
==19571==
==19571== For counts of detected and suppressed errors, rerun with: -v
==19571== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
At first I thought the string size should be always be multiple of 8 (maybe some wcs read chunks of 8) but some cases failed, then I thought I'd have to append always 8 bytes for the NULL terminator ((wcslen(item) + 2) * sizeof(wchar_t)), it worked but that doesn't make any sense since sizeof(wchar_t) - in my system - is 4 bytes and should be enough to handle the L'\0' terminator.
I also read the glibc wcslen source code but nothing new. I'm now thinking of Valgrind issue. Do you guys could throw some light here? Does it worth to file a bug against Valgrind?
Thank you
This is probably caused by SSE optimisation of the wcslen function; see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=798968 or https://bugs.archlinux.org/task/30643.
When optimising wcslen, it's faster to read multiple wide characters at a time and use vectorised instructions (SSE) to compare them to L'\0'. Unfortunately valgrind sees this as an uninitialised read - which it is, but it's harmless because the result of wcslen does not depend on the uninitialised value.
The fix is to update valgrind in the hope that a newer version will suppress the false positive.