I'm converting some source from VC6 to VS2010. The code is written in C++/CLI and it is an MFC application. It includes a line:
BYTE mybyte;
sscanf(source, "%x", &mybyte);
Which is fine for VC6 (for more than 15 years) but causing problems in VS2010 so I created some test code.
void test_WORD_scanf()
{
char *source = "0xaa";
char *format = "%x";
int result = 0;
try
{
WORD pre = -1;
WORD target = -1;
WORD post = -1;
printf("Test (pre scan): stack: pre=%04x, target=%04x, post=%04x, sourse='%s', format='%s'\n", pre, target, post, source, format);
result = sscanf(source, format, &target);
printf("Test (post scan): stack: pre=%04x, target=%04x, post=%04x, sourse='%s', format='%s'\n", pre, target, post, source, format);
printf("result=%x", result);
// modification suggested by Werner Henze.
printf("&pre=%x sizeof(pre)=%x, &target=%x, sizeof(target)=%x, &post=%x, sizeof(post)=%d\n", &pre, sizeof(pre), &target, sizeof(target), &post, sizeof(post));
}
catch (...)
{
printf("Exception: Bad luck!\n");
}
}
Building this (in DEBUG mode) is no problem. Running it gives strange results that I cannot explain. First, I get the output from the two printf statemens as expected. Then a get a run time waring, which is the unexpected bit for me.
Test (pre scan): stack: pre=ffff, target=ffff, post=ffff, source='0xaa', format='%x'
Test (post scan): stack: pre=ffff, target=00aa, post=ffff, source='0xaa', format='%x'
result=1
Run-Time Check Failure #2 - Stack around the variable 'target' was corrupted.
Using the debugger I found out that the run time check failure is triggered on returning from the function. Does anybody know where the run time check failure comes from? I used Google but can't find any suggestion for this.
In the actual code it is not a WORD that is used in sscanf but a BYTE (and I have a BYTE version of the test function). This caused actual stack corruptions with the "%x" format (overwriting variable pre with 0) while using "%hx" (what I expect to be the correct format) is still causing some problems in overwriting the lower byte of variable prev.
Any suggestion is welcome.
Note: I edited the example code to include the return result from sscanf()
Kind regards,
Andre Steenveld.
sscanf with %x writes an int. If you provide the address of a BYTE or a WORD then you get a buffer overflow/stack overwrite. %hx will write a short int.
The solution is to have an int variable, let sscanf write to that and then set your WORD or BYTE variable to the read value.
int x;
sscanf("%x", "0xaa", x);
BYTE b = (BYTE)x;
BTW, for your test and the message
Run-Time Check Failure #2 - Stack around the variable 'target' was corrupted.
you should also print out the addresses of the variables and you'll probably see that the compiler added some padding/security check space between the variables pre/target/post.
Related
This is my first time using Ghidra and debugging. My project deals with reverse engineering a Dos executable from 2007, to understand how it generates a code.
I looked for the strings I can read when launching the program through wine (debugging under linux) and found one place :
/* Reverses the string */
__strrev(local_8);
local_4 = 0;
DISPLAY_MESSAGE(s__Code_=_%s_0040704c);
with DISPLAY_MESSAGE being :
int __cdecl DISPLAY_MESSAGE(byte *param_1)
{
int iVar1;
int errorCode;
iVar1 = FUN_004019c0((undefined4 *)&DAT_004072e8);
errorCode = FUN_00401ac0((char **)&DAT_004072e8,param_1,(undefined4 *)&stack0x00000008);
FUN_00401a60(iVar1,(int *)&DAT_004072e8);
return errorCode;
}
I named the function "DISPLAY_MESSAGE" because I saw the string on the screen ;-). I would like to name it printf but its signature does not match the one of printf since it takes byte * instead of char *, ... as input parameters and returns an int instead of void for the actual printf.
The string "Code = %s" (stripping the CRs and new lines) is actually located at address "0040704c", and I am very surprised not to see the variable holding the generated code value instead (that could help me rename the variables).
If I change the signature to the one of printf it yields :
DISPLAY_MESSAGE(s__Code_=_%s_0040704c,local_8)
which looks better, because local_8 could be the code, but I don't know if it is correct to change the signature like this (since then the local variable that I renamed errorCode is never used whereas it was returned before signature change).
void __cdecl DISPLAY_MESSAGE(char *param_1,...)
{
int iVar1;
int errorCode;
iVar1 = FUN_004019c0((undefined4 *)&DAT_004072e8);
FUN_00401ac0((char **)&DAT_004072e8,(byte *)param_1,(undefined4 *)&stack0x00000008);
FUN_00401a60(iVar1,(int *)&DAT_004072e8);
return;
}
So my questions are :
Why is Ghidra appending _0040704c to the string (should it help me, and how should I make use of this piece of info) ?
If my signature change is correct, what prevents Ghidra from finding the correct signature from its analysis ?
Should I think there is a problem with the function signature whenever I see undefinedX as it appears in DISPLAY_MESSAGE ?
Any help greatly appreciated!
I try to compile wingraphviz for x64 (it's an old, unmaintained project), and ran into a very strange problem :
There's a call to getDefaultFont() that looks like this :
const char* def = getDefaultFont();
Deffontname = late_nnstring(g->proto->n,N_fontname,def);
(original code did the call inside function call, but I extracted it for understanding)
the getDefaultFont function is very simple, and returns a string litteral based on current charset :
const char * getDefaultFont() {
switch(DOT_CODEPAGE) {
case CP_KOREAN:
return CP_949_DEFAULTFONT;
break;
[...]
default:
return DEFAULT_FONTNAME;
break;
}
}
with DEFAULT_FONTNAME & others defined in a header file :
#define DEFAULT_FONTNAME "Times New Roman"
I changed the return to { const char* r = DEFAULT_FONTNAME; return r; } to see the value while debugging: r is correct at return instruction.
But when the debugger returns to caller function, def points to invalid memory.
I ran the debugger in assembly mode, and see that :
const char* def = getDefaultFont();
000007FEDA1244FE call getDefaultFont (07FEDA1291A0h)
000007FEDA124503 cdqe
000007FEDA124505 mov qword ptr [def],rax
after the call instruction, RAX contains the correct value, a pointer to .data : RAX = 000007FEDA0C9A20
but the next instruction, cqde "Convert dword (eax) to qword (rax)." destroy the 4 higher bytes, and now RAX = FFFFFFFFDA0C9A20. Then the third stores the truncated value on stack.
After that, late_nnstring() tries to de-reference the corrupted pointer and crashes...
Do you know why VS inserts this cqde instruction ?
All theses functions are in .c files under the same project.
I've implemented a workaround, using strdup() to return low-memory addresses, but it's not safe (maybe heap can use memory after 4G?) (and there my be some other cases I did not find while testing that will crash when using the library)
I published the files here : https://gitlab.com/data-public/wingraphviz
especially :
caller at https://gitlab.com/data-public/wingraphviz/blob/97085eeb6e9356c7784965c5a43757d8db3fec41/dependencies/graphviz-1.8.10/dotneato/common/emit.c#L842
getDefaultFont at https://gitlab.com/data-public/wingraphviz/blob/97085eeb6e9356c7784965c5a43757d8db3fec41/dependencies/graphviz-1.8.10/dotneato/common/utils.c#L111
constant defines at https://gitlab.com/data-public/wingraphviz/blob/97085eeb6e9356c7784965c5a43757d8db3fec41/dependencies/graphviz-1.8.10/dotneato/common/const.h#L49
Your links require some account I don’t have.
You likely failed to include the header declaring that function, or messed up with headers order. Here’s more info why C compiler inserts cdqe.
P.S. Great example why you should read, and fix, compiler warnings.
Update: If you have circular dependency problem and can’t just include utils.h, a quick workaround is declare const char * getDefaultFont(); in emit.c before you call that function.
The following code just keeps on crashing when it reaches the part with _itoa, I've tried to implement that function instead and then it got even weirder, it just kept on crashing when I ran it without the debugger but worked fine while working with the debugger.
# include "HNum.h"
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <assert.h>
# define START_value 30
typedef enum {
HNUM_OUT_OF_MEMORY = -1,
HNUM_SUCCESS = 0,
} HNumRetVal;
typedef struct _HNum{
size_t Size_Memory;
char* String;
}HNum;
HNum *HNum_alloc(){
HNum* first = (HNum*)malloc(sizeof(HNum));
if(first==NULL){
return NULL;
}
first->String =(char*)malloc(sizeof(START_value));
if(first->String==NULL){
return NULL;
}
first->Size_Memory = START_value; // slash zero && and starting from zero index;
return first;
}
HNumRetVal HNum_setFromInt(HNum *hnum, int nn){
itoa(nn,hnum->String,10);
}
void main(){
HNum * nadav ;
int h = 13428637;
nadav = HNum_alloc();
nadav->String="1237823423423434";
HNum_setFromInt(nadav,h);
printf("nadav string : %s \n ",nadav->String);
//printf("w string %s\n",w->String);
//printf("nadav string %s\n",nadav->String);
HNum_free(nadav);
}
I've been trying to figure this out for hours and couldn't come up with anything...
The IDE I'm using is Visual Studio 2012 express, the crash shows the following:
"PROJECT C.exe has stopped working
windows can check online for a solution to the program."
first->String =(char*)malloc(sizeof(START_value));
should be
first->String = malloc(START_value);
The current version allocates space for sizeof(int)-1 characters (-1 to leave space for the nul terminator). This is too small to hold your target value so _itoa writes beyond memory allocated for first->String. This results in undefined behaviour; it is quite possible for different runs to fail in different places or debug/release builds to behave differently.
You also need to remove the line
nadav->String="1237823423423434";
which leaks the memory allocated for String in HNum_alloc, replacing it with a pointer to a string literal. This new pointer should be considered to be read-only; you cannot write it it inside _itoa
Since I'm not allowed to comment:
simonc's answer is correct. If you find the following answer useful, you should mark his answer as the right one:P
I tried that code myself and the only thing missing is lets say:
strcpy(nadav->String, "1237823423423434"); INSTEAD OF nadav->String="1237823423423434";
and
first->String = malloc(START_value); INSTEAD OF first->String =(char*)malloc(sizeof(START_value));
Also, maybe you'd have to use _itoa instead of itoa, that's one of the things I had to change in my case anyhow.
If that doesn't work, you should probably consider using a different version of VS.
Here is a C function that segfaults:
void compileShaders(OGL_STATE_T *state) {
// First testing to see if I can access object properly. Correctly outputs:
// nsHandle: 6
state->nsHandle = 6;
printf("nsHandle: %d\n", state->nsHandle);
// Next testing if glCreateProgram() returns proper value. Correctly outputs:
// glCreateProgram: 1
printf("glCreateProgram: %d\n", glCreateProgram());
// Then the program segfaults on the following line according to gdb
state->nsHandle = glCreateProgram();
}
For the record state->nsHandle is of type GLuint and glCreateProgram() returns a GLuint so that shouldn't be my problem.
gdb says that my program segfaults on line 303 which is actually the comment line before that line. I don't know if that actually matters.
Is gdb lying to me? How do I debug this?
EDIT:
Turned off optimizations (-O3) and now it's working. If somebody could explain why that would be great though.
EDIT 2:
For the purpose of the comments, here's a watered down version of the important components:
typedef struct {
GLuint nsHandle;
} OGL_STATE_T;
int main (int argc, char *argv[]) {
OGL_STATE_T _state, *state=&_state;
compileShaders(state);
}
EDIT 3:
Here's a test I did:
int main(int argc, char *argv[]) {
OGL_STATE_T _state, *state=&_state;
// Assign value and try to print it in other function
state->nsHandle = 5;
compileShaders(state);
}
void compileShaders(OGL_STATE_T *state) {
// Test to see if the first call to state is getting optimized out
// Correctly outputs:
// nsHandle (At entry): 5
printf("nsHandle (At entry): %d\n", state->nsHandle);
}
Not sure if that helps anything or if the compiler would actually optimize the value from the main function.
EDIT 4:
Printed out pointer address in main and compileShaders and everything matches. So I'm gonna assume it's segfaulting somewhere else and gdb is lying to me about which line is actually causing it.
This is going to be guesswork based on what you have, but with optimization on this line:
state->nsHandle = 6;
printf("nsHandle: %d\n", state->nsHandle);
is probably optimized to just
printf("nsHandle: 6\n");
So the first access to state is where the segfault is. With optimization on GDB can report odd line numbers for where the issue is because the running code may no longer map cleanly to source code lines as you can see from the example above.
As mentioned in the comments, state is almost certainly not initialized. Some other difference in the optimized code is causing it to point to an invalid memory area whereas the non-optimized code it's pointing somewhere valid.
This might happen if you're doing something with pointers directly that prevents the optimizer from 'seeing' that a given variable is used.
A sanity check would be useful to check that state != 0 but it'll not help if it's non-zero but invalid.
You'd need to post the calling code for anyone to tell you more. However, you asked how to debug it -- I would print (or use GDB to view) the value of state when that function is entered, I imagine it will be vastly different in optimized and non-optimized versions. Then track back to the function call to work out why that's the case.
EDIT
You posted the calling code -- that should be fine. Are you getting warnings when compiling (turn all the warnings on with -Wall). In any case my advice about printing the value of state in different scenarios still stands.
(removed comment about adding & since you edited the question again)
When you optimize your program, there is no more 1:1 mapping between source lines and emmitted code.
Typically, the compiler will reorder the code to be more efficient for your CPU, or will inline function call, etc...
This code is wrong:
*state=_state
It should be:
*state=&_state
Well, you edited your post, so ignore the above fix.
Check for the NULL condition before de-referencing the pointer or reading it. If the values you pass are NULL or if the values stored are NULL then you will hit segfault without performing any checks.
FYI: GDB Can't Lie !
I ended up starting a new thread with more relevant information and somebody found the answer. New thread is here:
GCC: Segmentation fault and debugging program that only crashes when optimized
First off, this snippet is not meant for production code. So please, no lecturing about it "being unsafe." Thanks!
So, the following code is part of a parser that takes in a csv and uses it to populate an sqlite3 db. When compiled and ran in Snow Leopard, it worked just fine. Now that I've switched to Lion, the scanf statement throws Bus Error: 10. Specifically, it seems to have something to do with how I am consuming and discarding the '\n' at the end of each line:
int main()
{
sqlite3* db;
sqlite3_open("someExistingDB.sqlite3", &db);
FILE *pFile;
pFile = fopen("excelData.csv","r");
char name[256],country[256], last[256], first[256], photoURI[256];
char sqlStatement[16384];
while(fscanf(pFile, "%[^,],%[^,],%[^,],%[^,],%[^\n]%*c", name, country, last,first, photoURI) != EOF)
{
blah...
...
if I remove the last %*c, which is meant to consume the '\n' and ignore it so as to advance to the next line, the program does not crash. But of course does an incorrect parsing.
Also, mind you, the EOF doesn't seem to be the problem; I'e also tried a single fscanf statement instead of the while-loop shown above.
Any thoughts?
EDIT: Let me add that the code was originally compiled and ran in an intel core duo (32-bit) macbook with Snow Leopard and now I am compiling it and running it on a MacPro (64-bit) with Lion. So I wonder if it might have something to do with alignment?
Interesting. Bus errors are usually due to alignment issues but that may not be the case here since all you're scanning in is chars.
One thing you may want to consider is to fgets the entire line into a buffer and the sscanf it. This will allow you to do two things:
print out the line in a debug statement before sscanfing it (or after scanning, if the expected conversion count is wrong), so you can see if there are any problems; and
not worry about trying to align the line-ending with fscanf, since fgets does a good job of this already.
So it would be something like (untested):
char bigHonkinBuffer[16384];
while (fgets (bigHonkinBuffer, sizeof(bigHonkinBuffer), pFile) != NULL) {
if (sscanf(bigHonkinBuffer, "%[^,],%[^,],%[^,],%[^,],%[^\n]", name, country, last,first, photoURI) != 5) {
// printf ("Not scanned properly: [%s]\n", bigHonkinBuffer);
exit (1);
}
}
You should also check the return values from the sqlite3_open and fopen calls, if this is anything more than "play" code (i.e., if there's the slightest possibility that those files may not exist).
I tried the following adaptation of your code on a Mac Mini running Lion (10.7.1) with XCode 4.
#include <stdio.h>
static void print(const char *tag, const char *str)
{
printf("%8s: <<%s>>\n", tag, str);
}
int main(void)
{
FILE *pFile = fopen("excelData.csv","r");
char name[256], country[256], last[256], first[256], photoURI[256];
while (fscanf(pFile, "%[^,],%[^,],%[^,],%[^,],%[^\n]%*c",
name, country, last, first, photoURI) == 5)
{
print("name", name);
print("country", country);
print("last", last);
print("first", first);
print("photoURI", photoURI);
}
return 0;
}
I produced a 64-bit binary using:
gcc -O -std=c99 -Wall -Wextra xxx.c -o xxx
There were no warnings of any sort. Given the input data:
Monster,United States,Smith,John,http://www.example.com/photo1
Emancipated Majority,Canada,Jones,Alan,http://www.example.com/photo2
A Much Longer Name Than Any Before,A Land from Far Away and In the Imagination Most Beautiful,OneOfTheLongerFamilyNamesYou'llEverSee,ALongishGivenName,http://www.example.com/photo3/elephant/pygmalion/photo3,x31
It produces the output:
name: <<Monster>>
country: <<United States>>
last: <<Smith>>
first: <<John>>
photoURI: <<http://www.example.com/photo1>>
name: <<Emancipated Majority>>
country: <<Canada>>
last: <<Jones>>
first: <<Alan>>
photoURI: <<http://www.example.com/photo2>>
name: <<A Much Longer Name Than Any Before>>
country: <<A Land from Far Away and In the Imagination Most Beautiful>>
last: <<OneOfTheLongerFamilyNamesYou'llEverSee>>
first: <<ALongishGivenName>>
photoURI: <<http://www.example.com/photo3/elephant/pygmalion/photo3,x31>>
The != EOF vs == 5 change does not matter with the sample data, but is arguably more robust in general. The last line of data exploits your change in pattern and contains a comma in the 'last field'.
Since your code does not check that the file was opened correctly, I have to wonder whether that is your problem, though that's more likely to produce a segmentation violation than a bus error.
So, no answer to your problem - but some code for you to try.