sprintf corrupting arrays in IAR microcontroller - c

I am currently learning embedded programming, and thus working on an IAR-platform using a TI microcontroller with ARM architecture. Since I am not at all familiar with the technicalities related to this kind of programming, or C programming in general, I would like to ask a basic question:
I have the following simple code snippet:
int i;
for(i = 0; i < NUM_SAMPLES; i++)
{
sinTable[i] = sinf(2*i*dT*PI);
}
for(i = 0; i < NUM_SAMPLES; i++)
{
char out[32];
sprintf(out,"sin: %.7f, %.7f;", i*dT, sinTable[i]);
putString(out);
delay(DELAY_100US);
}
Where sinTable[] is a global variable of size NUM_SAMPLES, putString(*char) is a function which writes to an RS232-port, and delay(float) is a simple delay-function.
My problem is that once the sprintf(...) is called, it corrupts sinTable, giving some very peculiar results when plotting the table on the receiver end of the COM-signal.
I don't expect that I run out of memory, as the MC has 64KB SRAM.
Does anyone have any thoughts?

Ensure that your stack pointer is on a 64-bit boundary when main is reached.
The symptom your are seeing is typical of a stack aligned on an odd 32-bit boundary. Everything seems to work properly until a double is used as a variadac argument. This breaks when the code expects such arguments to be on 8-byte boundaries.

Upon further review:
I suspect that Michael Burr's response regarding stack utilization was on the right track. Selecting a smaller printf library might be sufficient, but if you can increase your stack size, that seems safer. Note that the IAR C/C++ Development Guide includes info on linker stack usage analysis.
Original:
When I upgraded from IAR 6.1 (licensed) to 6.4 (kickstart), I ran into a similar problem - vsnprintf was writing "all over RAM", even though the return value indicated the number of characters written was well within the target bounds. The "solution" was to avoid the printf library that has multibyte support.
Project Options > General > Library Options > printf small w/o multi-byte
might want to also uncheck
Project Options > C/C++ Compiler / Language 2 / enable multibyte
I tried to report this to IAR, but since my support contract is expired ...
Unfortunately, a similar problem is back with IAR 7.3.4, and the multibyte "fix" does NOT seem to be sufficient. Happens with both sprintf() and snprintf(), although the out-of-bounds corruption is not identical between those 2.

You seem pretty confident that your result string is only 31 characters long. The only way to corrupt another variable with your sprintf statement is that your result string is longer than 32 bytes (31 chars and a nul-byte), therefore overwriting other parts of memory. Make your numbers smaller or make your temporary buffer larger.

Thank you to anyone who have suggested a solution to this problem. I eventually ended up writing a conversion method that gives a hex-representation of the string and transmitted that instead, omitting the sprintf(...) completely.
It is very crude, but suits my needs.

Related

Why is it better to use `%s` to print a string using `printf` rather than printing it directly? [duplicate]

I was reading about vulnerabilities in code and came across this Format-String Vulnerability.
Wikipedia says:
Format string bugs most commonly appear when a programmer wishes to
print a string containing user supplied data. The programmer may
mistakenly write printf(buffer) instead of printf("%s", buffer). The
first version interprets buffer as a format string, and parses any
formatting instructions it may contain. The second version simply
prints a string to the screen, as the programmer intended.
I got the problem with printf(buffer) version, but I still didn't get how this vulnerability can be used by attacker to execute harmful code. Can someone please tell me how this vulnerability can be exploited by an example?
You may be able to exploit a format string vulnerability in many ways, directly or indirectly. Let's use the following as an example (assuming no relevant OS protections, which is very rare anyways):
int main(int argc, char **argv)
{
char text[1024];
static int some_value = -72;
strcpy(text, argv[1]); /* ignore the buffer overflow here */
printf("This is how you print correctly:\n");
printf("%s", text);
printf("This is how not to print:\n");
printf(text);
printf("some_value # 0x%08x = %d [0x%08x]", &some_value, some_value, some_value);
return(0);
}
The basis of this vulnerability is the behaviour of functions with variable arguments. A function which implements handling of a variable number of parameters has to read them from the stack, essentially. If we specify a format string that will make printf() expect two integers on the stack, and we provide only one parameter, the second one will have to be something else on the stack. By extension, and if we have control over the format string, we can have the two most fundamental primitives:
Reading from arbitrary memory addresses
[EDIT] IMPORTANT: I'm making some assumptions about the stack frame layout here. You can ignore them if you understand the basic premise behind the vulnerability, and they vary across OS, platform, program and configuration anyways.
It's possible to use the %s format parameter to read data. You can read the data of the original format string in printf(text), hence you can use it to read anything off the stack:
./vulnerable AAAA%08x.%08x.%08x.%08x
This is how you print correctly:
AAAA%08x.%08x.%08x.%08x
This is how not to print:
AAAA.XXXXXXXX.XXXXXXXX.XXXXXXXX.41414141
some_value # 0x08049794 = -72 [0xffffffb8]
Writing to arbitrary memory addresses
You can use the %n format specifier to write to an arbitrary address (almost). Again, let's assume our vulnerable program above, and let's try changing the value of some_value, which is located at 0x08049794, as seen above:
./vulnerable $(printf "\x94\x97\x04\x08")%08x.%08x.%08x.%n
This is how you print correctly:
??%08x.%08x.%08x.%n
This is how not to print:
??XXXXXXXX.XXXXXXXX.XXXXXXXX.
some_value # 0x08049794 = 31 [0x0000001f]
We've overwritten some_value with the number of bytes written before the %n specifier was encountered (man printf). We can use the format string itself, or field width to control this value:
./vulnerable $(printf "\x94\x97\x04\x08")%x%x%x%n
This is how you print correctly:
??%x%x%x%n
This is how not to print:
??XXXXXXXXXXXXXXXXXXXXXXXX
some_value # 0x08049794 = 21 [0x00000015]
There are many possibilities and tricks to try (direct parameter access, large field width making wrap-around possible, building your own primitives), and this just touches the tip of the iceberg. I would suggest reading more articles on fmt string vulnerabilities (Phrack has some mostly excellent ones, although they may be a little advanced) or a book which touches on the subject.
Disclaimer: the examples are taken [although not verbatim] from the book Hacking: The art of exploitation (2nd ed) by Jon Erickson.
It is interesting that no-one has mentioned the n$ notation supported by POSIX. If you can control the format string as the attacker, you can use notations such as:
"%200$p"
to read the 200th item on the stack (if there is one). The intention is that you should list all the n$ numbers from 1 to the maximum, and it provides a way of resequencing how the parameters appear in a format string, which is handy when dealing with I18N (L10N, G11N, M18N*).
However, some (probably most) systems are somewhat lackadaisical about how they validate the n$ values and this can lead to abuse by attackers who can control the format string. Combined with the %n format specifier, this can lead to writing at pointer locations.
* The acronyms I18N, L10N, G11N and M18N are for internationalization, localization, globalization, and multinationalization respectively. The number represents the number of omitted letters.
Ah, the answer is in the article!
Uncontrolled format string is a type of software vulnerability, discovered around 1999, that can be used in security exploits. Previously thought harmless, format string exploits can be used to crash a program or to execute harmful code.
A typical exploit uses a combination of these techniques to force a program to overwrite the address of a library function or the return address on the stack with a pointer to some malicious shellcode. The padding parameters to format specifiers are used to control the number of bytes output and the %x token is used to pop bytes from the stack until the beginning of the format string itself is reached. The start of the format string is crafted to contain the address that the %n format token can then overwrite with the address of the malicious code to execute.
This is because %n causes printf to write data to a variable, which is on the stack. But that means it could write to something arbitrarily. All you need is for someone to use that variable (it's relatively easy if it happens to be a function pointer, whose value you just figured out how to control) and they can make you execute anything arbitrarily.
Take a look at the links in the article; they look interesting.
I would recommend reading this lecture note about format string vulnerability.
It describes in details what happens and how, and has some images that might help you to understand the topic.
AFAIK it's mainly because it can crash your program, which is considered to be a denial-of-service attack. All you need is to give an invalid address (practically anything with a few %s's is guaranteed to work), and it becomes a simple denial-of-service (DoS) attack.
Now, it's theoretically possible for that to trigger anything in the case of an exception/signal/interrupt handler, but figuring out how to do that is beyond me -- you need to figure out how to write arbitrary data to memory as well.
But why does anyone care if the program crashes, you might ask? Doesn't that just inconvenience the user (who deserves it anyway)?
The problem is that some programs are accessed by multiple users, so crashing them has a non-negligible cost. Or sometimes they're critical to the running of the system (or maybe they're in the middle of doing something very critical), in which case this can be damaging to your data. Of course, if you crash Notepad then no one might care, but if you crash CSRSS (which I believe actually had a similar kind of bug -- a double-free bug, specifically) then yeah, the entire system is going down with you.
Update:
See this link for the CSRSS bug I was referring to.
Edit:
Take note that reading arbitrary data can be just as dangerous as executing arbitrary code! If you read a password, a cookie, etc. then it's just as serious as an arbitrary code execution -- and this is trivial if you just have enough time to try enough format strings.

Buffer overflow that overwrites local variables

I'm doing a buffer overflow exercise where the source code is given. The exercise allows you to change the number of argument vectors you feed into the program so you can get around the null problem making it easy.
However the exercise also mentions that it is possible to use just 1 argument vector to compromise this code. I'm curious to see how this can be done. Any ideas on how to approach this would be greatly appreciated.
The problem here is that length needs to be overwritten in order for the overflow to take place and the return address to be compromised. To my knowledge, you can't really use NULLs in the string since they are being passed in via execve arguments. So the length ends up being a very large number as you have to write some non zero number causing the entire stack to go boom, it's the same case with the return address. Am I missing something obvious? Does strlen need to be exploited. I saw some references to arithmetic overflow of signed numbers but I'm not sure if turning the local variables does anything.
The code is posted below and returns to a main function which then ends the program and runs on a little endian system with all stack protection turned off as this is an introductory exercise for infosec:
int TrickyOverflowSeq ( char *in )
{
char to_be_exploited[128];
int c;
int limit;
limit = strlen(in);
if (limit > 144)
limit = 144;
for (c = 0; c <= limit; c++)
to_be_exploited[c] = in[c];
return(0);
}
I don't know where arg comes from, but since your buffer is only 128 bytes, and you cap the max length to 144, you need only pass in a string longer than 128 bytes to cause a buffer overrun when copying in to to_be_exploited. Any malicious code would be in the input buffer from positions 129 to 144.
Whether or not that will properly set up a return to a different location depends on many factors.
However the exercise also mentions that it is possible to use just 1 argument vector to compromise this code. I'm curious to see how this can be done.
...
The problem here is that length needs to be overwritten in order for the overflow to take place and the return address to be compromised.
It seems pretty straightforward to me. That magic number 144 makes sense if sizeof(int) == 8, which it would if you are building for 64-bit.
So assuming a stack layout where to_be_exploited comes before c and limit, you can simply pass in a very long string with junk in the bytes starting at offset 136 (i.e., 128 + sizeof(int)), and then carefully crafted junk in the bytes starting with offset 144. This will overwrite limit starting with that byte, thus disabling the length check. Then the carefully crafted junk overwrites the return address.
You could put almost anything into the 8 bytes starting at offset 136 and have them make a number that is large enough to disable the security check. Just make sure you don't end up with a negative number. For example, the string "HAHAHAHA" would evaluate, as an integer, to 5206522089439316033. This number is larger than 144... actually, it's too large as you want this function to stop copying once your string is copied. So you just need to figure out how long your attack string actually is and put the correct bytes for that length into that position, and the attack will be copied in.
Note that normal string-handling functions in C use a NUL byte as a terminator, and stop copying. This function doesn't do that; it just trusts limit. So you could put any junk you want in the input string to exploit this function. However, if normal C library functions need to copy the input data, you might end up needing to avoid NUL bytes.
Of course nobody should put code this silly into production.
EDIT: I wrote the above in a hurry. Now that I have more time, I re-read your question and I think I better understand what you wanted to have explained.
You are wondering how a string can correctly clobber limit with a correct length without having strlen() chop it off short. This is impossible on a big-endian computer, but perfectly possible on a little-endian computer.
On a little-endian computer, the first byte is the least significant byte. See the Wikipedia entry:
http://en.wikipedia.org/wiki/Endianness
Any number that is not ridiculously large must have zero in its most significant bytes. On a big-endian computer that means the first several bytes will all be zero, will act like a NUL, and will cause strlen() to chop the string before the function can clobber limit. However, on a little-endian computer, the important bytes you want copied will all come before the NUL bytes.
In the early days of the Internet, it was common for big-endian computers (often bought from Sun Microsystems) to run Internet server apps. These days, commodity x86 server hardware is most common, and x86 is little-endian. In practice, anyone deploying such exploitable code as the TrickyOverflowSeq() function will get 0wned.
If you don't think this answer is thorough enough, please post a comment explaining what part you think I need to cover better and I'll update the answer.
I am aware that this is quite an old post, however I stumbled on your question because I found myself in the same situation with exactly the same questions as the ones you ask in your post and in the comments.
A few minutes later, I solved the problem. I don't know how much of it I should "spoil" here, since AFAIK this is a typical problem in many Computer Security courses. I can say however that the solution can indeed be achieved with exactly one argument... and with a couple of environment variables. Additional hint: environment variables are stored after function arguments on the stack (as in in higher addresses than the function arguments).

Any function instead of sprintf() in C? code size is too big after compile

I am working on developing an embedded system (Cortex M3). For sending some data from the device to the serial port (to show on a PC screen), I use some own functions using putchar() method.
When I want to send integer or float, I use sprintf() in order to convert them to string of characters and sending them to the serial port.
Now, them problem is that I am using Keil uVision IDE and it is limited version with max 32 KB.
Whenever I call sprintf() in different functions, I don't know why the size of the code after compile increased too much.
I have surpassed 32 KB now and I wonder I have to change some of my functions and use something else instead of sprintf!
Any clue?
Two potential offerings (neither of which I have used myself - my compiler vendors usually supply a stripped down printf for embedded use):
http://eprintf.sourceforge.net/ - [Sep 2017: unfortunately, seems to have gone away, but source code still here: https://sourceforge.net/projects/eprintf/files/ ]
http://www.sparetimelabs.com/tinyprintf/index.html - 2 files, about 1.4KB code. Option to enable 'longs' (means more code size). Supports leading zeros and field widths. No floating point support.
If you want it to be efficient, the best way is probably to code it yourself, or find some already written code for it on the net. Int to string conversion is however very simple, every programmer can write that in less than 30 minutes. Float to string conversion is a bit more intricate and depends on the floating point format used.
For convenience, here is a simple int-to-string algorithm for use in microcontroller applications:
void get_dec_str (uint8_t* str, size_t len, uint32_t val)
{
uint8_t i;
for(i=1; i<=len; i++)
{
str[len-i] = (uint8_t) ((val % 10UL) + '0');
val/=10;
}
str[i-1] = '\0';
}
You can try itoa() or ftoa() and implement these as your requirement.I mean as they convert those to characters just inside the definition itself use putchar() to print directly.
This is should work , I think.

Convert binary data file generated in windows to linux

I apologize ahead of time for my lack of c knowledge, as I am a native FORTRAN programmer. I was given some c code to debug which ingests a binary file and parses it into an input file containing several hundred records (871, to be exact) for a Fortran program that I'm working with. The problem is that these input binaries, and the associated c code, were created in a Windows environment. The parser reads through the binary until it reaches the end of the file:
SAGE_Lvl0_Packet GetNextPacket()
{
int i;
SAGE_Lvl0_Packet inpkt;
WORD rdbuf[128];
memset(rdbuf,0,sizeof(rdbuf));
fprintf(stdout,"Nbytes: %u\n",Nbytes);//returns 224
if((i = fread(rdbuf,Nbytes,1,Fp)) != 1)
FileEnd = 1;
else
{
if(FileType == 0)
memcpy(&(inpkt.CCSDS),rdbuf,Nbytes);
else
memcpy(&inpkt,rdbuf,Nbytes);
memcpy(&CurrentPacket,&inpkt,sizeof(inpkt));
}
return inpkt;
}
So when the code gets to packet 872, this snippet should return FileEnd = 1. Instead, the parser attempts to read a large amount of data from (near) the end of the file. This, I would think, would cause the program to crash (at least it would in Fortran. Would c just start reading the next portion of memory?) Fortunately, there is a CRC later on in the code that catches that the parser isn't reading correct data and exits gracefully.
I assume the problem originates with the binary buffer size and value in a Windows binary being larger/different than that in Linux. If that is the case, is there an easy way to convert Windows' binaries to Linux either in c or Linux? If I'm wrong in my assumption, then perhaps I need to look over the code some more. BTW, a WORD is an unsigned short int, and a SAGE_Lvl0_Packet is a 3-tiered structure with a total of 106 WORDs.
I think the biggest problem here is that, when fread() indicates end of file, the FileEnd flag gets set, but the function still ends up returning an (invalid) zeroed-out packet. Not a particularly robust design. I assume that the caller should be checking FileEnd before it attempts to use the packet just returned, but since that's not shown, it's quite possible that's a false assumption.
Also, not knowing what the packet looks like, it's impossible to tell whether the various memcpy() calls are correct. The fact that memcpy() is asked to copy 224 bytes into a structure that is supposedly only 212 bytes long is highly problematic.
There are likely other issues, but those are the big ones I see at the moment.

Stack overflow error in C, before any step

When I try to debug my C program, and even before the compiler starts executing any line I get:
"Unhandled exception at 0x00468867 in HistsToFields.exe: 0xC00000FD: Stack overflow."
I have no clue on how to spot the problem since the program hasn't even started executing any line (or at least this is what I can see from the compiler debugging window). How can I tell what is causing the overflow if there isn't yet any line of my program executed?
"The when the debugger breaks it points to a line in chkstk.asm"
I'm using Microsoft Visual Studio 2008 on a win7.
I set the Stack Reserve Size to 300000000
PS: the program used to execute fine before but on another machine.
I have a database (120000 x 60)in csv format, I need to change it to space delimited. The program (which I didn't write myself) defines a structure of the output file:
`struct OutputFileContents {
char Filename[LINE_LEN];
char Title[LINE_LEN];
int NVar;
char VarName[MAX_NVAR][LINE_LEN];
char ZoneTitle[LINE_LEN];
int NI;
int NJ;
int NK;
double Datums[MAX_NVAR];
double Data[MAX_NVAR][MAX_NPOINT];`
This last array "Data[][]" is what contains all the output. hence is the huge size.
This array size "MAX_NPOINT" is set in a header source file in the project, and this header is used by several programs in the projects.
Thank you very much in advance.
Ahmad.
First, IDE != compiler != debugger.
Second, and no matter why the debugger fails debugging the application - a dataset that huge, on the stack, is a serious design error. Fix that design error, and your debugger problem will go away.
As for why the debugger fails... no idea. Too little RAM installed? 32bit vs 64bit platform? Infinite recursion in constructing static variables? Can't really say without looking at things you haven't showed us, like source, specs of environment, etc.
Edit: In case the hint is missed: Global / static data objects are constructed before main() starts executing. An infinite (or just much-too-deep) recursion in those constructors can trigger a stack overflow. (I am assuming C++ instead of C as the error message you gave says "unhandled exception".)
Edit 2: You added that you have a "database" that you need to convert to space-delimited. Without seeing the rest of your code: Trying to do the whole conversion in one go in memory isn't a good idea. Read a record, convert it, write it. Repeat. If you need stuff like "longest record" to determine the output format, iterate over the input once read-only for finding the output sizes, then iterate again doing the actual conversion.

Resources