String parsing failes when using a function doing substring in C - c

I am having an issue with parsing a string in C. It causes a HardFault eventually.
MCU: LPC1769,
OS: FreeRTOS 10,
Toolchain: IAR
In order to test, If I keep sending the same data frame (you may see the sample below in message variable in parseMessage function),
after 5-6 times parsing it goes OK, parsing works as I expected, and then suddenly falls in HardFault when I send one more the exact same string to the function.
I tested the function in OnlineGDB. I haven't observed any issue.
I have couple of slightly different version of that function below although the result is the same;
char *substr3(char const *input, size_t start, size_t len) {
char *ret = malloc(len+1);
memcpy(ret, input+start, len);
ret[len] = '\0';
return ret;
}
I've extracted the function piece for a better overveiw:
(don't pay attention to stripEOL(message); call, it just strips out end-of-line characters, but you can see it in the gdbonline share of mine)
void parseMessage(char * message){
//char* message= "7E00002A347C31323030302D3132353330387C33302E30372E323032307C31317C33307C33317C31352D31367C31357C317C57656E67657274880D";
// Parsing the frame
char* start;
char* len;
char* cmd;
char* data;
char* chksum;
char* end;
stripEOL(message);
unsigned int messagelen = strlen(message);
start = substr3(message, 0, 2);
len = substr3(message, 2, 4);
cmd = substr3(message, 6, 2);
data = substr3(message, 8, messagelen-8-4);
chksum = substr3(message, messagelen-4, 2);
end = substr3(message, messagelen-2, 2);
}
Only the data variable differs in length.
e.g. data --> "347C31323030302D3132353330387C33302E30372E323032307C31317C33307C33317C31352D31367C31357C317C57656E67657274"
A HardFault debug log:
LR = 0x8667 in disassembly
PC = 0x2dd0 in disassembly

I appreciate to the contributors which they led me to find the solution for my instance.
Since there wasn't a total solution by the contributors and I found a working solution, I'd better be writing for whom may interest in future.
Since I am developing my application on top of FreeRTOS 10 and using malloc from the C library, apparently it wasn't cooping at least with my implementations. It's been said in some resources, you can use standard malloc within FreeRTOS, I couldn't manage myself for some unknown reason. It might have been a help, if I had increased the heap memory, I don't know but I didn't have intention on that as well.
I've just placed that two wrapper functions (somewhere in a common file) without even changing my malloc and free calls.;
Creating a malloc/free functions that work with the built-in FreeRTOS heap is quite simple. We just wrap the pvPortMalloc/pvPortFree calls:
void* malloc(size_t size)
{
void* ptr = NULL;
if(size > 0)
{
// We simply wrap the FreeRTOS call into a standard form
ptr = pvPortMalloc(size);
} // else NULL if there was an error
return ptr;
}
void free(void* ptr)
{
if(ptr)
{
// We simply wrap the FreeRTOS call into a standard form
vPortFree(ptr);
}
}
Note that: You can't use that with heap schema #1 but with the others (2, 3, 4 and 5).
I would recommend start using portable/MemMang/heap_4.c

Related

Program with while loop causes stack overflow, but only in x86 and only when injected into another process

I have an unfortunately convoluted problem that I am hopeful someone might be able to help me with.
I have written a reasonably large program that I have converted into position independent code (see here for reference: https://bruteratel.com/research/feature-update/2021/01/30/OBJEXEC/). Basically just meaning that the resulting exe (compiled using mingw) contains data only in the .text section, and thus can be injected into and ran from an arbitrary place in memory. I have successfully ported the program to this format and can compile it for both x86 and x64.
I created two "helper" exe's to run the PIC program, a local injector and a remote injector. The local injector runs the program by calling VirtualAlloc, memcpy, and CreateThread. The remote injector runs the program by calling CreateProcess (suspended), VirtualAllocEx, WriteProcessMemory, QueueAPCThread, and ResumeThread (the last two api's being called on pi.hThread which was returned from CreateProcess).
I am experiencing inconsistent results in the program depending on the architecture and method of execution.
x64 local: works
x64 inject: works
x86 local: works
x86 inject: fails; stack overflow
I have determined that my program is crashing in a while loop in a particular function. This function is used to format data contained in buffers (heap allocated) that are passed in as function args. The raw data buffer (IOBuf) contains a ~325k long string containing Base64 characters with spaces randomly placed throughout. The while loop in question iterates over this buffer and copies non-space characters to a second buffer (IntermedBuf), with the end goal being that IntermedBuf contains the full Base64 string in IOBuf minus the random spaces.
A few notes about the following code snippet:
Because the code is written to be position independent, all api's must be manually resolved which is why you see things like (SPRINTF)(Apis.sprintfFunc). I have resolved the addresses of each API in their respective DLL and have created typedef's for each API that is called. While odd, this is not in itself causing the issue as the code works fine in 3/4 of the situations.
Because this program is failing when injected, I cannot use print statements to debug, so I have added calls to MessageBoxA to pop up at certain places to determine contents of variables and/or if execution is reaching that part of the code.
The relevant code snippet is as follows:
char inter[] = {'I','n','t',' ',0};
char tools[100] = {0};
if (((STRCMP)Apis.strcmpFunc)(IntermedBuf, StringVars->b64Null) != 0)
{
int i = 0, j = 0, strLen = 0, lenIOBuf = ((STRLEN)Apis.strlenFunc)(IOBuf);
((SPRINTF)Apis.sprintfFunc)(tools, StringVars->poi, IOBuf);
((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, tools, NULL, NULL);
((MEMSET)Apis.memsetFunc)(tools, 0, 100 * sizeof(char));
((SPRINTF)Apis.sprintfFunc)(tools, StringVars->poi, IntermedBuf);
((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, tools, NULL, NULL);
char* locSpace;
while (j < lenIOBuf)
{
locSpace = ((STRSTR)Apis.strstrFunc)(IOBuf + j, StringVars->space);
if (locSpace == 0)
locSpace = IOBuf + lenIOBuf;
strLen = locSpace - IOBuf - j;
((MEMCPY)Apis.memcpyFunc)(IntermedBuf + i, IOBuf + j, strLen);
i += strLen, j += strLen + 1;
}
((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, StringVars->here, NULL, NULL);
((MEMSET)Apis.memsetFunc)(IOBuf, 0, BUFFSIZE * sizeof(char));
The first two MessageBoxA calls successfully execute, each containing the address of IOBuf and IntermedBuf respectively. The last call to MessageBoxA, after the while loop, never comes, meaning the program is crashing in the while loop as it copies data from IOBuf to IntermedBuf.
I ran remote.exe which spawned a new WerFault.exe (I have tried with calc, notepad, several other processes with the same result) containing the PIC program, and stuck it into Windbg to try and get a better sense of what was happening. I found that after receiving the first two message boxes and clicking through them, WerFault crashes with a stack overflow caused by a call to strstr:
Examining the contents of the stack at crash time shows this:
Looking at the contents of IntermedBuf (which is one of the arguments passed to the strstr call) I can see that the program IS copying data from IOBuf to IntermedBuf and removing spaces as intended, however the program crashes after copying ~80k.
IOBuf (raw data):
IntermedBuf(After removing spaces)
My preliminary understanding of what is happening here is that strstr (and potentially memcpy) are pushing data to the stack with each call, and given the length of the loop (lengthIOBuf is ~325K, spaces occur randomly every 2-11 characters throught) the stack is overflowing before the while loop finishes and the stack unwinds. However this doesn't explain why this succeeds in x64 in both cases, and in x86 when the PIC program is running in a user-made program as opposed to injected into a legitimate process.
I have ran the x86 PIC program in the local injector, where it succeeds, and also attached Windbg to it in order to examine what is happening differently there. The stack similarly contains the same sort of pattern of characters as seen in the above screenshot, however later in the loop (because again the program succeeds), the stack appears to... jump? I examined the contents of the stack early into the while loop (having set bp on strstr) and see that it contains much the same pattern seen in the stack in the remote injector session:
I also added another MessageBox this time inside the while loop, set to pop when j > lenIOBuf - 500 in order to catch the program as it neared completion of the while loop.
char* locSpace;
while (j < lenIOBuf)
{
if (j > lenIOBuf - 500)
{
((MEMSET)Apis.memsetFunc)(tools, 0, 100 * sizeof(char));
((SPRINTF)Apis.sprintfFunc)(tools, StringVars->poi, IntermedBuf);
((MESSAGEBOXA)Apis.MessageBoxAFunc)(NULL, tools, NULL, NULL);
}
locSpace = ((STRSTR)Apis.strstrFunc)(IOBuf + j, StringVars->space);
if (locSpace == 0)
locSpace = IOBuf + lenIOBuf;
strLen = locSpace - IOBuf - j;
((MEMCPY)Apis.memcpyFunc)(IntermedBuf + i, IOBuf + j, strLen);
i += strLen, j += strLen + 1;
}
When this MessageBox popped, I paused execution and found that ESP was now 649fd80; previously it was around 13beb24?
So it appears that the stack relocated, or the local injector added more memory to the stack or something (I am embarassingly naive about this stuff). Looking at the "original" stack location at this stage in execution shows that the data there previously is still there at this point when the loop is near completion:
So bottom line, this code which runs successfully by all accounts in x64 local/remote and x86 local is crashing when ran in another process in x86. It appears that in the local injector case the stack fills in a similar fashion as in the remote injector where it crashes, however the local injector is relocating the stack or adding more stack space or something which isn't happening in the remote injector. Does anyone have any ideas why, or more importantly, how I could alter the code to achieve the goal of removing spaces from a large, arbitrary buffer in a different way where I might not encounter the overflow that I am currently?
Thanks for any help
typedef void*(WINAPI* MEMCPY)(void * destination, const void * source, size_t num);
typedef char*(WINAPI* STRSTR)(const char *haystack, const char *needle);
is wrong declarations. both this api used __cdecl calling convention - this mean that caller must up stack ( add esp,4*param_count) after call. but because you declare it as __stdcall (== WINAPI) compiler not generate add esp,4*param_count instruction. so you have unbalanced push for parameters.
you need use
typedef void * (__cdecl * MEMCPY)(void * _Dst, const void * _Src, _In_ size_t _MaxCount);
typedef char* (__cdecl* STRSTR)(_In_z_ char* const _String, _In_z_ char const* const _SubString);
and so on..
Familiar with what you are doing, and frankly I moved onto compiling some required functions (memcpy, etc) instead of manually looking them up and making external calls.
For example:
inline void* _memcpy(void* dest, const void* src, size_t count)
{
char *char_dest = (char *)dest;
char *char_src = (char *)src;
if ((char_dest <= char_src) || (char_dest >= (char_src+count)))
{
/* non-overlapping buffers */
while(count > 0)
{
*char_dest = *char_src;
char_dest++;
char_src++;
count--;
}
}
else
{
/* overlaping buffers */
char_dest = (char *)dest + count - 1;
char_src = (char *)src + count - 1;
while(count > 0)
{
*char_dest = *char_src;
char_dest--;
char_src--;
count--;
}
}
return dest;
}
inline char * _strstr(const char *s, const char *find)
{
char c, sc;
size_t len;
if ((c = *find++) != 0)
{
len = strlen(find);
do {
do {
if ((sc = *s++) == 0)
return 0;
} while (sc != c);
} while (strncmp(s, find, len) != 0);
s--;
}
return (char *)((size_t)s);
}
Credits for the above code from ReactOS. You can lookup the rest required (strlen, etc.)

How to create a process that runs a routine with variable number of parameters?

I know there are lots of questions here about functions that take a variable number of arguments. I also know there's lots of docs about stdarg.h and its macros. And I also know how printf-like functions take a variable number of arguments. I already tried each of those alternatives and they didn't help me. So, please, keep that in mind before marking this question as duplicate.
I'm working on the process management features of a little embedded operating system and I'm stuck on the design of a function that can create processes that run a function with a variable number of parameters. Here's a simplified version of how I want my API to looks like:
// create a new process
// * function is a pointer to the routine the process will run
// * nargs is the number of arguments the routine takes
void create(void* function, uint8_t nargs, ...);
void f1();
void f2(int i);
void f3(float f, int i, const char* str);
int main()
{
create(f1, 0);
create(f2, 1, 9);
create(f3, 3, 3.14f, 9, "string");
return 0;
}
And here is a pseudocode for the relevant part of the implementation of system call create:
void create(void* function, uint8_t nargs, ...)
{
process_stack = create_stack();
first_arg = &nargs + 1;
copy_args_list_to_process_stack(process_stack, first_arg);
}
Of course I'll need to know the calling convention in order to be able to copy from create's activation record to the new process stack, but that's not the problem. The problem is how many bytes do I need to copy. Even though I know how many arguments I need to copy, I don't know how much space each of those arguments occupy. So I don't know when to stop copying.
The Xinu Operating System does something very similar to what I want to do, but I tried hard to understand the code and didn't succeed. I'll transcript a very simplified version of the Xinu's create function here. Maybe someone understand and help me.
pid32 create(void* procaddr, uint32 ssize, pri16 priority, char *name, int32 nargs, ...)
{
int32 i;
uint32 *a; /* points to list of args */
uint32 *saddr; /* stack address */
saddr = (uint32 *)getstk(ssize); // return a pointer to the new process's stack
*saddr = STACKMAGIC; // STACKMAGIC is just a marker to detect stack overflow
// this is the cryptic part
/* push arguments */
a = (uint32 *)(&nargs + 1); /* start of args */
a += nargs -1; /* last argument */
for ( ; nargs > 4 ; nargs--) /* machine dependent; copy args */
*--saddr = *a--; /* onto created process's stack */
*--saddr = (long)procaddr;
for(i = 11; i >= 4; i--)
*--saddr = 0;
for(i = 4; i > 0; i--) {
if(i <= nargs)
*--saddr = *a--;
else
*--saddr = 0;
}
}
I got stuck on this line: a += nargs -1;. This should move the pointer a 4*(nargs - 1) ahead in memory, right? What if an argument's size is not 4 bytes? But that is just the first question. I also didn't understand the next lines of the code.
If you are writing an operating system, you also define the calling convention(s) right? Settle for argument sizes of sizeof(void*) and pad as necessary.

How do I use memcpy_toio/fromio?

I am working on a kernel module in C to talk to a PCIe card and I have allocated some io memory using pci_iomap, and I write/read there using ioread/write32.
This works but the performance is quite poor, and I read I could use block transfer through memcpy_toio/fromio instead of just doing 32b at a time.
To write, I am using iowrite32(buffer[i], privdata->registers + i);
To read, I do buffer[i] = ioread32(&privdata->registers[i]);
I tried to replace the for loops these are in with:
memcpy_toio(privdata->registers, buffer, 2048);
memcpy_fromio(buffer, privdata->registers, 2048);
If I only replace the write loop with memcpy_toio and I do the reading using ioread32, the program doesn't crash but the instruction doesn't seem to be doing anything (registers don't change);
Also, when I replace the read loop as well with the memcpy_fromio instruction, it crashes.
I was thinking it might be because the reads try to access the mem location while it is still being written to. Is there a way to flush the writes queue after either iowrite32 or memcpy_toio?
What am I doing wrong here?
memcpy_from/toio() can be used only if the I/O memory behaves like memory, i.e., if values can be read speculatively, and be written multiple times or out of order.
An I/O range marked as non-prefetchable does not support this.
I don't know if my suggestion is valid, the single ioread32 function is more efficient than memcpy, the read function only needs to read or write the PCI device once, while memcpy needs multiple times. The kernel provides functions such as ioread32_rep to replace the cumbersome for loop (essentially the same).If you need to pursue efficiency, you can try to use ioread32_rep, and you can try to use memcpy for reading and writing of variable length.
Which buffer type do you use?
Look at the implementation of memcpy_fromio() , memcpy_toio()
static inline void
memcpy_fromio(void *dst, volatile void __iomem *src, int count)
{
memcpy(dst, (void __force *) src, count);
}
static inline void
memcpy_toio(volatile void __iomem *dst, const void *src, int count)
{
memcpy((void __force *) dst, src, count);
}
Yo can see simple memcpy call.
And look at the iowrite32() and ioread32() implementations:
static inline void iowrite32(u32 val, void __iomem *p)
{
if (__is_PCI_addr(p))
val = _swapl(val);
__builtin_write32(p, val);
if (__is_PCI_MEM(p))
__flush_PCI_writes();
}
static inline unsigned int ioread32(void __iomem *p)
{
uint32_t ret = __builtin_read32(p);
if (__is_PCI_addr(p))
ret = _swapl(ret);
return ret;
}
As you can see memcpy_fromio() , memcpy_toio() are not suitable for working with PCIe devices.

Run-time mocking in C?

This has been pending for a long time in my list now. In brief - I need to run mocked_dummy() in the place of dummy() ON RUN-TIME, without modifying factorial(). I do not care on the entry point of the software. I can add up any number of additional functions (but cannot modify code within /*---- do not modify ----*/).
Why do I need this?
To do unit tests of some legacy C modules. I know there are a lot of tools available around, but if run-time mocking is possible I can change my UT approach (add reusable components) make my life easier :).
Platform / Environment?
Linux, ARM, gcc.
Approach that I'm trying with?
I know GDB uses trap/illegal instructions for adding up breakpoints (gdb internals).
Make the code self modifiable.
Replace dummy() code segment with illegal instruction, and return as immediate next instruction.
Control transfers to trap handler.
Trap handler is a reusable function that reads from a unix domain socket.
Address of mocked_dummy() function is passed (read from map file).
Mock function executes.
There are problems going ahead from here. I also found the approach is tedious and requires good amount of coding, some in assembly too.
I also found, under gcc each function call can be hooked / instrumented, but again not very useful since the the function is intended to be mocked will anyway get executed.
Is there any other approach that I could use?
#include <stdio.h>
#include <stdlib.h>
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
int (*fp)(int) = atoi(argv[1]);
printf("fp = %x\n",fp);
printf("factorial of 5 is = %d\n",fp(5));
printf("factorial of 5 is = %d\n",factorial(5));
return 1;
}
test-dept is a relatively recent C unit testing framework that allows you to do runtime stubbing of functions. I found it very easy to use - here's an example from their docs:
void test_stringify_cannot_malloc_returns_sane_result() {
replace_function(&malloc, &always_failing_malloc);
char *h = stringify('h');
assert_string_equals("cannot_stringify", h);
}
Although the downloads section is a little out of date, it seems fairly actively developed - the author fixed an issue I had very promptly. You can get the latest version (which I've been using without issues) with:
svn checkout http://test-dept.googlecode.com/svn/trunk/ test-dept-read-only
the version there was last updated in Oct 2011.
However, since the stubbing is achieved using assembler, it may need some effort to get it to support ARM.
This is a question I've been trying to answer myself. I also have the requirement that I want the mocking method/tools to be done in the same language as my application. Unfortunately this cannot be done in C in a portable way, so I've resorted to what you might call a trampoline or detour. This falls under the "Make the code self modifiable." approach you mentioned above. This is were we change the actually bytes of a function at runtime to jump to our mock function.
#include <stdio.h>
#include <stdlib.h>
// Additional headers
#include <stdint.h> // for uint32_t
#include <sys/mman.h> // for mprotect
#include <errno.h> // for errno
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
typedef void (*dummy_fun)(void);
void set_run_mock()
{
dummy_fun run_ptr, mock_ptr;
uint32_t off;
unsigned char * ptr, * pg;
run_ptr = dummy;
mock_ptr = mocked_dummy;
if (run_ptr > mock_ptr) {
off = run_ptr - mock_ptr;
off = -off - 5;
}
else {
off = mock_ptr - run_ptr - 5;
}
ptr = (unsigned char *)run_ptr;
pg = (unsigned char *)(ptr - ((size_t)ptr % 4096));
if (mprotect(pg, 5, PROT_READ | PROT_WRITE | PROT_EXEC)) {
perror("Couldn't mprotect");
exit(errno);
}
ptr[0] = 0xE9; //x86 JMP rel32
ptr[1] = off & 0x000000FF;
ptr[2] = (off & 0x0000FF00) >> 8;
ptr[3] = (off & 0x00FF0000) >> 16;
ptr[4] = (off & 0xFF000000) >> 24;
}
int main(int argc, char * argv[])
{
// Run for realz
factorial(5);
// Set jmp
set_run_mock();
// Run the mock dummy
factorial(5);
return 0;
}
Portability explanation...
mprotect() - This changes the memory page access permissions so that we can actually write to memory that holds the function code. This isn't very portable, and in a WINAPI env, you may need to use VirtualProtect() instead.
The memory parameter for mprotect is aligned to the previous 4k page, this also can change from system to system, 4k is appropriate for vanilla linux kernel.
The method that we use to jmp to the mock function is to actually put down our own opcodes, this is probably the biggest issue with portability because the opcode I've used will only work on a little endian x86 (most desktops). So this would need to be updated for each arch you plan to run on (which could be semi-easy to deal with in CPP macros.)
The function itself has to be at least five bytes. The is usually the case because every function normally has at least 5 bytes in its prologue and epilogue.
Potential Improvements...
The set_mock_run() call could easily be setup to accept parameters for reuse. Also, you could save the five overwritten bytes from the original function to restore later in the code if you desire.
I'm unable to test, but I've read that in ARM... you'd do similar but you can jump to an address (not an offset) with the branch opcode... which for an unconditional branch you'd have the first bytes be 0xEA and the next 3 bytes are the address.
Chenz
An approach that I have used in the past that has worked well is the following.
For each C module, publish an 'interface' that other modules can use. These interfaces are structs that contain function pointers.
struct Module1
{
int (*getTemperature)(void);
int (*setKp)(int Kp);
}
During initialization, each module initializes these function pointers with its implementation functions.
When you write the module tests, you can dynamically changes these function pointers to its mock implementations and after testing, restore the original implementation.
Example:
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummyFn(void)
{
printf("__%s__()\n",__func__);
}
static void (*dummy)(void) = dummyFn;
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
void (*oldDummy) = dummy;
/* with the original dummy function */
printf("factorial of 5 is = %d\n",factorial(5));
/* with the mocked dummy */
oldDummy = dummy; /* save the old dummy */
dummy = mocked_dummy; /* put in the mocked dummy */
printf("factorial of 5 is = %d\n",factorial(5));
dummy = oldDummy; /* restore the old dummy */
return 1;
}
You can replace every function by the use of LD_PRELOAD. You have to create a shared library, which gets loaded by LD_PRELOAD. This is a standard function used to turn programs without support for SOCKS into SOCKS aware programs. Here is a tutorial which explains it.

Simple C implementation to track memory malloc/free?

programming language: C
platform: ARM
Compiler: ADS 1.2
I need to keep track of simple melloc/free calls in my project. I just need to get very basic idea of how much heap memory is required when the program has allocated all its resources. Therefore, I have provided a wrapper for the malloc/free calls. In these wrappers I need to increment a current memory count when malloc is called and decrement it when free is called. The malloc case is straight forward as I have the size to allocate from the caller. I am wondering how to deal with the free case as I need to store the pointer/size mapping somewhere. This being C, I do not have a standard map to implement this easily.
I am trying to avoid linking in any libraries so would prefer *.c/h implementation.
So I am wondering if there already is a simple implementation one may lead me to. If not, this is motivation to go ahead and implement one.
EDIT: Purely for debugging and this code is not shipped with the product.
EDIT: Initial implementation based on answer from Makis. I would appreciate feedback on this.
EDIT: Reworked implementation
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <limits.h>
static size_t gnCurrentMemory = 0;
static size_t gnPeakMemory = 0;
void *MemAlloc (size_t nSize)
{
void *pMem = malloc(sizeof(size_t) + nSize);
if (pMem)
{
size_t *pSize = (size_t *)pMem;
memcpy(pSize, &nSize, sizeof(nSize));
gnCurrentMemory += nSize;
if (gnCurrentMemory > gnPeakMemory)
{
gnPeakMemory = gnCurrentMemory;
}
printf("PMemAlloc (%#X) - Size (%d), Current (%d), Peak (%d)\n",
pSize + 1, nSize, gnCurrentMemory, gnPeakMemory);
return(pSize + 1);
}
return NULL;
}
void MemFree (void *pMem)
{
if(pMem)
{
size_t *pSize = (size_t *)pMem;
// Get the size
--pSize;
assert(gnCurrentMemory >= *pSize);
printf("PMemFree (%#X) - Size (%d), Current (%d), Peak (%d)\n",
pMem, *pSize, gnCurrentMemory, gnPeakMemory);
gnCurrentMemory -= *pSize;
free(pSize);
}
}
#define BUFFERSIZE (1024*1024)
typedef struct
{
bool flag;
int buffer[BUFFERSIZE];
bool bools[BUFFERSIZE];
} sample_buffer;
typedef struct
{
unsigned int whichbuffer;
char ch;
} buffer_info;
int main(void)
{
unsigned int i;
buffer_info *bufferinfo;
sample_buffer *mybuffer;
char *pCh;
printf("Tesint MemAlloc - MemFree\n");
mybuffer = (sample_buffer *) MemAlloc(sizeof(sample_buffer));
if (mybuffer == NULL)
{
printf("ERROR ALLOCATING mybuffer\n");
return EXIT_FAILURE;
}
bufferinfo = (buffer_info *) MemAlloc(sizeof(buffer_info));
if (bufferinfo == NULL)
{
printf("ERROR ALLOCATING bufferinfo\n");
MemFree(mybuffer);
return EXIT_FAILURE;
}
pCh = (char *)MemAlloc(sizeof(char));
printf("finished malloc\n");
// fill allocated memory with integers and read back some values
for(i = 0; i < BUFFERSIZE; ++i)
{
mybuffer->buffer[i] = i;
mybuffer->bools[i] = true;
bufferinfo->whichbuffer = (unsigned int)(i/100);
}
MemFree(bufferinfo);
MemFree(mybuffer);
if(pCh)
{
MemFree(pCh);
}
return EXIT_SUCCESS;
}
You could allocate a few extra bytes in your wrapper and put either an id (if you want to be able to couple malloc() and free()) or just the size there. Just malloc() that much more memory, store the information at the beginning of your memory block and and move the pointer you return that many bytes forward.
This can, btw, also easily be used for fence pointers/finger-prints and such.
Either you can have access to internal tables used by malloc/free (see this question: Where Do malloc() / free() Store Allocated Sizes and Addresses? for some hints), or you have to manage your own tables in your wrappers.
You could always use valgrind instead of rolling your own implementation. If you don't care about the amount of memory you allocate you could use an even simpler implementation: (I did this really quickly so there could be errors and I realize that it is not the most efficient implementation. The pAllocedStorage should be given an initial size and increase by some factor for a resize etc. but you get the idea.)
EDIT: I missed that this was for ARM, to my knowledge valgrind is not available on ARM so that might not be an option.
static size_t indexAllocedStorage = 0;
static size_t *pAllocedStorage = NULL;
static unsigned int free_calls = 0;
static unsigned long long int total_mem_alloced = 0;
void *
my_malloc(size_t size){
size_t *temp;
void *p = malloc(size);
if(p == NULL){
fprintf(stderr,"my_malloc malloc failed, %s", strerror(errno));
exit(EXIT_FAILURE);
}
total_mem_alloced += size;
temp = (size_t *)realloc(pAllocedStorage, (indexAllocedStorage+1) * sizeof(size_t));
if(temp == NULL){
fprintf(stderr,"my_malloc realloc failed, %s", strerror(errno));
exit(EXIT_FAILURE);
}
pAllocedStorage = temp;
pAllocedStorage[indexAllocedStorage++] = (size_t)p;
return p;
}
void
my_free(void *p){
size_t i;
int found = 0;
for(i = 0; i < indexAllocedStorage; i++){
if(pAllocedStorage[i] == (size_t)p){
pAllocedStorage[i] = (size_t)NULL;
found = 1;
break;
}
}
if(!found){
printf("Free Called on unknown\n");
}
free_calls++;
free(p);
}
void
free_check(void) {
size_t i;
printf("checking freed memeory\n");
for(i = 0; i < indexAllocedStorage; i++){
if(pAllocedStorage[i] != (size_t)NULL){
printf( "Memory leak %X\n", (unsigned int)pAllocedStorage[i]);
free((void *)pAllocedStorage[i]);
}
}
free(pAllocedStorage);
pAllocedStorage = NULL;
}
I would use rmalloc. It is a simple library (actually it is only two files) to debug memory usage, but it also has support for statistics. Since you already wrapper functions it should be very easy to use rmalloc for it. Keep in mind that you also need to replace strdup, etc.
Your program may also need to intercept realloc(), calloc(), getcwd() (as it may allocate memory when buffer is NULL in some implementations) and maybe strdup() or a similar function, if it is supported by your compiler
If you are running on x86 you could just run your binary under valgrind and it would gather all this information for you, using the standard implementation of malloc and free. Simple.
I've been trying out some of the same techniques mentioned on this page and wound up here from a google search. I know this question is old, but wanted to add for the record...
1) Does your operating system not provide any tools to see how much heap memory is in use in a running process? I see you're talking about ARM, so this may well be the case. In most full-featured OSes, this is just a matter of using a cmd-line tool to see the heap size.
2) If available in your libc, sbrk(0) on most platforms will tell you the end address of your data segment. If you have it, all you need to do is store that address at the start of your program (say, startBrk=sbrk(0)), then at any time your allocated size is sbrk(0) - startBrk.
3) If shared objects can be used, you're dynamically linking to your libc, and your OS's runtime loader has something like an LD_PRELOAD environment variable, you might find it more useful to build your own shared object that defines the actual libc functions with the same symbols (malloc(), not MemAlloc()), then have the loader load your lib first and "interpose" the libc functions. You can further obtain the addresses of the actual libc functions with dlsym() and the RTLD_NEXT flag so you can do what you are doing above without having to recompile all your code to use your malloc/free wrappers. It is then just a runtime decision when you start your program (or any program that fits the description in the first sentence) where you set an environment variable like LD_PRELOAD=mymemdebug.so and then run it. (google for shared object interposition.. it's a great technique and one used by many debuggers/profilers)

Resources