I have some software that I have working on a redhat system with icc and it is working fine. When I ported the code to an IRIX system running with MIPS then I get some calculations that come out as "nan" when there should definitely be values there.
I don't have any good debuggers on the non-redhat system, but I have tracked down that some of my arrays are getting "nan" sporadically in them and that is causing my dot product calculation to come back as "nan."
Seeing as how I can't track it down with a debugger, I am thinking that the problem may be with a memcpy. Are there any issues with the MIPS compiler memcpy() function with dynamically allocated arrays? I am basically using
memcpy(to, from, n*sizeof(double));
And I can't really prove it, but I think this may be the issue. Is there some workaround? Perhaps sme data is misaligned? How do I fix that?
I'd be surprised if your problem came from a bug in memcpy. It may be an alignment issue: are your doubles sufficiently aligned? (They will be if you only store them in double or double[] objects or through double* pointers but might not be if you move them around via void* pointers). X86 platforms are more tolerant to misalignment than most.
Did you try compiling your code with gcc at a high warning level? (Gcc is available just about everywhere that's not a microcontroller or mainframe. It may produce slower code but better diagnostics than the “native” compiler.)
Of course, it could always be a buffer overflow or other memory management problem in some unrelated part of the code that just happened not to cause any visible bug on your original platform.
If you can't get a access to a good debugger, try at least printf'ing stuff in key places.
Is it possible for the memory regions to and from to overlap? memcpy isn't required to handle overlapping memory regions. If this is your problem then the solution is as simple as using memmove instead.
Is sizeof() definitely supported?
Related
Consider the two following lines of C :
int a[1] = {0};
a[1] = 0;
The second line makes a write access somewhere in the memory where it should not. Sometimes such programs will give a segfault during the execution, and sometimes not, depending on the environment I suppose, and maybe other things.
I wonder if there is a way to force, as much as possible, such programs to segfault (by compiling them in a special way for instance, or execute them in some virtual machine, I don't know).
This is for pedagogic purpose.
According to the C language standard these kinds of accesses are undefined behaviour and the compiler and runtime are not obliged to make them segfault (though they obviously do sometimes).
For pedagogical purposes you can have a look at the address sanitizers in popular compilers like GCC (-fsanitize=address in https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) and Clang (https://clang.llvm.org/docs/AddressSanitizer.html).
In simple terms these options cause the compiler to instrument memory accesses with extra logic to catch out-of-bounds memory accesses and produce a user-visible error (though not exacly a segfault message), allowing users to spot such errors and fix them.
This might be what you are looking for.
Valgrind on Linux, stack guards for most compilers, debug options for you selected runtime (e.g. Application Verifier on Windows), there are plenty of options.
In your example the overflow is on the stack, which will always require the compiler to emit the appropriate guards. For dynamic memory allocations it's either up to the used C/C++ runtime library or a custom wrapper inside your application to catch this.
Tools like valgrind catch the heap based buffer overflow as they happen, as they actually execute the code in a VM.
Compiler assisted options work with canaries which are placed in front and back of the buffer, and which are typically checked again when the buffer is released. Options from the address sanitizer family may also add additional checks to all accesses on fields of a fixed size, but this won't work if raw pointers are involved.
Debug options for the runtime typically only provide a very rough granularity. Often they work by simply placing each allocation in a dedicated page in a non-continous address space. Accessing the gaps in between the pages then is an instant error. However only minor buffer overflows are typically not detected immediately.
Finally there is also static code analysis which all modern compilers support to some extent, which can easily detect at least trivial mistakes like the one in your example.
None of these options is able to catch all possible errors though. The C language gives you plenty of options to achieve undefined behavior which none of these tools can detect.
strlen is a fairly simple function, and it is obviously O(n) to compute. However, I have seen a few approaches that operate on more than one character at a time. See example 5 here or this approach here. The basic way these work is by reinterpret-casting the char const* buffer to a uint32_t const* buffer and then checking four bytes at a time.
Personally, my gut reaction is that this is a segfault-waiting-to-happen, since I might dereference up to three bytes outside valid memory. However, this solution seems to hang around, and it seems curious to me that something so obviously broken has stood the test of time.
I think this comprises UB for two reasons:
Potential dereference outside valid memory
Potential dereference of unaligned pointer
(Note that there is not an aliasing issue; one might think the uint32_t is aliased as an incompatible type, and code after the strlen (such as code that might change the string) could run out of order to the strlen, but it turns out that char is an explicit exception to strict aliasing).
But, how likely is it to fail in practice? At minimum, I think there needs to be 3 bytes padding after the string literal data section, malloc needs to be 4-byte or larger aligned (actually the case on most systems), and malloc needs to allocate 3 extra bytes. There are other criteria related to aliasing. This is all fine for compiler implementations, which create their own environments, but how frequently are these conditions met on modern hardware for user code?
The technique is valid, and you will not avoid it if you call our C library strlen. If that library is, for instance, a recent version of the GNU C library (at least on certain targets), it does the same thing.
The key to make it work is to ensure that the pointer is aligned properly. If the pointer is aligned, the operation will read beyond the end of the string surely enough, but not into an adjacent page. If the null terminating byte is within one word of the end of a page, then that last word will be accessed without touching the subsequent page.
It certainly isn't well-defined behavior in C, and so it carries the burden of careful validation when ported from one compiler to another. It also triggers false positives from out-of-bounds access detectors like Valgrind.
Valgrind had to be patched to work around Glibc doing this. Without the patches, you get nuisance errors such as this:
==13669== Invalid read of size 8
==13669== at 0x411D6D7: __wcslen_sse2 (wcslen-sse2.S:59)
==13669== by 0x806923F: length_str (lib.c:2410)
==13669== by 0x807E61A: string_out_put_string (stream.c:997)
==13669== by 0x8075853: obj_pprint (lib.c:7103)
==13669== by 0x8084318: vformat (stream.c:2033)
==13669== by 0x8081599: format (stream.c:2100)
==13669== by 0x408F4D2: (below main) (libc-start.c:226)
==13669== Address 0x43bcaf8 is 56 bytes inside a block of size 60 alloc'd
==13669== at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==13669== by 0x8063C4F: chk_malloc (lib.c:1763)
==13669== by 0x806CD79: sub_str (lib.c:2653)
==13669== by 0x804A7E2: sysroot_helper (txr.c:233)
==13669== by 0x408F4D2: (below main) (libc-start.c:226)
Glibc is using SSE instructions to do calculate wcslen eight bytes at a time (instead of four, the width of wchar_t). In doing so, it is accessing at offset 56 in a block that is 60 bytes wide. However, note that this access could never straddle a page boundary: the address is divisible by 8.
If you're working in assembly language, you don't have to think twice about the technique.
In fact, the technique is used quite bit in some optimized audio codecs that I work with (targetting ARM), which feature a lot of hand-written assembly language in the Neon instruction set.
I noticed it when running Valgrind on code which integrated these codecs, and contacted the vendor. They explained that it was just a harmless loop optimization technique; I went through the assembly language and convinced myself they were right.
(1) can definitely happen. There's nothing preventing you from taking the strlen of a string near the end of an allocated page, which could result in an access past the end of allocated memory and a nice big crash. As you note, this could be mitigated by padding all your allocations, but then you have to have any libraries do the same. Worse, you have to arrange for the linker and OS to always add this padding (remember the OS passes argv[] in a static memory buffer somewhere). The overhead of doing this isn't worth it.
(2) also definitely happens. Earlier versions of ARM processors generate data aborts on unaligned accesses, which either cause your program to die with a bus error (or halt the CPU if you're running bare-metal), or force a very expensive trap through the kernel to handle the unaligned access. These earlier ARM chips are still in wide use in older cellphones and embedded devices. Later ARM processors synthesize multiple word accesses to deal with unaligned accesses, but this will result in overall slower performance since you basically double the number of memory loads you need to do.
Many current ("modern") PICs and embedded microprocessors lack the logic to handle unaligned accesses, and may behave unpredictably or even nonsensically when given unaligned addresses (I've personally seen chips that will just mask off the bottom bits, which would give incorrect answers, and others that will just give garbage results with unaligned accesses).
So, this is ridiculously dangerous to use in anything that should be remotely portable. Please, please, please do not use this code; use the libc strlen. It will usually be faster (optimized for your platform properly) and will make your code portable. The last thing you want is for your code to subtly and unexpectedly break in some situation (string near the end of an allocation) or on some new processor.
Donald Knuth, a person who wrote 3+ volumes on clever algorithms said: "Premature optimization is the root of all evil".
strlen() is used a lot, so it really should be fast. Riffing on wildplasser's remark, "I would trust the library function", what makes you think that the library function works byte at a time? Or is slow?
The title may give folks the impression that the code you suggest is faster than the standard system library strlen(), but I think what you mean is that it is faster than a naive strlen() which probably doesn't get used, anyway.
I compiled a simple C program and looked on my 64-bit system which uses GNU's glibc function. The code I saw was pretty sophisticated and looks pretty fast in terms of working with register width rather than byte at a time. The code I saw for strlen() is written in assembly language so there probably aren't junk instructions as you might get if this were compiled C code. What I saw was rtld-strlen.S. This code also unrolls loops to reduce the overhead in looping.
Before you think you can do better on strlen, you should look at that code, or the corresponding code for your particular architecture, and register size.
And if you do write your own strlen, benchmark it against the existing implementation.
And obviously, if you use the system strlen, then it is probably correct and you don't have to worry about invalid memory references as a result of an optimization in the code.
I agree it's a bletcherous technique, but I suspect it's likely to work most of the time. It's only a segfault if the string happens to be right up against the end of your data (or stack) segment. The vast majority of strings (wheher statically or dynamically allocated) won't be.
But you're right, to guarantee it working you'd need some guarantee that all strings were padded somehow, and your list of shims looks about right.
If alignment is a problem, you could take care of that in the fast strlen implementation; you wouldn't have to run around trying to align all strings.
(But of course, if your problem is that you're spending too much time scanning strings, the right fix is not to desperately try to make string scanning faster, but to rig things up so that you don't have to scan so many strings in the first place...)
I'm working on an embedded program. I use the avr-gcc tool chain to compile the C source from my MacBook Pro. Until recently things have been going pretty well. In my latest development iteration though, I seem to have introduced some sort of intermittent bug that I'm suspecting is some sort of stack or other memory corruption error.
I've never used Valgrind, but it seems it gets rave reviews, but most of the references seem to refer to malloc/free types of errors. I don't do any malloc'ing. It's a smallish embedded program, no OS. Can Valgrind help me? Any pointers on how I would use it to help find static memory mismanagement errors in a cross-compiled scenario would be really helpful!
Or is there a different tool or technique I should look at to validate my code's memory management?
Yes, valgrind can definitely help you. In addition to a lot of heap-based analysis (illegal frees, memory leaks, etc.) its memcheck tool detects illegal reads and writes, i.e. situations when your program accesses memory that it should not access. This analysis does not differentiate between static and dynamic memory: it would report accesses outside of a stack frame, accesses beyond bounds of a static array, and so on. It also detects access to variables that have not been onitialized previously. Both situations are undefined behavior, and can lead to crash.
Frama-C is a static analysis framework (as opposed to Valgrind which provides dynamic analysis). It was originally designed with embedded, possibly low-level code in mind. Frama-C's “value analysis“ plug-in basically detects all C undefined behaviors that you may want to know about in embedded code (including accessing an invalid pointer).
Since it is a static analyzer, it does not execute the code(*) and is thus ideal in a cross-compiled context. Look for option -machdep. Values for this option include x86_64 x86_32 ppc_32 x86_16.
Disclaimer: I am one of the contributors of Frama-C's “value analysis” plug-in.
(*) though if you provide all inputs and set precision on maximum, it can interpret the source code as precisely as any cross-compilation+execution would.
I have a C code that runs without any memory leak in 64 bit CPU processor but shows leak in 32 bit processor. What can be the reason for it. GCC 4.1.2 is the compiler and Debian is the operating system.
That sounds weird. But it's too vague to answer, for me. Since you're on Linux though, I would recommend that you simply run the 32-bit version under Valgrind, with maximum memory tracking.
Keep in mind that even though your own code is the same, you are still running two distinct programs -- there are different runtime libraries they are linked against. It could be that some aspect of your code is triggering a leak in one runtime and not another. If that is the case, it can occur in one of two ways:
a) You've done nothing badly. The problem is in the 32-bit runtime.
or
b) You have something wrong, but something defensive in the 64-bit runtime is masking it.
Really hard to tell without the code. Things that can go wrong
implicit conversions. in most places narrow data types are converted to signed or unsigned. If you have implicit assumptions on the width of of these, you may have all kind of things: overflow, undefined behavior, compiler specific behavior
missing function prototypes. oldish C assumes that a function that it doesn't know return an int. If in reality it returns a pointer (e.g) you are in trouble.
pointer to int conversions (or the other way round)
Your compiler is quite old. You probably should try do obtain something more recent, if possible, or try something different like clang.
Compile with all warnings on -Wall -Wextra... and work on your code until it doesn't have any warning at all.
If your problem persists run it with valgrind as #unwind suggested.
Then with a concrete problem some back here, so we might help you.
What are the points that should be kept in mind while writing code that should be portable on both 32 bit and 64 bit machines?
Thinking more on this, I feel if you can add your experience interms of issues faced, that would help.
Adding further on this, I once faced a problem due to a missing prototype for a function which was returning returning a pointer. When I ported the same to a 64 bit machine, the code was crashing and I had no clue about the reason for quite some time, later realised that all missing prototypes are assumed to return int causing the problem.
Any such examples can help.
EDIT: Adding to community wiki.
Some integral types may have different sizes
Pointers are of different lengths
Structure padding
Alignment
On Windows, there is only calling convention on x64 as opposed to the multiple on a regular x32 machine.
Things get murkier when you have some components which are 32-bit and some 64-bit. On Windows, I ended up writing a COM service to get them to talk.
Pushing pointers onto the stack takes up twice as much space. Stack size may not change between OS Versions though, causing code that runs fine in 32-bit to mysteriously fail when compiled and run unchanged on 64-bit. Don't ask me how I know this.
sizeof(int) might != sizeof(void*)
alignment. Its possible that alignments needs might change. This can expose bugs where you have been mistreating things that should have been aligned but were only aligned by accident in 32 bit (or on a processor that doesnt care)
dont pass 0 into varargs if the receiver is expecting a pointer. This is painful in C++ where savvy devs know that 0 is a valid null pointer. A C dev will usually use NULL so you are probably OK
Gotchas:
Casting pointers to integer types is dangerous
Data structure sizes can change
Watch out for sign extension
Different ABI?
Some tips & tricks I've found helpful:
Get yourself a native-size integer type (from a header or typedef your own) and use it when you have variables that don't care about size.
Use explicit variable types wherever possible (u_int64_t, int_32_t, etc.)
Write automated tests and run them regularly on both platforms.