Static keyword workaround for "out of memory" error? - static

Is there some known VB6 problem or issue for which using a static variable could avoid an out of memory error?
Details:
Got a unique end user problem report which described an error #7 "out of memory" from the VB6 application I support. This is the only case of the problem that I know of; I have not been able to reproduce it locally.
The error report indicated the procedure which failed; in that proc I found the following:
'the below is static because it didn't fit on the stack
Static obReport As clsReport
'this makes it work like it did when it was on the stack
Set obReport = Nothing
Set obReport = New clsReport
Maybe someone in years past had a similar error and came up with this hack as a workaround. We don't have this pattern anywhere else that I have seen.
As far as I can tell the reported "out of memory" error does not occur until later in the code long after these lines here have been executed.
The (ancient) documentation reference for this error message doesn't seem to offer anything corresponding to this. Google/SO searching didn't turn anything up either.
My interpretation is that the author was trying to free up stack space by allocating the obReport variable to the heap by making it Static. I can imagine someone thinking this could "save" on memory somehow.
But this snippet may just be nonsense... if anything the Static keyword only gets the object reference off the stack, not the actual object which I think would be in the heap anyway. I can't see how this could have resolved any problems unless it is a VB6 quirk/bug which just can't be reasoned about normally.
(Or, I'm just wrong - enlighten me!)

Some of the ancient versions of VB, even back into VBDOS, had some quirks like you describe but I don’t think they were related to classes. We used to call the solutions “programming by witchcraft” and they typically were used to trigger garbage collection. The instructions themselves didn’t do anything at all but their presence would cause garbage collection to occur at difference locations in the code. The giveaway was usually “DO NOT MOVE OR REMOVE THE BELOW CODE”.
I do see some things that, when viewed from a larger perspective, might lead to a solution. “unique end user problem” and “This is the only case of the problem that I know of; I have not been able to reproduce it locally” and “the reported ‘out of memory’ error does not occur until later in the code”. And from VBA documentation you linked: “You have too many applications, documents, or source files open. Close any unnecessary applications, documents, or source files that are open.” The only “out of memory” or “out of resources” I have seen like this occurred on machines where the end user had many, many programs and/or browsers or browser windows open when running with a very small amount of RAM.
So examine the end user’s run time environment using Remote Desktop or some such while the error is displayed. It may be a valid error message.
From VBA documentation you linked: “You have a module or procedure that's too large. Break large modules or procedures into smaller ones. This doesn't save memory, but it can prevent hitting 64K segment boundaries.” Have you tried this? It would be the next step.
You may just have a memory leak. Check for this using Task Manager to see if you have multiple instances of obReport created and not being terminated properly. "The procedure's job is to gather data then show some reports." How much data and how many reports? Could it be a problem trying to handle a huge amount of data either by design or programming error that is causing the out of memory error?

Related

Global variables in driver affecting one another

I am experiencing a weird problem regarding global variables in a WFP driver I'm writing. After writing a piece of code I started to get 0x18 bugchecks in the entirely different part of my code, not related to the written one - moreover it occured even before the new piece of code got to be executed. After a short analysis I noticed that it was caused by declaration of a global variable, which with new piece commented out was ommited by the compiler. As soon as the variable was used, it got placed in the memory and bugchecks started to occur. My first thought was, that maybe I should try changing the order of how the variables are declared, and thus, how there are placed in the memory (I think). As it turned out, after a couple of tries I was able to run the code correctly. An example of a code, that is NOT working:
PKEVENT gPacketEvent;
HANDLE gFilteringEngineHandle;
HANDLE gInjectionEngineHandle;
PVOID gUASharedMem;
PVOID gSharedMem;
UINT64 gActiveFilter;
UINT32 gCalloutID;
PMDL gMdl;
I am guessing that one or more of the types (I guess PMDL and PKEVENT?) I used above can cause such problems, but unfortunately I have never seen such a behaviour, and I haven't been able to find anything useful on google. I was able to develop this code for a while, changing the order back and forth to find a working one, but it would be great to get to the bottom of this... Any ideas, why does this happen?

How to debug the memory is changed randomly issue

My application is a multi-thread program that runs on Solaris.
Recently, I found it may crash, and the reason is one member in a pointer array is changed from a valid value to NULL,so when accessing it, it crashed.
Because the occurrence ratio is very low, in the past 2 months, it only occurred twice, and the changed members in the array aren't the same. I can't find the repeated steps, and after reviewing code, there is no valuable clue gotten.
Could anyone give some advice on how to debug the memory is changed randomly issue?
Since you aren't able to reproduce the crash, debugging it isn't going to be easy.
However, there are some things you can do:
Go through the code and make a list of all of the places in the code that write to that variable--particularly the ones that could write a NULL to it. It's likely that one of them is your culprit.
Try to develop some kind of torture test that makes the fault more likely to occur (eg running through simulated or random transactions at top speed). If you can reproduce the crash this way you'll be in a much better situation, as you can then analyze the actual cause of the crash instead of just speculating.
If possible, run the program under valgrind or purify or similar. If they give any warnings, track down what is causing those warnings and fix it; it's possible that your program is eg accessing memory that has been freed, which might seem to work most of the time (if the free memory hasn't been reused for anything when it is accessed) but would fail occasionally (when something is reusing it)
Add a memory checker like Electric Fence to your code, or just replace free() with a custom version that overwrites the free memory with random garbage in the hopes that this will make the crash more likely to occur.
Recompile your program using different compilers (especially new/fancy ones like clang++ with the static analyzer enabled) and fix whatever they warn about. This may point you to your problem.
Run the program under different hardware and OS's; sometimes an obscure problem under one OS gives really obvious symptoms on another.
Review the various machines where the crash is known to have occurred. Do they all have anything in common? What about the machines where it hasn't crashed? Is there something different about them?
Step 2 is really the most important one, because even if you think you have fixed the problem, you won't be able to prove it unless you can reproduce the crash in the old code, and cannot reproduce it with the fixed code. Without being able to reproduce the fault, you're just guessing about whether a particular code change actually helps or not.

Methods/Tools for solving a Mystery Segfault while running on condor

I'm writing a C application which is run across a compute cluster (using condor). I've tried many methods to reveal the offending code but to no avail.
Clues:
On Average when I run the code on 15 machines for 2 days, I get two or three segfaults (signal 11).
When I run the code locally I do not get a segfault. I ran it for nearly 3 weeks on my home machine.
Attempts:
I ran the code in valGrind for four days locally with no memory errors.
I captured the segfault signal by defining my own signal handler so that I can output some of the program state.
Now when a segfault happens I can print out the current stack using backtrace.
I can print out variable values.
I created a variable which is set to the current line number.
Have also tried commenting chunks of the code out, hoping that if the problem goes away I will discover the segfault.
Sadly the line number outputted is fairly random. I'm not entirely sure what I can do with the stacktrace. Am I correct in assuming that it only records the address of the function in which the segfault occurs?
Suspicions:
I suspect that the check pointing system which condor uses to move jobs across machines is more sensitive to memory corruption and this is why I don't see it locally.
That indices are being corrupted by the bug, and that these indices are causing the segfault. This would explain the fact that the segfaults are occurring on fairly random line numbers.
UPDATE
Researching this some more I've found the following links:
LibSegFault - a library for automatically catching and printing state data about segfaults.
Stack unwinding (stack trace) with GCC tutorial on catching segfaults and get the line numbers of the offending instructions.
UPDATE 2
Greg suggested looking at the condor log and to 'correlate the segfaults to when condor restarts the executable from a checkpoint'. Looking at the logs the segfaults all occur immediately after a restart. All of the failures appear to occur when a job switches from one type of machine to another type.
UPDATE 3
The segfault was being caused by differences between hosts, by setting the 'requiremets' field in the condor submit file to problem completely disappeared.
One can set individual machines:
requirements = machine == "hostname1" || machine == "hostname2"
or an entire class of machines:
requirements = classOfMachinesName
See requirements example here
if you can, compile with debugging, and run under gdb.
alternatively, get core dumped and load that into debugger.
mpich has built-in debugger, or you can buy commercial parallel debugger.
Then you can step through the code to see what happening in debugger
http://nmi.cs.wisc.edu/node/1610
http://nmi.cs.wisc.edu/node/1611
Can you create a core dump when your segfault happens? You can then debug this dump to try to figure out the state of the code when it crashed.
Look at what instruction caused the fault. Was it even a valid instruction or are you trying to execute data? If valid, what memory is it trying to access? Where did this pointer come from. You need to narrow down the location of your fault (stack corruption, heap corruption, uninitialized pointer, accessing invalid memory). If it's a corruption, see if if there's any tell-tale data in the corrupted area (pointers to symbols, data that looks like something in your structures, ...). Your memory allocator may already have built in features to debug some corruption (see MALLOC_CHECK_ on Linux or MallocGuardEdges on Mac OS). A common case for these is using memory that has been free()'d, so logging your malloc() / free() pairs might help.
If you have used the condor_compile tool to relink your code with the condor checkpointing code, it does a few things differently than a normal link. Most importantly, it statically links your code, and uses it's own malloc. Another big difference is that condor will then run it on a foreign machine, where the environment may be different enough from what you expect to cause problems.
The executable generated by condor_compile is runnable as a standalone binary outside of the condor system. If you run the binary emitted from condor_compile locally, outside of condor, do you still see the segfaults?
If it doesn't, can you correlate the segfaults to when condor restarts the executable from a checkpoint (the user log will tell you when this happens).
You've tried most of what I'd think of. The only other thing I'd suggest is start adding a lot of logging code and hope you can narrow down where the error is happening.
The one thing you do not say is how much flexibility you have to solve the problem.
Can you, for example, have the system come to a halt and just run your application?
Also how important are these crashes to solve?
I am assuming that for the most part you do. This may require a lot of resources.
The short term step is to put tons of "asserts" ( semi handwritten ) of each variable
to make sure it hasn't changed when you don't want it to. This can ccontinue to work as you go through the long term process.
Long term-- try running it on a cluster of two ( maybe your home computer and a VM ).
Do you still see the segfaults. If not increase the cluster size until you start seeing segfaults.
Run it on a minimum configuration ( to get segfaults ) and record all your inputs till a crash. Automate running the system with the inputs that you recorded, tweaking them until you can consistent get a crash with minimal input.
At that point look around. If you still can't find the bug, then you will have to ask again with some extra data you gathered with those runs.

Need help with buffer overrun

I've got a buffer overrun I absolutely can't see to figure out (in C). First of all, it only happens maybe 10% of the time or so. The data that it is pulling from the DB each time doesn't seem to be all that much different between executions... at least not different enough for me to find any discernible pattern as to when it happens. The exact message from Visual Studio is this:
A buffer overrun has occurred in
hub.exe which has corrupted the
program's internal state. Press
Break to debug the program or Continue
to terminate the program.
For more details please see Help topic
'How to debug Buffer Overrun Issues'.
If I debug, I find that it is broken in __report_gsfailure() which I'm pretty sure is from the /GS flag on the compiler and also signifies that this is an overrun on the stack rather than the heap. I can also see the function it threw this on as it was leaving, but I can't see anything in there that would cause this behavior, the function has also existed for a long time (10+ years, albeit with some minor modifications) and as far as I know, this has never happened.
I'd post the code of the function, but it's decently long and references a lot of proprietary functions/variables/etc.
I'm basically just looking for either some idea of what I should be looking for that I haven't or perhaps some tools that may help. Unfortunately, nearly every tool I've found only helps with debugging overruns on the heap, and unless I'm mistaken, this is on the stack. Thanks in advance.
You could try putting some local variables on either end of the buffer, or even sentinels into the (slightly expanded) buffer itself, and trigger a breakpoint if those values aren't what you think they should be. Obviously, using a pattern that is not likely in the data would be a good idea.
While it won't help you in Windows, Valgrind is by far the best tool for detecting bad memory behavior.
If you are debugging the stack, your need to get to low level tools - place a canary in the stack frame (perhaps a buffer filled with something like 0xA5) around any potential suspects. Run the program in a debugger and see which canaries are no longer the right size and contain the right contents. You will gobble up a large chunk of stack doing this, but it may help you spot exactly what is occurring.
One thing I have done in the past to help narrow down a mystery bug like this was to create a variable with global visibility named checkpoint. Inside the culprit function, I set checkpoint = 0; as the very first line. Then, I added ++checkpoint; statements before and after function calls or memory operations that I even remotely suspected might be able to cause an out-of-bounds memory reference (plus peppering the rest of the code so that I had a checkpoint at least every 10 lines or so). When your program crashes, the value of checkpoint will narrow down the range you need to focus on to a handful of lines of code. This may be a bit overkill, I do this sort of thing on embedded systems (where tools like valgrind can't be used) but it should still be useful.
Wrap it in an exception handler and dump out useful information when it occurs.
Does this program recurse at all? If so, I check there to ensure you don't have an infinite recursion bug. If you can't see it manually, sometimes you can catch it in the debugger by pausing frequently and observing the stack.

c runtime error message

this error appeared while creating file using fopen in c programming language
the NTVDM cpu has encountered an illegal instruction CS:0000 IP0075
OP:f0 00 f0 37 05 choos 'close to terminate the operation
This kind of thing typically happens when a program tries to execute data as code. In turn, this typically happens when something tramples the stack and overwrites a return address.
In this case, I would guess that "IP0075" is the instruction pointer, and that the illegal instructions executed were at address 0x0075. My bet is that this address is NOT mapped to the apps executable code.
UPDATE on the possible connection with 'fopen': The OP states that deleting the fopen code makes the problem go away. Unfortunately, this does not prove that the fopen code is the cause of the problem. For example:
The deleted code may include extra local variables, which may mean that the stack trampling is hitting the return address in one case ... and in the other case, some word that is not going to be used.
The deleted code may cause the size of the code segment to change, causing some significant address to point somewhere else.
The problem is almost certainly that your application has done something that has "undefined behavior" per the C standard. Anything can happen, and the chances are that it won't make any sense.
Debugging this kind of problem can be really hard. You should probably start by running "lint" or the equivalent over your code and fixing all of the warnings. Next, you should probably use a good debugger and single step the application to try to find where it is jumping to the bad code/address. Then work back to figure out what caused it to happen.
Assuming that it's really the fopen() call that causes problems (it's hard to say without your source code), have you checked that the 2 character pointers that you pass to the function are actually pointers to a correctly allocated memory?
Maybe they are not properly initialized?
Hmmm.... you did mention NTVDM which sounds like an old 16 bit application that crashed inside an old command window with application compatibility set, somehow. As no code was posted, it could be possible to gauge a guess that its something to do with files (but fopen - how do you know that without showing a hint?) Perhaps there was a particular file that is longer than the conventional 8.3 DOS filename convention and it borked when attempting to read it or that the 16 bit application is running inside a folder that has again, name longer than 8.3?
Hope this helps,
Best regards,
Tom.

Resources