Analyzing the root cause of OutOfMemoryException in WPF app with WinDbg - wpf

I'm having some troubles to understand the crash dump and to find what is the root cause of the OutOfMemoryException thrown by the WPF application. The exception is thrown after the application has been run for several hours so this clearly indicates that there is a memory leak.
My first step was to look at !address -summary command :
--- Usage Summary ---------------- RgnCount ------- Total Size -------- %ofBusy %ofTotal
<unknown> 2043 58997000 ( 1.384 Gb) 71.43% 69.22%
Heap 152 fcc3000 ( 252.762 Mb) 12.74% 12.34%
Image 1050 bc77000 ( 188.465 Mb) 9.50% 9.20%
Stack 699 7d00000 ( 125.000 Mb) 6.30% 6.10%
Free 518 3f6b000 ( 63.418 Mb) 3.10%
TEB 125 7d000 ( 500.000 kb) 0.02% 0.02%
Other 12 36000 ( 216.000 kb) 0.01% 0.01%
PEB 1 1000 ( 4.000 kb) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 2186 685b7000 ( 1.631 Gb) 84.14% 81.53%
MEM_IMAGE 1710 f3f3000 ( 243.949 Mb) 12.29% 11.91%
MEM_MAPPED 186 46db000 ( 70.855 Mb) 3.57% 3.46%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_COMMIT 3366 73fe7000 ( 1.812 Gb) 93.52% 90.62%
MEM_RESERVE 716 809e000 ( 128.617 Mb) 6.48% 6.28%
MEM_FREE 518 3f6b000 ( 63.418 Mb) 3.10%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 1650 5e19e000 ( 1.470 Gb) 75.87% 73.52%
PAGE_EXECUTE_READ 224 bc42000 ( 188.258 Mb) 9.49% 9.19%
PAGE_READWRITE|PAGE_WRITECOMBINE 28 439f000 ( 67.621 Mb) 3.41% 3.30%
PAGE_READONLY 573 3d7b000 ( 61.480 Mb) 3.10% 3.00%
PAGE_WRITECOPY 214 f8f000 ( 15.559 Mb) 0.78% 0.76%
PAGE_EXECUTE_READWRITE 265 d0a000 ( 13.039 Mb) 0.66% 0.64%
PAGE_READWRITE|PAGE_GUARD 357 33b000 ( 3.230 Mb) 0.16% 0.16%
PAGE_EXECUTE_WRITECOPY 55 119000 ( 1.098 Mb) 0.06% 0.05%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unknown> 78d40000 2350000 ( 35.313 Mb)
Heap 36db0000 fd0000 ( 15.813 Mb)
Image 64a8c000 e92000 ( 14.570 Mb)
Stack 4b90000 fd000 (1012.000 kb)
Free 7752f000 1a1000 ( 1.629 Mb)
TEB 7ede3000 1000 ( 4.000 kb)
Other 7efb0000 23000 ( 140.000 kb)
PEB 7efde000 1000 ( 4.000 kb)
This shows that the memory is quite high.
Then I'm looking at GC heap size with eeheap -gc command. It shows that the heap is quite big (1.1GB) which indicates that there is a problem within the managed part of the application.
5fc90000 5fc91000 60c7acd4 0xfe9cd4(16686292)
5a060000 5a061000 5b05e9c0 0xffd9c0(16767424)
56de0000 56de1000 57ddf1c4 0xffe1c4(16769476)
57de0000 57de1000 58ddbbbc 0xffabbc(16755644)
73ff0000 73ff1000 74fe0f5c 0xfeff5c(16711516)
50de0000 50de1000 51dcfa58 0xfeea58(16706136)
5b060000 5b061000 5c05ca54 0xffba54(16759380)
4fde0000 4fde1000 50ddfd8c 0xffed8c(16772492)
Large object heap starts at 0x03921000
segment begin allocated size
03920000 03921000 049013d0 0xfe03d0(16647120)
14850000 14851000 15837380 0xfe6380(16671616)
178d0000 178d1000 1889a3e0 0xfc93e0(16552928)
1a1c0000 1a1c1000 1b1abca8 0xfeaca8(16690344)
40de0000 40de1000 41dc8b48 0xfe7b48(16677704)
42de0000 42de1000 43827170 0xa46170(10772848)
54de0000 54de1000 55dd6d18 0xff5d18(16735512)
Total Size: Size: 0x448fde94 (1150279316) bytes.
------------------------------
GC Heap Size: Size: 0x448fde94 (1150279316) bytes.
Notice that there are 64 segments and each about (16MB). It seems that there is some data held in the memory and never released.
Next I look at !dumpheap -stat:
65c1f26c 207530 19092760 System.Windows.Media.GlyphRun
65c2c434 373991 20943496 System.Windows.Media.RenderData
68482bb0 746446 26872056 MS.Utility.ThreeItemList`1[[System.Double, mscorlib]]
65c285b4 746448 29857920 System.Windows.Media.DoubleCollection
64c25d58 299568 32353344 System.Windows.Data.BindingExpression
6708a1b8 2401099 38417584 System.WeakReference
67082c2c 1288315 41226080 System.EventHandler
67046f80 1729646 42238136 System.Object[]
64c1409c 206969 52156188 System.Windows.Controls.ContentPresenter
67094c9c 382163 64812664 System.Byte[]
004b0890 159 65181140 Free
64c150d0 207806 72316488 System.Windows.Controls.TextBlock
6708fd04 1498498 97863380 System.String
6848038c 847783 128775772 System.Windows.EffectiveValueEntry[]
As I understand it, there is no a signle object that takes all the memory. The biggest one is just about 122MB. Summing up all the sizes (8500 lines of outputed lines) gives the (1.1GB) of occupied memory. It seems that all the object graph is somehow duplicated and added to the memory and never released.
The !gcroot 6848038c or !gcroot 6708fd04 to inspect how EffectiveValueEntry and System.String are reachable, never ends, the stack is soooo big...
dumpheap -mt <address> doesn't show me something that srikes me. !finalizequeue shows that there are many objects (more that 2 millions) registered for finalization :
6708a1b8 2401099 38417584 System.WeakReference
Total 2417538 objects
I suspect that the OutOfMemoryException occures when the application tries to duplicate the object graph and allocate new memory, but I cannot find the root cause of it.
Question How can I drill down to the root of the problem (what other command of windbg can I use to check it). As it seems that not just one object is leaking but the whole object graph.. Am I on the right track or there is something else I'm overlooking ? What are other hypothesis ?

Object graph duplication
Your application uses ~1.1 GB of virtual memory by .NET. You can see that from the output of !eeheap -gc directly
GC Heap Size: Size: 0x448fde94 (1150279316) bytes.
or by summing up the values of !dumpheap -stat.
Summing up all the sizes (8500 lines of output lines) gives the (1.1GB)
This roughly correlates to the value displayed as <unknown> in !address -summary.
--- Usage Summary ---------------- RgnCount ------- Total Size -------- %ofBusy %ofTotal
<unknown> 2043 58997000 ( 1.384 Gb) 71.43% 69.22%
There is no reason to assume that the whole object graph is being duplicated. This is a normal situation.
OutOfMemory
At the moment, there are 65 MB of virtual memory already committed by .NET and marked as free (from !dumpheap -stat):
004b0890 159 65181140 Free
Unfortunately, those 65 MB are split into 159 smaller regions. to get the largest block of those, you need to run a !dumpheap -mt 004b0890.
In addition, .NET could obtain another 63 MB from Windows (from !address -summary):
--- Usage Summary ---------------- RgnCount ------- Total Size --------
Free 518 3f6b000 ( 63.418 Mb)
But the largest block is only 1.6 MB, so that's almost useless:
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free 7752f000 1a1000 ( 1.629 Mb)
So, clearly, the application is out of memory.
Where is the memory?
252 MB are in native heaps. It seems you're using some native DLLs. While that does not seem too much at the moment, this fact might could indicate the presence of pinned objects. Pinned objects are not garbage collected. Look at the output of !gchandles to find out whether that could be part of the problem.
188 MB are in DLLs. You can unload native DLLs which are not in use, but for .NET assemblies, you probably can't do much about that.
125 MB are in stacks. With a default size of 1 MB, it seems that there are 125 threads in your application. Use !clrstack to find out what they are doing and why they did not finish yet. Potentially, each thread works on something and has not freed the objects yet. If you have it under your control, do not start so many threads in parallel. E.g. use 8 threads only and wait for threads to finish before doing the next piece of work.
Of course the majority of memory is used by .NET objects. However, you drew a few wrong conclusions.
As I understand it, there is no a single object that takes all the memory. The biggest one is just about 122MB.
Note that there is not a single object EffectiveValueEntry[] eating 122 MB ob memory. There are 847.783 of them. This changes the question from "Why does this object use so much memory?" to "Why are there so many of them?". For example, why does your application need 207.806 text blocks? Is it really displaying so much text?
Using !gcroot is a good idea. However, you used it with the address of a method table instead of an object:
!gcroot 6848038c
!gcroot 6708fd04
These were both numbers from the output of !dumpheap -stat. Using them in !gcroot should have given a warning like
Please note that 6848038c is not a valid object.
Instead, !gcroot works on individual objects only which you get from a !dumpheap without the -stat parameter.
You can use !traveseheap filename.log to dump all objects into a file compatible with CLR profiler [Codeplex]. Note that CLR Profiler cannot read the -xml format. After loading the object information, Heap Graph is probably the most useful button for you.
To find out the trigger of the OutOfMemoryException, you can use the !u command. You'll need to read some MSIL code to understand what it does. See also How to identify array type. However, in your scenario, I guess that's useless, because even small objects could trigger this.

In addition to the other answer, you can try to visualize the memory adresses and the GC heap with the WinDbg extension cosos gcview

Related

Linker remove unused execution paths when linking with archive file

I have an embedded C application which is developed using the CrossWorks for ARM toolchain.
This project targets a specific processor which is getting old and hard to source, we are working towards revising our design with a new processor. So my plan is to divide the source code into a set of low level driver code which targets the old processor, and another set of common code which will be able to compile on both processors.
I got started making a drivers project which compiles down to a drivers.a file. Currently this file is literally empty. It's entire contents are
!<arch>
The problem I have is that mealy including this file into the compilation of the common code causes much bloating of the compiled size. And the resulting binary is about 33% larger...
Below is an example of the size of some of the sections from the map file, the symbols listed are the FatFs functions.
Size without drivers.a Size with drivers.a
f_close 76 f_close 148
f_closedir 84 f_closedir 136
f_findfirst 48 f_findfirst 108
f_findnext 116 f_findnext 144
f_getfree 368 f_getfree 636
f_lseek 636 f_lseek 1,148
f_mkdir 488 f_mkdir 688
f_mount 200 f_mount 256
f_open 1,096 f_open 1,492
f_opendir 324 f_opendir 472
f_read 564 f_read 1,132
f_readdir 176 f_readdir 268
f_stat 156 f_stat 228
f_sync 244 f_sync 440
f_unlink 380 f_unlink 556
f_write 668 f_write 1,324
So clearly because of the additional drivers.a file the linker is unable to determine that certain parts of the code are unreachable due to the possibility that the linked in drivers.a code would call those routines. This makes sense I guess, but I need a way to get around it so that I can divide the code into separately maintainable code, while still compiling down as efficiently as before.
I hand not realized that linking *.a files could have this consequence, I previously had the mental image that *.a files were no different from a bunch of *.o files effectively tar'ed together into a single file. Clearly this isn't the case.
It turns out this has nothing to do with mealy linking in the drivers.a file...
The way I had my project set up the compiler options were changing when I included the drivers.a. Effectively I believe I was actually comparing Debugging Level 3 to Debugging Level 2, in which case the added binary size is understandable.

"Cannot allocate memory" issue with shared memory using shmat command in C

I have two programs in C that need to communicate with each other. There is a single variable that I am storing in shared memory using shmget(key, 27, IPC_CREAT | 0666) in one program. I update this variable every 1 second. From the other program, I access it every 1 second using shmget(key, 27, 0666).
This works great, but after a while (usually a few hours), the program that retrieves the data crashes with a segfault. I used gdb to pinpoint the seg fault to the shmget(key, 27, 0666) line. The error code returned is:
ENOMEM Could not allocate memory for the descriptor or for the page
tables.
When I check the shared memory segments from the command prompt using ipcs -m, I currently see this:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 65536 root 600 393216 2 dest
0x00000000 98305 root 600 393216 2 dest
0x00000000 131074 root 600 393216 2 dest
0x00000000 163843 root 600 393216 2 dest
0x00000000 196612 root 600 393216 2 dest
0x00000000 229381 root 600 393216 2 dest
0x00000000 262150 root 600 393216 2 dest
0x00000000 294919 root 600 393216 2 dest
0x00000000 327688 root 600 393216 2 dest
0x00000000 589833 root 600 393216 2 dest
0x00000000 655370 root 600 393216 2 dest
0x00000000 524299 root 600 393216 2 dest
0x00000000 688140 root 666 27 0
0x0008aa53 720909 root 666 27 31950
0x0006f855 753678 root 666 27 33564
It seems to me like there's an issue with the shared memory I'm using hitting some kind of maximum? But I'm not sure what to do about that, and I'm finding precious little info by google searching. Any thoughts? This program needs to run for ~24 hours at a time at least, if not longer.
Thank you in advance.
You seem to misunderstand how to use Sys V shared memory segments. You should not need to perform a shmget() more than once in the same process for the same shared memory segment. You are meant to get a segment ID via shmget(), attach it to your memory space via shmat(), and thereafter simply read and/or write it as many times as you want. Modifications will be visible to other processes that have attached the same shared memory segment.
If you nevertheless do attach the memory segment multiple times, then you must be sure to also detach it each time via shmdt(), else, yes, you will eventually fill up the process's whole address space.
In addition, to use shared memory properly, you need some kind of synchronization mechanism. For this purpose, the natural complement to Sys V shared memory segments is Sys V semaphores. You use this to prevent one process from reading while another process is writing the same data.

Write one billion files in one Folder BUT "(No space left on device)" error

I'm trying to write 1 billion of files in one folder using multi thread but next my program wrote 20 million files I got "No space left on device". I did not close my program because It still writing same files.
I don't have any problems with "inode", I used only 7%.
No problem with /tmp, /var/tmp, there are empty.
I increased fs.inotify.max_user_watches to 1048576.
I use debian and EXT4 as filesystem.
Is there same one meet this problem and thank you so much for help.
Running tune2fs -l /path/to/drive gives
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 260276224
Block count: 195197952
Reserved block count: 9759897
Free blocks: 178861356
Free inodes: 260276213
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1024
Blocks per group: 24576
Fragments per group: 24576
Inodes per group: 32768
Inode blocks per group: 2048
Flex block group size: 16
Filesystem created: ---
Last mount time: ---
Last write time: ---
Mount count: 2
Maximum mount count: -1
Last checked: ---
Check interval: 0 ()
Lifetime writes: 62 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: ---
Directory Hash Seed: ---
Journal backup: inode blocks
check this question
How to store one billion files on ext4?
you have fewer blocks than inodes which is not going to work, though I think that is the least of your problems. If you really want to do this (would a database be better?) you may need to look into filesystems other an ext4 zfs springs to mind as an option that allows 2^48 entries per directory and should do what you want.
If this question https://serverfault.com/questions/506465/is-there-a-hard-limit-to-the-number-of-files-a-directory-can-have is anything to go by, there is a limit on the number of files per directory using ext4 which you are likely hitting

Is there a way to calculate I/O and memory of current process in C?

If I use
/usr/bin/time -f"%e,%P,%M,%I,%O"
I get (for the last three placeholders) the memory the process used, and if there was some input and output during it.
Obviously, it's easy to get %e or something like it using sys/time.h, but is there a way to get %M, %I and %O programmatically?
You could read and parse the files in the /proc filesystem. /proc/self refers to the process accessing the /proc filesystem.
/proc/self/statm contains information about memory usage, measured in pages. Sample output:
% cat /proc/self/statm
1115 82 63 12 0 79 0
Fields are size resident share text lib data dt; see the proc manual page for some additional details.
/proc/self/io contains the I/O for the current process. Sample output:
% cat /proc/self/io
rchar: 2012
wchar: 0
syscr: 6
syscw: 0
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0
Unfortunately, io isn't documented in the proc manual page (at least on my Debian system). I had too check the iotop source code to see how it obtained the per process I/O information.

Determining CPU utilization- solaris unix

I was just going through SO and found out a question Determining CPU utilization
The question is interesting and the one which is more intersting is the answer.
So i thought doing some checks on my solaris SPARC unix system.
i went to /proc as root user and i found out some directories with numbers as their names.
I think these numbers are the process id's.Surprisingly i did not find /stat.(donno why?..)
i took one process id(one directory) and checked whats present inside it.below is the output
root#tiger> cd 11770
root#tiger> pwd
/proc/11770
root#tiger> ls
as contracts ctl fd lstatus lwp object path psinfo root status watch
auxv cred cwd lpsinfo lusage map pagedata priv rmap sigact usage xmap
i did check what are those files :
root#tigris> file *
as: empty file
auxv: data
contracts: directory
cred: data
ctl: cannot read: Invalid argument
cwd: directory
fd: directory
lpsinfo: data
lstatus: data
lusage: data
lwp: directory
map: TrueType font file version 1.0 (TTF)
object: directory
pagedata: cannot read: Arg list too long
path: directory
priv: data
psinfo: data
rmap: TrueType font file version 1.0 (TTF)
root: directory
sigact: ascii text
status: data
usage: data
watch: empty file
xmap: TrueType font file version 1.0 (TTF)
i am not sure ..given this how can i determine the cpu utilization?
for eg: what is the idle time of my process.
can anyone give me the right direction?
probably with an example!
As no one else is taking the bait, I'll add some comments/answers.
1st off, Did you check out the info available for Solaris System tuning? This is for old Solarian, 2.6, v7 & 8. Presumably a little searching at developers.sun.com will find something newer.
You wrote:
I went to /proc as root user and i found out some directories with numbers as their names. I think these numbers are the process id's.Surprisingly i did not find /stat.(donno why?..)
Many non-Linux OS's have their own special conventions on how processes are managed.
For Solaris, the /proc directory is not a directory of disk-based files, but information about all of the active system processes arranged like a directory hierarchy. Cool, right?!
I don't know the exact meaning of stat, status? statistics? something else? but that is just the convention used a different OS's directory structure that is holding the process information.
As you have discovered, below /proc/ are a bunch of numbered entries, these are the active processIDs. When you cd into any one of those, then you're seeing the system information available for that process.
I did check what are those files : ....
I don't have access to Solaris servers any more, so we'll have to guess a little. I recommend 'drilling down' into any file or directory whose name hints at anything related.
Did you try cat psinfo? What did that produce?
If the solaris tuning page didn't help, then is your appropos is working? Do appropos proc and see what man-pages are mentioned. Drill down on those. Else try man proc, andd look near the bottom of the entry for the 'see also' section AND for the Examples section.
(Un)?fortunately, most man pages are not tutorials, so reading through these may only give you an idea on how much more you need to study.
You know about the built-in commands that offer some performance monitoring capabilities, i.e. ps, top, etc?
And the excellent AIX-based nmon has been/is being? ported to Solaris too, see http://sourceforge.net/projects/sarmon/.
There are also expensive monitoring/measuring/utilization tools that network managers like to have, as a mere developer, we never got to use them. Look at the paid ads when you google for 'solaris performance monitoring'.
Finally, keep in mind this excellent observation from the developer of the nmon-AIX system monitor included in the FAQ for nmon :
If you keep using shorter and shorter periods you will eventually see that the CPUs are either 100% busy or 100% idle all the other numbers are just a feature of humans not thinking fast enough and having to average out the CPU use in longer periods.
I hope this helps.
There is no simple and accurate way to get the CPU utilization from Solaris /proc hierarchy.
Unlike Linux which use it to store various system information and statistics, Solaris is only presenting process related data under /proc.
There is also another difference. Linux is usually presenting preprocessed readable data (text) while Solaris is always presenting the actual kernel structures or raw data (binary).
All of this is fully documented in Solaris 46 pages proc manual ( man -s 4 proc )
While it would be possible to get the CPU utilization by summing the usage per process from this hierarchy, i.e. by reading the /proc//xxx file, the usual way is through the Solaris kstat (kernel statistics) interface. Moreover, the former method would be inaccurate by missing CPU usage not accounted to processes but directly to the kernel.
kstat (man -a kstat) is what are using under the hood all the usual commands that report what you are looking for like vmstat, iostat, prstat, sar, top and the likes.
For example, cpu usage is displayed in the last three columns of vmstat output (us, sy and id for time spend in userland, kernel and idling).
$ vmstat 10 8
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr cd s0 -- -- in sy cs us sy id
0 0 0 1346956 359168 34 133 96 0 0 0 53 11 0 0 0 264 842 380 9 7 84
0 0 0 1295084 275292 0 4 4 0 0 0 0 0 0 0 0 248 288 200 2 3 95
0 0 0 1295080 275276 0 0 0 0 0 0 0 3 0 0 0 252 271 189 2 3 95
0 0 0 1295076 275272 0 14 0 0 0 0 0 0 0 0 0 251 282 189 2 3 95
0 0 0 1293840 262364 1137 1369 4727 0 0 0 0 131 0 0 0 605 1123 620 15 19 66
0 0 0 1281588 224588 127 561 750 1 1 0 0 89 0 0 0 438 1840 484 51 15 34
0 0 0 1275392 217824 31 115 233 2 2 0 0 31 0 0 0 377 821 465 20 8 72
0 0 0 1291532 257892 0 0 0 0 0 0 0 8 0 0 0 270 282 219 2 3 95
If for some reason you don't want to use vmstat, you can directly get the kstat counters by using the kstat command but that would be cumbersome and less portable.

Resources