Finding the true memory footprint of a Windows application - wpf

I've run into a few OutOfMemoryExceptions with my C#/WPF application and I'm running into some confusing data while attempting to profile the memory usage.
When the app is typically running, Windows Task Manager shows the memory usage as somewhere around 34 MB (bounces around slightly as objects are created and garbage collected). When I run memory profiling applications such as CLR Profiler and dotTrace Memory, they show the total memory usage at around 1.2 MB.
Why this huge discrepancy? What does Task Manager see that these profilers do not?
UPDATE: I added some diag code to my application to print out various memory information every so often via the Process class.
While running my app, I set up a rule in DebugDiag to perform a memory dump in the event of an exception. I forced an exception and the memory dump occurred. At this point, the memory usage of my app (as shown by task manager) jumped from 32 MB to 145 MB and remained there.
You can see this jump in the table below (WorkingSet64). I'm still trying to make sense of all the types of memory info provided by the Process class. How would an external application make the working set of my app grow like this?
Link to data table here.

Using some of the diagnostics tools suggested here, plus the ANTS memory profiler (which is so money) I found the source of the leak.
WPF Storyboard animations leak under .NET 3.5
The WPF BitmapEffect class can cause leaks. The alternative "Effect" class fixes the leak. Link, Link
XAML Merged ResourceDictionaries can cause leak. Link, Link
The "Working Set" memory footprint of an application (memory shown by task manager) is not a good indication of your process' footprint. Outside applications can influence this. Link
The memory profiling tools helped me find that the leaks were mostly in unmanaged code, which made it a real pain to track down. Dealing with these leaks, plus a better understanding of Windows memory (private vs working set) cleared things up.

Prcess Explorer and VMMap, both part of the Sysinternals Suite by Mark Russinovich.

Related

WindowsCodecs adding memory footprint of WPF application

I downloaded Red gate ANTS Memory Profiler and profiled my app. I noticed that my app had a lot of unmanaged memory allocated (160MB). So I ran the profiler again with unmanaged memory profiling enabled. The break down of the unmanaged memory as follows:
WindowsCodecs: 45.19MB
CLR: 43.21MB
Allocated by Managed code: 30.23MB
Allocated before profiling started: 2.047MB
d3d9: 595.5KB
Other modules: 67.27MB
Other modules also contains the red gate memory profiler which accounted for 64.94MB of the 67.27MB.
From reading the Red Gate documentation, loading the CLR is expected to be around 40MB and my managed code allocating 30MB seems reasonable. But I am confused by the WindowsCodecs taking up 45MB! I am assuming that WindowsCodex is related to video codecs or playback. My app does not use Windows Media Player media control nor does it do any media playback of any sort (audio or video).
Anybody run into this? It appears my app could get a 45MB memory footprint reduction if I can figure out what is pulling in this dependency.

The memory usage shown in memory profiler is different from Process Explorer

I'm learning to use .Net Memory Profiler and testing the memory usage of Microsoft's DataBindingLab sample project.
.Net Memory Profiler shows total live bytes are 2011015, ie. 2MB.
Process Explorer shows Private Bytes are 56MB.
Obviously, there is such a big difference between the value given by the two tools. Where does the remaining 54MB come from?

Memory usage profiled in task manager and memory profiler tools

.Net winform application.
I used several memory profiler, including CLR profiler, DotTrace memory, Net memory profiler.
The tools gave the result that the allocated memory was 38-40M. But I found that working set was 300-400M in task manager(almost the same size as Peak working set or memory or commit size.
So what's the difference between the two results? What do the results mean?
those tools may show you private bytes or managed heap size, this does not include, e.g. memory mapped file, either page file backed or disk file backed, your app may be r/w ing
big mapped file, so working set looks big, or your app just load too many dll/assemblies.
VMMAP (from sysinternals) can give a clear overview of memory type/size in your app.

What makes WPF application startup slow?

I noticed that WPF application startup is sometimes pretty slow. Does anybody know if the the cause is the elements initialization or DLLs loading or something else?
The text below was extracted from this MSDN article on Improving WPF applications startup time (Edit: now merged into WPF Application Startup Time)
Application Startup Time
The amount of time that is required for a WPF application to start can vary greatly. This topic describes various techniques for reducing the perceived and actual startup time for a Windows Presentation Foundation (WPF) application.
Understanding Cold Startup and WarmStartup
Cold startup occurs when your application starts for the first time after a system reboot, or when you start your application, close it, and then start it again after a long period of time. When an application starts, if the required pages (code, static data, registry, etc) are not present in the Windows memory manager's standby list, page faults occur. Disk access is required to bring the pages into memory.
Warm startup occurs when most of the pages for the main common language runtime (CLR) components are already loaded in memory, which saves expensive disk access time. That is why a managed application starts faster when it runs a second time.
Implement a Splash Screen
In cases where there is a significant, unavoidable delay between starting an application and displaying the first UI, optimize the perceived startup time by using a splash screen. This approach displays an image almost immediately after the user starts the application. When the application is ready to display its first UI, the splash screen fades. Starting in the .NET Framework 3.5 SP1, you can use the SplashScreen class to implement a splash screen. For more information, see How to: Add a Splash Screen to a WPF Application.
You can also implement your own splash screen by using native Win32 graphics. Display your implementation before the Run method is called.
Analyze the Startup Code
Determine the reason for a slow cold startup. Disk I/O may be responsible, but this is not always the case. In general, you should minimize the use of external resources, such as network, Web services, or disk.
Before you test, verify that no other running applications or services use managed code or WPF code.
Start your WPF application immediately after a reboot, and determine how long it takes to display. If all subsequent launches of your application (warm startup) are much faster, your cold startup issue is most likely caused by I/O.
If your application's cold startup issue is not related to I/O, it is likely that your application performs some lengthy initialization or computation, waits for some event to complete, or requires a lot of JIT compilation at startup. The following sections describe some of these situations in more detail.
Optimize Module Loading
Use tools such as Process Explorer (Procexp.exe) and Tlist.exe to determine which modules your application loads. The command Tlist <pid> shows all the modules that are loaded by a process.
For example, if you are not connecting to the Web and you see that System.Web.dll is loaded, then there is a module in your application that references this assembly. Check to make sure that the reference is necessary.
If your application has multiple modules, merge them into a single module. This approach requires less CLR assembly-loading overhead. Fewer assemblies also mean that the CLR maintains less state.
Defer Initialization Operations
Consider postponing initialization code until after the main application window is rendered.
Be aware that initialization may be performed inside a class constructor, and if the initialization code references other classes, it can cause a cascading effect in which many class constructors are executed.
Avoid Application Configuration
Consider avoiding application configuration. For example, if an application has simple configuration requirements and has strict startup time goals, registry entries or a simple INI file may be a faster startup alternative.
Utilize the GAC
If an assembly is not installed in the Global Assembly Cache (GAC), there are delays caused by hash verification of strong-named assemblies and by Ngen image validation if a native image for that assembly is available on the computer. Strong-name verification is skipped for all assemblies installed in the GAC. For more information, see Gacutil.exe (Global Assembly Cache Tool).
Use Ngen.exe
Consider using the Native Image Generator (Ngen.exe) on your application. Using Ngen.exe means trading CPU consumption for more disk access because the native image generated by Ngen.exe is likely to be larger than the MSIL image.
To improve the warm startup time, you should always use Ngen.exe on your application, because this avoids the CPU cost of JIT compilation of the application code.
In some cold startup scenarios, using Ngen.exe can also be helpful. This is because the JIT compiler (mscorjit.dll) does not have to be loaded.
Having both Ngen and JIT modules can have the worst effect. This is because mscorjit.dll must be loaded, and when the JIT compiler works on your code, many pages in the Ngen images must be accessed when the JIT compiler reads the assemblies' metadata.
Ngen and ClickOnce
The way you plan to deploy your application can also make a difference in load time. ClickOnce application deployment does not support Ngen. If you decide to use Ngen.exe for your application, you will have to use another deployment mechanism, such as Windows Installer.
For more information, see Ngen.exe (Native Image Generator).
Rebasing and DLL Address Collisions
If you use Ngen.exe, be aware that rebasing can occur when the native images are loaded in memory. If a DLL is not loaded at its preferred base address because that address range is already allocated, the Windows loader will load it at another address, which can be a time-consuming operation.
You can use the Virtual Address Dump (Vadump.exe) tool to check if there are modules in which all the pages are private. If this is the case, the module may have been rebased to a different address. Therefore, its pages cannot be shared.
For more information about how to set the base address, see Ngen.exe (Native Image Generator).
Optimize Authenticode
Authenticode verification adds to the startup time. Authenticode-signed assemblies have to be verified with the certification authority (CA). This verification can be time consuming, because it can require connecting to the network several times to download current certificate revocation lists. It also makes sure that there is a full chain of valid certificates on the path to a trusted root. This can translate to several seconds of delay while the assembly is being loaded.
Consider installing the CA certificate on the client computer, or avoid using Authenticode when it is possible. If you know that your application does not need the publisher evidence, you do not have to pay the cost of signature verification.
Starting in .NET Framework 3.5, there is a configuration option that allows the Authenticode verification to be bypassed. To do this, add the following setting to the app.exe.config file:
<configuration>
<runtime>
<generatePublisherEvidence enabled="false"/>
</runtime>
</configuration>
Compare Performance on Windows Vista
The memory manager in Windows Vista has a technology called SuperFetch. SuperFetch analyzes memory usage patterns over time to determine the optimal memory content for a specific user. It works continuously to maintain that content at all times.
This approach differs from the pre-fetch technique used in Windows XP, which preloads data into memory without analyzing usage patterns. Over time, if the user uses your WPF application frequently on Windows Vista, the cold startup time of your application may improve.
Use AppDomains Efficiently
If possible, load assemblies into a domain-neutral code area to make sure that the native image, if one exists, is used in all AppDomains created in the application.
For the best performance, enforce efficient cross-domain communication by reducing cross-domain calls. When possible, use calls without arguments or with primitive type arguments.
Use the NeutralResourcesLanguage Attribute
Use the NeutralResourcesLanguageAttribute to specify the neutral culture for the ResourceManager. This approach avoids unsuccessful assembly lookups.
Use the BinaryFormatter Class for Serialization
If you must use serialization, use the BinaryFormatter class instead of the XmlSerializer class. The BinaryFormatter class is implemented in the Base Class Library (BCL) in the mscorlib.dll assembly. The XmlSerializer is implemented in the System.Xml.dll assembly, which might be an additional DLL to load.
If you must use the XmlSerializer class, you can achieve better performance if you pre-generate the serialization assembly.
Configure ClickOnce to Check for Updates After Startup
If your application uses ClickOnce, avoid network access on startup by configuring ClickOnce to check the deployment site for updates after the application starts.
If you use the XAML browser application (XBAP) model, keep in mind that ClickOnce checks the deployment site for updates even if the XBAP is already in the ClickOnce cache. For more information, see ClickOnce Security and Deployment.
Configure the PresentationFontCache Service to Start Automatically
The first WPF application to run after a reboot is the PresentationFontCache service. The service caches the system fonts, improves font access, and improves overall performance. There is an overhead in starting the service, and in some controlled environments, consider configuring the service to start automatically when the system reboots.
Set Data Binding Programmatically
Instead of using XAML to set the DataContext declaratively for the main window, consider setting it programmatically in the OnActivated method.
The most useful advice on fixing WPF startup performance I've ever seen was given in this other question: run "ngen update" in every framework folder.
It seems that Microsoft can't keep their ngen cache up-to-date, which results in your application pretty much recompiling half the .NET framework every single startup.
Hard to believe, but seems to be true.
The startup time of a WPF Application can be much faster if you use Framework 3.51 and not 3.5 or 3.0. The 3.51 is really an improvement.
This is an old thread, but I've ended up here several times while trying to fix a startup performance issue with WPF apps on my Win10 system, so I thought I'd state an answer which may help others - an answer that takes a horrible 5 second startup time for all WPF apps on this system down to just a few milliseconds. Remove the nVidia "3d Vision" driver. I have a GeForce GTX 650 card, and the "3d Vision" driver doesn't seem to offer anything useful, so removing it is no problem for me. The VisualStudio2015 performance analysis tool finally helped show that almost all the 5 second startup time was spent IDLE after a call through nvapi64.dll - the nVidia driver. Wow.
What helped me the most from the excellent article that Stuart links to was the XmlSerializer trick. That really shaved up quite a few seconds. Furthermore don't underestimate defragmenting your HD :-)

Why is WPF application running in debug mode slow?

I know that running applications in DEBUG (build configuration) thru the visual studio adds a level of overhead but I have a WPF application that I am testing out that is painfully slow in its execution and other functions such as drag/drop of items. When I run the application in Release mode it performs like one would expect, very quickly and without hesitation. I've set up no special debugging parameters or any other watches, settings or breakpoints that would interrupt the application.
Has anyone else run across an issue like this or is there possibly just some setting that can be adjusted? It's not really an issue more of a why is this happening...
thanks.
The garbage collector is much less aggressive in debug mode.
Try watching the memory usage in task manager, the VM Size column is often the most useful.
See if during the slow operations a lot of memory gets released - this will indicate that the collector hasn't bothered doing much work for a while and then has had to kick in a do a larger clean up.
You might check your Output and Immediate windows. You're might be getting a lot of messages thrown in there, especially if you're getting binding errors.

Resources