WPF ListView with millions of rows

I need to display a very large amount of data in a list view, up to 5 million or more rows. I am trying to find a solution where I can show all of the 5 million of items without having all of the 5 million items in my applications memory.
So basically the idea is that only a small amount of the items are shown, like 1k or 2k and when scrolling new data is retrieved from a database on demand. So that the application never has more then a couple of thousand item in its application memory. However the user should not realize that this is done, the listview should behave as if it had 5 million rows. Which means if he drags the scrollbar way down it should display the last of the 5 million rows and not the last of the couple of thousand rows that are currently in memory.
Does anyone of you have concept how such a feature can be achieved? I am grateful for any input you can provide.

Data virtualization in wpf is much discussed - here is a good starting place.
I have implemented an approach that is somewhat different from anything I've found online. It isn't perfect but it suits my needs quite well.
I have a generic ItemsProvider interface that can page through data and exposes some other basic functions and info about the backing data. I also created a VirtualizationManager class that inherits from DependencyObject and has dependency properties ScrollableHeight and VerticalOffset. These are bound to the matching properties on a ScrollViewer (found in the templates of your finer ItemsControls). When either of these properties change they fire a callback that calculates how close the scroll viewer is to the end of the (currently loaded) list - for this my VirtualizationManager needs a handle to the ItemsProvider - and if that is less than some minimum amount the ItemsProvider is instructed to load the next page. The whole contraption can be installed on an ItemsControl via a set of attached properties.
My implementation is rather idiosyncratic, but the idea is fairly simple.
If you're dealing with very large lists you'll want a solution that not only loads incrementally but also unloads old items. There is at least one such solution at the link above.


DataGrid row request patterns with data virtualization

I implemented a data virtualization solution using some ideas from CodePlex and the blog of Bea Stollnitz and Vincent Da Ven Berhge's paper (same link). However I needed a different approach so I decided to write my own solution.
I am using a DataGrid to display about a million rows with this solution. I am using UI virtualization as well. My solution is feasible, but I experience some weird behavior in certain situations on how the DataGrid requests data from its source.
About the solution
I ended up writing a list which does all the heavy work. It is a generic class named VirtualList<T>. It implements the ICollectionViewFactory interface, so the collection view creation mechanism can create a VirtualListCollectionView<T> instance to wrap it. This class inherits from ListCollectionView. I did not follow the suggestions to write my own ICollectionView implementation. Inheriting seems to work fine as well.
The VirtualList<T> splits the whole data into pages. It gets the total item count and every time the DataGrid requests for a row via the list indexer it loads the appropriate page or returns it from the cache. The pages are recycled inside and a DispatcherTimer disposes unused pages in idle time.
Data request patterns
The first thing I learned, that VirtualList<T> should implement IList (non generic). Otherwise the ItemsControl will treat it as an IEnumerable and query/enumerate all the rows. This is logical, since the DataGrid is not type safe, so it cannot use the IList<T> interface.
The row with 0 index is frequently asked by the DataGrid. It is seem to be used for visual item measurement (according to the call stack). So, I simply cache this one.
The caching mechanism inside the DataGrid uses a predictable pattern to query the rows it shows. First it asks for the visible rows from top to bottom (two times for every row), then it queries a couple of rows (depending on the size of the visible area) before the visible area (including the first visible row) in a descending order so, from bottom to top. After that it requests for a same amount of rows after the visible rows (including the last visible row) from top to bottom.
If the visible row indexes are 4,5,6. The data request would be: 4,4,5,5,6,6,4,3,2,1,6,7,8,9.
If my page size is properly set, I can serve all these requests from the current and previously loaded page.
If CanSelectMultipleItems is True and the user selects multiple items using the SHIFT button or mouse drag, the DataGrid enumerates all the rows from the beginning of the list to the end of the selection. This enumeration happens via the IEnumerable interface regardless of that IList is implemented or not.
If the selected row is not visible and the current visible area is "far" from the selected row, sometimes DataGrid starts requesting all the items, from the selected row to the end of the visible area. Including all the rows in between which are not even visible. I could not figure out the exact pattern of this behavior. Maybe my implementation is the reason for that.
My questions
I am wondering, why the DataGrid requests for non visible rows, since those rows will be requested again when become visible?
Why is it necessary to request every row two or three times?
Can anyone tell me how to make the DataGrid not to use IEnumerable, except turning off multiple item selection?
I at least found some way to fool the VirtualList. You can read it here.
If you have found another solution (that is even better than mine), please tell me!

Rendering Thread still slow after Virtualization

At a high level my application is applying about 5 different DataTemplates to a set of ListBoxItems based on their type. These items are laid out on a canvas at specific x, y points. I implemented virtualization on my ListBox and it did not seem to improve the time it takes to complete the rendering thread's processes. It still takes about 8-12 seconds for the UI to be completely loaded and usable by the user. I thought virtualization would help fix this problem but after looking around it looks like it only helps process scrolling large amounts of data. Am I correct in this assumption and does anyone else have any other tips for improving the rendering thread. This is the only problem I am having and then my project is complete. Thanks StackOverflow!
Virtualisation means that only the items you have visible are created, then dynamically destroyed/new items created as you scroll. The alternative is all UI controls are created for all items at once.
It sounds like you have bigger problems with the rest of the app. Do you perform all loading operations on a background thread? Is the UI control tree very complex indeed? Are you displaying 100s or 1,000s of items?
We also had a lot of trouble with performance in WPF. Best way is of course to profile your application. We use ANTS Performance profiler for that, but any .NET profiler will do. We got a huge performance hit, because of the lookup of our XAML Resources. Thats the advice i can give you:
Try to minimize all resources in XAML. But not only that, also try to minimize the amount of XAML files you have. One thing you can try is to defere the loading of complex parts of your DataTemplate. Similiar to what happens when you load a JPEG in a browser, first you will see a pixelated image which will be finer after it finished loading the JPEG. To accomplish that, use a simpler DataTemplate at first and then if this is visible only load the complex template on demand or after a while.
But without more information of your specific problem, we can only guess. This is an old question of mine about a similiar subject, maybe this will help aswell.
Yes, ListBox virtualization is for scrolling. When you have a large number of items in a ListBox, enabling virtualization will make only the visible items (+ a few extra items for scrolling) render, and scrolling the ListBox replaces the data in the rendered items instead of rendering new items.
If you were to post some code, perhaps we could assist you with some performance tweaks

How to best display large number of items in program

I currently have a 952 large collection of items. I am displaying about 500+ of them as polygons, and this is causing some noticeable, but manageable lag in my application. What is the most lightweight control / element that I can use to display these items at one time?
DrawingVisuals provide a more lightweight approach for rendering objects than Paths:
The downside of this approach is that they do not provide events such as mouse enter / leave, you must perform hit testing manually. However, this might be OK for your needs.
There is an even more lightweight approach where you add items to the visual layer directly, you can see an example on this page:
My advice would be to try DrawingVisuals first.

MS Word pagination using Multiple wpf RichTextBox

My aim is to make a editor behave similar to MS-Word.Wpf RichTextBox is a wonderful control for it.By placing it inside a ScrollViewer,we can make it editable.(Like a notepad).But I need MS-Word like pages.One effective way probably is to apply style to scrollViewer such that we create a look and feel of multiple pages on richtextbox but I dont know how to do it.What we are doing in the project is to use a documentViewer. Inside a FixedPage,create a Header(Canvas),Body(WpfRichTextBox),Footer(Canvas). And thus create multiple pages,and by subscribing to RichTextBox sizechanged event, we are manually doing the pagination i.e move the blocks from one page to another when height has changed. Do you see any better approach in doing this? Does using multiple richtextboxes hamper my performance?
#WpfProgrammer This is the good approach I would say. Say if you have 1000s of pages then, there will definitely be a performance problem. For avoiding that problem, you need to do demand paging.
Virtual Paging :
1. You need to construct a page table, which will contains pages. Each page will contains information about the controls, images, their positions, dimension and Styles for the page. [All serializable data]
2. Virtual Pages - You need to
de-serialize all the data for the
page and create a page with
RichTextBox. Virtual Pages are
nothing but, pre-cached pages that
are going to be rendered. Say for
example. If I'm in 1st page. Then,
I'll de-seriealize next 3
consecutive pages and have them in a
collection. Then, repeat this
procedure for consecutive page
movements. Adding some logic using
Most Frequently Used collection. It
will be fast enough. In the case of
1000's of pages. You can collapse
those non-dirty or never visited
pages. That could yield little more
performance. If performance is far
more concern for low hardwares.
Then, you should consider
3. Cleaning -
Cleaning is the process of
identifying LFU pages and remove
them. This would be very helpful if
performance is more pronounced.
Hi Tameem
Set the min height,width of the richTextBox to A4 size(lets say). Subscribe to RichTextBox Size Changed event.As soon as the content exceeds,this event gets fired.Then I take the last block of previous page and push it to the first block of next page.(Remember if page doesnot exist, you need to create new page then add it as first block).And also the focus should be changed to the new page.(because if you press enter at the last RTB, you expect the focus to be there in the new page.).When the user deletes a block in some page(say 2nd),then you need to add all the blocks of bottom pages to this page,so that our pagination logic will push the blocks down again and adjust. I can share some piece of code if you need further help.

In what way a WPF Wrap panel is slower that we need virtual wrap panel

I hear a lot about the wrap panel being slower to load things and hence we need a virtualising panel.
Can somebody give me a small wrap panel sample where it can be proven it is slower to load etc that it needs a virtualising panel please.
I set a wrap panel as a panel control for a listbox, and added 10000 string objects to it, and it was not a problem. I am sure my sample was silly, maybe i have to write a business object and create a larger data template to see this problem in action.
Kindly show me a sample that proves wrap panel without virtualisation is slower.
I think the performance issue depends mainly on the number of visual objects in your tree.
The default ListBoxItem template consists of a low number of elements (a border and a textblock i think). If you have a template that creates a complex visualization of lets say 100 visual elements per item you get a fairly large amount of visuals depending on your item count.
This is the reason why the normal panel is slower at load time, because it has to create all the objects at startup whereas the virtualising version only creates visuals for the visible items and disposes no longer displayed visuals.
In addition this has also implications on memory usage
I recently needed this functionality when making a insert symbol form. Using a listbox with normal wrap panel as the items panel - load time would take up to 5 seconds.
