How to generate and print large XPS documents in WPF? - wpf

I would like to generate (and then print or save) big XPS documents (>400 pages) from my WPF application. We have some large amount of in-memory data that needs to be written to XPS.
How can this be done without getting an OutOfMemoryException? Is there a way I can write the document in chunks? How is this usually done? Should I not be using XPS for large files in the first place?
The root cause of the OutOfMemoryException seems to be the creation of the huge FlowDocument. I am creating the full FlowDocument and then sending it to the XPS document writer. Is this the wrong approach?

How do you do it? You didn't show any code.
I use an XpsDocumentWriter to write in chunks, like this:
FlowDocument flowDocument = . .. ..;
// write the XPS document
using (XpsDocument doc = new XpsDocument(fileName, FileAccess.ReadWrite))
{
XpsDocumentWriter writer = XpsDocument.CreateXpsDocumentWriter(doc);
DocumentPaginator paginator = ((IDocumentPaginatorSource)flowDocument).DocumentPaginator;
// Change the PageSize and PagePadding for the document
// to match the CanvasSize for the printer device.
paginator.PageSize = new Size(816, 1056);
copy.PagePadding = new Thickness(72);
copy.ColumnWidth = double.PositiveInfinity;
writer.Write(paginator);
}
Does this not work for you?

Speaking from perfect ignorance of the specific system involved, might I suggest using the Wolf Fence in Alaska debugging technique to identify the source of the problem? I'm suggesting this because other responders are not reporting the same problem you are experiencing. When working with easy-to-reproduce bugs Wolf Fence is dead simple to do (it doesn't work so well with race conditions and the like).
Pick a midpoint in your input data and try to generate your output document from only that data, no more.
If it succeeds, pick a point about 75% into the input and try again, otherwise pick a point at about 25% of the way into the input and try again.
Lather, rinse, repeat, each time narrowing the window to where the works/fails line is.
You may find that you fairly quickly identify one specific page -- or perhaps one specific object on that page -- that is "funny." Note: you only have to do this log2(N) times, or in this case 9 times given the 400 pages you mention.
Now you probably have something you can attack directly. Good luck.

You cannot use a single FlowDocument for generating large documents because you will run out of memory. However if it is possible to generate your output as a sequence of FlowDocument or as an extremely tall ItemsControl, it is possible.
I've found the easiest way to do this is to subclass DocumentPaginator and pass an instance of my subclass to XpsDocumentWriter.Write:
var document = new XpsDocument(...);
var writer = XpsDocument.CreateXpsDocumentWriter(xpsDocument);
writer.Write(new WidgetPaginator { Widget = widgetBeingPrinted, PageSize = ... });
The WidgetPaginator class itself is quite simple:
class WidgetPaginator : DocumentPaginator, IDocumentPaginatorSource
{
Size _pageSize;
public Widget Widget { get; set; }
public override Size PageSize { get { return _pageSize; } set { _pageSize = value; } }
public override bool IsPageCountValid { return true; }
public override IDocumentPaginatorSource Source { return this; }
public override DocumentPaginator DocumentPaginator { return this; }
public override int PageCount
{
get
{
return ...; // Compute page count
}
}
public override DocumentPage GetPaget(int pageNumber)
{
var visual = ...; // Compute page visual
Rect box = new Rect(0,0,_pageSize.With, _pageSize.Height);
return new DocumentPage(visual, _pageSize, box, box);
}
Of course you still have to write the code that actually creates the pages.
If you want to use a series of FlowDocuments to create your document
If you're using a sequence of FlowDocuments to lay out your document one section at a time instead of all at once, your custom paginator can work in two passes:
The first pass occurs as the paginator is constructed. It creates a FlowDocument for each section, then gets a DocumentPaginator to retrieve the number of pages. Each section's FlowDocument is discarded after the pages are counted.
The second pass occurs during actual document output: If the number passed to GetPage() is in the most recent FlowDocument created, GetPage() simply calls that document's paginator to get the appropriate page. Otherwise it discards that FlowDocument and creates a FlowDocument for the new section, gets its paginator, then calls GetPage() on the paginator.
This strategy allows you to continue to use FlowDocuments as you have been, as long as you can break the data into "sections" each with its own document. Your custom paginator then effectively treats all the individual FlowDocuments as one big document. This is similar to Word's "Master Document" feature.
If you can render your data as a sequence of vertically-stacked visuals
In this case, the same technique can be used. During the first pass, all visuals are generated in order and measured to see how many will fit on a page. A data structure is built to indicate which range of visuals (by index or whatever) are found on a given page. During this process each time a page fills up, the next visual is placed on a new page. Headers and footers would be handled in the obvious way.
During the actual document generation, the GetPage() method is implemented to regenerate the visuals previously decided to be on a given page and combine them using a vertical DockPanel or other panel of your choice.
I've found this technique more flexible in the long run because you don't have to deal with the limitations of FlowDocument.

I can confirm that XPS does not throw out-of-memory on long documents. Both in theory (because operations on XPS are page-based, it doesn't try to load whole document in memory), and in practice (I use XPS-based reporting, and seen run-away error messages add up to many thousands of pages).
Could it be that the problem is in a single particularly large page? A huge image, for example? Large page with high DPI resolution? If single object in document is too big to be allocated at once, it will lead to out-of-memory exception.

Have you used sos to find out what is using up all the memory?
It could be that managed or unmanaged objects are being created during the production of your document, and they're not being released until the document is finished (or not at all).
Tracking down managed memory leaks by Rico Mariani could be of help.

like you say: probably the in-memory FixedDocument is consuming too much memory.
Maybe an approach in which you write out the XPS pages each individually (and make sure the FixedDocument gets released each time), and then use a merger afterwards could prove fruitful.
Are you able to write each page separately?
Nick.
ps. Feel free to concact me directly (info#nixps.com); we do a lot of XPS stuff at NiXPS, and I'm very interested in helping you getting this issue resolved.

Related

OxyPlot performance issue on larg data in WPF on InvalidatePlot

I'm using OxyPlot in my wpf application as line recorder. It's like the LiveDemo example.
On a larg visible data set, I get some UI performance issues and may the whole application could freez. It seems to be PlotModel.InvalidatePlot which is called with to many points to often, but I didn't found a better way.
In deep:
Using OxyPlot 2.0.0
I code all in the PlotModel. The Xaml PlotView is only binding to the PlotModel.
I cyclical collect data in a thread an put them in a DataSource (List of List which are ItemSoure for the LineSeries)
I have a class which calculates cyclical in a thread the presentation for x and y axis and a bit more. After all this stuff, it calls PlotModel.InvalidatePlot.
If I
have more than 100 k points on the display (no matter if in multiple LineSeries or not)
and add 1 DataPoint per LineSeries every 500 ms
and call PlotModel.InvalidatePlot every 200 ms
not only the PlotView has performance issues, also the window is very slow in reaction, even if I call PlotModel.InvalidatePlot (false).
My goal
My goal would be that the Windo / Application is working normally. It should not hang up because of a line recorder. The best would be if it has no performance issues, but I'm skeptical.
What I have found or tested
OxyPlot has Performance guidelines. I'm using ItemsSource with DataPoints. I have also tried adding them directly to the LineSeris.Points, but then the Plot doesn’t refresh anyway (even with an ObservableCollection), so I have to call PlotModel.InvalidatePlot, what results in the same effect. I cannot bind to a defined LineSeries in Xaml because I don’t know how much Lines will be there. Maybe I missed something on adding the points directly?
I have also found a Github issue 1286 which is describing a related problem, but this workaround is slower in my tests.
I have also checked the time which is elapsed on the call of PlotModel.InvalidatePlot, but the count of points does not affect it.
I have checked the UI thread and it seems it have trouble to handle this large set of points
If I zoom in to the plot and display under 20 k Points it looks so
Question:
Is there a way to handle this better, except to call PlotModel.InvalidatePlot much less?
Restrictions:
I also must Update Axis and Annotations. So, I think I will not come around to call PlotModel.InvalidatePlot.
I have found that using the OxyPlot Windows Forms implementation and then displaying it using Windows Form integration in WPF gives much better performance.
e.g.
var plotView = new OxyPlot.WindowsForms.PlotView();
plotView.Model = Plot;
var host = new System.Windows.Forms.Integration.WindowsFormsHost();
host.Child = plotView;
PlotContainer = host;
Where 'Plot' is the PlotModel you call InvalidatePlot() on.
And then in your XAML:
<ContentControl Content="{Binding PlotContainer}"/>
Or however else you want to use your WindowsFormsHost.
I have a similar problem and found that you can use a Decimator in LineSeries. It is documented in the examples: LineSeriesExamples.cs
The usage is like this:
public static PlotModel WithXDecimator()
{
var model = new PlotModel { Title = "LineSeries with X Decimator" };
var s1 = CreateSeriesSuitableForDecimation();
s1.Decimator = Decimator.Decimate;
model.Series.Add(s1);
return model;
}
This may solve the problem on my side, and I hope it helps others too. Unfortunately it is not documented in the documentation
For the moment I ended up with calculating the time for calling InvalidatePlot for the next time. I calculate it with the method given in this answer, wich returns the number of visible points. This rededuce the performance issue, but dosent fix the block on the UI Thread on calling InvalidatePlot.

WPF app with multiple usercontrols

I Am creating simple WPF test project which contains multiple UserControls(Insteda of Pages).I Am using Switcher Class to navigate between different UserControls.When i navigate to different pages,i have observed that memory consuption keep on increasing on each UserControle Navigationand GC is not invoked.
1.So am i doing something wrong in following code?
2.Which part of the code consuming more memory?
3.Do i need to invoke GC for disposing my UserControls on each new UserControle creation?
If need how can i invoke GC?
public void On_Navigate_Click()
{
UserControle newusercontrole=new UserControle();
DataSet ds = new DataSet();
ds=con.getSome_Datafrom_SQL();//Gets data from SQL via connection class
dataGrid_test.ItemsSource = ds.Tables[0].DefaultView;
Grid.SetColumn(newusercontrole, 1);//dataGrid_test is inside newusercontrole and following is the code to add "this" usercontrol to the main window.
Grid.SetRow(newusercontrole, 1);
Grid.SetZIndex(newusercontrole, 10);
Container.Children.Add(newusercontrole);
}
First off, I must point out that if garbage collection really isn't happening (as you said), it's not your fault and it does not mean you're doing something wrong. It only means that the CLR doesn't think that your system is under memory pressure yet.
Now, to manually invoke a garbage collection cycle anyway, you can use the GC.Collect() static method. If a garbage collection actually starts and your memory consumption is still unreasonably high, this means that you're probably doing something wrong: You're keeping an ever increasing number of unnecessary object references and the garbage collector cannot safely collect those objects. This is a kind of a memory leak.
As far as your code goes, I think that the problem is at the end of the method you posted:
Container.Children.Add(newusercontrole);
This seems to add a new object (on every click) to the collection Container.Children. If this is not removed elsewhere, this is probably the cause of your memory leak. I don't know what the suitable solution would be for your use case (since I don't know exactly how your UI should behave), but you'll likely need to find a way to remove the last UserControle added from the Container.Children. If you can use LINQ, then the methods OfType<T>() and Last() could be of use to find it.
In any case, don't leave the GC.Collect() line in production code. Use it only to force a collection cycle for testing purposes, like this one.

Immutable data model for an WPF application with MVVM implementation

I have an application which has a data tree as a databackend. To implement the the MVVM pattern I have a logic layer of classes which encapsulate the data tree. Therefore the logic also is arranged in a tree. If the input is in a valid state the data should be copied to a second thread which works as a double buffer of the last valid state. Therefore one way would be cloning.
Another approach would be to implement the complete data backend to be immutable. This would imply to rebuild the whole data tree if something new is entered. My question is, is there a practical way to do this? I'm stuck at the point where I have to reassign the data tree efficently to the logic layer.
**UPDATE - Some Code
What we are doing is to abstract hardware devices which we use to run our experiments. Therefore we defined classes like "chassis, sequence, card, channel, step". Those build a tree structure like this:
Chassis
/ \
Sequence1 Sequence2
/ | \
Card1 Card2 Card3
/ \
Channel1 Channel2
/ \
Step1 Step2
In code it looks like this:
public class Chassis{
readonly var List<Sequence> Sequences = new List<Sequence>();
}
public class Sequence{
readonly var List<Card> Cards = new List<Card>();
}
and so on. Of course each class has some more properties but those are easy to handle. My Problem now is that List is a mutable object. I can call List.Add() and it changed. Ok there is a ReadOnlyList but I'm not sure if it implements the immutability the right way. Right as in copy by value not reference and not just blocking to write to it by blocking the set methods.
The next problem is that the amount of sequences and step can vary. For this reason I need an atomic exchange of list elements.
At the moment I don't have any more code as I'm still thinking if this way would help me and if it is possible at all to implement it in a reasonable amount of time.
Note that there are new immutable collections for .NET that could help you achieve your goal.
Be very cautious about Dave Turvey's statement (I would downvote/comment if I could):
If you are looking to implement an immutable list you could try storing the list as a private member but exposing a public IEnumerable<>
This is incorrect. The private member could still be changed by its container class. The public member could be cast to List<T>, IList<T>, or ICollection<T> without throwing an exception. Thus anything that depends on the immutability of the public IEnumerable<T> could break.
I'm not sure if I understand 100% what you're asking. It sounds like you have a tree of objects in a particular state and you want to perform some processing on a copy of that without modifying the original object state. You should look into cloning via a "Deep Copy". This question should get you started.
If you are looking to implement an immutable list you could try storing the list as a private member but exposing a public IEnumerable<>
public class Chassis
{
List<Sequence> _sequences = new List<Sequence>();
public IEnumerable<Sequence> Sequences { get { return _sequences; } }
}
18/04/13 Update in response to Brandon Bonds comments
The library linked in Brandon Bonds answer is certainly interesting and offers advantages over IEnumerable<>. In many cases it is probably a better solution. However, there are a couple of caveats that you should be aware of if you use this library.
As of 18/04/2013 This is a beta library. It is obviously still in development and may not be ready for production use. For example, The code sample for list creation in the linked article doesn't work in the current nuget package.
This is a .net 4.5 library. so it will not be suitable for programs targeting an older framework.
It does not guarantee immutability of objects contained in the collections only of the collection itself. It is possible to modify objects in an immutable list You will still need to consider a deep copy for copying collections.
This is addressed in the FAQ at the end of the article
Q: Can I only store immutable data in these immutable collections?
A: You can store all types of data in these collections. The only immutable aspect is the collections themselves, not the items they contain.
In addition the following code sample illustrates this point (using version 1.0.8-beta)
class Data
{
public int Value { get; set; }
}
class Program
{
static void Main(string[] args)
{
var test = ImmutableList.Create<Data>();
test = test.Add(new Data { Value = 1 });
Console.WriteLine(test[0].Value);
test[0].Value = 2;
Console.WriteLine(test[0].Value);
Console.ReadKey();
}
}
This code will allow modification of the Data object and output
1
2
Here are a couple of articles for further reading on this topic
Read only, frozen, and immutable collections
Immutability in C# Part One: Kinds of Immutability

Programmatically determining max fit in textbox (WP7)

I'm currently writing an eBook reader for Windows Phone Seven, and I'm trying to style it like the Kindle reader. In order to do so, I need to split my books up into pages, and this is going to get a lot more complex when variable font sizes are added.
To do this at the moment, I just add a word at a time into the textblock until it becomes higher than its container. As you can imagine though, with a document of over 120,000 words, this takes an unacceptable period of time.
Is there a way I can find out when the text would exceed the bounds (logically dividing it into pages), without having to actually render it? That way I'd be able to run it in a background thread so the user can keep reading in the meantime.
So far, the only idea that has occurred to me is to find out how the textblock decides its bounds (in the measure call?), but I have no idea how to find that code, because reflector didn't show anything.
Thanks in advance!
From what I can see the Kindle app appears to use a similar algorithm to the one you suggest. Note that:
it generally shows the % position through the book - it doesn't show total number of pages.
if you change the font size, then the first word on the page remains the same (so that's where the % comes from) - so the Kindle app just does one page worth of repagination assuming the first word of the page stays the same.
if you change the font size and then scroll back to the first page, then actually there is a discontinuity - they pull content forwards again in order to fill the first page.
Based on this, I would suggest you do not index the whole book. Instead just concentrate on the current page based on a "position" of some kind (e.g. character count - displayed as a percentage). If you have to do something on a background thread, then just look at the next page (and maybe the prev page) in order that scrolling can be more responsive.
Further to optimise your experience, there are a couple of changes you could make to your current algorithm that you could try:
try a different starting point and search increment for your algorithm - no need to start at one word and to then only add one word at a time.
assuming most of your books are ASCII, try caching the width of the common characters, and then work out the width of textblocks yourself.
Beyond that, I'd also quite like to try using <Run> blocks within your TextBlock - it may be possible to get the relative position of each Run within the TextBlock - although I've not managed to do this yet.
I do something similar to adjust font size for individual textboxes (to ensure they all fit). Basically, I create a TextBlock in code, set all my properties and check the ActualWidth and ActualHeight properties. Here is some pseudo code to help with your problem:
public static String PageText(TextBlock txtPage, String BookText)
{
TextBlock t = new TextBlock();
t.FontFamily = txtPage.FontFamily;
t.FontStyle = txtPage.FontStyle;
t.FontWeight = txtPage.FontWeight;
t.FontSize = txtPage.FontSize;
t.Text = BookText;
Size Actual = new Size();
Actual.Width = t.ActualWidth;
Actual.Height = t.ActualHeight;
if(Actual.Height <= txtPage.ActualHeight)
return BookText;
Double hRatio = txtPage.ActualHeight / Actual.Height;
return s.Substring((int)((s.Length - 1) * hRatio));
}
The above is untested code, but hopefully can get you started. Basically it sees if the text can fit in the box, if so you're good to go. If not, it finds out what percentage of the text can fit and returns it. This does not take word breaks into account, and may not be a perfect match, but should get you close.
You could alter this code to return the length rather than the actual substring and use that as your page size. Creating the textblock in code (with no display) actually performs pretty well (I do it in some table views with no noticeable lag). I wouldn't send all 120,000 words to this function, but a reasonable subset of some sort.
Once you have the ideal length you can use a RegEx to split the book into pages. There are examples on this site of RegEx that break on word boundaries after a specific length.
Another option, is to calculate page size ahead of time for each potential fontsize (and hardcode it with a switch statement). This could easily get crazy if you are allowing any font and any size combinations, and would be awful if you allowed mixed fonts/sizes, but would perform very well. Most likely you have a particular range of readable sizes, and just a few fonts. Creating a test app to calculate the text length of a page for each of these combinations wouldn't be that hard and would probably make your life easier - even if it doesn't "feel" right as a programmer :)
I didn't find any reference to this example from Microsoft called: "Principles of Pagination".
It has some interesting sample code running in Windows Phone.
http://msdn.microsoft.com/en-us/magazine/hh205757.aspx
You can also look this article about Page Transitions in Windows Phone and this other about the final touches in the E-Book project.
The code is downloadable: http://archive.msdn.microsoft.com/mag201111UIFrontiers/Release/ProjectReleases.aspx?ReleaseId=5776
You can query the FormattedText class that is used AFAIK inside textBlock. since this is the class being used to format text in preparation for Rendering, this is the most lower-level class available, and should be fast.

Why is my WPF Treeview bound to LinqToSql classes being a memory hog?

I have a WPF App which is grinding to a halt after running out of memory...
It is basically a TreeView displaying nodes, which are instances of Linq To Sql OR Generated class ICTemplates.Segment. There around 20 tables indirectly linked via associations to this class in the OR designer.
<TreeView Grid.Column="0" x:Name="tvwSegments"
ItemsSource="{Binding}"
SelectedItemChanged="OnNewSegmentSelected"/>
<HierarchicalDataTemplate DataType="{x:Type local:Segment}" ItemsSource="{Binding Path=Children}">
...
// code behind, set the data context based on user-input (Site, Id)
KeeperOfControls.DataContext = from segment in tblSegments
where segment.site == iTemplateSite && segment.id == iTemplateSid
select segment;
I've added an explicit property called Children to the segment class which looks up another table with parent-child records.
public IEnumerable<Segment> Children
{
get
{
System1ConfigDataContext dc = new System1ConfigDataContext();
return from link in this.ChildLinks
join segment in dc.Segments on new { Site = link.ChildSite, ID = link.ChildSID } equals new { Site = segment.site, ID = segment.id }
select segment;
}
}
The rest of it is data binding coupled with data templates to display each Segment as a set of UI Controls.
I'm pretty certain that the children are being loaded on-demand (when I expand the parent) going by the response time. When I expand a node with around 70 children, it takes a while before the children are loaded (Task manager shows Mem Usage as 1000000K!). If I expand the next node with around 50 children, BOOM! OutOfMemoryException
I ran the VS Profiler to dig deeper and here are the results
Summary Page
Object Lifetimes
Allocation
The top 3 are Action, DeferredSourceFactory.DeferredSource and EntitySet (all .Net/LINQ classes). The only user-classes are Segment[] and Segment come in at #9 an #10.
I can't think of a lead to pursue.. What could be the reason ?
maybe a using surrounding that DataContext ?
using(System1ConfigDataContext dc = new System1ConfigDataContext()){
.... ?
}
also, have you tried using an sql profiler? might shed some light on the matter.
Have you tried using a global DataContext instead of one for each element?
Creating all of the DataContext's each with there own query and results could be the cause of your memory bloat.
I don't know the exact solution but the new statement in join may cause this. Because for each relation a new object could be created(But as I mentioned, I don't know if it is correct).
Could you try this;
public IEnumerable<Segment> Children
{
get
{
System1ConfigDataContext dc = new System1ConfigDataContext();
return from link in this.ChildLinks
join segment in dc.Segments on link.ChildSite == segment.site && link.ChildSID == segment.id
select segment;
}
}
The issue seems to be the creation of multiple S1DataContext objects as Sirocco referred to.
I tried the using statement to force a Dispose and make it eligible for collection. However it resulted in an ObjectDisposedException that I can't make sense of.
The control goes from the line that sets the data context of the DockPanel KeeperOfAllControls.
[External Code] (shown in call stack)
Segment.Children.get (has a using block with dc)
Back at the Line in Step 1... the Linq query uses tblSegments which is retrieved from a local instance of S1DataContext
Anyways so i assume that there is something that prevents multiple DataContexts from being created and disposed. So I tried a Singleton DataContext.
And it works!
the TreeView control is significantly more responsive, every node I tried loads in 3-4 secs max.
I put in a GC.Collect (for verification) before every fetch/search and now the memory usage stays between 200,000-300,000K.
The OR generated System.Data.Linq.DataContext doesn't seem to go away unless it is disposed explicitly (eating memory). Trying to Dispose it in my case, didn't pan out.. even though both functions had their own using blocks (no shared instance of DataContext). Though I dislike Singletons, I'm making a small internal tool for devs and hence don't mind it as of now.. None of the LinqToSql samples I saw online.. had Dispose calls mandated.
So I guess the problem has been fixed. Thanks to all the people that acted as more eyeballs to make this bug shallow.

Resources