I have a simple WPF app that displays and prints some
reports with a FixedDocument.
How can generate PDF's from that, with a free and open solution,
such as iTextSharp?
A WPF FixedDocument, also known as an XPS document, is a definite improvement over PDF. It has many capabilities that PDF lacks. In most cases it is better to distribute your document as XPS rather than PDF, but sometimes it is necessary to convert from XPS to PDF, for example if you need to open the document on devices that have only PDF support. Unfortunately most free tools to convert from XPS to PDF, such as CutePDF and BullzipPDF, require installing a printer driver or are not open source.
A good open-source solution is to use the "gxps" tool that is part of GhostPDL. GhostPDL is part of the Ghostscript project and is open-source licensed under GPL2.
Download GhostPDL from http://ghostscript.com/releases/ghostpdl-8.71.tar.bz2 and compile it.
Copy the gxps.exe executable into your project as Content and call it from your code using Process.Start.
Your code might look like this:
string pdfPath = ... // Path to place PDF file
string xpsPath = Path.GetTempPath();
using(XpsDocument doc = new XpsDocument(xpsPath, FileAccess.Write))
XpsDocument.CreateXpsDocumentWriter(doc).Write(... content ...);
Process.Start("gxps.exe",
"-sDEVICE=pdfwrite -sOutputFile=" +
pdfPath +
"-dNOPAUSE " +
xpsPath).WaitForExit();
// Now the PDF file is found at pdfPath
A simple way, which is easy, but probably not the most efficient way is to render the Fixed document to an image and then embed the image in a PDF using iTextSharp.
I have done it this way before successfully. Initially I tried to convert the control primitives (shapes) to PDF equivalents, but this proved too hard.
If you can get it into an image from WPF then you can import it into iTextSharp like they do in this article. You can even avoid the filesystem all together if you write it to a MemoryStream and then use that instead of using a FileStream.
http://www.mikesdotnetting.com/Article/87/iTextSharp-Working-with-images
IF you want to do it programatically, your Best bet would be the following path XPS (Fixed Document) -> Print to PS -> Use Ghostscript to read the PS and convert to PDF.
If you dont care about reading the PDF back in the code, you can print to any one of the free PDF printers to which you can pass the destination path. This way your target PDF file will still be searchable if you have any test in your report.
Related
I'm trying to OCR resumes. My first problem is, before OCR, to get the main blocks of a document.
Since all the resumes have "visual blocks" (referring to professional experience, skills, languages, hobbies, whatever ...), I wonder if there's any open source solution to "split" into "blocks" a document, obviously no matter the layout design (that's where some kind of AI will work, I assume)
Thank you
First decompress your pdf using zlib.
you will then be able to see the pdf in a readable format - https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF#A_first_example
The pdf format is kind of similar to postscript.
also try converting your pdf to postscript to see how contents are arranged.
you can decompress the pdf using pdf-parser https://blog.didierstevens.com/2008/10/30/pdf-parserpy/
try this as well - https://gist.github.com/averagesecurityguy/ba8d9ed3c59c1deffbd1390dafa5a3c2
Once you can see how your data is presented => you can then start applying alogorithms to extract more meaning.
I am trying to convert a pdf file to a Bitmap by first getting the Bytes of the PDF file and then converting it into a MemoryStream to Be converted into the Bitmap.
This worked successfully when converting images but not working with a pdf.
Dim bytes As Byte() = System.IO.File.ReadAllBytes("C:\Users\s.ferry\Downloads\test2.pdf")
Dim myimage As Image
Dim msPdf As System.IO.MemoryStream = New System.IO.MemoryStream(bytes)
myimage = System.Drawing.Image.FromStream(msPdf)
upBmp = myimage
Above is the code snippet I am using to try and accomplish this. I am getting an error on the last line to say the parameter msPdf is not validSee Here
I was hoping to accomplish this without having to introduce a 3rd party source but don't think I will have a choice.
Any help is appreciated
I Don't think this is possible without a third party Library.
A useful and free tool for this is Spire.PDF
More Info Here
It's understandable that you might think you could create an instance of the System.Drawing.Image class from a PDF directly. In many cases a PDF document consists of a single-page that is no more than a scanned image so some users have the perspective that it's nothing more than an image format.
But most PDFs are much more complicated than this. Online you will find a plethora of PDF software, however Rasterization of a PDF page to an image is a very complicated task that many vendors don't always do correctly.
I am using aspose.total package for pdf generation. The reports are generated fine for english language some characters in chinese language are not coming fine. I have found that instead of showing characters its showing square like image. Does any one know how to deal with this ?
Thanks,
Please note that Aspose.Words requires TrueType fonts when rendering documents to fixed-page formats (JPEG, PNG, PDF or XPS). You need to install fonts that are used in your document on the machine where you're converting documents to Pdf. Please refer to the following articles:
How Aspose.Words Uses True Type Fonts
I work with Aspose as Developer evangelist.
You can set the font as follow:
TextFragment textFragment = new TextFragment("text");
textFragment.TextState.Font = FontRepository.OpenFont("C:\\WINDOWS\\FONTS\\SIMHEI.TTF");
I'm including a number of images as "Content" in my deployed XAP for Mango.
I'd like to enumerate these at runtime - is there any way to do this?
I've tried enumerating resources like:
foreach (string key in Application.Current.Resources.Keys)
{
Debug.WriteLine("Resource:" + key);
}
But the images aren't included in the list. I've also tried using embedded resources instead - but that didn't help. I can read the streams using Application.GetResourceStream(uri) but obviously I need to know the names in order to do this.
This is no API baked in to WP7 that allows you to enumerate the contents of the Xap. You need to know the name of the content items before you can retreive them.
There probably is some code floating around somewhere that is able to sniff out the Zip catalog in the XAP however I would strongly recommend that you don't bother. Instead include some sensible resource such as an Xml file or ResourceDictionary that lists them.
Having found no practical way to read the Content files from a XAP I build such a list at design time using T4.
See an example at https://github.com/mrlacey/phonegap-wp7/blob/master/WP7Gap/WP7Gap/MainPage.xaml.cs
This seems the right way to go as:
a) I'd rather build the list once at design time rather than on every phone which needs the code.
and
b) I shouldn't ever be building the XAP without being certain about what files I'm including anyway.
Plus it's a manual step to set the build action on all such files so adding a manual step to "Run Custom Tool" once for each build isn't an issue for me.
There is no way to enumerate the files set as "Content".
However, there is a way to enumerate files at runtime, if you set your files as "Embedded Resource".
Here is how you can do this:
Set the Build Action of your images as "Embedded Resource".
Use Assembly.GetCallingAssembly().GetManifestResourceNames() to
enumerate the resources names
Use
Assembly.GetCallingAssembly().GetManifestResourceStream(resName)
to get the file streams.
Here is the code:
public void Test()
{
foreach (String resName in GetResourcesNames())
{
Stream s = GetStreamFromEmbeddedResource(resName);
}
}
string[] GetResourcesNames()
{
return Assembly.GetCallingAssembly().GetManifestResourceNames();
}
Stream GetStreamFromEmbeddedResource(string resName)
{
return Assembly.GetCallingAssembly().GetManifestResourceStream(resName);
}
EDIT : As quetzalcoatl noted, the drawback of this solution is that images are embedded in the DLL, so if you a high volume of images, the app load time might take a hit.
I have a PDF file where every page is a (LZW) TIFF file. I know this because I created it. I want to be able to load it and save it as a bunch of TIFF files.
I can open the PDF file with CGPDFDocumentCreateWithURL, and get a page. I can even draw the page onto the screen.
What I WANT to do is draw the page into a bitmapContext, so that I can use CGBitmapContextCreateImage to get the image into a CGImageRef. However, in order to create a bitmap context, I need to know the size and resolution of the image. I can't seem to find out how to get either a CGPDFDocument or a CGPDFPage to tell me the resolution of the image object on that page.
Is there an easier way to do this that I'm not realizing?
thanks.
Ghostscript will work for you here :
gs -sDEVICE=tiff32nc -sOutputFile=foo-Page%d.tif foo.pdf
For 2 page document foo.pdf you should get :
foo-Page1.tif
foo-Page2.tif
From memory I think the output resolution from GS is that of the containing Page, not necessarily the resolution of the embedded file (unless these are the same to begin with).
If this is the case and you want to recover the image as it was originally res-wise, you can use iText (java) or iTextSharp(.net) to get to the image content stream (ie. Bytes) and write them out to disk in the format of your choice, after converting the content stream into a PdfImage iirc.
Hope the ghostscript option is applicable to save writing yet another utility...