How to scrape data from an obscure Windows 7 program? - database

I have the task of scraping the data from a piece of software written especially for a small charity. I have attached a screenshot (with identifying information blurred out) below:
The data is in a table format, with too much information to contain in a single screen capture. What options do I have to 'scrape' the data? Ideally the method would preserve the table format.

For those who may find it useful, I found several simple and relatively cheap data scraping tools. I went with 'Screen Scraping From Windows Applications Software' from Sobolsoft, a Windows 7 compatible tool that exports data directly as an Excel file.

Related

Print PDF programmatically - C# WinForms

I need to print a SSRS report in PDF format from a WinForms application written in C#. The report is a PDF document (containing text, images & tables), in a byte array - and I don't want to save it to disk for security/performance reasons. The requirements for printing are that it needs to be done:
- in the fastest way possible
- with no user interaction
- without the need to install anything on the client machine (we can't rely on any Adobe products being installed)
- third-party libraries can be used, as long as they can be installed together with the application
I came to 2 potential solutions:
1. using MigraDoc - but I can't find a way to load and print an existing file, only a newly created PDF file, or one already saved to disk
2. sending the PDF directly to the printer, using "PDF Direct Print"/PCL/etc. This seems to be the fastest option, but I haven't implemented it yet, and it seems to not be supported by all printers.
Does anybody have any suggestions on how to implement the options above, or any other options which meet the requirements?
MigraDoc cannot print PDF files, so one of your potential solutions is void.

Display data from excel in a web page

I am a complete newbie when it comes to web page design, and what I am trying to achieve is a web page that I can display on a wall mounted screen as an office dashboard. I have data in excel that is constantly being updated (on the server) and I want to be able summarise this and display it (e.g. total orders etc.) for staff to see. Therefore the web page needs to be able to connect to the data source, and update itself every few minutes. I am hoping to then use Ubuntu or even Raspberry Pi to drive the dashboards.
Can anybody point me towards either some clear instruction on how to achieve this, or better still some sample files that will help me see how its done?
Really appreciate any help!!
If you want to use PHP, you can use PHPExcel, to read your Excel files, and if you want to only display the information as is, you may output it to HTML directly, there is an API to manipulate the Excel file, so you may use that to summarize your data, however, if you need something cleaner, you may want to use Windows instead of Linux, given that in Windows, you can use an Excel file as a data source, and there are third party products that can use them as if it were a database (using SQL queries to retrieve data).

SSRS 2008 R2 - Excel output not formatting to page size

I have a batch of reports that are set up to print very nicely in landscape on A4 page. But when I set the default format to Excel, the resulting spreadsheet, when printed without changing anything in the print setup, is wider than an A4 page so of course it gets broken up over mulitple pages (i.e: each page is 2 pages wide rather than 1)
Most of our users just want to print these as soon as they arrive via email (but they still want Excel format so they can re-sort, cut and paste, etc) so how can I make Excel keep the print format defined in the report in SSRS so the users don't have to mess about with print settings? (These are daily reports so this is driving our users mad as some of them may get 4 or 5 reports!)
Do I have to use an Excel template (can this even be done?) or is there a way to acheieve what I want via SSRS?
TIA for any help....
Mike
The short answer is that you can't exactly do what you want with the Excel renderer. Some workarounds that come to mind:
Filling an Excel template with data might be an option, but is more of a job for SSIS, not reporting services.
Send the report in PDF for printing, and if needed in Excel as well.
Re-layout the report so it plays well with the default printing of Excel. This won't be very pretty, you'd need to either make columns much smaller (and perhaps rotate headers using the WritingMode property) or turn columns into row groups somehow.
(hack warning!) create an Excel macro or something alike for your users, that does some printing-quick-fixes.
Some background
Unfortunately SSRS gives you only a small bit of control over how the report is rendered in the various rendering extensions. There's this MSDN page on rendering extensions (additional emphasis mine) with some useful info:
Soft page-break renderers: Soft page-break renderers maintain the report layout and formatting. The resulting file is optimized for screen-based viewing and delivery, such as on a Web page. The available soft page-break renderers are: Microsoft Excel, Microsoft Word, Web archive (MHTML), and HTML.
Hard page-break renderers: Hard page-break renderers maintain the report layout and formatting. The resulting file is optimized for a consistent printing experience, or to view the report online in a book format. The available hard page-break renderers are supported: TIFF and PDF.
So, if you want to optimize for printing experience, you should probably use the PDF export. You can then play around with the page size and margins to fit as much info as possible on a page, and let the client program (probably Adobe Reader) worry about printing it nicely.

DB objects relations visualization

I'm not guru in DBA, so I'll try explain what I want in terms I imagine it.
I have Oracle DB with network devices. each device has ports which has parent device/port
I want some tool which will automaticaly create visual map of this device relations.
Will create "Network Map" based on this relations.
It's would be better if this tool will have some output ready for web publishing, or web based tool from the begging. Also if it will automatically update "picture" as soon as I add new relation/object
From far it looks something like Gource http://youtu.be/E5xPMW5fg48
But not exactly what i need
Hope to get some suggestion.
Thanks in advance!
UPD: found another tool: Gephi
You could try graphviz. It was created specifically for visualising large graphs of network nodes.
It's not out of the box; you'll have to write some code that:
Reads data on the devices & their relationships
Creates the graphviz input file
generates the diagram by calling the graphviz binary.
There are many ways to do that. One of the easiest is to use python with the pydot library.
Note that graphviz generates static images (jpeg / tiff etc.) so you'd have to regenerate on demand.
There are more interactive toolkits available, e.g. protovis / infovis. Both are javascript based and render directly in the browser.
hth.

Convert pcl to image

I'm communicating with a logic analyzer (HP 1660A) over RS232. I issue a command which tells the analyzer to print screen its display and send it over to the controller (my pc) through serial communication. I'm saving the result (which is usually abut 25kB) to my computer and I would like to view it as a TIFF or other format. The problem is that the response from the analyzer comes in PCL format, therefore suitable to be sent to a printer and printed directly, but not to be opened as an image. I have tried a few PCL to image converters to do the job, I found one which does it properly, however I've used the trial version and I am reluctant to purchase it. I've given you the background of my labour. I would appreciate any kind of help, a reference to the commands in pcl 1 and what should I do in order to extract the data and format it properly from the PCL file. I have no experience with PCL and image processing whatsoever, so please, give me a hand here. Thank you.
P.S. I've obtained the PCL file from the analyzer, both in C# and matlab... I have one slight problem in C# with the serial port control, some images have some uninterpreted characters in the image, when using the above converters. I say all these because I need an algorithm or some indications, no matter the programming language, so please feel free to post.
PCL is complex to read. There are only a handful of tools out there that do a good job of this. We have lots of PCL expertise and still often look to other to supply conversion to PDF and other formats. If the PCL is quite simple, that is, just text, a few fonts, and a graphic or two, a couple of RegEx commands could deal with the extraction of the text and then you could mock up a new document using whatever tools you wish.
Looking at these files in stackoverflow might be tough. If you can get them on an ftp and post a link I can take a quick look and post my findings/thoughts here. The other option is to look to an outside tool. There are a few we've had success with. Our needs are broad so I've settled on one that works the best with many different PCL streams (some PCL coding is better than others). As you are dealing with a known quantity of PCL you may have a few options. Here are a few we've used and had some success with (in order of usefulness to us)
PCLWorks by PageTech (they have a GUI viewer and complete SDK)
VeryPDF PCL Converter (command line tool)
SwiftView
There are others, and even an opensource variant of Ghostscript that handles PCL (we've never had much luck as the PCL we use often contains very custom fonts, symbol sets, and tons of macros which seem to choke it.
GhostPCL
EDIT: Most recently we've been working with LincPDF (http://www.lincolnco.com/). This is also an excellent product with has one big benefit, deployment is simple. Some of the other tools have complex software installations. This solution is very easy for us to deploy as a feature in an application. It's also faster then any tools we've tested to date (at least with the PCL that we generate from our apps which is quite complex as they include specialized fonts and macros).
According to the spec sheet for the HP 1660 (pdf) series can send the TIFF,PCX and postscript.
Wouldn't it be easier to use TIFF?
The project was put on hold for a while, but I would like to offer a complete and usable solution.
#Adrian
You can save the image to a floppy disk, I've done that, saved it as TIFF and everything worked fine. Unfortunately, it sends only PCL through RS232. The idea to save the print screen over serial communication was to avoid using too much the floppy disk, which the device uses in order to boot.
#Douglas
Thank you for your elaborate answer. I'll take a look at the indicated tools, however, my desire is to offer a complete front-end solution, which yields directly the graphic. I've put some files from my tests here in order to see the complexity of the PCL constructions. Do you have any knowledge of a possible API that I could integrate into my application, which can parse the file and interpret the PCL?
Regards,
Cosmin
We capture the serial input via a serial spooler that watches COM1:. It's called SSpool.exe. It redirects the PCL as input to PCLXForm. PCLXForm converts it into any raster format (TIFF, JPG, PDF, BMP, etc.) However, we can also extract the text during the conversion and we can extract individual raster objects from the PCL for re-arrangement in the downstream application. Our pricing model is positioned for licensee's that need to convert up to 50,000 pages of invoices into indexed PDF's per month. However, this type of application normally requires a custom license in order to get our pricing down to the level required. In order to do so, we often have to restrict our product to convert unlimited files, but only up to the 20th page within any one PCL print file. That provides enough page volume and gives us the ability to reduce the pricing per unit. To demo, you would need the PCLTool SDK.

Resources