Optical character recoginition - artificial-intelligence

I've to write a program which is able to recognize patterns, specially characters. I've implemented back-propagation in c# and now I want to use it for the pattern recognition. I've also created a form application and used brush/graphics so that user can write something with the help of mouse (just like 'pencil tool' in MS Paint). So I need some helping material about "How to implement character recognition method in my application?".
Helping stuff over the internet mostly related to back-propagation and software demos.

If your project is something else but you want to have OCR in your project, you should search for third party tools that do this. But if your project is this and you want to do that yourself, read this answer:
There are two ways of recognizing characters. Online and offline.
Online way uses the pen (or mouse) input data. and offline way uses just the pixels.
Your first step will be choose from one of them. offline way hasn't the pen data, this is a useful feature. but in offline, you can recognize characters from image files (created by paint and saved or even scanned)
Second, you should preprocess data (this step is for only offline way). you should remove noises from it, scale it, and do the Thinning to it.
Next, you should extract useful features from the preprocessed data (online or offline). for this, you can read some articles about optical character recognition and feature extractions of it. there is a good powerpoint presentation about preprocessing and feature extraction here. Also pdf keyword and filetype:pdf at the end of your search term in google would help you!
Then you should use neural networks or something like that to recognize the character. inputs should be extracted features.
But remember, this project is not easy and may take some time! (This was my project for Persian language)

Related

Recreating KaTeX emulation in La/TeX?

I'm working on a site that is using KaTeX for rendering math. However, the interface for entering the math content is (really) not ideal, so it is actually faster for me to work in an editor, like Sublime Text 3 and import the work; however, an issue I run into is that when I import, I discover various functions / environments aren't supported (i.e. emulated) by KaTeX.
If it were just me working on the material, I would simply learn as I go and consult the KaTeX documentation page; however, I have several contractors working on digitizing content who do not have access to the site (and I don't have the ability to give them access), and so cannot learn by trial-and-error. Instead, I end up with piles of documents that all need to be manually adjusted, to render as desired with KaTeX.
As such, I wanted to assemble a preamble for a LaTeX document that would recreate the abilities (i.e. functions and environments) KaTeX can emulate, and was wondering if such a preamble / package already exists? I have tried a few quick searches, but because I'm looking for something that imitates an emulator, I'm finding it tricky to find the right choice of words to get relevant results.
I wasn't sure if this were best posted here or on the TeX.se - I suspect it falls in-between the two - so I apologize if my guess was wrong and I should've tried there first. Any suggestions would be very much appreciated, as this is creating a substantial bottle-neck in my workflow but is also just outside of my ability to solve on my own.
Supported functions is one thing. To tackle that you might actually stand a fair chance of just tokenizing the input, looking for backslash name sequences and checking them against a list extracted from KaTeX sources to see which are supported.
I guess one could even try to remove all other functions from LaTeX. Or rather hide them, such that the user input can't access them but third party libraries can. Getting rid of language features (as opposed to macros) such as \def would probably be even harder. Better askn on the TeX stack exchange for details of you really want to follow this route.
As an alternative I guess you might be able to perform the check I described above in TeX. Write a macro which reads the current file as plain text instead of TeX source, to perform this analysis. Or some such. But a separate stand-alone tool would be much easier.
If you are going for a separate tool, you might as well write it in JavaScript for Node, and have it run KaTeX on the input. That way you can at least tell whether it will get typeset to something or error out.
Whether the rendering is what you expect from LaTeX may be another question. In general KaTeX aims to reproduce LaTeX behaviour, so any difference might indicate a bug. But bugs exist, so all of this might not avoid the need for checks. How about you just processing the math part of the input with KaTeX to some HTML which authors can check without access to the site?
As for existing tools or macro packages, I know of none, but tool or library questions are off topic on stack exchange anyway.

Generate a series of documents based on SQL table

I am trying to formulate a proposal for an application that allows a user to print a batch of documents based on data stored in a SQL table. The SQL table indicates which documents are due and also contains all demographic information. This is outside of what I normally do and am trying to see if these is a platform/application that already exists to do such a task
For example
List of all documents: Document #1 - Document #10
Person 1 is due for document #: 1,5,7,8
Person 2 is due for document #: 2.6
Person 3 is due for document #: 7,8,10
etc
Ideally, what I would like is for the user to be able to push a button and get a printed stack of documents that have been customized for each user including basic demographic info like name, DOB, etc
Like i said at the top, I already have all of the needed information in a database, I am just trying to figure out the best approach to move that information onto a document
I have done some research and found some people have used mail merge in Word or using Access as a front end but I don't know if this is the best way. I've also found this document. Any advice would be greatly appreciated
If I understand your problem correctly, your problem is two-fold: Firstly, you need to find a way to generated documents based on data (mail-merge) and secondly, you might need to print them two.
For document generation you have two basic approaches: template-based and programmatically from scratch. I suppose that you will opt for a template based approach which basically means that you design (in MS Word) a template document (Word, RTF, ...) that acts as a template and contains placeholders and other tags that designate »dynamic« parts of the document. Then, at document generation time, you need a .NET library/processor that you will pass this template document and the data, where the processor will populate the template with the data and return the resulting document.
One way to achieve this functionality would be employing MS Words' native mail-merge, but you should know that this would involve using Office COM and Word Application Automation which should be avoided almost always.
Another option is to build such a system on top of Open XML SDK. This is velid option, but it will be a pretty demanding task and will most probably cost you much more than buying a commercial .NET library that does mail-merge out-of-the-box – been there, done that. But of course, the good side here is that you will be able to tailer the solution to your needs. If you go down this road I recoment that you use Content Controls for tagging documents/templates. The solution with CCs will be much easier to implement than the solution with bookmarks.
I'm not very familliar with the open source solutions and I'm not sury how many there are that can do mail-merge. One I know is FlexDoc (on CodePlex) but its problem is that uses a construct (XmlControl) for tagging that is depricated in Word 2010+.
Then there are commercial solutions. Again I don't know them in detail but I know that the majority of them are a general purpose document processing libraries. Our company has been using this document generation toolkit for some time now and I can say it covers all our »template-based document generation« needs. It doesn't require MS Word at doc generation time, and has really helpful add-in for MS word and you only need several lines of code to integrate it in your project. Templating is very powerful and you can set-up a template in a very short time. While templates are Word documents, you can generate PDF or XPS docs as well. XPS is useful because you can use .NET/WPF prining framework that works with XPS docs to print documents. This is a very high-end solution, but of course, the downside here is that it is not a free solution.

Mapping without Google Maps (on a stand-alone server)

I've been asked to create a stand-alone site/app that's not connected to the web (all on a local server).
One part of it is to have a map of a natural reserve with a bunch of links that will show footpaths, different animals habitat areas, visitor centres and such.
So there's a map (static picture) and when you click on it some overlay goes on top of it.
At least that's the way I see it now.
I've looked here: http://www.carto.net/williams/yosemite/ but it just looks mucho ugly.
Getting Maps Premium is not an option as it's not that cheap. And the reason they don't want to use Maps/Earth free API is because internet connection is still very slow there (sattelite internet only and when optic cable will be hooked up nobody knows).
Looking for some recommendations as to how to proceed there. Drawing paths/areas on the picture of the maps seems extremely insufficient and time consuming.
I'd need some way to use coordinates to automatically draw areas and lines over the map (and then somehow export that as a graphis file (or SVG) that'll be layered on top of original map simply using ajax.
Will ARCGIS pro edition be the way to go or should I start learning SVG. Do you know some good SVG books/tutorials (as related to mapping)? Maybe there's some other way around altogether...
They do have detailed maps of the area in ARCGIS (whatever format they are in I don't know yet).
Just looking for some ideas, any help will be appreciated. Thanks in advance.
Do you know GeoServer? More or less all-in-one, compatible with different types of datasets, widely customisable.
Starting from "raw" SVG and write the whole thing yourself will probably be prohibitively time consuming.
If you have very little data (say less than 50 geometries) that is fixed, you could also use OpenLayers without any backend server.
For the data you could use a OpenLayers.Layer.Image if your (overlay-) map consists of a small raster image. For vector data, you can use OpenLayers.Layer.Text or a OpenLayers.Layer.Vecor together with protocols OpenLayers.Layer.KML or .JSON.
You can click through the current release examples.
I admit that this is not an easy task for a beginner, but it's fun hacking the maps together.

Creating Reports in Silverlight (either as PDF or send it off to a printer)

I have recently attempted to generate reports in Silverlight 4. In my problem domain, these reports either need to go directly to the printer and/or the client-side SL application creates a PDF and allows the user to store it somewhere.
As for the report, it's roughly composed of 50% flow text (incl. enumerations), 30% tables and 20% charts. The flow text part makes it slighty more challenging, as proper line breaking would have to take place.
So far, I have tried the following approaches - each with its own shortcomings that make them not so much feasible:
Silverlight's own PrintDocument: technically, there are two major concerns. For one, getting page breaks to work and printing UIElements on it with proper layout is a bit of a dirty hackjob and full of compromises; thankfully that's the part I've managed to get working so far. However, the PrintDocument class always renders all visuals as bitmaps before sending them off; this is not so much fun, if one uses a PDF printer and hopes to still be able to search in / select text. David Poll's approach in "Silverlight and Beyond" [1] wasn't that helpful as well as it inherently follows the same approach and thus suffers from very similar issues.
silverPDF [2]: a barely documented library that requires to do most of the layout manually (the former approach at least allowed me to re-use Silverlight's layouting engine). So far, I see no way to (for instance) measure paragraphs and the only sample with long flowtext uses hardcoded absolute values for layout rectangles. Also, the developing party seems to be inactive.
Personally, I'm now thinking of following an entirely different strategy: simply generate HTML documents. But I was hoping that the community here might have hints for the two approaches above or know other good approaches.
Thanks in advance,
~Manny
Do you need to generate the report on the client, or can you get the server to generate it? Your options are better if you can generate it on the server. Personally, I think the way Silverlight printing works at the moment is pretty poor for report usage (sending each page to the printer as raster rather than vector, resulting in potentially huge amounts of data travelling through the network, and lower printing quality output). I've found the best strategy is to generate the PDF on the server (enabling you to take advantage of a reporting engine), and display it in your application. There are also a few commercial products (such as Telerik's Silverlight Report Viewer, Report Sharp Shooter, or even First Floor Software's Document Toolkit). If a client side solution is really required, perhaps one of these might be the best option (although the printing quality will still be poor). Note that Silverlight 5 is supposed to have support for vector printing, but it's another 6 months or more away from release. Yet another option is Pete Brown and David Poll's open source reporting framework here: http://silverlightreporting.codeplex.com/.
If you want to take the option of generating the report on the server as a PDF and displaying it in your application, I've written an article on doing so here: http://www.silverlightshow.net/items/Building-a-Silverlight-Line-Of-Business-Application-Part-6.aspx. This doesn't work for OOB applications, but the source code accompanying my book (Pro Business Applications with Silverlight 4) does: apress.com/book/view/9781430272076.
Hope this helps...
Chris Anderson

Convert pcl to image

I'm communicating with a logic analyzer (HP 1660A) over RS232. I issue a command which tells the analyzer to print screen its display and send it over to the controller (my pc) through serial communication. I'm saving the result (which is usually abut 25kB) to my computer and I would like to view it as a TIFF or other format. The problem is that the response from the analyzer comes in PCL format, therefore suitable to be sent to a printer and printed directly, but not to be opened as an image. I have tried a few PCL to image converters to do the job, I found one which does it properly, however I've used the trial version and I am reluctant to purchase it. I've given you the background of my labour. I would appreciate any kind of help, a reference to the commands in pcl 1 and what should I do in order to extract the data and format it properly from the PCL file. I have no experience with PCL and image processing whatsoever, so please, give me a hand here. Thank you.
P.S. I've obtained the PCL file from the analyzer, both in C# and matlab... I have one slight problem in C# with the serial port control, some images have some uninterpreted characters in the image, when using the above converters. I say all these because I need an algorithm or some indications, no matter the programming language, so please feel free to post.
PCL is complex to read. There are only a handful of tools out there that do a good job of this. We have lots of PCL expertise and still often look to other to supply conversion to PDF and other formats. If the PCL is quite simple, that is, just text, a few fonts, and a graphic or two, a couple of RegEx commands could deal with the extraction of the text and then you could mock up a new document using whatever tools you wish.
Looking at these files in stackoverflow might be tough. If you can get them on an ftp and post a link I can take a quick look and post my findings/thoughts here. The other option is to look to an outside tool. There are a few we've had success with. Our needs are broad so I've settled on one that works the best with many different PCL streams (some PCL coding is better than others). As you are dealing with a known quantity of PCL you may have a few options. Here are a few we've used and had some success with (in order of usefulness to us)
PCLWorks by PageTech (they have a GUI viewer and complete SDK)
VeryPDF PCL Converter (command line tool)
SwiftView
There are others, and even an opensource variant of Ghostscript that handles PCL (we've never had much luck as the PCL we use often contains very custom fonts, symbol sets, and tons of macros which seem to choke it.
GhostPCL
EDIT: Most recently we've been working with LincPDF (http://www.lincolnco.com/). This is also an excellent product with has one big benefit, deployment is simple. Some of the other tools have complex software installations. This solution is very easy for us to deploy as a feature in an application. It's also faster then any tools we've tested to date (at least with the PCL that we generate from our apps which is quite complex as they include specialized fonts and macros).
According to the spec sheet for the HP 1660 (pdf) series can send the TIFF,PCX and postscript.
Wouldn't it be easier to use TIFF?
The project was put on hold for a while, but I would like to offer a complete and usable solution.
#Adrian
You can save the image to a floppy disk, I've done that, saved it as TIFF and everything worked fine. Unfortunately, it sends only PCL through RS232. The idea to save the print screen over serial communication was to avoid using too much the floppy disk, which the device uses in order to boot.
#Douglas
Thank you for your elaborate answer. I'll take a look at the indicated tools, however, my desire is to offer a complete front-end solution, which yields directly the graphic. I've put some files from my tests here in order to see the complexity of the PCL constructions. Do you have any knowledge of a possible API that I could integrate into my application, which can parse the file and interpret the PCL?
Regards,
Cosmin
We capture the serial input via a serial spooler that watches COM1:. It's called SSpool.exe. It redirects the PCL as input to PCLXForm. PCLXForm converts it into any raster format (TIFF, JPG, PDF, BMP, etc.) However, we can also extract the text during the conversion and we can extract individual raster objects from the PCL for re-arrangement in the downstream application. Our pricing model is positioned for licensee's that need to convert up to 50,000 pages of invoices into indexed PDF's per month. However, this type of application normally requires a custom license in order to get our pricing down to the level required. In order to do so, we often have to restrict our product to convert unlimited files, but only up to the 20th page within any one PCL print file. That provides enough page volume and gives us the ability to reduce the pricing per unit. To demo, you would need the PCLTool SDK.

Resources