OCR batch processing tiff to text - batch-file

I have a problem where i need to batch convert 50,000 tiff's into 50,000 txt files respectively. I am aware of abbyy finereader and some other pieces of software that may be able to do this, but a free solution would be best. I have been researching tesseract as well. Is anyone aware of any script or program that uses tesseract to do this automatically with a good quality output??
Thanks in advance

For a free solution with Tesseract, here's a straightforward command line batch file. Change the variable contents and/or create folders as necessary:
:Start
#Echo off
Set _SourcePath=C:\tifs\*.tif
Set _OutputPath=C:\txts\
Set _Tesseract="C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
:Convert
For %%A in (%_SourcePath%) Do Echo Converting %%A...&%_Tesseract% %%A %_OutputPath%%%~nA
:End
Set "_SourcePath="
Set "_OutputPath="
Set "_Tesseract="

In my opinion, I think Tesseract is going to give you the best results, whether you're looking at free solutions or not.
If you figure out how to convert one file, and then you post back the command you use, it will be easy to hack a batch script together to process multiple files.

Take a look at VietOCR, a Java/.NET frontend for Tesseract; its function seems to suit your need.

Related

Error "Invalid path 0 File(s) copied" using xcopy in batch file

I want to start with a little disclaimer:
I read this thread on a similar issue to mine, but it seems like the solution doesn't work. It might just be me not understanding it completely but here I am asking for clarification.
My goal is to copy a shortcut to the start menu programs folder conserving all of its attributes, icon and start in value. I thought making a copy would be simple but it seems like my brain can't understand anything today.
So here's the actual xcopy argument:
#echo off
xcopy "%~dp0\file.lnk" "%userprofile%\Start Menu\Programs\file.lnk\" /p /v /f
pause
I have tried every combination of adding/removing the file name, with/without the \ at the end and any combination of both... I also tried running the batch file as administrator just in case.
The #echo off is just a habit and the pause is to allow me to read any error messages that could pop up. I also put the extra arguments into the xcopy line to try to get more information. It doesn't seem to help me a lot though.
I'm starting to think the issue is completely isolated from the other thread.
As suggested by SomethingDark, changing the path from %userprofile%\Start Menu\Programs\ to %AppData%\Microsoft\Windows\Start Menu\Programs\ fixed my issue.

How to edit and read specific lines of a text document with a Batch script

I am writing a Batch text adventure at the moment, and I am attempting to find a way to make a world save and also save the various bits of armor and the statistics they have onto a text document. I am at a loss for how to do this, and I was wondering if anyone could help.
By the way, I mean the actual Batch Scripting Language, on it's own. I do not mean things like PowerShell or VBScript, although, if this Batch turns out to not be worth it, I might switch to it for continued development.
Thanks!
Depending on how you have the script setup. If you are having the info set as variables like %armor% or w.e you can use set out. Here is an example,
#Echo Off
set /p armor=
Set "out=C:\users\*your login name*\Desktop"
> "%out%\YourFileName.txt"(
echo %armor%
)
Then whatever the armor variable is set it will be put in the text document. Not sure if this is sufficent as I am fairly new to messing with scripts myself but hope it helps!

Batch command double extension fix

Hi everybody i would like to ask how i can remove a mistakenly added second file extension using a batch script.
E.g. "test.aac.m4a" -> "test.m4a"
So the last extension is the right one which i want to have.
But this is ONLY the case for
.aac.m4a
-> .m4a
and
.m4a.aac
-> .aac
I know some batch scripting but
ren *.aac.m4a *.m4a
Won't work :(
Another thing worth mentioning would be that these double extensions come from my music software MusicBee.
I use mp4box on m4a files to extract the raw aac stream from the m4a container so i can edit it in other software.
Currently the syntax is:
mp4box.exe -raw 1 "<URL>" -out "<URL>".aac
The "<URL>" is the variable MusicBee will replace with the file url.
But this will add the .aac extension after the .m4a and i have no idea how to replace it instead. (and again when i repack the files ".aac -> .aac.m4a")
As far as i know MusicBee just replaces the variables and launches the batch code when activated so i think other batch code will work too.
Is it possible to prevent this double extension from even developing?
As always ANY help is apreciated!!
Thanks, Daniel
In preventing the double extension from developing I'm guessing you're not appending the file extension accordingly in the right way, I am however not familiar at all with Music Bee.
As for creating the right batch files that do what you want, I've used Advanced File Renamer in the past for all sorts of renaming patterns such as your case. It's freeware too! The program has a fairly advanced feature that allows users to write custom scripts in JavaScript user guide here. And can even generate batch scripts that do special renaming (as you've noted in the comments) for your specific use case.
For the other less advanced users the program has a GUI that makes it easy to do batch renames.
Best of all, if you're like me that avoids third party software installs just to do one thing as much as possible, the program has a portable mode that won't hook itself into your system, which is always good.
Read the manuals and guides there for more information. My answer might sound a little too much like advertising for it, but that's only because it's helped me so much in renaming my music a long time ago.
Here's a screenshot by the OP, RapidFireArts that shows how OP used the software to remove the second file extension.
It ought to be possible, but I have no idea how to work with your software to prevent the double extensions from occurring in the first place. But it is fairly easy to strip off the unwanted middle "extension".
If you know for a fact that none of your .m4a or .aac files are supposed to have multiple dots, then you could simply do the following:
ren *.m4a ???????????????????????????????????????????.m4a
ren *.aac ???????????????????????????????????????????.aac
Just make sure you have enough ? wildcards to match the longest name in your folder. See How does the Windows RENAME command interpret wildcards? for an explanation of why this solution works.
But sometimes file names legitimately have additional dots prior to the actual extension. If this is your case, then the following batch script will remove only the unwanted .m4a and .aac middle "extensions"
#echo off
for /f "eol=: delims=" %%A in ('dir /b /a-d *.m4a.aac *.aac.m4a') do (
for %%B in ("%%A") do ren "%%A" "%%~nB%%~xA"
)
Another option is to use my JREN.BAT regular expression file renaming utility. JREN.BAT is a hybrid JScript/batch script that runs natively on any Windows machine from XP onward. Ideally, the script should be placed within a folder that is included within your PATH. I like to use c:\utils for all of my non-standard utilities.
Once you have JREN.BAT, then all you would need would be
call jren "\.(m4a|aac)(?=\.(m4a|aac)$)" ""
Provided you understand regular expressions, and you take the time to read the built-in JREN help, then there are many wondrous things you can do with the utility. The help is accessed by issuing jren /? from the command line. You might want to use jren /?|more if you have not configured your console window to have a large buffer that enables scrolling to see prior output.
I use File Renamer Basic.
http://download.cnet.com/File-Renamer-Basic/3000-2248_4-10306538.html
It's Free

Batch incrementation from external file

I have a script who creates new tags in a SVN, and add some files. I want to automate this task so I would like to find some way to do automatically the incrementation for the tags name, from 1.0 to X.0.
I thought about a conf file who would contains "1.0" as a first version number and who would be overwrite at each call to the script. But not sure I can get the "1.0" value from the file and then do an incrementation on it in my script.
Any help would be really appreciate.
Thanks in advance
Don't create a seed configuration file. Instead, let the batch script default to 1.0 if file does not exist.
#echo off
setlocal
set "conf=version.conf"
if not exist "%conf%" (set version=1.0) else (
for /f "usebackq delims=." %%N in ("%conf%") do set /a version=%%N+1
)
set "version=%version%.0"
(echo %version%)>"%conf%"
I'm assuming you will never run this process multiple times in parallel - it can fail if you do run in parallel. Modifications can be made to lock the conf file so you can run in parallel if need be. See the accepted answer to how to check in command line if given file or directory is locked, that it is used by a process? for more info.
Take a look at keywords in Subversion using autoprops.
First, setup subversion to honor keyword expansion
enable-auto-props = yes
[auto-props]
version.txt = svn:keywords=Revision
Then, setup a simple file, let's call it version.txt with the $revision$ keyword and some random content.
$revision$
Random content
Then, in your batch file, recreate the version.txt file with new random content
echo $revision$ >version.txt
echo %random% %date% %time% >>version.txt
and check in this new file every time your batch file is run, so it will become
$revision 32 $
4214 Mon 21/01/2013 15:53:27,62
This way, subversion will keep an accurate version number of all the runs of the batch file, even in multiple clients and simultaneosly.
You might then extract and use the revision number from version.txt with code similar to
for /f "tokens=1,2" %%a in (version.txt) do (
if %%a==$revision (
echo Revision number is %%b
echo do something with %%b, create %%b tag or whatever
)
)
Since you don't say what language you want to use only general remarks can be given:
It certainly is possible to maintain a small 'version' file holding the 'dottet version number', something like 0.2.6 maybe. That files content can be read by any process. You should implement a little collection of methods to split that content into its numerical tokens (major and minor version and the like). Those numerical values can be processed by any mathematical function you like to use. For example you can increment them. Another method would be some 'implode' function that takes the numerical tokens and creates again a 'dottet version number' (now maybe 0.2.7...) and finally you can write that information back into the file. It certainly makes sense to allow an argument that controls which part of the version should be incremented.
Such scheme is not really efficient, but often sufficient.
Note, that such approach will only work if you can guarantee that it is always only a single process to access that version file. Otherwise multiple processes might overwrite each others results which certainly is a cause of problems.
As an alternative, maybe a more elegant alternative, you might consider treating the subversion repository itself as seed storage for your version number: instead of reading a special files content (what if that file is deleted or something else happens?) make a request to the tags folder inside subversion. It should contain all previously tagged versions. So that is precisely the information you want. Take all version numbers, sort them, take the highest one and process it as above.

Help with Batch Files?

What are batch files useful for? They just seem to be used to make viruses and other things...but it seems like shell scripting to me.
Whats the uses for batch files?
From Batch file article at Wikipedia:
Batch files are useful for running a
sequence of executables automatically
and are often used by system
administrators to automate tedious
processes. Unix-like operating
systems (such as Linux) have a similar
type of file called a shell script.
A simple example:
for /f "tokens=* delims=" %%i in ('dir /s /b /a:d *svn') do ( rd /s /q "%%i")
If you save the above line in a file called ClearSVNFolders.bat and after that execute a double click you'll delete every folder named svn that resides inside a root path...
You automated the whole process. You could easily spend hours doing the above task if you had a deep root directory, that is, one containing thousands of folders. :)
Batch files are the Windows equivalent of a Unix shell script. So you can use them to automate things.
You could use them for shell scripting. :-P
Of course, they kind of suck at that, compared to bash (or perl/python/tcl). But if you're on Windows, it's a one-horse race unless you want to install cygwin or msys and battle with Unix/Windows incompatibilities.
Batch Files are extremely useful. They are super easy to learn as well. you can make them do things on startup like say that a program wants to open itself and wont close even from taskman.exe you can force it to shutoff without warning.
or you could make games and ineractive things like i like to do.
i have a Messenger that i made with fully customizable colors and accounts with account management and servers.
But you probably dont trust me enough for you to download it.
But yea they are pretty useful.
Batch file is "a computer file containing a list of instructions to be carried out in turn."
We have been studying since childhood that computer is a dummy machine and this is a method of instructing a dummy machine.
For example :-
If you want to instruct the system to create a folder with random name then type ,
#echo off
md %random%
Creating Batch files enables you to execute several line of CMD commands in a single file.
For example :-
#echo off
md %random%
tasklist
Pause
The entire purpose of a Batch script is to execute several DOS commands in sequence:
echo Hello!
set var=7
echo I just made var=%var%!
pause
It was invented in MS-DOS for user simplicity to execute things they did all the time, the most notable thing being "AUTOEXEC.BAT" which started once the command interpreter started, people would add things like:
echo Welcome to my computer!
or
cd C:\Games\
To make it quicker to access their games or whatever they needed.

Resources