Apply batch OCR through command line - batch-file

I am totally new to batch scripting for cmd (Windows).
I have installed tesseract to work as a command line OCR tool.
Now I would like to run OCR on 100 images that I have stored in a folder.
How can I do it with batch ?
The command to run tesseract on an image and return the OCR text in a text file is:
"C:\OCR\tesseract" "C:\Image_to_OCR.jpg" "C:\out"
More information: http://chillyfacts.com/convert-image-to-text-using-cmd-prompt/
As you can see, I would probably need to make a for loop whith automatically iterates through the number of pictures and changes the name of the picture in the command accordingly and of course also the output name of the text file... but I don't know how to do it.
Any help would be very appreciated !
EDIT:
As suggested in the answer by Stephan, I could write:
for %%A in (C:\*.jpg) do "C:\OCR\tesseract.exe" "%%~fA" "C:\out"
However, the command line (cmd) only apears quickly and closes imidiatley and nothing happens.
My files are not directly in C:\ but in "C:\Users\James\Desktop\", therefore I wrote the command as such:
for %%A in (C:\Users\James\Desktop\*.jpg) do "C:\OCR\tesseract.exe" "%%~fA" "C:\out"
...but as said before, it does not work somehow.
Also, can I change the output txt name to be the same as the input image name, like so ?
for %%A in (C:\Users\James\Desktop\*.jpg) do "C:\OCR\tesseract.exe" "%%~fA" "%%~fA"
This worked :
I got two great answers! Thanks a lot. The final thing that worked was a mix between both answers:
#Echo off
PushD C:\Program Files (x86)\Tesseract-OCR || (Echo couldn't pushd C:\OCR & Exit /B 1)
for %%A in ("C:\Users\EPFL\Google Drive\EDx PDF Maker\Cellular Mechanisms of Brain Functions\Slides\1\*.jpg") do tesseract.exe "%%~fA" "%%~dpnxA"

I don't know your program C:\OCR\tesseract.exe but I assume it needs supporting tools/files present in the C:\OCR folder, so either you have to set that folder as the current one or have it contained in your path variable.
#Echo off
PushD "C:\OCR" || (Echo couldn't pushd C:\OCR & Exit /B 1)
for %%A in ("C:\Users\James\Desktop\*.jpg") do tesseract.exe "%%~fA" "%%~dpnA.txt"
The "%%~dpnA.txt" will save the text with same drive/path/filename and extension .txt

Use a for loop to iterate over the files:
for %%A in (C:\*.jpg) do "C:\OCR\tesseract.exe" "%%~fA" "C:\out"
%%A is the filenames (one at each run of the loop),
%%~fA is the fully qualified filename (just to be sure).
Read the output of for /? to learn more about those modifiers.
Note: this is batchfile syntax. To use it directly on command line, replace every %% with a single %

Related

Batch: Preserve (creation-)date on copying files to another (flatten) folder structure, incl. renaming files to avoid doublettes

this is my first question, so I apologize beforehand if I write not as you are used to...
fact:
I've a deep folder structure with tons of files (images,videos and so on) so I want to copy that files to a flat structure for a better overview.
I want to keep (at least) the original date attributes: creation-date and last-modified date.
Problem 1) there are files with same name 00001.jpg in different folders which I want to have in same folder, so I want to add creation date/time to filename on copy process.
00001.jpg becomes 2015-11-17_11-23-35_00001.jpg
So far so good. Or not good...
Copy and XCopy doesn't give me an option to do that, without loosing at least the creation date information (I didn`t find a solution with both).
Now I try to copy the files (file by file) with robocopy to new folder and use ren on the copied file to "pre-set" the date/time information before the filename.
Here is a simple test.bat example:
setlocal enableextensions enabledelayedexpansion
robocopy . ./zzz original.txt /copy:DATSO
pause
rem :: formatted creation date of original file will be here, in real life
set "myDate=2015-11-17-_11-23-35"
rem rename "./zzz/original.txt" "!myDate!_renamed.txt" ::doesnt work: why? relative path??
rem :: this will do what I want - original creation date is kept on copy file
FOR %%A IN (zzz/original.txt) DO REN "%%~fA" "!myDate!_%%~nxA"
[possibly] Problem2) Is there a better way to do this, or could I run into thread problems (asynchronous execution). Could it be, that I try to rename a file before the robocopy has finished the copy process (e.g. for large files)?
Sorry I'm a totally batch newbie (also as poster in SO ;).
ThanX in advance for each tip and also for critics on my solution approach. Maybe I have the horse-blinkers on my head and dont see the easy solution?!
[edit: formatting of post]
[edit: content of post -> date/time in front of filename for better sorting]
It is possible to use command DIR to get recursive listed all files in the specified folder and its subfolders with creation date/time.
The format of the date/time depends on Windows Region and Language settings.
Example output for F:\Temp\Test on my machine with English Windows 7 and region is configured to Germany on running the command line dir F:\Temp\Test\* /A-D /S /TC:
Volume in drive F is TEMP
Volume Serial Number is 1911-26A4
Directory of F:\Temp\Test
25.09.2017 17:26 465.950 SimpleFile.ccl
1 File(s) 465.950 bytes
Directory of F:\Temp\Test\Test Folder
25.09.2017 17:26 360.546 Test File.tmp
1 File(s) 360.546 bytes
Total Files Listed:
2 File(s) 826.496 bytes
0 Dir(s) 58.279.460.864 bytes free
This output is filtered with findstr /R /C:"^ Directory of" /C:"^[0123][0123456789]" to get only lines starting with  Directory of (note the space at begin) or with a number in range 00 to 39.
Directory of F:\Temp\Test
25.09.2017 17:26 465.950 SimpleFile.ccl
Directory of F:\Temp\Test\Test Folder
25.09.2017 17:26 360.546 Test File.tmp
And this filtered output is processed by FOR loop and the commands executed by FOR.
#echo off
for /F "tokens=1-2*" %%A in ('dir "F:\Temp\Test\*" /A-D /S /TC ^| %SystemRoot%\System32\findstr.exe /R /C:"^ Directory of" /C:"^[0123][0123456789]" 2^>nul') do (
if "%%A %%B" == "Directory of" (
set "FilePath=%%C"
) else (
set "CreationDate=%%A"
set "CreationTime=%%B"
for /F "tokens=1*" %%I in ("%%C") do set "FileName=%%J"
call :RenameFile
)
)
goto :EOF
:RenameFile
set "NewName=%CreationDate:~-4%-%CreationDate:~3,2%-%CreationDate:~0,2%_%CreationTime:~0,2%-%CreationTime:~3,2%_%FileName%"
ren "%FilePath%\%FileName%" "%NewName%"
goto :EOF
It would be advisable to put command echo before command ren in last but one line to first verify the expected new file names.
ren "F:\Temp\Test\SimpleFile.ccl" "2017-09-25_17-26_SimpleFile.ccl"
ren "F:\Temp\Test\Test Folder\Test File.tmp" "2017-09-25_17-26_Test File.tmp"
Note: The batch file must be in a folder not processed by this batch file as otherwise the batch file itself would be renamed while running which breaks the processing of the batch file.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
call /?
dir /?
echo /?
for /?
goto /?
if /?
ren /?
set /?
By the way: WinRAR can add files into a RAR archive with creation and last access time in addition to last modification time and extract the files to a different directory with restoring also creation and last access time using SetFileTime function of Windows kernel.
currently I use Locale independent date. I use tokens from that for currrent date/time.
for /f "tokens=2 delims==" %%I in ('wmic os get localdatetime /format:list') do set datetime=%%I
rem :: format it to YYYY-MM-DD_hh-mm-ss
set myDateTime=%datetime:~0,4%-%datetime:~4,2%-%datetime:~6,2%_%datetime:~8,2%-%datetime:~10,2%-%datetime:~12,2%
Thats not the problem.
To clarify:
The listing is also not the problem.
I loop throw all related files without a prob (except batch files and output dir and its sub tree ;).
My problem is:
I call robocopy for each file, then I rename the file to save the original creation date. My fear is that it makes problems (Multi-Threading?) for large files and for the number of calls (many thousend times)
Is batch execution really serialized? Is the process waiting for robocopy, that it has finished, before I try to rename file. Could I run into dead-locks for vry large files? I'll test it with some fake gigabyte-files.
Your suggestion, to use winrar sounds interesting.
If I could add all that files to a big archive (with structure) and at the end extract it to target dir... I'll try it ;)
If it doesn't work I will program it in java!
There I know what to do, thats my playground :D
I thought it would be easy to write a simple batch file, to do this for me, but it seems it's not as easy as I thought!
ThanX anyway

Getting the attributes of the last modified file in a directory written in a file

I am in the need of a batch script that checks a drive (D:) for the 'last modified' attribute of *.czi files (Carl Zeiss image files) and append the data to a file on another drive. I have tried solutions with the following line:
FOR /F "delims=" %%I IN ('DIR %source%*.czi /A:-D /O:-D /T:W /B') DO COPY "%%I" > %target%
that does give me the last file, but it copies the entire file which is not that smart since they can be big. As a biologist I will spare you for my desperate attempts that did not work (spent 4-5 hours). I figure this can be done dead easily, that is if you know how... Any good suggestions? Any reply will be appreciated! Thanks in advance.
Let's assume just the last modified file time from newest file is wanted from all *.czi files in directory C:\Temp containing for example:
30.01.2017 08:13 First Listed.czi
28.01.2017 21:26 Oldest Image.czi
03.02.2017 17:13 Newest Image.czi
The batch code for this task could be:
#echo off
set "Source=C:\Temp\"
set "Target=%Source%List.txt"
for /F "delims=" %%I in ('dir "%Source%*.czi" /A:-D /B /O:-D /T:W 2^>nul') do (
echo File "%%I" last modified on %%~tI>>"%Target%"
goto AfterLoop
)
:AfterLoop
The command DIR searches in for *.czi files in directory C:\Temp and outputs the list sorted by last modification time in reverse order from newest to oldest.
In case of no *.czi file could be found, command DIR would output an error message to handle STDERR. This output message is redirected with 2>nul to device NUL to suppress it whereby the redirection operator > must be escaped here with ^ to be interpreted as redirection operator on execution of DIR command line and not already on parsing FOR command line.
%%I references the name of the file as output by DIR and %%~tI references the last modification date of the file. The help of command FOR output by running in a command prompt window for /? explains those modifiers.
The loop is exited after first output of the text redirected to the target file on which the line is appended if it is already existing because of using >> instead of just >.
For the example files list the following line is appended to C:\Temp\List.txt:
File "Newest Image.czi" last modified on 03.02.2017 17:13
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
dir /?
echo /?
for /?
goto /?
set /?
See also the Microsoft article Using command redirection operators.
Your question is unclear, so let me try to rephrase it:
I think you want to find the Most Recently Modified file with a .CZI extension, and copy only that newest file to some target destination.
To list all .CZI files in all subdirectories, sorted by newest-file first:
(for /f %a in ('dir *.CZI /s /b') do #echo %~ta %~fa) | sort
If the first line of this output is the file that you want, then all you need to do is copy that one file to your target.
(and please, take the time to write detailed and clear questions so we can provide good answers)

Batch File to Move Files based on List of Strings

I have searched for days with results of similar circumstances but none that exactly addresses my problem.
Problem: I have 10,000 files in C:\Data folder. They all have a file name such as 1234_File_Log_Date_Time.csv. 1234 is a serial number. I have a list of multiple serial in a SN.txt file. I would like to have a batch file read SN.txt, then copy the files found in C:\Test Data based on this list to another directory of C:\My Data. There are no duplicate files to contend with.
I have never written a batch file in my life so be gentle haha.
I have never written a batch file in my life... Read Command-Line Reference or Windows Commands.
For an initial look, start with a simple batch script which could appear like
#ECHO OFF >NUL
SETLOCAL EnableExtensions
pushd "C:\Data"
for /f "delims=" %%G in (SN.txt) do (
echo "%%~G"
)
popd
pause
Then replace the echo "%%~G" line (step by step) with
if exist "%%~G_*.csv" dir /B "%%~G_*.csv" (to see file names that are to be copied);
if exist "%%~G_*.csv" echo copy /B "%%~G_*.csv" "C:\My Data\" (to see commands that are to be performed);
if exist "%%~G_*.csv" copy /B "%%~G_*.csv" "C:\My Data\" (final edit to execute the commands).
Additional resources (required reading for any batch scripter):
(command reference) An A-Z Index of the Windows CMD command line
(additional particularities) Windows CMD Shell Command Line Syntax
(%~G etc. special page) Command Line arguments (Parameters)
(special page) EnableDelayedExpansion

How to create empty files in a loop with batch?

I am having a problem on running a script to create empty files in a loop.
This is what have done so far:
#echo off
for /l %%a (1;1;20) do (echo m> ".mp4" c:\test)
pause
exit
Basically I have twenty names in a file on my desktop and I intend to create them as empty *.mp4 files in folder c:\test with the command echo m> .mp4. When I run the code above, it does not seem to work.
The following FOR loop can be used in the batch file to create empty files 1.mp4, 2.mp4, 3.mp4, ..., 20.mp4 in directory C:\test as suggested by rojo:
for /L %%I in (1,1,20) do type NUL >"C:\test\%%I.mp4"
And the next FOR loop can be used in the batch file to read the file names for the empty *.mp4 files to create from a list file on Windows desktop of current user as also suggested by rojo:
for /F "usebackq delims=" %%I in ("%USERPROFILE%\Desktop\List of Names.txt") do type NUL >"C:\test\%%I.mp4"
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
for /?
type /?
Further the Microsoft article Using command redirection operators should be read explaining the redirection operator > and the SS64 page about NUL (null device).

How to bulk rename files within subfolders - using CMD command prompt

I am quite new to Command Prompt (Windows), however I have used it to do some file extensions changing and it is extremely helpful. I know nothing on coding, and so i am simply going off what I have read about but I am looking for a command line that I cant seem to find anywhere. I have folder, and inside that folder are 70 sub folders each labelled by a number from 1-70. Inside these subfolders are roughly 20 png picture files, that are currently numbered by a number from 1-20 in png format. I am looking for a command line to rename each file from its original name to "folder name (page number).png"
For example, I have folder called '68' and inside that folder is 1.png, 2.png, 3.png. I want the command line to change that 1.png and 2.png etc... to 68 (1).png and 68 (2). png, noticing the space between the bracket and the folder name. Sorry if i have made it confusing but I would really appreciate it and I have got some very helpful and quick answers from StackOverflow before
Thankyou if you are able to help me, as i am completely hopeless on this matter.
Only run this once - launch it from the folder containing the 70 folders.
#echo off
for /f "delims=" %%a in ('dir /ad /b') do (
pushd "%%a"
for /f "delims=" %%b in ('dir /a-d /b') do ren "%%b" "%%a (%%~nb)%%~xb"
popd
)
I am not a very experienced bash scriptor, but I guess this should do the task for you. I am assuming that you are using Linux operating system. So open a text editor and copy the following:
#!/bin/bash
NumberOfFolders=70
for((a=1; a <= NumberOfFolders ; a++))
do
cd ./$a
b=1
while [ -f $b.png ]
do
mv "$b.png" "$a($b).png"
let b=b+1
done
cd ..
done
save this script where you have you folders 1-70 (and call it whatever.ssh). Then open a terminal and write down chmod a+x whatever.ssh and after that ./whatever.ssh. This should do the job as you presented in your question.
A slight modification of the approach #foxidrive suggested, so the script can be run from anywhere and won't have to parse dir output:
#echo off
for /r "C:\root\folder" %%d in (.) do (
pushd "%%~fd"
for %%f in (*) do ren "%%~ff" "%%~nd (%%~nf)%%~xf"
popd
)
Note that this will not work in Windows XP (not sure about Windows Vista), due to a bug with the expansion of * in the inner for loop that would cause the files to be renamed multiple times.

Resources