Batch file: count duplicate ids and write them in column of csv - loops

I am currently trying to automate the preprocessing process on a csv file via a batch file. I have the following table:
id;street;name;nrOfIds
4014001;T1;example1;0
4014002;B2;example2;0
4014003;B3;example3;0
4014004;L1;example4;0
4015001;M3;example5;0
4015002;B9;example6;0
4016001;T4;example7;0
4016002;L2;example8;0
4016003;L1;example9;0
The first row "id" holds the id of the entry which is made unique by the last 3 digts (for example 001, 002, 003, ...). The digits before the last three digits are not unique. As you can see in the result table, I want to count how often the first part of the ID (so the part before the last three digits) exists in the table and I want to write the sum into the third column named "nrOfIds". The result table then should look like this:
id;street;name;nrOfIds
4014001;T1;example1;4
4014002;B2;example2;4
4014003;B3;example3;4
4014004;L1;example4;4
4015001;M3;example5;2
4015002;B9;example6;2
4016001;T4;example7;3
4016002;L2;example8;3
4016003;L1;example9;3
For example, the part before the last three digits of the first line (4014) exists exactly 4 times in the whole table, so I write 4 in the "nrOfIds" column and so on.
The code used for this looks like this:
#echo off
setlocal enabledelayedexpansion
for /F "tokens=1-3* delims=;" %%a in (%PREPROCESSING_INPUT_PATH%%INPUT_FILENAME%) do (
(echo %%a;%%b;%%c)> "%PREPROCESSING_INPUT_PATH%%OUTPUT_FILENAME%" & goto :file
)
:file
(for /F "skip=1 tokens=1-3* delims=;" %%a in (%PREPROCESSING_INPUT_PATH%%INPUT_FILENAME%) do (
REM count ids (like 4014, 4015, ... and write sum into "nrOfIds" column
)
) >> %PREPROCESSING_OUTPUT_PATH%%OUTPUT_FILENAME%
pause
Any suggestions on how to do this? Thank you very much in advance! Your help is greatly appreciated.

Pretty similar to the previous answer I posted, here we just use find /C to identify the number of occurrences of the last 3 digits of the ID:
#echo off
setlocal enabledelayedexpansion
set "infile=z:\folder31\testcsv.csv"
set "outfile=%PREPROCESSING_OUTPUT_PATH%testOutput.csv"
for /f "usebackq delims=" %%a in ("%infile%") do (
(echo %%a)>"%outfile%" & goto :file
)
:file
(for /f "skip=1 usebackq tokens=1-4*delims=;" %%a in ("%infile%") do (
set "match=%%a"
for /f %%i in ('findstr /B "!match:~0,-3!" "%infile%" ^| find /C "!match:~0,-3!"') do (
set /a _cnt=%%i
echo %%a;%%b;%%c;!_cnt!
)
)
)>>"%outfile%"
Debug version:
#echo off
setlocal enabledelayedexpansion
set "infile=%PREPROCESSING_INPUT_PATH%%INPUT_FILENAME%"
set "outfile=%PREPROCESSING_OUTPUT_PATH%%OUTPUT_FILENAME%"
for /f "usebackq delims=" %%a in ("%infile%") do (
(echo %%a) & goto :file
)
:file
(for /f "skip=1 usebackq tokens=1-4*delims=;" %%a in ("%infile%") do (
set "match=%%a"
for /f %%i in ('findstr /B "!match:~0,-3!" "%infile%" ^|find /C "!match:~0,-3!"') do (
set /a _cnt=%%i
echo %%a;%%b;%%c;!_cnt!
)
)
)
pause

This method is simple and run fast:
#echo off
setlocal enabledelayedexpansion
rem Count ids
for /F "skip=1 delims=;" %%a in (input.txt) do (
set "id=%%a"
set /A "count[!id:~0,-3!]+=1"
)
rem Update the file
set "header="
(for /F "tokens=1-4 delims=;" %%a in (input.txt) do (
if not defined header (
echo %%a;%%b;%%c;%%d
set "header=1"
) else (
set "id=%%a"
for /F %%i in ("!id:~0,-3!") do echo %%a;%%b;%%c;!count[%%~i]!
)
)) > output.txt
A method based on external commands, like findstr or find, is slower...

Related

Calling a function for every attribute I want to read from multiple .txt files and write to .csv file

I tried to call a function for every attribute (column) that I want to read from 4 .txt files and then write into a .csv file. One column has flawed output and the code should have a few logic flaws as I haven't learned batch cleanly from scratch. Do you know a fix?
Link to previous solved question: Read information from multiple .txt files and sort it into .csv file
#Magoo
echo Name;Prename;Sign;Roomnumber;Phonenumber > sorted.csv
for /f "tokens=1,2 delims= " %%a in (TestEmployees.txt) do (
call :findSign %%a %%b
)
:findSign
set prename=%1
set name=%2
for /f "tokens=1,2 delims= " %%a in (TestSign.txt) do (
if "%name%"=="%%a" (
call :findRoomNumber
)
)
:End
:findRoomNumber
set sign=%1
for /f "tokens=1,2 delims=|" %%q in (TestRoomNumber.txt) do (
if "%sign%"=="%%q" (
call :findPhoneNumber
)
)
:End
:findPhoneNumber
for /f "tokens=1,2 delims=;" %%u in (TestPhoneNumber.txt) do (
if "%%b"=="%%u" (
echo %name%;%prename%;%%b;%%r;%%v >> sorted.csv
)
)
:End
This is the way I would do it:
#echo off
setlocal EnableDelayedExpansion
rem Load PhoneNumber array
for /F "tokens=1,2 delims=;" %%a in (PhoneNumber.txt) do set "phone[%%a]=%%b"
rem Load RoomNumber array
for /F "tokens=1,2 delims=|" %%a in (RoomNumber.txt) do set "room[%%a]=%%b"
rem Load Sign array
for /F "tokens=1,2" %%a in (Sign.txt) do set "sign[%%a]=%%b"
rem Process Employees file and generate output
> sorted.csv (
echo Name;Prename;Sign;RoomNumber;PhoneNumber
for /F "tokens=1,2" %%a in (Employees.txt) do for %%s in (!sign[%%b]!) do (
echo %%b;%%a;%%s;!room[%%s]!;!phone[%%s]!
)
)
#ECHO OFF
SETLOCAL
rem The following settings for the directories and filenames are names
rem that I use for testing and deliberately includes spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
SET "destdir=u:\your results"
SET "filename1=%sourcedir%\q74258020_TestEmployees.txt"
SET "filename2=%sourcedir%\q74258020_TestSign.txt"
SET "filename3=%sourcedir%\q74258020_TestRoomNumber.txt"
SET "filename4=%sourcedir%\q74258020_TestPhoneNumber.txt"
SET "outfile=%destdir%\outfile.txt"
>"%outfile%" (
echo Surname;Name;Sign;Roomnumber;Phonenumber
for /f "usebackqtokens=1,2 delims= " %%g in ("%filename1%") do (
call :findSign %%g %%h
)
)
GOTO :eof
:findSign
for /f "usebackqtokens=1,2 delims= " %%b in ("%filename2%") do (
if "%2"=="%%b" (
for /f "usebackqtokens=1,2 delims=|" %%q in ("%filename3%") do (
if "%%c"=="%%q" (
for /f "usebackqtokens=1,2 delims=;" %%u in ("%filename4%") do (
if "%%c"=="%%u" (
echo %2;%1;%%c;%%r;%%v
)
)
)
)
)
)
GOTO :EOF
Always verify against a test directory before applying to real data.
Note that if the filename does not contain separators like spaces, then both usebackq and the quotes around %filename1% can be omitted.
Why should I not upload images of code/data/errors? Copy and paste the code as text
For similar reasons, please post all relevant data into a question to obviate every potential respondent having to switch back and forth to the data to which you have linked.

Windows Batch Scripting: Checking file for multiple strings

I have a batch file that processes scanned PDFs using ghostscript. One of the user prompts is for the resolution of the desired output. I wrote a crude autodetect routine like this:
for /f "delims=" %%a in ('findstr /C:"/Height 1650" %1') do set resdect=150
for /f "delims=" %%a in ('findstr /C:"/Height 3300" %1') do set resdect=300
for /f "delims=" %%a in ('findstr /C:"/Height 6600" %1') do set resdect=600
echo %resdect% DPI detected.
%1 is the filename passed to the batch script.
This should return the the highest resolution detected of some common sizes we see. My question to the community is: Is there a faster or more efficient way to do this other than search the file multiple times?
Assuming that the value of RESDECT is the /Height value divided by 11, and that no line contains more than one /Height token, the following code might work for you:
#echo off
for /F delims^=^ eol^= %%A in ('findstr /R /I /C:"/Height *[0-9][0-9]*" "%~1"') do (
set "LINE=%%A"
setlocal EnableDelayedExpansion
set "RESDECT=!LINE:*/Height =!"
set /A "RESDECT/=11"
echo/!RESDECT!
endlocal
)
If you only want to match the dedicated /Height values 1650, 3300, 6600, you could use this:
#echo off
for /F delims^=^ eol^= %%A in ('findstr /I /C:"/Height 1650" /C:"/Height 3300" /C:"/Height 6600" "%~1"') do (
set "LINE=%%A"
setlocal EnableDelayedExpansion
set "RESDECT=!LINE:*/Height =!"
set /A "RESDECT/=11"
echo/!RESDECT!
endlocal
)
To gather the greatest /Height value appearing in the file, you can use this script, respecting the aforementioned assumptions:
#echo off
set "RESDECT=0"
for /F delims^=^ eol^= %%A in ('findstr /R /I /C:"/Height *[0-9][0-9]*" "%~1"') do (
set "LINE=%%A"
setlocal EnableDelayedExpansion
set "HEIGHT=!LINE:*/Height =!"
for /F %%B in ('set /A HEIGHT/11') do (
if %%B gtr !RESDECT! (endlocal & set "RESDECT=%%B") else endlocal
)
)
echo %RESDECT%
Of course you can again exchange the findstr command line like above.
Here is another approach to get the greatest /Height value, using (pseudo-)arrays, which might be faster than the above method, because there are no extra cmd instances created in the loop:
#echo off
setlocal
set "RESDECT=0"
for /F delims^=^ eol^= %%A in ('findstr /R /I /C:"/Height *[0-9][0-9]*" "%~1"') do (
set "LINE=%%A"
setlocal EnableDelayedExpansion
set "HEIGHT=!LINE:*/Height =!"
set /A "HEIGHT+=0, RES=HEIGHT/11" & set "HEIGHT=0000000000!HEIGHT!"
for /F %%B in ("$RESOLUTIONS[!HEIGHT:~-10!]=!RES!") do endlocal & set "%%B"
)
for /F "tokens=2 delims==" %%B in ('set $RESOLUTIONS[') do set "RESDECT=%%B"
echo %RESDECT%
endlocal
At first all heights and related resolutions are collected in an array called $RESOLUTIONS[], where the /Height values are used as indexes and the resolutions are the values. The heights become left-zero-padded to a fixed number of digits, so set $RESOLUTIONS[ return them in ascending order. The second for /F loop returns the last arrays element whose value is the greatest resolution.
I do have to admit that this was inspired by Aacini's nice answer.
get the corresponding line to a variable and work with that instead of the whole file. Instead of your three for loops, you can use just one, when you change the logic a bit:
#echo off
setlocal enabledelayedexpansion
for /f "delims=" %%a in ('findstr /C:"/Height " %1') do (
set "line=%%a"
set "line=!line:*/Height =!"
for /f "delims=/ " %%b in ("!line!") do set "hval=!hval! %%b"
)
for %%a in (1650,3300,6600) do #(
echo " %hval% " | find " %%a " >nul && set /a resdect=%%a/11
)
echo %resdect% DPI detected.
A solution with jrepl.bat could look something like:
for /f %a in ('type t.txt^|find "/Height "^|jrepl ".*/Height ([0-9]{4}).*" "$1"^|sort') do set /a dpi==%a / 11
(given, all valid Heights have 4 digits)
Note: for use in batchfiles, use %%a instead of %a
I barely scratched the surface of jrepl - I'm quite sure, there is a much more elegant (and probably faster) solution.
You may directly convert the Height value into the highest resolution in a single operation using an array. However, to do that we need to know the format of the line that contain the Height value. In the code below I assumed that the format of such a line is /Height xxxx, that is, that the height is the second token in the line. If this is not true, just adjust the "tokens=2" value in the for /F command.
EDIT: Code modified as requested in comments
In this modified code the Height value may appear anywhere in the line.
#echo off
setlocal EnableDelayedExpansion
rem Initialize "resDect" array
for %%a in ("1650=150" "3300=300" "6600=600") do (
for /F "tokens=1,2 delims==" %%b in (%%a) do (
set "resDect[%%b]=%%c"
)
)
set "highResDect=0"
for /F "delims=" %%a in ('findstr "/Height" %1') do (
set "line=%%a"
set "line=!line:*/Height =!"
for /F %%b in ("!line!") do set /A "thisRectDect=resDect[%%b]"
if !thisRectDect! gtr !highResDect! set "highResDect=!thisRectDect!"
)
echo %highResDect% DPI detected.
For the record, the final code was:
setlocal enabledelayedexpansion
set resdetc=0
for /f "delims=" %%a in ('findstr /C:"/Height " %1') do (
set "line=%%a"
set "line=!line:*/Height =!"
for /f "delims=/ " %%b in ("!line!") do set "hval=!hval! %%b"
)
for %%a in (1650,3300,6600) do #(
echo " %hval% " | find " %%a " >nul && set /a resdetc=%%a/11
)
if %resdetc%==0 SET resDefault=3
if %resdetc%==150 SET resDefault=1
if %resdetc%==300 SET resDefault=3
if %resdetc%==600 SET resDefault=6
ECHO.
ECHO Choose your resolution
ECHO ----------------------
ECHO 1. 150 4. 400
ECHO 2. 200 5. 500
ECHO 3. 300 6. 600
ECHO.
IF NOT %RESDETC%==0 ECHO 7. Custom (%resdetc% DPI input detected)
IF %RESDETC%==0 ECHO 7. Custom
ECHO ----------------------
choice /c 1234567 /T 3 /D %resDefault% /N /M "Enter 1-7 (defaults to %resDefault% after 3 sec.): "
IF errorlevel==7 goto choice7
IF errorlevel==6 set reschoice=600 & goto convert
IF errorlevel==5 set reschoice=500 & goto convert
[...]
Thanks everyone for the help!

Locating keyword in .csv files

I am trying to create batch file that reads specific CSV documents from specific file, and extracts some lines that have specific number and print it out on the screen " the whole line !". The problem is I created the code but it wont work at all, whenever I tried it only prints the line numbers!?
The code:
#echo off
setlocal EnableDelayedExpansion
set "yourDir=C:\Users\Adminm\Desktop\test11\"
set "yourExt=csv"
set "keyword=44"
set /a count=0
set linenum=!count!
set c=0
pushd %yourDir%
for %%a in (*.%yourExt%) do (
for /f "usebackq tokens=3 delims=," %%b in (%yourDir%%%a) do (
set /a count = !count! + 1
if NOT %%b == %keyword% (
for /f "delims=" %%1 in ('type %yourDir%%%a') do (
set /a c+=1 && if "!c!" equ "%linenum%" echo %%1%
)
)
)
)
echo !count!
popd
endlocal
thanks in advance <3
for %%a in (*.%yourExt%) do (
for /f "usebackq delims=" %%L in ("%%a") do (
for /f "tokens=3 delims=," %%b in ("%%L") do (
if %%b == %keyword% echo %%L
)
)
)
Assuming what you want to do is scan each file for a target string in column3, then:
Since you have already changed to yourdir, there's no requirement to specify it in the scan-for-filenames for.
Your attempt to locate the required line is clumsy. All you need to do is assign each line in turn to a metavariable (%%L) and then use for/ to parse the metavariable. When the required data matches, simply echo the metavariable containing the entire line.
You've attempted to use %%1 as a metavariable. %n for n=0..9 refers to the parameter number supplied to the routine. The only officially defined metavariables for use here are %%a..%%z and %%A..%%Z (one of the very few places where batch is case-sensitive) - although some other symbols also work. Numerics will not work here.

Display txt contents below string then remove duplicate lines

Need help creating a batch that displays a certain amount of text (e.g 5 lines of text) from a txt file but only below a specific key word e.g 'Home' and finally removing any duplicate text
So ,
Search for specific string e.g 'Home' any text below ‘home’ display not all just 5 lines worth and finally remove any duplicate sentence’s
I've tried modifying the following command .
#echo OFF
:: Get the number of lines in the file
set LINES=0
for /f "delims==" %%I in (data.txt) do (
set /a LINES=LINES+1
)
:: Print the last 10 lines (suggestion to use more courtsey of dmityugov)
set /a LINES=LINES-10
more +%LINES% < data.txt
Displaying lines from text file in a batch file
Read every 5th line using Batch Script
I don't know if its possible to remove duplicates
Update
Yes that right duplicate lines within the block of 5 following the keyword
However Don't worry about removing duplicates my main concern is just trying to show text below a certain string e.g Home
I have the below command but doesn't show all information below the text just one line ideally I would like to adjust the amount displayed e.g 5 lines worth of data
setlocal EnableDelayedExpansion
rem Assemble the list of line numbers
set numbers=
set "folder=C:\test\world.txt"
for /F "delims=:" %%a in ('findstr /I /N /C:"home" "%folder%"') do (
set /A before=%%a-0, after=%%a+3
set "numbers=!numbers!!before!: !after!: "
)
rem Search for the lines
(for /F "tokens=1* delims=:" %%a in ('findstr /N "^" "%folder%" ^| findstr /B "%numbers%"') do echo. %%b)
batch script to print previous and next lines of search string in a text file
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
:: remove variables starting $
For %%b IN ($) DO FOR /F "delims==" %%a In ('set %%b 2^>Nul') DO SET "%%a="
SET /a before=0
SET /a after=5
SET "target=home"
SET /a count=0
SET "file=q24813694.txt"
FOR /f "delims=:" %%a IN ('findstr /i /n /L /c:"%target%" "%file%"'
) DO SET /a $!count!=%%a-%before%&SET /a count+=1
FOR /f "tokens=1*delims=:" %%a IN ('findstr /n /r "." "%file%"') DO (
SET "printed="
FOR /f "tokens=1,2delims==" %%m IN ('set $ 2^>nul') DO IF NOT DEFINED printed IF %%a geq %%n (
SET /a count=%%n+%before%+%after%
IF %%a geq !count! (SET "%%m=") ELSE (SET "printed=Y"&ECHO %%b)
)
)
GOTO :EOF
This routine should do the trick. You'd need to set file to suit yourself, of course; and to set the target.
If you want to set the number of lines before to print, and those after (which includes the target line) then those should work, too.

Find And Replace in a TXT From CSV File

I'm trying to find words on a first column of a CSV or XLS file, and replace them with words in the second column of the CSV of XLS. I have made something like that but it doesn't work.
Can you help me? For each line, the first column in a variable called ita and the second column in a variable called eng, and then find Ita and replace Eng. As you can imagine I need to translate a web page, starting from the csv with a language for each column. My csv file structure is:
ita1;eng1
ita2;eng2
etc...
This is my wrong script:
#echo off
setlocal enableextensions enabledelayedexpansion
set host=%COMPUTERNAME%
echo Host: %host%
pause
for /f "tokens=1 delims=;" %%Ita in (index.csv) do (
SET ita=%%Ita
echo %ita%
pause
for /f "tokens=2 delims=;" %%eng in (index.csv) do (
set eng=%%eng
echo %eng
pause
(for /f "delims=" %%i in ('findstr /n "^" "index.txt"') do (
set "transl=%%i"
set "transl=!line:%ita%=%eng%!"
echo(!line!
endlocal
))>"index2.txt"
type "index2.txt"
)
)
)
(for /f "tokens=1,2 delims=;" %%a in (index.csv) do echo(%%b;%%a)>index2.txt
type index2.txt
#echo off
setlocal enableextensions enabledelayedexpansion
(for /f "tokens=* usebackq" %%l in ("index.txt") do (
set "line=%%l"
for /f "tokens=1,2 delims=; usebackq" %%a in ("index.csv") do (
set "line=!line:%%a=%%b!"
)
echo !line!
)) > index2.txt
But this approach doesn't care of substring match.
Try using awk
awk -F';' 'NR==FNR {a[$1]=$2;next} { for(x in a) gsub(x,a[x]) } 1' index.csv index.txt

Resources