I have a batch to check the duplicate line in TXT file (over one million line) with 13MB, that will be running over 2hr...how can I speed up that? Thank you!!
TXT file
11
22
33
44
.
.
.
44 (over one million line)
Existing Batch
setlocal
set var1=*
sort original.txt>sort.txt
for /f %%a in ('type sort.txt') do (call :run %%a)
goto :end
:run
if %1==%var1% echo %1>>duplicate.txt
set var1=%1
goto :eof
:end
This should be the fastest method using a Batch file:
#echo off
setlocal EnableDelayedExpansion
set var1=*
sort original.txt>sort.txt
(for /f %%a in (sort.txt) do (
if "%%a" == "!var1!" (
echo %%a
) else (
set "var1=%%a"
)
)) >duplicate.txt
This method use findstr command as in aschipfl's answer, but in this case each line and its duplicates are removed from the file after being revised by findstr. This method could be faster if the number of duplicates in the file is high; otherwise it will be slower because the high volume data manipulated in each turn. Just a test may confirm this point...
#echo off
setlocal EnableDelayedExpansion
del duplicate.txt 2>NUL
copy /Y original.txt input.txt > NUL
:nextTurn
for %%a in (input.txt) do if %%~Za equ 0 goto end
< input.txt (
set /P "line="
findstr /X /C:"!line!"
find /V "!line!" > output.txt
) >> duplicate.txt
move /Y output.txt input.txt > NUL
goto nextTurn
:end
#echo off
setlocal enabledelayedexpansion
set var1=*
(
for /f %%a in ('sort q42574625.txt') do (
if "%%a"=="!var1!" echo %%a
set "var1=%%a"
)
)>"u:\q42574625_2.txt"
GOTO :EOF
This may be faster - I don't have your file to test against
I used a file named q42574625.txt containing some dummy data for my testing.
It's not clear whether you want only one instance of a duplicate line or not. Your code would produce 5 "duplicate" lines if there were 6 identical lines in the source file.
Here's a version which will report each duplicated line only once:
#echo off
setlocal enabledelayedexpansion
set var1=*
set var2=*
(
for /f %%a in ('sort q42574625.txt') do (
if "%%a"=="!var1!" IF "!var2!" neq "%%a" echo %%a&SET "var2=%%a"
set "var1=%%a"
)
)>"u:\q42574625.txt"
GOTO :EOF
Supposing you provide the text file as the first command line argument, you could try the following:
#echo off
for /F "usebackq delims=" %%L in ("%~1") do (
for /F "delims=" %%K in ('
findstr /X /C:"%%L" "%~1" ^| find /C /V ""
') do (
if %%K GTR 1 echo %%L
)
)
This returns all duplicate lines, but multiple times each, namely as often as each occurs in the file.
Related
Not able to substring the dynamic variable inside forloop in Windows batch script.
I have the properties file in my git hub in the below format.
"collectionName=TestCollectionRun.json=test"
So I have written the below code to fetch this values.But the requirement is that I need to strip of the '.json' part from collection name.With the below code I am not able to set/echo that value.Can you please help on this!
#ECHO ON
:BEGIN
IF EXIST "test.properties" ECHO Found properties file, reading file..
SET props=test.properties
setlocal EnableDelayedExpansion
For /F "delims== tokens=1,2,3" %%G in (%props%) Do (
if "%%I" EQU "test" if "%%G" EQU "collectionName" SET collName=%%H(
SET finalCollName=%collName%:~0,-5
ECHO %finalCollName%
)
)
:END
We need the ECHO to return "TestCollectionRun".currently its not returning anything.
For /F "delims== tokens=1,2,3" %%G in (%props%) Do (
if "%%I" EQU "test" if "%%G" EQU "collectionName" SET "collName=%%~nH"&echo %%~nH
)
ECHO %CollName%
Note the second ) is now redundant. Your problem has to do with delayedexpansion which you are invoking but not using. call %%collname%% within the for loop would have shown the value after assignment if required.
This code works by interpreting %%H as a filename and assigning simply the name part of %%H (%%~nH)
Given a line content of collectionName=TestCollectionRun.json=test, here's a quick rewrite of what I think you're tring to do:
#Echo Off
Set "props=test.properties"
If Not Exist "%props%" (
Echo Properties file not found!
Echo Exiting..
Timeout /T 3 /NoBreak >NUL
Exit /B
)
Echo Found properties file, reading file..
For /F "UseBackQ Tokens=1-3 Delims==" %%A In ("%props%") Do (
If /I "%%C" == "test" If /I "%%A" == "collectionName" Echo %%~nB
)
Pause
If you wanted to do something with the collection name within the loop then you would probably need to use delayed expansion:
#Echo Off
SetLocal DisableDelayedExpansion
Set "props=test.properties"
If Not Exist "%props%" (
Echo Properties file not found!
Echo Exiting..
Timeout /T 3 /NoBreak >NUL
Exit /B
)
Echo Found properties file, reading file..
For /F "UseBackQ Tokens=1-3 Delims==" %%A In ("%props%") Do (
If /I "%%C" == "test" If /I "%%A" == "collectionName" (
Set "collName=%%B"
SetLocalEnableDelayedExpansion
Echo !collName!
Rem Perform substring task on the variable named collName
Set "finalCollName=!collName%:~0,-5!"
Echo !finalCollName!
EndLocal
)
)
Pause
Note, these answers will not work, as is, if your string is surrounded by doublequotes, (as in your question body), or if the line content differs (e.g. begins with spaces or tabs).
[Edit /]
Looking at your 'after the fact' question in the comments, it is clear that you do not need to substring the variable at all, so should use the first method posted:
Echo Found properties file, reading file..
For /F "UseBackQ Tokens=1-3 Delims==" %%A In ("%props%") Do (
If /I "%%C" == "test" If /I "%%A" == "collectionName" (
newman run "%%B" -e "%envName%" --insecure --reporters cli,htmlextra --reporter-htmlextra-export "newman\%BUILD_NUMBER%\%%~nB.html" --disable-unicode
)
)
Pause
This assumes that both %envName% and %BUILD_NUMBER% have been previously defined correctly.
I am trying to write a batch script that reads 18th line from a .log file and outputs the same. The .log file name varies each time. abc_XXXX.log where xxxx are process IDs. Below is the code I am trying to run to achieve this.
:Test1
set "xprvar=" for /F "skip=17 delims=" %%p in (abc*.log) do (echo %%p& goto
break)
:break
pause
goto END
set var=anyCommand doesn't work. It just sets the var to the literal string.
The usage of afor /f is the right way, just the variable assignment works different:
for /F "skip=17 delims=" %%p in ('dir /b abc*.log') do ( set "xprvar=%%p"& goto break )
There is also an option using FindStr
#Echo Off
For /F "Tokens=1-2* Delims=:" %%A In ('FindStr/N "^" "abc_*.log" 2^>Nul'
) Do If %%B Equ 18 Echo %%A:%%C
Pause
The above example Echoes the <filename>:<18th line content>, but there's no reason in the appropriate situation why you couldn't change that to read:
#Echo Off
For /F "Tokens=1-2* Delims=:" %%A In ('FindStr/N "^" "abc_*.log" 2^>Nul'
) Do If %%B Equ 18 Set "xprvar=%%C"
If there is more than one matching filename in the directory, the variable would be set to the content in the last file parsed.
#ECHO Off
SETLOCAL
FOR %%f IN (abc*.log) DO (
SET "reported="
FOR /f "skip=17delims=" %%p IN (%%f) DO IF NOT DEFINED reported (
ECHO %%p
SET "reported=Y"
)
)
Assign each filename in turn to %%f.
For each filename found, clear the reported flag then read the file, skipping the first 17 lines. echo the 18th line and set the reported flag so that the remainder of the lines are not echoed.
How can we split string using windows bat script?
for below .bat code snippet
#echo off & setlocal EnableDelayedExpansion
set j=0
for /f "delims=""" %%i in (config.ini) do (
set /a j+=1
set con!j!=%%i
call set a=%%con!j!%%
echo !a!
(echo !a!|findstr "^#">nul 2>nul && (
rem mkdir !a!
) || (
echo +)
rem for /f "tokens=2" %%k in(config.ini) do echo %%k
)
)
pause
below config file
Q
What's wrong when I del rem at the begin of rem for /f "tokens=2" %%k in(config.ini) do echo %%k
How can I get the /path/to/case and value as a pair?
for /f xxxx in (testconfig.ini) do (set a=/path/to/case1 set b=vaule1)
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "filename1=%sourcedir%\q43407067.txt"
set j=0
for /f "delims=""" %%i in (%filename1%) do (
set /a j+=1
set con!j!=%%i
call set a=%%con!j!%%
echo !a! SHOULD BE EQUAL TO %%i
(echo !a!|findstr "^#">nul 2>nul && (
echo mkdir !a!
) || (
echo +)
for /f "tokens=2" %%k IN ("%%i") do echo "%%k"
for /f "tokens=1,2" %%j IN ("%%i") do echo "%%j" and "%%k"
)
)
ECHO ----------------------------
SET con
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances.
I used a file named q43407067.txt containing your data for my testing.
(These are setting that suit my system)
SO - to address your problems:
because the ) on that line closes the ( on the previous. The ) on that line closes the ( on the one prior. (I changed the rem to an echo so that the code would produce something visible) The first ( on the (echo !a! line is closed by the ) on the line following the (now) two for /f commands. and the ( on the for..%%i..do( is closed by the final ) before the echo -----
You can't delete that ) because it's participating in a parenthesis-pair.
You need a space between the in and the (.
I've shown a way. See for /?|more from the prompt for documentation (or many articles here on SO)
In your code, !a! is the same as %%i - so I've no idea why you are conducting all the gymnastics - doubtless to present a minimal example showing the problem.
Note that since the default delimiters include Space then if any line contains a space in the /path/to/case or value then you'll have to re-engineer the approach.
I' not sure if I understand what exactly it is you need, so what follows may not suit your needs:
#Echo Off
SetLocal EnableDelayedExpansion
Set "n=0"
For /F "Delims=" %%A In (testConfig.ini) Do (Set "_=%%A"
If "!_:~,1!"=="#" (Set/A "n+=1", "i=0"
Echo=MD %%A
Set "con[!n!]!i!=%%A") Else (For /F "Tokens=1-2" %%B In ('Echo=%%A'
) Do (Set/A "i+=1"
Set "con[!n!]!i!=%%B"&&Set/A "i+=1"&&Set "con[!n!]!i!=%%C")))
Set con[
Timeout -1
GoTo :EOF
remove Echo= on line 6 if you are happy with the output and really want to create those directories
Is it possible to write a batch file that deletes all files in a directory for which the first n characters of the file's root name do not match the first n characters of any other filenames in that directory? For instance, suppose the directory contains the following:
Purcell_HenryA.txt
Purcell_HenryB.txt
Casaubon_IsaacA.txt
In this case, we would want to delete all files in the directory whose first 13 characters did not match the first 13 characters in any other files in the directory. (That is, we'd want to delete only Casaubon_IsaacA.txt.) I have tracked down scripts that delete all files with unique extensions in a directory, but don't know how to begin to write this script, and would therefore be grateful for any leads on the question.
This checks for root filenames of 14 characters and over - and if there is only 1 file with the same leading 13 characters then it will echo del. Remove the echo to make it perform the deletion.
#echo off
setlocal enabledelayedexpansion
for /f "delims=" %%a in ('dir /b /a-d') do (
set "part=%%~na"
if not "!part:~13,1!"=="" (
set "part=!part:~0,13!"
for /f "delims=" %%b in ('dir /b /a-d "!part!*.*" ^|find /c "!part!" ') do (
if %%b EQU 1 echo del "%%a"
)
)
)
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET target=u:\testdir
DIR /b /a-d %target%
echo====^^ names IN DIR ^^===
SET length=13
SET match=:
SET "candidate="
FOR /f "delims=" %%i IN ('dir /b/a-d "%target%\*"') DO (
SET filename=%%i
SET section=!filename:~0,%length%!
IF !section!==!match! (SET "candidate=") ELSE (
IF DEFINED candidate ECHO(DEL %target%\!candidate!
SET candidate=%%i
SET match=!section!
)
)
IF DEFINED candidate ECHO(DEL %target%\!candidate!
GOTO :EOF
Test result:
abc123_uniquename.txt
another_uniquename.txt
duplicate_name1234.txt
duplicate_name1235.txt
duplicate_name1236.txt
hello.txt
repeated__name1236.txt
repeated__name1235.txt
unique__name1235.txt
===^ names IN DIR ^===
DEL u:\testdir\abc123_uniquename.txt
DEL u:\testdir\another_uniquename.txt
DEL u:\testdir\hello.txt
DEL u:\testdir\unique__name1235.txt
If you are happy after testing, remove both ECHO( to activate the delete function.
For this code file name = name+extension:
#echo off &SETLOCAL enabledelayedexpansion
FOR %%a IN (*) DO (
SET "search=%%~a"
IF "!search:~13!" neq "" (
FOR /f "delims=[]" %%b IN ('dir /b /a-d /on "!search:~0,13!*" ^| find /n "!search:~0,13!"') DO SET found=%%b
IF !found! equ 1 ECHO DEL "%%~a"
)
)
And because I coose a very similar solution as foxidrive here is another one:
#echo off &SETLOCAL enabledelayedexpansion
FOR %%a IN (*) DO (
SET search=%%a
IF "!search:~13!" neq "" SET /a $!search:~0,13!+=1 2>nul
)
FOR /f "tokens=1*delims=$=" %%a IN ('set "$"') DO if %%b equ 1 echo del "%%~a*"
The way i'd go about this is as follows, i will explain the logic and i'll leave you to do the coding.
You will parse all the file names into variables, while increasing each time.
Then you will set a limit to the number of loops to go through. Then you will search the first 13 characters of the file name and if the number of lines is equal to 1 then delete it. After you will increase the variable by 1 and go through the loop, at the end of each loop it will check if it has reached the limit aka the number of files in the directory, if it has reached the limit, end the loop, otherwise continue.
hah, i finally decided to do it after a guy decided to use my idea i described into actual code, anyway this is way shorter and a lot faster than his, tested+verified to work:
#echo off & setlocal enabledelayedexpansion
set dir=directoryyouwanttosearchin
for /f "delims=" %%a in ('dir /A:a /b %dir%') do set /A name+=1 & set file!name!=%%a
:LOOP
set /A cnt+=1
for /f "delims=" %%a in ('dir /A:a /b %dir% ^| find /C /I "!file%cnt%:~0,13!"') do set lines=%%a
if %lines%==1 del %dir%\!file%cnt%! > nul
if %cnt% NEQ %name% Goto :LOOP
exit /b
That's 9 lines :).
How to delete last n lines from file using batch script
I don't have any idea about batch files, I am writing batch file for the first time.
How should I write this batch file?
For Windows7
Try it for
<Project_Name>
<Noter>
<Common>
<File>D:\Project_Name\Util.jar</File>
<File>D:\Project_Name\Noter.bat</File>
<File>D:Project_Name\Noter.xml</File>
<File>D:Project_Name\Noter.jar</File>
</Common>
<Project_Name>
<File>D:\Util.bat</File>
<File>D:\Util.xml</File>
<File>D:\log.bat</File>
</Project_Name>
</Noter>
<CCNET>
This the complete script for remove last N line
count the total line
set Line = Line - N , remain just processing lines number
#echo OFF
setlocal EnableDelayedExpansion
set LINES=0
for /f "delims==" %%I in (infile.txt) do (
set /a LINES=LINES+1
)
echo Total Lines : %LINES%
echo.
:: n = 5 , last 5 line will ignore
set /a LINES=LINES-5
call:PrintFirstNLine > output.txt
goto EOF
:PrintFirstNLine
set cur=0
for /f "delims==" %%I in (infile.txt) do (
echo %%I
::echo !cur! : %%I
set /a cur=cur+1
if "!cur!"=="%LINES%" goto EOF
)
:EOF
exit /b
Here call:PrintFirstNLine > output.txt will give the output in an external file name as output.txt
Output for sample Input
<Project_Name>
<CBA_Notifier>
<Common>
<File>D:\CBA\CBA_Notifier\Project_Name\IPS-Util.jar</File>
<File>D:\CBA\CBA_Notifier\Project_Name\Notifier.bat</File>
<File>D:\CBA\CBA_Notifier\Project_Name\Notifier.xml</File>
<File>D:\CBA\CBA_Notifier\Project_Name\Notifier.jar</File>
</Common>
<Project_Name>
<File>D:\CBA\CBA_Notifier\IPS-Util.bat</File>
remove last 5 line
Update
:PrintFirstNLine
set cur=0
for /F "tokens=1* delims=]" %%I in ('type "infile.txt" ^| find /V /N ""') do (
if "%%J"=="" (echo.) else (
echo.%%J
set /a cur=cur+1
)
if "!cur!"=="%LINES%" goto EOF
)
This script will takes 1 arguement, the file to be trunkated, creates a temporary file and then replaces the original file with the shorter one.
#echo off
setlocal enabledelayedexpansion
set count=
for /f %%x in ('type %1 ^| find /c /v ""') do set /a lines=%%x-5
copy /y nul %tmp%\tmp.zzz > nul
for /f "tokens=*" %%x in ('type %1 ^| find /v ""') do (
set /a count=count+1
if !count! leq %lines% echo %%x>>%tmp%\tmp.zzz
)
move /y %tmp%\tmp.zzz %1 > nul
If the original file is 5 or less lines, the main output routine will noT create a file. To combat this, I use the copy /y null to create a zero byte file.
If you would rather not have an empty file, just remove the copy /y nul line, and replace it with the following line:
if %lines% leq 0 del %1
You should use one method or the other, otherwise source files with 5 or less lines will remain untouched. (Neither replaced or deleted.)
to delete last lines from your file,
1 copy starting lines that are needed from file like from- e:\original.txt
2 paste them in new file like- e:\new\newfile1.txt
code is thanks to the person giving me this code:
remember all may be done if you have motive and even blood hb =6. but help of nature is required always as you are a part of it
#echo off & setLocal enableDELAYedeXpansion
set N=
for /f "tokens=* delims= " %%a in (e:\4.txt) do (
set /a N+=1
if !N! gtr 264 goto :did
e:\new4.txt echo.%%a
)
:did
if you have 800 files then use excel to make code for 800 and then copy it to notepad and using Ctrl+h replace space with no space. then rename file as haha.bat . run in folder with files numbered 1.txt 2.txt 3.txt etc. any enquirers welcome Erkamaldev#gmail.com " Long Live Bharata"
A slow method with less coding:
set input=file.txt
set remove=7
for /f "delims=" %i in ('find /c /v "" ^< "%cd%\%input%"') do set lines=%i
set /a lines-=remove
for /l %i in (1,1,!lines!) do findstr /n . log.txt | findstr /b %i:
May be redirected to a file.
Each line is prefixed with the line number; may be removed with extra coding.
A faster version with /g flag in my answer at:
How to split large text file in windows?
Tested in Win 10 CMD, on 577KB file, 7669 lines.