Batch script to merge lines from two files into a third file - file

Contents of file A1:
AA
VV
BB
Contents of file A2:
DD
EE
FF
I want to merge the contents of A1 and A2 as below into A3, so that the expected data in A3 is:
AADD
VVEE
BBFF
Alternatively, the expected output in A3 may be:
AA is from DD
VV is from EE
BB is from FF
Thanks for the help. I did try and search before I posted and could not find someone that has already posted something similar...

We can load the contents of the files into Batch variable arrays so each of its lines can be directly accessed in any way you wish:
#echo off
setlocal EnableDelayedExpansion
rem Load first file into A1 array:
set i=0
for /F "delims=" %%a in (A1.txt) do (
set /A i+=1
set A1[!i!]=%%a
)
rem Load second file into A2 array:
set i=0
for /F "delims=" %%a in (A2.txt) do (
set /A i+=1
set A2[!i!]=%%a
)
rem At this point, the number of lines is in %i% variable
rem Merge data from both files and create the third one:
for /L %%i in (1,1,%i%) do echo !A1[%%i]! is from !A2[%%i]!>> A3.txt
EDIT Alternative solution
There is another way to do it that don't use Batch variables so it can be used on files of any size, although it is slower. I borrowed the method used by Andy Morris in its solution: 1- Insert line numbers in both files, 2- Combine both files in one, 3- Sort the combined file, and 4- Merge groups of lines into one same line. The program below is basically Andy's one with several small modifications that made it faster (with a subtle error fixed).
#echo off
setlocal EnableDelayedExpansion
call :AddLineNumbers A1.txt A > Both.txt
call :AddLineNumbers A2.txt B >> Both.txt
sort Both.txt /O Sorted.txt
echo EOF: >> Sorted.txt
call :creatNewLines < Sorted.txt > Result.txt
goto :eof
:AddLineNumbers
findstr /n ^^ %1 > tem.tmp
for /f "tokens=1* delims=:" %%a in (tem.tmp) do (
set /a lineNo=1000000+%%a
echo !lineNo!%2:%%b
)
goto :eof
:creatNewLines
set /p lineA1=
for /f "tokens=1* delims=:" %%a in ("%lineA1%") do (
if %%a == EOF goto :eof
set /p dummy=%%b< nul
)
set /p lineA2=
for /f "tokens=1* delims=:" %%a in ("%lineA2%") do echo is from %%b
goto creatNewLines
SORT command order lines based on its contents. Andy's original method may fail because after the line number the lines are ordered based on line contents, so the lines of each file may be misplaced. In this method an additional character (A or B) is added after the line number, so the lines of each file are always placed in the right place.

If your original data is in Data1.txt and Data2.txt this should do:
#echo off
call :AddLineNumbers data1.txt Tem1.txt
call :AddLineNumbers data2.txt Tem2.txt
copy tem1.txt + tem2.txt tem3.txt
sort < tem3.txt > tem4.txt
call :GetDataOut tem4.txt > tem5.txt
set OddData=
for /f %%a in (tem5.txt) do call :creatNewLines %%a
goto :eof
:AddLineNumbers
find /v /n "xx!!xx" < %1 > tem.txt
call :ProcessLines > %2
goto :eof
:ProcessLines
for /f "tokens=1,2 delims=[]" %%a in (tem.txt) do call :EachLine %%a %%b
goto :eof
:eachLine
set LineNo=00000%1
set data=%2
set LineNo=%LineNo:~-6%
echo %LineNo% %data%
goto :eof
:GetDataOut
for /f "tokens=2" %%a in (%1) do #echo %%a
goto :eof
:creatNewLines
if "%oddData%"=="" (
set oddData=%1
) else (
echo %oddData% %1
set oddData=
)
goto :eof

If using linux, I would recommend using cut and paste (command line). See the man pages.
Alternatively, if you don't need the automation, you could use vim block mode cut and paste. Enter block mode visual mode with control-v.

Related

Windows Batch FOR Loop improvement

I have a batch to check the duplicate line in TXT file (over one million line) with 13MB, that will be running over 2hr...how can I speed up that? Thank you!!
TXT file
11
22
33
44
.
.
.
44 (over one million line)
Existing Batch
setlocal
set var1=*
sort original.txt>sort.txt
for /f %%a in ('type sort.txt') do (call :run %%a)
goto :end
:run
if %1==%var1% echo %1>>duplicate.txt
set var1=%1
goto :eof
:end
This should be the fastest method using a Batch file:
#echo off
setlocal EnableDelayedExpansion
set var1=*
sort original.txt>sort.txt
(for /f %%a in (sort.txt) do (
if "%%a" == "!var1!" (
echo %%a
) else (
set "var1=%%a"
)
)) >duplicate.txt
This method use findstr command as in aschipfl's answer, but in this case each line and its duplicates are removed from the file after being revised by findstr. This method could be faster if the number of duplicates in the file is high; otherwise it will be slower because the high volume data manipulated in each turn. Just a test may confirm this point...
#echo off
setlocal EnableDelayedExpansion
del duplicate.txt 2>NUL
copy /Y original.txt input.txt > NUL
:nextTurn
for %%a in (input.txt) do if %%~Za equ 0 goto end
< input.txt (
set /P "line="
findstr /X /C:"!line!"
find /V "!line!" > output.txt
) >> duplicate.txt
move /Y output.txt input.txt > NUL
goto nextTurn
:end
#echo off
setlocal enabledelayedexpansion
set var1=*
(
for /f %%a in ('sort q42574625.txt') do (
if "%%a"=="!var1!" echo %%a
set "var1=%%a"
)
)>"u:\q42574625_2.txt"
GOTO :EOF
This may be faster - I don't have your file to test against
I used a file named q42574625.txt containing some dummy data for my testing.
It's not clear whether you want only one instance of a duplicate line or not. Your code would produce 5 "duplicate" lines if there were 6 identical lines in the source file.
Here's a version which will report each duplicated line only once:
#echo off
setlocal enabledelayedexpansion
set var1=*
set var2=*
(
for /f %%a in ('sort q42574625.txt') do (
if "%%a"=="!var1!" IF "!var2!" neq "%%a" echo %%a&SET "var2=%%a"
set "var1=%%a"
)
)>"u:\q42574625.txt"
GOTO :EOF
Supposing you provide the text file as the first command line argument, you could try the following:
#echo off
for /F "usebackq delims=" %%L in ("%~1") do (
for /F "delims=" %%K in ('
findstr /X /C:"%%L" "%~1" ^| find /C /V ""
') do (
if %%K GTR 1 echo %%L
)
)
This returns all duplicate lines, but multiple times each, namely as often as each occurs in the file.

Batch output one occurrence instead of several [duplicate]

Is it possible to remove duplicate rows from a text file? If yes, how?
Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.
This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
set "prev="
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
if /i "!ln!" neq "!prev!" (
endlocal
(echo %%A)
set "prev=%%A"
) else endlocal
)
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
>"%deduped%" (
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
>"%line%" (echo !ln:\=\\!)
>nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
endlocal
)
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"
EDIT
Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.
I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.
New solution 2016-04-13: JSORT.BAT
You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.
#jsort file.txt /u >file.txt.new
#move /y file.txt.new file.txt >nul
you may use uniq http://en.wikipedia.org/wiki/Uniq from UnxUtils http://sourceforge.net/projects/unxutils/
Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort command features some undocumented options that can be adopted:
/UNIQ[UE] to output only unique lines;
/C[ASE_SENSITIVE] to sort case-sensitively;
So use the following line of code to remove duplicate lines (remove /C to do that in a case-insensitive manner):
sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"
This removes duplicate lines from the text in incoming.txt and provides the result in outgoing.txt. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort).
However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).
The Batch file below do what you want:
#echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in (theFile.txt) do (
if "%%a" neq "!prevLine!" (
echo %%a
set "prevLine=%%a"
)
)
If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq program. Save it with .bat extension, like uniq.bat:
#if (#CodeSection == #Batch) #then
#CScript //nologo //E:JScript "%~F0" & goto :EOF
#end
var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
line = WScript.Stdin.ReadLine();
if ( line != prevLine ) {
WScript.Stdout.WriteLine(line);
prevLine = line;
}
}
Both programs were copied from this post.
set "file=%CD%\%1"
sort "%file%">"%file%.sorted"
del /q "%file%"
FOR /F "tokens=*" %%A IN (%file%.sorted) DO (
SETLOCAL EnableDelayedExpansion
if not [%%A]==[!LN!] (
set "ln=%%A"
echo %%A>>"%file%"
)
)
ENDLOCAL
del /q "%file%.sorted"
This should work exactly the same. That dbenham example seemed way too hardcore for me, so, tested my own solution. usage ex.: filedup.cmd filename.ext
Pure batch - 3 effective lines.
#ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
FOR /f "delims=" %%a IN (q34223624.txt) DO SET $%%a=Y
(FOR /F "delims=$=" %%a In ('set $ 2^>Nul') DO ECHO %%a)>u:\resultfile.txt
GOTO :EOF
Works happily if the data does not contain characters to which batch has a sensitivity.
"q34223624.txt" because question 34223624 contained this data
1.1.1.1
1.1.1.1
1.1.1.1
1.2.1.2
1.2.1.2
1.2.1.2
1.3.1.3
1.3.1.3
1.3.1.3
on which it works perfectly.
Did come across this issue and had to resolve it myself because the use was particulate to my need.
I needed to find duplicate URL's and order of lines was relevant so it needed to be preserved. The lines of text should not contain any double quotes, should not be very long and sorting cannot be used.
Thus I did this:
setlocal enabledelayedexpansion
type nul>unique.txt
for /F "tokens=*" %%i in (list.txt) do (
find "%%i" unique.txt 1>nul
if !errorlevel! NEQ 0 (
echo %%i>>unique.txt
)
)
Auxiliary: if the text does contain double quotes then the FIND needs to use a filtered set variable as described in this post: Escape double quotes in parameter
So instead of:
find "%%i" unique.txt 1>nul
it would be more like:
set test=%%i
set test=!test:"=""!
find "!test!" unique.txt 1>nul
Thus find will look like find """what""" file and %%i will be unchanged.
I have used a fake "array" to accomplish this
#echo off
:: filter out all duplicate ip addresses
REM you file would take place of %1
set file=%1%
if [%1]==[] goto :EOF
setlocal EnableDelayedExpansion
set size=0
set cond=false
set max=0
for /F %%a IN ('type %file%') do (
if [!size!]==[0] (
set cond=true
set /a size="size+1"
set arr[!size!]=%%a
) ELSE (
call :inner
if [!cond!]==[true] (
set /a size="size+1"
set arr[!size!]=%%a&& ECHO > NUL
)
)
)
break> %file%
:: destroys old output
for /L %%b in (1,1,!size!) do echo !arr[%%b]!>> %file%
endlocal
goto :eof
:inner
for /L %%b in (1,1,!size!) do (
if "%%a" neq "!arr[%%b]!" (set cond=true) ELSE (set cond=false&&goto :break)
)
:break
the use of the label for the inner loop is something specific to cmd.exe and is the only way I have been successful nesting for loops within each other. Basically this compares each new value that is being passed as a delimiter and if there is no match then the program will add the value into memory. When it is done it will destroy the target files contents and replace them with the unique strings

Copying multiple files with batch

I am writing a batch program for controling my movie archive (Personel usage). This is what i am trying to do for copying folders.
:_Kopya
set "TRGT=%~1" & set "KPY-GLN[1]=%~2" & set "KPY-GLN[2]=%~3" & set "KPY-GLN[3]=%~4"
REM Checking user input and defining variables.
for /l %%s in (1,1,3) do (
if DEFINED KPY-GLN[%%s] (
for /f "tokens=1-2 delims=:" %%a in ("!KPY-GLN[%%s]!") do (
call :_Kontrol "%%a" "%%b" "" "" "aaaaa[%%s]" "bbbbb[%%s]" "" ""
if "!TEST!"=="0" goto :EOF
)
)
)
REM Copying folders.
for /l %%s in (1,1,3) do (
if NOT DEFINED bbbbb[%%s] set bbbbb[%%s]=!aaaaa[%%s]!
for /l %%a in (!aaaaa[%%s]!,1,!bbbbb[%%s]!) do (
call :_ReadLine "%MURL%" "%%a" "LINE"
if EXIST "!TRGT!\!LINE:~20!" rd /s /q !TRGT!\!LINE:~20!
robocopy /s /e "!LINE!" "!TRGT!\!LINE:~20!" >NUL 2>&1
)
)
goto :EOF
And this is the way i call,
call :_Kopya "C:\" "123:125" "124:130" "125"
Which means copy the file numbers from 123 to 125 and from 124 to 130 and 125.
It works fine but there is a problem i want to solve. When i call this function the way i show its copying file number 124 2 times and file number 125 3 times. How can i fix this issue?
PS1: %MURL% its a text file and contains local address of those files. Its something like this M:\Movies\000y.001y\The.Lord.of.the.Rings.The.Return.of.the.King.(2003){0167260}[00087]
PS2: :_ReadLine its a function that reading specific line and adding value of this line to LINE variable.
#ECHO Off
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
:: Parameters are adirectory range*
:: where range may be a single number or start:finish
SET "directory=%~1"
:loop
SHIFT
IF "%~1"=="" GOTO :eof
FOR /f "tokens=1,2delims=:" %%a IN ("%~1") DO (
IF "%%b"=="" (CALL :kopythis %%a) ELSE (FOR /L %%c IN (%%a,1,%%b) DO CALL :kopythis %%c)
)
GOTO loop
GOTO :EOF
:kopythis
IF DEFINED $%1 GOTO :EOF
SET $%1=Y
ECHO(COPY whatever with parameters %directory% and %1
GOTO :eof
This should do what you seem to need. I'll leave you to work out the details of how to structure whatever copy mechanism you need from the parameters provided.
Note that with this approach, quoting the parameters is optional with the obvious exception of the first when it's optional if the first doesn't contain separators. It also allows any number of range parameters.

How to replace Strings in Windows Batch file

I would like to replace the following String in a file:
android:versionName="anyStringHere" >
*anyStringHere represents any possible string
With:
android:versionName="1.04.008" >
How would I do this in a clean, reusable way, and preserve the new lines, tabs, and indentation in the file?
Not even close to the fastest option, and not 100% bulletproof, but this is pure batch and will handle spacing and indentation while do the replacement.
#echo off
setlocal enableextensions disabledelayedexpansion
rem File to process
set "file=data.txt"
rem How to find lines
set "match=public static String CONST = \"abc\";"
rem String to replace and replacement
set "findStr=abc"
set "replaceStr=def"
rem temporary file to work with lines
set "tempFile=%temp%\repl.tmp"
rem All the output goes into the temporary file
(
rem Process input file extracting non matching lines
for /f tokens^=^1^*^ delims^=^:^ eol^= %%a in ('findstr /n /v /c:"%match%" ^< "%file%"') do (
set /a "n=1000000+%%a"
setlocal enabledelayedexpansion
< nul set /p "n=!n!"
endlocal
echo :%%b
)
rem Process input file extrancting matching lines and changing strings
for /f tokens^=^1^*^ delims^=^:^ eol^= %%a in ('findstr /n /c:"%match%" ^< "%file%"') do (
set /a "n=1000000+%%a"
set "data=%%b"
setlocal enabledelayedexpansion
set "data=!data:%findStr%=%replaceStr%!"
echo !n!:!data!
endlocal
)
)> "%tempFile%"
rem Sort the output file to get the final file
(for /f tokens^=^1^*^ delims^=^:^ eol^= %%a in ('sort "%tempFile%"') do (
if "%%b"=="" (
echo.
) else (
echo %%b
)
)) > "%file%.repl"
This is the simplest way to do this that I could come up with. It takes a String and searches for it in a file, then replaces the entire line that contains the string. It won't only replace parts of a line, which can be done with a bit more effort.
#echo off
:: file containing string to replace
set file=test.txt
:: string to replace in file
set searchString=line 4
:: string to write to file
set repString=line 4 edited
setLocal enableDelayedExpansion
set count=0
if not exist %file% echo cannot find file - %file% & goto :EOF
:: Search for string - and get it's line number
for /F "delims=:" %%a in ('findstr /N /I /C:"%searchString%" "%file%"') do set searchLine=%%a
if not defined searchLine echo cannot find string - %searchString% - in file - %file% & goto :EOF
:: Read file into variables - by line number
for /F "delims=~!" %%b in ('type %file%') do (
set /a count=!count!+1
set line!count!=%%b
)
:: Edit the one line
set line%searchLine%=%repString%
:: Empty file and write new contents
del %file%
for /L %%c in (1,1,!count!) do echo !line%%c!>>%file%
pause
You can change the echo on the last for loop to output to a different file, maybe %file%.new or something, and then remove the del command.
This is a robust solution that retains all formatting. It uses a helper batch file called repl.bat - download from: https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat
Place repl.bat in the same folder as the batch file or in a folder that is on the path.
type "file.txt" | repl "(public static String CONST = \q).*(\q.*)" "$1def$2" x >"newfile.txt"
I found that using sed was the cleanest solution
http://gnuwin32.sourceforge.net/packages/sed.htm
sed "s/android:versionName=\".*\" >/android:versionName=\"%NEW_VERSION%\" >/g" %ORIG_FILE_NAME% > %TEMP_FILE_NAME%
#move /Y %TEMP_FILE_NAME% %ORIG_FILE_NAME% >nul

Deleting last n lines from file using batch file

How to delete last n lines from file using batch script
I don't have any idea about batch files, I am writing batch file for the first time.
How should I write this batch file?
For Windows7
Try it for
<Project_Name>
<Noter>
<Common>
<File>D:\Project_Name\Util.jar</File>
<File>D:\Project_Name\Noter.bat</File>
<File>D:Project_Name\Noter.xml</File>
<File>D:Project_Name\Noter.jar</File>
</Common>
<Project_Name>
<File>D:\Util.bat</File>
<File>D:\Util.xml</File>
<File>D:\log.bat</File>
</Project_Name>
</Noter>
<CCNET>
This the complete script for remove last N line
count the total line
set Line = Line - N , remain just processing lines number
#echo OFF
setlocal EnableDelayedExpansion
set LINES=0
for /f "delims==" %%I in (infile.txt) do (
set /a LINES=LINES+1
)
echo Total Lines : %LINES%
echo.
:: n = 5 , last 5 line will ignore
set /a LINES=LINES-5
call:PrintFirstNLine > output.txt
goto EOF
:PrintFirstNLine
set cur=0
for /f "delims==" %%I in (infile.txt) do (
echo %%I
::echo !cur! : %%I
set /a cur=cur+1
if "!cur!"=="%LINES%" goto EOF
)
:EOF
exit /b
Here call:PrintFirstNLine > output.txt will give the output in an external file name as output.txt
Output for sample Input
<Project_Name>
<CBA_Notifier>
<Common>
<File>D:\CBA\CBA_Notifier\Project_Name\IPS-Util.jar</File>
<File>D:\CBA\CBA_Notifier\Project_Name\Notifier.bat</File>
<File>D:\CBA\CBA_Notifier\Project_Name\Notifier.xml</File>
<File>D:\CBA\CBA_Notifier\Project_Name\Notifier.jar</File>
</Common>
<Project_Name>
<File>D:\CBA\CBA_Notifier\IPS-Util.bat</File>
remove last 5 line
Update
:PrintFirstNLine
set cur=0
for /F "tokens=1* delims=]" %%I in ('type "infile.txt" ^| find /V /N ""') do (
if "%%J"=="" (echo.) else (
echo.%%J
set /a cur=cur+1
)
if "!cur!"=="%LINES%" goto EOF
)
This script will takes 1 arguement, the file to be trunkated, creates a temporary file and then replaces the original file with the shorter one.
#echo off
setlocal enabledelayedexpansion
set count=
for /f %%x in ('type %1 ^| find /c /v ""') do set /a lines=%%x-5
copy /y nul %tmp%\tmp.zzz > nul
for /f "tokens=*" %%x in ('type %1 ^| find /v ""') do (
set /a count=count+1
if !count! leq %lines% echo %%x>>%tmp%\tmp.zzz
)
move /y %tmp%\tmp.zzz %1 > nul
If the original file is 5 or less lines, the main output routine will noT create a file. To combat this, I use the copy /y null to create a zero byte file.
If you would rather not have an empty file, just remove the copy /y nul line, and replace it with the following line:
if %lines% leq 0 del %1
You should use one method or the other, otherwise source files with 5 or less lines will remain untouched. (Neither replaced or deleted.)
to delete last lines from your file,
1 copy starting lines that are needed from file like from- e:\original.txt
2 paste them in new file like- e:\new\newfile1.txt
code is thanks to the person giving me this code:
remember all may be done if you have motive and even blood hb =6. but help of nature is required always as you are a part of it
#echo off & setLocal enableDELAYedeXpansion
set N=
for /f "tokens=* delims= " %%a in (e:\4.txt) do (
set /a N+=1
if !N! gtr 264 goto :did
e:\new4.txt echo.%%a
)
:did
if you have 800 files then use excel to make code for 800 and then copy it to notepad and using Ctrl+h replace space with no space. then rename file as haha.bat . run in folder with files numbered 1.txt 2.txt 3.txt etc. any enquirers welcome Erkamaldev#gmail.com " Long Live Bharata"
A slow method with less coding:
set input=file.txt
set remove=7
for /f "delims=" %i in ('find /c /v "" ^< "%cd%\%input%"') do set lines=%i
set /a lines-=remove
for /l %i in (1,1,!lines!) do findstr /n . log.txt | findstr /b %i:
May be redirected to a file.
Each line is prefixed with the line number; may be removed with extra coding.
A faster version with /g flag in my answer at:
How to split large text file in windows?
Tested in Win 10 CMD, on 577KB file, 7669 lines.

Resources