Is it possible to remove duplicate rows from a text file? If yes, how?
Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.
This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
set "prev="
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
if /i "!ln!" neq "!prev!" (
endlocal
(echo %%A)
set "prev=%%A"
) else endlocal
)
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"
This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.
#echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^
::The 2 blank lines above are critical, do not remove
>"%deduped%" (
for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
set "ln=%%A"
setlocal enableDelayedExpansion
>"%line%" (echo !ln:\=\\!)
>nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
endlocal
)
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"
EDIT
Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.
I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.
New solution 2016-04-13: JSORT.BAT
You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.
#jsort file.txt /u >file.txt.new
#move /y file.txt.new file.txt >nul
you may use uniq http://en.wikipedia.org/wiki/Uniq from UnxUtils http://sourceforge.net/projects/unxutils/
Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort command features some undocumented options that can be adopted:
/UNIQ[UE] to output only unique lines;
/C[ASE_SENSITIVE] to sort case-sensitively;
So use the following line of code to remove duplicate lines (remove /C to do that in a case-insensitive manner):
sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"
This removes duplicate lines from the text in incoming.txt and provides the result in outgoing.txt. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort).
However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).
The Batch file below do what you want:
#echo off
setlocal EnableDelayedExpansion
set "prevLine="
for /F "delims=" %%a in (theFile.txt) do (
if "%%a" neq "!prevLine!" (
echo %%a
set "prevLine=%%a"
)
)
If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq program. Save it with .bat extension, like uniq.bat:
#if (#CodeSection == #Batch) #then
#CScript //nologo //E:JScript "%~F0" & goto :EOF
#end
var line, prevLine = "";
while ( ! WScript.Stdin.AtEndOfStream ) {
line = WScript.Stdin.ReadLine();
if ( line != prevLine ) {
WScript.Stdout.WriteLine(line);
prevLine = line;
}
}
Both programs were copied from this post.
set "file=%CD%\%1"
sort "%file%">"%file%.sorted"
del /q "%file%"
FOR /F "tokens=*" %%A IN (%file%.sorted) DO (
SETLOCAL EnableDelayedExpansion
if not [%%A]==[!LN!] (
set "ln=%%A"
echo %%A>>"%file%"
)
)
ENDLOCAL
del /q "%file%.sorted"
This should work exactly the same. That dbenham example seemed way too hardcore for me, so, tested my own solution. usage ex.: filedup.cmd filename.ext
Pure batch - 3 effective lines.
#ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
FOR /f "delims=" %%a IN (q34223624.txt) DO SET $%%a=Y
(FOR /F "delims=$=" %%a In ('set $ 2^>Nul') DO ECHO %%a)>u:\resultfile.txt
GOTO :EOF
Works happily if the data does not contain characters to which batch has a sensitivity.
"q34223624.txt" because question 34223624 contained this data
1.1.1.1
1.1.1.1
1.1.1.1
1.2.1.2
1.2.1.2
1.2.1.2
1.3.1.3
1.3.1.3
1.3.1.3
on which it works perfectly.
Did come across this issue and had to resolve it myself because the use was particulate to my need.
I needed to find duplicate URL's and order of lines was relevant so it needed to be preserved. The lines of text should not contain any double quotes, should not be very long and sorting cannot be used.
Thus I did this:
setlocal enabledelayedexpansion
type nul>unique.txt
for /F "tokens=*" %%i in (list.txt) do (
find "%%i" unique.txt 1>nul
if !errorlevel! NEQ 0 (
echo %%i>>unique.txt
)
)
Auxiliary: if the text does contain double quotes then the FIND needs to use a filtered set variable as described in this post: Escape double quotes in parameter
So instead of:
find "%%i" unique.txt 1>nul
it would be more like:
set test=%%i
set test=!test:"=""!
find "!test!" unique.txt 1>nul
Thus find will look like find """what""" file and %%i will be unchanged.
I have used a fake "array" to accomplish this
#echo off
:: filter out all duplicate ip addresses
REM you file would take place of %1
set file=%1%
if [%1]==[] goto :EOF
setlocal EnableDelayedExpansion
set size=0
set cond=false
set max=0
for /F %%a IN ('type %file%') do (
if [!size!]==[0] (
set cond=true
set /a size="size+1"
set arr[!size!]=%%a
) ELSE (
call :inner
if [!cond!]==[true] (
set /a size="size+1"
set arr[!size!]=%%a&& ECHO > NUL
)
)
)
break> %file%
:: destroys old output
for /L %%b in (1,1,!size!) do echo !arr[%%b]!>> %file%
endlocal
goto :eof
:inner
for /L %%b in (1,1,!size!) do (
if "%%a" neq "!arr[%%b]!" (set cond=true) ELSE (set cond=false&&goto :break)
)
:break
the use of the label for the inner loop is something specific to cmd.exe and is the only way I have been successful nesting for loops within each other. Basically this compares each new value that is being passed as a delimiter and if there is no match then the program will add the value into memory. When it is done it will destroy the target files contents and replace them with the unique strings
Related
The code is used to search a string from .c file and replace it.The size of file is too long so I just want to exit the loop when it find the string and replace it.
#echo on
setlocal enabledelayedexpansion
cd D:\abc
set INTEXTFILE=rev.c
set OUTTEXTFILE=test_out.txt
set SEARCHTEXT=BNE1.9
set REPLACETEXT=BNE1.8
set OUTPUTLINE=
for /f "tokens=1,* delims=¶" %%A in ( '"findstr /n ^^ %INTEXTFILE%"') do (
SET string=%%A
for /f "delims=: tokens=1,*" %%a in ("!string!") do set "string=%%b"
if "!string!" == "" (
echo.>>%OUTTEXTFILE%
) else (
SET modified=!string:%SEARCHTEXT%=%REPLACETEXT%!
echo.!modified! >> %OUTTEXTFILE%
exit /b
)
)
del %INTEXTFILE%
rename %OUTTEXTFILE% %INTEXTFILE%
As your code is reading each line one by one and writing it to a resulting file, then using your batch file methodology there is unfortunately no way to stop at the first replacement, without losing each line below that replaced string line.
To do that you would need to use a scripting language or utility which can edit files as opposed to read them bit by bit and rewrite them entirely. I strongly advise that you consider a find and replace utility, or perform this task using the built-in WSH or PowerShell scripting languages instead.
As a side note, this is how I might have tackled the script you posted, which may perform the task a little more speedily:
#Echo Off
SetLocal EnableExtensions DisableDelayedExpansion
CD /D "D:\abc" 2>NUL || Exit /B
Set "INTEXTFILE=rev.c"
Set "OUTTEXTFILE=test_out.txt"
Set "SEARCHTEXT=BNE1.9"
Set "REPLACETEXT=BNE1.8"
If Not Exist "%INTEXTFILE%" Exit /B
Copy /Y "%INTEXTFILE%" "%OUTTEXTFILE%" 1>NUL
(
For /F "Delims=" %%G In (
'%SystemRoot%\System32\find.exe /I /N /V "" 0^<"%OUTTEXTFILE%"'
) Do (
Set "NumberedLine=%%G"
SetLocal EnableDelayedExpansion
Set NumberedLine | %SystemRoot%\System32\find.exe "%SEARCHTEXT%" 1>NUL
If Not ErrorLevel 1 (
Set "NumberedLine=!NumberedLine:%SEARCHTEXT%=%REPLACETEXT%!"
)
Echo(!NumberedLine:*]=!
EndLocal
)
) 1>"%INTEXTFILE%"
Rem Del "%OUTTEXTFILE%"
Feel free to remove the Remark on the last line, if you are sure you do not want a copy of the original file content.
The main issue with your chosen method is that multiple joined delimiters are considered as only one. That means any line which began with a colon, :, that would be lost here "delims=: tokens=1,*".
Additionally, this version opens the file for writing only once, whereas yours opened the file for writing, for each line, then closed it again before repeating for the next.
I am writing a .bat program that will find and replace text in a file. The problem that I am having is that it is removing blank lines and left justifying the other lines. I need the blank lines to remain and the new text to remain in the same location. Here is what I have wrote, and also the result. Can anybody please help.
program:
#ECHO OFF
cls
cd\
c:
setLocal EnableDelayedExpansion
For /f "tokens=* delims= " %%a in (samplefile.tx) do (
Set str=%%a
set str=!str:day=night!
set str=!str:winter=summer!
echo !str!>>samplefile2.txt)
ENDLOCAL
cls
exit
samle File:
this line is the first line in my file that I am using as an example.This is made up text
the cat in the hat
day
winter
below is the result:
this line is the first line in my file that I am using as an example.This is made up text
the cat in the hat
night
summer
I need the lines, spaces and new text to remain in the same position while making the text replacement. Please help
Your use of "tokens=* delims= " will trim leading spaces. Instead, use "delims=" to preserve leading spaces.
FOR /F always skips empty lines. The trick is to insert something before each line. Typically FIND or FINDSTR is used to insert the line number at the front of each line.
You can use !var:*:=! to delete the the line number prefix from FINDSTR.
Use echo(!str! to prevent ECHO is off message when line is empty
It is more efficient (faster) to redirect only once.
#echo off
setlocal enableDelayedExpansion
>samplefile2.txt (
for /f "delims=" %%A in ('findstr /n "^" samplefile.txt') do (
set "str=%%A"
set "str=!str:*:=!"
set "str=!str:day=night!"
set "str=!str:winter=summer!"
echo(!str!
)
)
This still has a potential problem. It will corrupt lines that contain ! when %%A is expanded because of the delayed expansion. The trick is to toggle delayed expansion on and off within the loop.
#echo off
setlocal disableDelayedExpansion
>samplefile2.txt (
for /f "delims=" %%A in ('findstr /n "^" samplefile.txt') do (
set "str=%%A"
setlocal enableDelayedExpansion
set "str=!str:*:=!"
set "str=!str:day=night!"
set "str=!str:winter=summer!"
echo(!str!
endlocal
)
)
Or you could forget custom batch entirely and get a much simpler and faster solution using my JREPL.BAT utility that performs regular expression search and replace on text. There are options to specify multiple literal search/replace pairs.
jrepl "day winter" "night summer" /t " " /l /i /f sampleFile.txt /o sampleFile2.txt
I used the /I option to make the search case insensitive. But you can drop that option to make it case sensitive if you prefer. That cannot be done easily using pure batch.
#ECHO Off
SETLOCAL
(
FOR /f "tokens=1*delims=]" %%a IN ('find /n /v "" q27459813.txt') DO (
SET "line=%%b"
IF DEFINED line (CALL :subs) ELSE (ECHO()
)
)>newfile.txt
GOTO :EOF
:subs
SET "line=%line:day=night%"
SET "line=%line:winter=summer%"
ECHO(%line%
GOTO :eof
Thi should work for you. I used a file named q27459813.txt containing your data for my testing.
Produces newfile.txt
Will not work correctly if the datafile lines start ].
Revised to allow leading ]
#ECHO Off
SETLOCAL
(
FOR /f "delims=" %%a IN ('type q27459813.txt^|find /n /v "" ') DO (
SET "line=%%a"
CALL :subs
)
)>newfile.txt
GOTO :EOF
:subs
SET "line=%line:*]=%"
IF NOT DEFINED line ECHO(&GOTO :EOF
SET "line=%line:day=night%"
SET "line=%line:winter=summer%"
ECHO(%line%
GOTO :eof
I am wondering if there is an easy way to check for files in a directory that contain a line that exceeds a certain number of characters. For example, I have a directory with 10000 files and I would like to see which files have at least one line that has over 1000 characters. Is it possible to check this via a batch script? Thank you.
This is for Windows 7 Enterprise, 64-bit, Service Pack 1
Easiest and fastest way would be to use the grep binary from GnuWin32. I believe this syntax would work:
grep -Pl ".{1000}" *
Which will perform a perl-syntax regular expression search in * for any line containing 1000 characters, and output the filename if a match is found.
It would definitely be possible to accomplish what you are asking with a pure batch script, but a for loop looping through 10,000 files with who-knows-how-many lines each, would take forever and a day.
OK Prof. Pickle, here's your batch file. I went with using variable substring extraction for speed. Also, if a line with 1000 characters is encountered, immediately move to the next file. I still reckon grep will be faster and simpler. o°/
#echo off
setlocal enabledelayedexpansion
for %%a in (*) do (
call :look "%%a"
)
goto :EOF
:look
for /f "usebackq delims=" %%I in ("%~1") do (
set "line=%%I"
if "!line:~999,1!" neq "" echo %~1 && exit /b
)
Pure batch:
#echo off&setlocal enabledelayedexpansion
for %%a in (*.txt) do (
for /f "tokens=1-2delims=:" %%i in ('"cmd /c type "%%~a" ^&echo(|findstr /no ^^"') do (
set "pos1=!pos0!"&set "line1=!line0!"
set "pos0=%%j"&set "line0=%%i"
set /a length=!pos0!-!pos1!-2
if !length! gtr 1000 echo line: !line1! length: !length! in file: %%~a
))
Change *.txt to your desired search pattern.
Edit: minor improvement (^^).
I found a much faster solution with a temp file:
#echo off&setlocal enabledelayedexpansion
set "tempfile=%temp%\%random%"
for %%a in (*.txt) do (
<"%%~a">"%tempfile%" more
echo(>>"%tempfile%"
for /f "tokens=1-2delims=:" %%i in ('^"^< "%tempfile%" findstr /no "^" ^"') do (
set "pos1=!pos0!"&set "line1=!line0!"
set "pos0=%%j"&set "line0=%%i"
set /a length=!pos0!-!pos1!-2
if !length! gtr 0 echo line: !line1! length: !length! in file: %%~a
))
del "%tempfile%" >nul 2>&1
Edit: improved escaping for XP.
I'd like to print each line of 2 separate txt files alternately using a for loop in a batch file, I tried using an AND but was given: "AND was unexpected at this time" in cmd.exe when I ran my batch. Any ideas?
FOR /F "tokens=*" %%F in (!logPath!) AND for /f "tokens=*" %%H in (%%refLogPath) DO (
REM print each line of log file and refLog file sequentially
echo %%F
echo %%H
REM set logLine=%%F
REM check 'each line' of log file against ENG-REF.log
)
There isn't a keyword like AND, normally you couldn't solve this with two FOR loops.
But there is an alternative way to read a file with set /p.
setlocal EnableDelayedExpansion
<file2.txt (
FOR /F "delims=" %%A in (file1.txt) DO (
set /p lineFromFile2=
echo file1=%%A, file2=!lineFromFile2!
)
)
I believe this is as robust as a batch solution can get.
It handles blank lines in both files
It can read up to approximately 8k bytes on each line
The number of lines in the files does not have to match
A line can begin with any character (avoiding a FOR /F EOL issue)
A line can contain ! without getting corrupted (avoiding a problem of expanding a FOR
variable while delayed expansion is enabled)
Lines can be either Unix or Windows style.
Control characters will not be stripped from end of line
But this solution will get progressively slower as it reads a large file because it must rescan the 2nd file from the beginning for every line.
#echo off
setlocal disableDelayedExpansion
set "file1=file1.txt"
set "file2=file2.txt"
for /f %%N in ('find /c /v "" ^<"%file2%"') do set file2Cnt=%%N
findstr /n "^" "%file1%" >"%file1%.tmp"
findstr /n "^" "%file2%" >"%file2%.tmp"
set "skip=0"
set "skipStr="
for /f "usebackq delims=" %%A in ("%file1%.tmp") do (
set "ln1=%%A"
call :readFile2
set /a "skip+=1"
)
if %file2Cnt% gtr %skip% (
for /f "usebackq skip=%skip% delims=" %%B in ("%file2%.tmp") do (
set "ln2=%%B"
setlocal enableDelayedExpansion
set "ln2=!ln2:*:=!"
(echo()
(echo(!ln2!)
)
)
del "%file1%.tmp" 2>nul
del "%file2%.tmp" 2>nul
exit /b
:readFile2
if %skip% gtr 0 set "skipStr=skip=%skip% "
if %file2Cnt% gtr %skip% (
for /f "usebackq %skipStr%delims=" %%B in ("%file2%.tmp") do (
set "ln2=%%B"
goto :break
)
) else set "ln2="
:break
setlocal enableDelayedExpansion
set "ln1=!ln1:*:=!"
if defined ln2 set "ln2=!ln2:*:=!"
(echo(!ln1!)
(echo(!ln2!)
exit /b
Much better to use jeb's approach if that solution's limitations are not a concern with your files. It currently has the following limitations that could be removed with fairly minor modifications:
Files must have same number of lines
Files must not have blank lines
File1 must not contain ! character
No line in File1 can start with ;
In addition it has the following limitations when reading File2 that are inherent to the SET /P limitations
Lines must be Windows style, ending in carriageReturn lineFeed
Lines cannot exceed 1021 characters (bytes) excluding the line terminators
Control characters will be stripped off the end of each line
An even better solution would be to use something other than batch. There are many possibilities: VBS, JScript, PowerShell, perl ... the list goes on and on.
There is a folder which contains some random files:
file1.txt
file2.exe
file3.cpp
file4.exe
How to SIMPLY display exe files connected with numbers like this:
1. file2.exe
2. file4.exe
And then I enter the number of the file, which I want to delete.. If it is even possible to do this simply..
Shortest bullet proof solution I can come up with. Like Anders, the DEL statement is disabled by the ECHO command. Remove the ECHO to make the menu functional.
#echo off
setlocal disableDelayedExpansion
for /f "delims==" %%A in ('set menu 2^>nul') do set "%%A="
for /f "tokens=1* delims=:" %%A in ('dir /b *.exe 2^>nul ^| findstr /n "^"') do (
set menu%%A=%%B
echo %%A. %%B
)
if not defined menu1 exit /b
set "delNum="
set /p "delNum=Delete which file (enter the number): "
setlocal enableDelayedExpansion
if defined menu!delNum! echo del "!menu%delNum%!"
The only thing I can think of that could go wrong is part of the menu could scroll off the screen if there are too many entries.
Additional messages can easily be incorporated. and an ELSE condition could be appended to the input validation to deal with invalid input.
A few subtle points of the code:
FINDSTR /N provides incrementing file number. Avoids need for delayed expansion or CALL within menu builder loop. Delayed expansion should not be enabled when expanding a FOR variable containing a file name because it will corrupt names containing !.
: is a safe FOR delimiter because a file name cannot contain :.
delNum is cleared prior to SET /P because SET /P will preserve existing value if <Enter> is pressed without entering anything.
Checking for the existence of the variable is the simplest way to validate the input. This is why it is critical that any existing MENU variables are undefined prior to building the menu.
Must use delayed expansion in IF DEFINED validation, otherwise space in input could crash the script (thanks Anders for pointing out the flaw in the original code)
DEL target must be quoted in case it contains spaces, even when delayed expansion is used.
Added test to make sure at least one menu entry exists before continuing. There may not be any .exe files left to delete.
#echo off
setlocal EnableDelayedExpansion
set i=0
for %%f in (*.exe) do (
set /A i+=1
set file[!i!]=%%f
echo !i!. %%f
)
set i=0
set /P i=File to delete:
del !file[%i%]!
Not exactly pretty but it gets the job done
#echo off
setlocal ENABLEEXTENSIONS DISABLEDELAYEDEXPANSION
goto main
:addit
set /A end=end + 1
set %end%=%~1
echo %end%. %~1
goto :EOF
:main
set end=0
for %%A in ("*.exe") do (
call :addit "%%~A"
)
if "%end%"=="0" goto :EOF
echo.&set idx=
set /P idx=Delete (1...%end%)
if not "%idx"=="" if %idx% GEQ 1 if %idx% LEQ %end% (
for /F "tokens=1,* delims==" %%A in ('set %idx% 2^>nul') do (
if "%idx%"=="%%~A" (
echo.Deleting %%~B...
rem del "%%~B"
)
)
)