I have a need of a code used in a batch file that renames bankfiles created from SAP (I am a SAP-man), stored in a location on the server.
Problem: All bank-files get a name from a sequence table in SAP (date + number). Before I send them to the bank they have to have a certain name structure.
I have a code and this has worked fine up to now. The problem now is that i send a "batch" (several) of files from SAP and they are named randomly.
In the first line of each file there is a unique batch ID, that is a bank sequence number and the files has to be named in this order.
I have done a lot of VBA programming, but i am not to strong in this subject.
Needed solution: What i need is that for each file it should read the first line of the file and fetch position 71 and 4 positions forward, save this in a variable and add it to the end of the file name.
Example: If the original file name from SAP is 20160301-001.txt I want to rename it to "P.00934681758.005.TBII.00000xxxx.txt" (where "xxxx" is the position 71 to 74 in the first line of the file.
It looks through a lot of bank directories today, but below you can find todays code (that work, except that in this code it renumbers "a", starting with no 1 - never more than 9 files) and i want to replace it with these 4 digits from the file:
Todays name: P.00934681758.005.TBII.00000!a!.dat ("a" beeing a variable)
New name: P.00934681758.005.TBII.00xxxx.dat ("xxxx" the digits from the file)
Todays code (part of it - showing 2 "scanned" directories):
#echo off & setlocal EnableDelayedExpansion
REM Start by renaming all files in each folder to the namerule of the bank
cd PL270\UT01
set a=1
for /f "delims=" %%i in ('dir /b *') do (
if not "%%~nxi"=="%~nx0" (
ren "%%i" "P.00934681758.002.TBII.00000!a!".dat
set /a a+=1
)
)
cd..
cd..\PL570\UT01
set a=1
for /f "delims=" %%i in ('dir /b *') do (
if not "%%~nxi"=="%~nx0" (
ren "%%i" "P.00934681758.005.TBII.00000!a!".dat
set /a a+=1
)
)
The code is run in a *.bat-file today, scheduled every 5th minute on the server.
All help will be VERY much appreciated:-)
B.r. Solve
As I understand the question you need to replace hardcoded a with a code readed from the file you'are renaming:
for /f "delims=" %%i in ('dir /b *') do (
set /p fline=<"%%i"
set code=!fline:~70,4!
echo !code!
if not "%%~nxi"=="%~nx0" (
ren "%%i" "P.00934681758.002.TBII.00000!code!".dat
set /a a+=1
)
)
(once I've worked for SAP - most boring I ever had...)
The following script walks through a given directory tree and searches for matching files. A file is considered matching if its name begins with today's date in the format YYYYMMDD, followed by a -, followed by a number, followed by extension.txt. For each file found, the demanded characters are extracted from the first line, then they are appended to the prefix P.00934681758.005.TBII.00, and the extension .dat is appended.
The behaviour of the batch file can be controlled by the constants defined at the beginning. To learn how it works and what happens, consult the rem comments throughout the whole file.
Here is the code:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem Definition of constants:
set "LOCATION=%~dp0" & rem Working directory; `%~dp0` is container, `.` is current;
set "RECURSIVE=#" & rem Flag to walk through working directory recursively, if defined;
set "SEPARATOR=-" & rem Separator character of original file name (one character!);
set "PATTERN=????????%SEPARATOR%*.txt" & rem Search pattern for `dir`;
rem Filter for `findstr /R` to match only correct file names (`.*` deactivates it):
set "REGEX=[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%SEPARATOR%[0-9][0-9]*\.txt";
rem Filter for date prefix in file names (fixed date like `YYYYMMDD`, all files if empty):
set "TODAY=%DATE:~,4%%DATE:~5,2%%DATE:~8,2%" & rem Expected `%DATE%` format: `YYYY/MM/DD`;
set "NEWNAME=P.00934681758.005.TBII.00" & rem Beginning of new file name;
set "NEWEXT=.dat" & rem New file extension;
set "NUMLINE=0" & rem Zero-based line number where to extract characters from;
set "POSITION=70,4" & rem Zero-based start position, number of characters (length);
set "FORCE=#" & rem Flag to force extracted characters not to be empty, if defined;
rem Change to working directory:
pushd "%LOCATION%" || (
>&2 echo Cannot find "%LOCATION%". & exit /B 1
)
rem Initialise some variables:
set /A NUMLINE+=0
if %NUMLINE% LEQ 0 (set "NUMSKIP=") else (set "NUMSKIP=skip^=%NUMLINE%^ ")
if defined RECURSIVE (set "RECURSIVE=/S ")
rem Walk through directory (tree) and search for matching file names (sorted by name):
for /F "eol=| delims=" %%F in ('
dir /B %RECURSIVE%/A:-D /O:-N-D "%PATTERN%" ^| ^
findstr /R /C:"^%REGEX%$" /C:"\\%REGEX%$"
') do (
rem Split file name by given separator:
for /F "eol=| tokens=1,* delims=%SEPARATOR%" %%I in ("%%~nxF") do (
set "PREFIX=%%I"
set "SUFFIX=%%J"
setlocal EnableDelayedExpansion
rem Check whether first part of file name matches today's date:
if "%TODAY%"=="" (set "TODAY=!PREFIX!")
if "!PREFIX!"=="!TODAY!" (
set "OLDNAME=!PREFIX!%SEPARATOR%!SUFFIX!"
rem Extract characters from predefined position from file (sub-routine):
call :EXTRACT PORTION "!OLDNAME!"
if defined FORCE (
rem Check extracted character string if empty optionally:
if not defined PORTION (
>&2 echo Nothing at position ^(%POSITION%^). & popd & exit /B 1
)
)
rem Build new file name using extracted character string:
set "BUILTNAME=%NEWNAME%!PORTION!%NEWEXT%"
rem Check whether a file with the new name already exists:
if not exist "!BUILTNAME!" (
rem Actually rename file here (as soon as `ECHO` is removed!):
ECHO ren "!OLDNAME!" "!BUILTNAME!"
) else (
>&2 echo "!BUILTNAME!" already exists. & popd & exit /B 1
)
) else (
endlocal
rem First part of file name does not match today's date, so leave loop:
goto :CONTINUE
)
endlocal
)
)
:CONTINUE
popd
endlocal
exit /B
:EXTRACT PORTION "FileSpec"
rem Sub-routine to extract characters from a file at a given position;
set "PART="
setlocal DisableDelayedExpansion
rem Read specified line from given file:
for /F usebackq^ %NUMSKIP%delims^=^ eol^= %%L in ("%~2") do (
set "LINE=%%L"
if defined LINE (
setlocal EnableDelayedExpansion
rem Extract specified sub-string of read line by position and length:
set "PART=!LINE:~%POSITION%!"
if defined PART (
for /F delims^=^ eol^= %%S in ("!PART!") do (
endlocal
set "PART=%%S"
)
) else (
endlocal
)
)
rem Do not read any more lines from the file, leave loop:
goto :QUIT
)
:QUIT
rem Return extracted character string:
endlocal & set "%~1=%PART%"
exit /B
The script does no renaming as long as you do not remove the upper-case ECHO in front of the ren command.
Related
As seen in the title, I am wondering if I can name a .txt file in consequence of a verification of other files that are located in the same folder...
Before even creating my new file, I tried this but it doesn't work since I get an error The system cannot find the file specified. and I can't use Traces!i!.txt as a variable even if I define it later:
#Echo Off
SETLOCAL EnableDelayedExpansion
Set dir=C:\...\myFolder\
Set /A i=1
for /f %%a in ('dir %dir%Traces*.txt /b') do (
Set file=%%a
Set /A result=!i! - !file:~6,-4!
If "!result:~0,1!"=="-" ( Set /A i=!file:~6,-4! + 1 )
If "!result:~0,1!"=="0" ( Set /A i=!i! + 1 )
)
Echo Traces!i!.txt
ENDLOCAL
pause
Also, is there a way that I get the final chosen name as a variable ?
I want the files in my folder to look like:
list1.txt
list2.txt
list3.txt
The next time when I create a new file, its name is supposed to be list4.txt, so I need a program that actually check for other files like I said before.
#echo off
setlocal
set "dir=C:\...\myFolder\"
set "i=0"
:get_filename
Set /a "i+=1"
if exist "%dir%Traces%i%.txt" goto :get_filename
set "filename=Traces%i%.txt"
echo "%dir%%filename%"
endlocal
pause
Can just use goto a label to loop to get an indexed filename. Once the 1st indexed filename is not found, filename is set with the value of the indexed filename available for creation.
If the number suffixes are always consecutive, you could sort the file names so that the highest number can be retrieved. Since batch scripting is only capable of pure alphabetic sorting, you need to pad the number suffixes with zeros to the left to get a fixed width, then alphabetic sorting results in the same order as alpha-numeric one.
Here is an example of what I mean:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_TARGET=D:\path\to\target\dir" & rem // (path to the target directory)
set "_PREFIX=Traces" & rem // (constant file name prefix)
set "_EXT=.txt" & rem // (file name extension with `.`)
set "_TEMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (path to a temporary file)
rem // Change into target directory:
pushd "%_TARGET%" && (
rem // Write to temporary file:
> "%_TEMPF%" (
rem // Loop through all matching file names:
for /F "delims= eol=|" %%F in ('dir /B /A:-D "%_PREFIX%*%EXT%"') do (
rem // Store current base file name:
set "NAME=%%~nF"
rem // Toggle delayed expansion to avoid loss of `!`:
setlocal EnableDelayedExpansion
rem // Remove prefix from base file name, pad number with 10 zeros:
set "NUM=!NAME:*%_PREFIX%=!" & set "PAD=0000000000!NUM!"
rem // Return 10-digit number plus `|` plus original number (suffix):
echo(!PAD:~-10!^|!NUM!
endlocal
)
)
rem // Restore previous working directory:
popd
)
rem // Reset variable that will receive the resulting highest number (suffix):
set "ITEM="
for /F "tokens=2 delims=| eol=|" %%E in ('sort "%_TEMPF%"') do (
set "ITEM=%%E"
)
rem // Clean up temporary file:
del "%_TEMPF%"
rem // Check whether there were matching files:
if defined ITEM (
rem // Increment found highest number:
set /A "ITEM+=1"
rem // Return next free file path and name:
setlocal EnableDelayedExpansion
echo(!_TARGET!\%_PREFIX%!ITEM!%_EXT%
endlocal
)
endlocal
exit /B
I have a Batch script :
#echo off
setlocal enableDelayedExpansion
SET /P UserInput=Please Enter a Number:
SET /A number=UserInput
ECHO number=%number%
for %%i in (*.jpeg) do call :JPG %%~ni %%i
goto :end
:JPG
set str=%1
set /a str2=%str:_color=%
set /a newnamej=%str2%+%number%
echo %1 ==> I can see the problem with it
set lastnamej=%newnamej%_color.jpeg
ren %2 %lastnamej%
goto :eof
:end
The goal of this script is to take all file in a folder. They are all named after a number (1_color.jpeg, 2_color.jpeg, 3_color.jpeg,..) and I want to rename them with an additionnal number (if user input is 5, 1_color.jpeg will become 6_color.jpeg, and so on).
I have a problem with this script.
if I use a number such as 555, the first file will pass in the for loop 2 times.
(little example : 1_color.jpeg and 2_color.jpeg,
I use my script with 5 so 1_color.jpeg => 6_color.jpeg and 2_color.jpeg => 7_color.jpeg but then, 6_color.jpeg will be read again once, and will become 11_color.jpeg, so my result will be 11_color.jpeg and 7_color.jpeg).
Do someone know how to fix this issue?
Thanks for all!
The problem have two parts: the for %%i in (*.jpeg) ... command may be dinamically affected by the position that a renamed file will occupy in the whole file list, so some files may be renamed twice and, in certain particular cases with many files, up to three times.
The solution is to use a for /F %%i in ('dir /B *.jpeg') ... command instead, that first get the list of all files, and then start the renaming process.
Also, the rename must be done from last file to first one order, to avoid duplicate numbers.
However, in this case the use of for /F combined with "tokens=1* delims=_" option also allows to process the first underscore-separated number in the file names in a simpler way:
#echo off
setlocal EnableDelayedExpansion
SET /P number=Please Enter a Number:
ECHO number=%number%
for /F "tokens=1* delims=_" %%a in ('dir /O:-N /B *.jpeg') do (
set /A newNum=%%a+number
ren "%%a_%%b" "!newNum!_%%b"
)
User Aacini provided a nice solution in his answer, pointing out both issues at hand, namely the fact that for does not fully enumerate the directory in advance (see this thread: At which point does for or for /R enumerate the directory (tree)?) and the flaw in the logic concerning the sort order of the processed files.
However, there is still a problem derived from the purely (reverse-)alphabetic sort order of dir /B /O:-N *.jpeg, which can still cause collisions, as the following example illustrates:
9_color.jpeg
8_color.jpeg
7_color.jpeg
6_color.jpeg
5_color.jpeg
4_color.jpeg
3_color.jpeg
2_color.jpeg
10_color.jpeg
1_color.jpeg
So if the entered number was 1, file 9_color.jpeg is tried to be renamed to 10_color.jpeg, which fails because that file already exists as it has not yet been processed (hence renamed to 11_color.jpeg).
To overcome this problem, you need to correctly sort the items in reverse alpha-numeric order. This can be achieved by left-zero-padding the numbers before sorting them, because then, alphabetic and alpha-numeric sort orders match. Here is a possible implementation:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_LOCATION=." & rem // (directory containing the files to rename)
set "_PATTERN=*_*.jpeg" & rem // (search pattern for the files to rename)
set "_REGEX1=^[0-9][0-9]*_[^_].*\.jpeg$" & rem // (`findstr` filter expression)
set "_TEMPFILE=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (path to temporary file)
rem // Retrieve numeric user input:
set "NUMBER="
set /P NUMBER="Please Enter a number: "
set /A "NUMBER+=0"
if %NUMBER% GTR 0 (set "ORDER=/R") else if %NUMBER% LSS 0 (set "ORDER=") else exit /B
rem /* Write `|`-separated list of left-zero-padded file prefixes, original and new
rem file names into temporary file: */
> "%_TEMPFILE%" (
for /F "tokens=1* delims=_" %%E in ('
dir /B "%_LOCATION%\%_PATTERN%" ^| findstr /I /R /C:"%_REGEX1%"
') do (
set "NAME=%%F"
setlocal EnableDelayedExpansion
set "PADDED=0000000000%%E"
set /A "NUMBER+=%%E"
echo !PADDED:~-10!^|%%E_!NAME!^|!NUMBER!_!NAME!
endlocal
)
)
rem /* Read `|`-separated list from temporary file, sort it by the left-zero-padded
rem prefixes, extract original and new file names and perform actual renaming: */
< "%_TEMPFILE%" (
for /F "tokens=2* delims=|" %%K in ('sort %ORDER%') do (
ECHO ren "%%K" "%%L"
)
)
rem // Clean up temporary file:
del "%_TEMPFILE%"
endlocal
exit /B
After having successfully verified the correct output of the script, to not forget to remove the upper-case ECHO command in front of the ren command line.
The script uses a temporary file that receives a |-separated table with the padded numeric prefix in the first, the original file name in the second and the new file name in the third column, like this:
0000000010|10_color.jpeg|11_color.jpeg
0000000001|1_color.jpeg|2_color.jpeg
0000000002|2_color.jpeg|3_color.jpeg
0000000003|3_color.jpeg|4_color.jpeg
0000000004|4_color.jpeg|5_color.jpeg
0000000005|5_color.jpeg|6_color.jpeg
0000000006|6_color.jpeg|7_color.jpeg
0000000007|7_color.jpeg|8_color.jpeg
0000000008|8_color.jpeg|9_color.jpeg
0000000009|9_color.jpeg|10_color.jpeg
The temporary file is read and sorted by the sort command. The strings from the second and third columns are extracted and passed over to the ren command.
I have never done a batch file before. I have a few dozen .txt files sitting in a folder (ex. C:\files).
The files all end with 6 rows of text that need to be deleted. A sample would be (note spaces in first line):
var...
'ascending';...
'LIT-xxx,LIT-xxx...
setfunction...
0.33...
getdate...
Additionally, I would like the "new" files to overwrite the current files so that the file names and directory do not change.
abs 10.txt
him 4.txt
lab 18.txt
The following code snippet does exactly what you want, deleting the last six lines from text files:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "FILES=C:\files\*.txt" & rem // (specify file location and pattern)
set /A "LINES=-6" & rem /* (specify number of lines to delete;
rem positive number: delete from begin,
rem negative number: delete from end) */
rem // Standard `for` loop to resolve file pattern:
for %%F in ("%FILES%") do (
rem // Get the count of lines for the current file:
for /F %%N in ('^< "%%~F" find /C /V ""') do set "COUNT=%%N"
rem // Initialise a line index:
set /A "INDEX=-LINES"
rem /* Enumerate all lines of the current file, preserving empty ones
rem by preceding each with a line number, so no line appears empty
rem to the `for /F` loop; the line number is split off later on;
rem in addition, the current file is emptied after being read: */
for /F "delims=" %%L in ('
findstr /N /R "^" "%%~F" ^& ^> "%%~F" break
') do (
rem // Increment index, get text of currently iterated line:
set /A "INDEX+=1" & set "LINE=%%L"
rem // Toggle delayed expansion to preserve exclamation marks:
setlocal EnableDelayedExpansion
rem // Check index value and write to current file conditionally:
if !INDEX! GTR 0 if !INDEX! LEQ !COUNT! (
rem // Split off line number from line text:
>> "%%~F" echo(!LINE:*:=!
)
endlocal
)
)
endlocal
exit /B
This approach does not use temporary files in order to avoid any name conflicts. However, due to the fact that there are multiple file write operations for every single file, the overall performance is a bit worse than when writing all data to a temporary file at once and moving it back onto the original file.
Backup your original files to a different backup folder, then run this script:
#echo off
setlocal enabledelayedexpansion
pushd "%temp%\Test"
for %%G in ("*.txt") do (set "break="
(for /f "delims=|" %%H in (%%~G) do (
if not defined break (
echo %%H | findstr /r /b /c:"[ ]*var.*" >nul && set break=TRUE || echo %%H )
)) >> %%~nG_mod.txt
del %%~G & ren %%~nG_mod.txt %%G )
popd
exit /b
It assumed:
your 6 rows of text always start from [any number of spaces]var[any text] row, as you posted in the question, where only one string of such kind is present in any file
other 5 bottom rows don't need to match in every file
you save the files to filter in %temp%\Test, and there are no other unrelated files in that dir.
how can I remove the duplicate entry from the text file using batch script. All i want to remove the duplicates before "=" sign and "%%" is exist in every single text file. Text file look likes below
%%B05AIPS_CDDOWNLOAD_IBDE_UNC=\\%%B05AIPS_UPLOAD_NODE.\F$\DATA\IPSL\CDFILES\B05_NAG\CD\INCOMING
%%B05AIPS_CDDOWNLOAD_FTS_UNC=\\%%B05AIPS_UPLOAD_NODE.\B05_NAG\FTS\To_Clearpath\%%DATE_CCYYMMDD.
%%B05AIPS_CDDOWNLOAD_FTS_UNC=%%B05AIPS_CDDOWNLOAD_FTS_UNC.
I got about 30 plus different text files which contains above kind of entries and want to remove the duplicate line and want to keep the first occurrence. Remember duplicate line should be identified before "=" sign only and removal required for the entire line.Each of the different text files have got "%%" sign. Please guide me if there is way to do through batch script or vbscript? Thanks
Here is a simple batch-file solution; let us call the script rem-dups.bat. Supposing your input file is test.txt and your output file is result.txt, you need to provide these files as command line arguments, so you need to call it by: rem-dups.bat "test.txt" "results.txt". Here is the script:
#echo off
setlocal EnableExtensions EnableDelayedExpansion
set "INFILE=%~1"
set "OUTFILE=%~2"
if not defined INFILE exit /B 1
if not defined OUTFILE set "OUTFILE=con"
for /F "usebackq tokens=1,* delims==" %%K in ("%INFILE%") do (
set "LEFT=%%K"
set "RIGHT=%%L"
set "LEFT=!LEFT:*%%%%=__!"
rem Remove `if` query to keep last occurrence:
if not defined !LEFT! set "!LEFT!=!RIGHT!"
)
> "%OUTFILE%" (
for /F "delims=" %%F in ('set __') do (
set "LINE=%%F"
echo(!LINE:*__=%%%%!
)
)
endlocal
exit /B
The script is based on the fact that there cannot occur duplicate environment variables, that are such with equal names.
This code only works if the following conditions are fulfilled:
the file content is treated in a case-insensitive manner;
the order of lines in the output file does not matter;
the partial strings before the first = sign start with %% and contain at least one more character other than %;
the partial strings before the first = contain only characters which may occur within environment variable names, besides the leading %%;
the partial strings after the first = must not be empty;
the partial strings after the first = must not start with = on their own;
no exclamation marks ! are allowed within the file, because they may get lost or lead to other unexpected results;
Here is an alternative method using a temporary file:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "INFILE=%~1"
set "OUTFILE=%~2"
if not defined INFILE exit /B 1
if not defined OUTFILE set "OUTFILE=con"
set "TEMPFILE=%TEMP%\%~n0_%RANDOM%.tmp"
> "%TEMPFILE%" break
> "%OUTFILE%" (
for /F usebackq^ delims^=^ eol^= %%L in ("%INFILE%") do (
for /F tokens^=1^,*^ delims^=^=^ eol^= %%E in ("%%L") do (
> nul 2>&1 findstr /I /X /L /C:"%%E" "%TEMPFILE%" || (
echo(%%L
>> "%TEMPFILE%" echo(%%E
)
)
)
)
> nul 2>&1 del "%TEMPFILE%"
endlocal
exit /B
Every unique (non-empty) token left to the first = sign is written to a temporary file, which is searched after having read each line from the input file. If the token is already available in the temporary file, the line is skipped; if not, it is written to the output file.
The file content is treated in a case-insensitive manner, unless you remove the /I switch from the findstr command.
Update: Improved Scripts
Here are two scripts which are improved so that no special character can bring them to fail. They do not use temporary files. Both scripts remove lines with duplicate keywords (such is the partial string before the first = sign).
This script keeps the first line in case of duplicate keywords have been encountered:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "INFILE=%~1"
set "OUTFILE=%~2"
if not defined INFILE exit /B 1
if not defined OUTFILE exit /B 1
> "%OUTFILE%" break
for /F usebackq^ delims^=^ eol^= %%L in ("%INFILE%") do (
for /F tokens^=1^ delims^=^=^ eol^= %%E in ("%%L") do (
set "LINE=%%L"
set "KEY=%%E"
setlocal EnableDelayedExpansion
if not "!LINE:~,1!"=="=" (
set "KEY=!KEY: = !"
set "KEY=!KEY:\=\\!" & set "KEY=!KEY:"=\"!"
more /T1 "%OUTFILE%" | > nul 2>&1 findstr /I /M /B /L /C:"!KEY!=" || (
>> "%OUTFILE%" echo(!LINE!
)
)
endlocal
)
)
endlocal
exit /B
This script keeps the last line in case of duplicate keywords have been encountered:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "INFILE=%~1"
set "OUTFILE=%~2"
if not defined INFILE exit /B 1
if not defined OUTFILE exit /B 1
> "%OUTFILE%" (
for /F delims^=^ eol^= %%L in ('findstr /N /R "^" "%INFILE%"') do (
set "LINE=%%L"
for /F "delims=:" %%N in ("%%L") do set "LNUM=%%N"
setlocal EnableDelayedExpansion
set "LINE=!LINE:*:=!"
if defined LINE if not "!LINE:~,1!"=="=" (
for /F tokens^=1^ delims^=^=^ eol^= %%E in ("!LINE!") do (
setlocal DisableDelayedExpansion
set "KEY=%%E"
setlocal EnableDelayedExpansion
set "KEY=!KEY: = !"
set "KEY=!KEY:\=\\!" & set "KEY=!KEY:"=\"!"
more /T1 +!LNUM! "%INFILE%" | > nul 2>&1 findstr /I /M /B /L /C:"!KEY!=" || (
echo(!LINE!
)
endlocal
endlocal
)
)
endlocal
)
)
endlocal
exit /B
For both scripts, the following rules apply:
the order of lines with non-duplicate keywords is maintained;
empty lines are ignored and therefore removed;
empty keywords, meaning lines starting with =, are ignored and therefore removed;
non-empty lines that do not contain an = at all are treated as they would be ended with an = for the check for duplicates, hence the entire line is used as the keyword;
for the check for duplicates, each TAB character is replaced by a single SPACE;
every line that is transferred to the returned file is copied from the original file without changes (hence the aforementioned attachment of = or replacement of TAB is not reflected there);
the check for duplicates is done in a case-insensitive manner, unless you remove the /I switch from the findstr command;
Amendment: Processing Multiple Files
All of the above scripts are designed for processing a single file only. However, if you need to process multiple files, you could simply write a wrapper that contains a for loop enumerating all the input files and calls one of the scripts above (called rem-dups.bat) for every item -- like this:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem Define constants here:
set "INPATH=D:\Data\source" & rem (location of input files)
set "OUTPATH=D:\Data\target" & rem (location of output files)
set INFILES="source.txt" "test*.txt" & rem (one or more input files)
set "OUTSUFF=_no-dups" & rem (optional suffix for output file names)
set "SUBBAT=%~dp0rem-dups.bat"
pushd "%INPATH%" || exit /B 1
for %%I in (%INFILES%) do if exist "%%~fI" (
call "%SUBBAT%" "%%~fI" "%OUTPATH%\%%~nI%OUTSUFF%%%~xI"
)
popd
endlocal
exit /B
You must not specify the same locations for the input and output files. If you want to overwrite the original input files, you need to write the modified output files to another location first, then you can move them back to the source location -- supposing you have set OUTSUFF in the wrapper script to an empty string (set "OUTSUFF=" instead of set "OUTSUFF=_no-dups"). The command line to overwrite the original input files would be: move /Y "D:\Data\target\*.*" "D:\Data\source".
You could read the file into Excel without splitting it into multiple columns. Use Excel functionality to eliminate duplicates and save it back. You could do all this in VBScript.
Create an Excel Object
Loop
Load text file
Remove duplicates
Save text file
Until there are no more files
Dispose of the Excel Object
Code for the individual pieces should be easily available on the web. Do ask for any additional, specific, pointers you might need.
I have about 300 000 files in a directory. They are sequentially numbered - x000001, x000002, ..., x300000. But some of these files are missing and I need to write an output text file containing the missing file numbers. The following code does it only up to 10 000 files:
#echo off
setlocal enabledelayedexpansion
set "log=%cd%\logfile.txt"
for /f "delims=" %%a in ('dir /ad /b /s') do (
pushd "%%a"
for /L %%b in (10000,1,19999) do (
set a=%%b
set a=!a:~-4!
if not exist "*!a!.csv" >>"%log%" echo "%%a - *!a!.csv"
)
popd
)
How to extend it to 3 * 10^5 files?
Solution 1 - simple but slow
If all 300000 CSV files are in current directory on executing the batch file, this batch code would do the job.
#echo off
set "log=%cd%\logfile.txt"
del "%log%" 2>nul
for /L %%N in (1,1,9) do if not exist *00000%%N.csv echo %%N - *00000%%N.csv>>"%log%"
for /L %%N in (10,1,99) do if not exist *0000%%N.csv echo %%N - *0000%%N.csv>>"%log%"
for /L %%N in (100,1,999) do if not exist *000%%N.csv echo %%N - *000%%N.csv>>"%log%"
for /L %%N in (1000,1,9999) do if not exist *00%%N.csv echo %%N - *00%%N.csv>>"%log%"
for /L %%N in (10000,1,99999) do if not exist *0%%N.csv echo %%N - *0%%N.csv>>"%log%"
for /L %%N in (100000,1,300000) do if not exist *%%N.csv echo %%N - *%%N.csv>>"%log%"
set "log="
Solution 2 - faster but more difficult to understand
This second solution is definitely much faster than above as it processes the list of file names in current directory from first file name to last file name.
In case of last file is not x300000.csv, the batch code below just writes one more line into the log file with the information from which number to expected end number 300000 files are missing in current directory.
#echo off
setlocal EnableExtensions EnableDelayedExpansion
rem Delete log file before running file check.
set "log=%cd%\logfile.txt"
del "%log%" 2>nul
rem Define initial value for the number in the file names.
set "Number=0"
rem Define the file extension of the files.
set "Ext=.csv"
rem Define beginning of first file name with number 1.
set "Name=x00000"
rem Define position of dot separating name from extension.
set "DotPos=7"
rem Process list of files matching the pattern of fixed length in current
rem directory sorted by file name line by line. Each file name is compared
rem case-sensitive with the expected file name according to current number.
rem A subroutine is called if current file name is not equal expected one.
for /F "delims=" %%F in ('dir /B /ON x??????%Ext% 2^>nul') do (
set /A Number+=1
if "!Name!!Number!%Ext%" NEQ "%%F" call :CheckDiff "%%F"
)
rem Has last file not expected number 300000, log the file numbers
rem of the files missing in current directory with a single line.
if "%Number%" NEQ "300000" (
set /A Number+=1
echo All files from number !Number! to 300000 are also missing.>>"%log%"
)
endlocal
rem Exit this batch file to jump to predefined label EOF (End Of File).
goto :EOF
rem This is a subroutine called from main loop whenever current file name
rem does not match with expected file name. There are two reasons possible
rem with file names being in expected format:
rem 1. One leading zero must be removed from variable "Name" as number
rem has increased to next higher power of 10, i.e. from 1-9 to 10,
rem from 10-99 to 100, etc.
rem 2. The next file name has really a different number as expected
rem which means there are one or even more files missing in list.
rem The first reason is checked by testing if the dot separating name
rem and extension is at correct position. One zero from end of string
rem of variable "Name" is removed if this is the case and then the
rem new expected file name is compared with the current file name.
rem Is the perhaps newly determined expected file name still not
rem equal the current file name, the expected file name is written
rem into the log file because this file is missing in list.
rem There can be even more files missing up to current file name. Therefore
rem the number is increased and entire subroutine is executed once more as
rem long as expected file name is not equal the current file name.
rem The subroutine is exited with goto :EOF if the expected file name
rem is equal the current file name resulting in continuing in main
rem loop above with checking next file name from directory listing.
:CheckDiff
set "Expected=%Name%%Number%%Ext%"
if "!Expected:~%DotPos%,1!" NEQ "." (
set "Name=%Name:~0,-1%"
set "Expected=!Name!%Number%%Ext%"
)
if "%Expected%" EQU %1 goto :EOF
echo %Expected%>>"%log%"
set /A Number+=1
goto CheckDiff
For understanding the used commands in both solutions and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
call /?
dir /?
echo /?
endlocal /?
for /?
if /?
goto /?
rem /?
set /?
setlocal /?
#echo off
setlocal EnableDelayedExpansion
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
set "num=1000000"
del logfile.txt 2> NUL
< NUL (for %%a in (*.csv) do (
set /A num+=1
set /P "=!num:~1!!CR!"
if "x!num:~1!" neq "%%~Na" call :missingFile "%%~Na"
))
goto :EOF
:missingFile file
echo x%num:~1%.csv>> logfile.txt
echo x%num:~1%.csv Missing
set /A num+=1
if "x%num:~1%" neq "%~1" goto missingFile
exit /B