Batch script deleting specific lines in multiple files - batch-file

I'm looking for a script or a program that can delete specific lines from a text file ( input.001.log.....input.log.1900), the files have 50MB size and I have around 2k files. On every line there is a string, I want to delete every line with double characters "aa" "bb" and so on, also every line with more than 5 numbers, every line with a special character except # # & and every line with more than 2 special characters ( like a#bcd#38s# this line needs to be deleted)
As a note I don't have any programming skills, just small experience with batch scripting.
So far, I'm using this code:
#ECHO OFF
SETLOCAL
FOR %%i IN (input.txt) DO (
TYPE "%%i"|FINDstr /l /v "aa bb cc dd ff gg hh ii jj kk ll mm nn pp qq rr ss tt uu vv xx yy zz" >"input_1.txt"
)
GOTO :EOF

This would be easy if batch had a decent regular expression utility, but FINDSTR is extremely limited and buggy. However, FINDSTR can solve this problem rather efficiently without too much difficulty.
You aren't very clear as to what you mean by "special character". My interpretation is you only want to accept alpha characters a-z and A-Z, digits 0-9, and special characters #, #, and &. I can only guess that you are building a dictionary of potential passwords.
I find this problem easier if you build environment variables that represent various classes of characters, as well as various logical expressions, and then use the variables within your search string.
I recommend you write your modified files to a new folder.
#echo off
setlocal
set "alpha=abcdefghijklmnopqrstuvwxyz"
set "num=0123456789"
set "sym=##&"
set "dups=aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz 00 11 22 33 44 55 66 77 88 99 ## ## &&"
set "bad=[^%alpha%%num%%sym%]"
set "num6=[%num%][^%num%]*[%num%][^%num%]*[%num%][^%num%]*[%num%][^%num%]*[%num%][^%num%]*[%num%]"
set "sym3=[%sym%][^%sym%]*[%sym%][^%sym%]*[%sym%]
set "source=c:\your\source\folder"
set "destination=c:\your\destination\folder"
for %%F in ("%source%\*.txt") do findstr /riv "%dups% %bad% %num6% %sym3%" "%%F" >"%destination%\%%~nxF"
Edit in response to Magoo's comment
The solution must be modified a bit if you are running on Windows XP, as that has a regular expression length limit of 127 bytes, and the %num6% expression exceeds that limit.
The solution should work on XP if you change num6 to
set "num6=[%num%].*[%num%].*[%num%].*[%num%].*[%num%].*[%num%]"
That search logically gives the same result, but it is significantly less efficient because it may require excessive backtracking during the matching process.

Related

How can I replace all instances of a letter and decimal combination greater than a predefined value

I am trying to search through text and find every instance of "Z" followed by a number. If the number is 40 or higher, then it will be replaced with 32.
So for example
N170G00Z58
N280G81X9.1787Y15.1981Z2.3803R4.6F.75L0.0
N300G00Z15.0
N580G03X-12.125Y6.7311Z52.775I-12.5J6.7311F35.0
Would produce
N170G00Z32
N280G81X9.1787Y15.1981Z2.3803R4.6F.75L0.0
N300G00Z15.0
N580G03X-12.125Y6.7311Z32I-12.5J6.7311F35.0
We are only looking at and changing the Z values.
I have tried with the following code, but it removes all Z values instead.
the "%VarOne%%MS201%" is just the file I have previously output, that I am using as a source.
set "INTEXTFILE=%VarOne%%MS201%"
for /f "delims=Z*" %%a in ('type "%INTEXTFILE%"') do (
SET s=%%a
IF s GTR Z40 SET s=!s:Z32!
echo !s!>>new.txt
)
I need to do this with other values as well (any Y value over 40 needs changed to "Y40"), so hopefully, the solution is expandable and understandable by me. I am fully aware that I do not fully know what I am doing, but I am trying.
One possible solution is using the batch/JScript hybrid JREPL.BAT with the command line:
call "%~dp0jrepl.bat" "Z(?:[1-9][0-9]{2,}|[4-9][0-9])(?:\.[0-9]+)?" "Z32" /F "%VarOne%%MS201%" /O New.txt
There could be used - instead of New.txt to do the replaces directly in file with name defined by %VarOne%%MS201%.
jrepl.bat is referenced here with the path of the batch file containing this command line and the definitions of the environment variables VarOne and MS201 which means jrepl.bat must be in same directory as the batch file.
The search expression Z(?:[1-9][0-9]{2,}|[4-9][0-9])(?:\.[0-9]+)? means:
Z ... find first case-sensitive this letter.
(?:...) ... is a non-marking group used here for an OR expression.
[1-9][0-9]{2,} ... there must be after Z a digit in range 1 to 9 with at least two or more digits in range 0 to 9. So this expression matches numbers in range 100 to 999999999 and even higher numbers.
| ... means OR as a second expression is needed for numbers lower than 100 after Z.
[4-9][0-9] ... matches a number with exactly two digits whereby the first digit must be in range 4 to 9 and the second digit can be in range 0 to 9. So this expression matches numbers in range 40 to 99.
(?:...)? ... that is once again a non-marking group used here to apply the multiplier ? on the entire expression inside the group which means applied zero or exactly once. In other words the expression inside this group with multiplier ? matches optionally also a string.
\.[0-9]+ ... matches a dot escaped with a backslash to be interpreted as literal character and one or more digits in range 0 to 9. This optionally applied expression matches the decimal point and the post comma digits of a floating point value.
For a replacement of all Z values with value 32 or higher the group with the OR expression must be extended by one more expression:
call "%~dp0jrepl.bat" "Z(?:[1-9][0-9]{2,}|[4-9][0-9]|3[2-9])(?:\.[0-9]+)?" "Z32" /F "%VarOne%%MS201%" /O New.txt
|3[2-9] ... is a third OR expression matching numbers in range 32 to 39.
So the three expressions in the OR group match numbers 100 or higher, 40 to 99 and 32 to 39 as integers or as floating point values with a decimal point and one or more decimal places with the optionally applied expression in second non-marking group.

What does %var:~0,4% and %var:.=% mean in batch file?

Here is my sample batch file code and I really don't know what it does.
set TEMPRPSW=%RPSW_VERSION%
set RELVER=%TEMPRPSW:~0,4%
set RELVER=%RELVER:.=%
if %RELVER% GEQ 30 goto :eof
Please give me a working sample.
That takes a 4 character long substring of TEMPRPSW, starting from character 0.
Meaning, it takes the first 4 characters of TEMPRPSW and puts them in RELVER.
set TEMPRPSW=abcdef
set RELVER=%TEMPRPSW:~0,4%
echo %RELVER% -> will print abcd
%VAR:str=% removes str
set RELVER=123.456
set RELVER=%RELVER:.=%
echo %RELVER% -> will print 123456 with no .
here is a nice article: https://www.dostips.com/DtTipsStringManipulation.php

What does "!S:~%I%,1!"=="" mean?

I found some sample code but I am unable to get what this if condition means:
set /p sourceDB=Enter Source DB: %=%
set S=%sourceDB%
set I=0
set L=-1
:l ----- Forget about this line
if "!S:~%I%,1!"=="" goto ld
if "!S:~%I%,1!"=="/" set K=%I%
if "!S:~%I%,1!"=="#" set Z=%I%
if "!S:~%I%,1!"==":" set Y=%I%
set /a I+=1
goto l
The short answer is that this is how you get substrings in batch.
When you extract a substring, you use the format %string_name:~index_of_first_character_in_substring,length_of_substring% or, if the value of either index_of_first_character_in_substring or length_of_substring is contained in a separate variable (in your example, the index is its own variable), you can enable delayed expansion and use the format !string_name:~%variable_whose_value_is_the_index_of_first_character_in_substring%,length_of_substring!
In this case, your main string is in a variable called %S%, you are starting at character %I%, and grabbing 1 character.
The line you've told us to ignore is actually pretty important, as it's used to loop through the entire string.
The entire line "!S:~%I%,1!"=="" is used to check if the substring is empty -- that is, the script is finished iterating through the string. There are also conditions for if the substring is /, #, and :; with K, Z, and Y respectively containing the indices of those substrings.

Batch File - How do I display a precent of a variable?

I have a batch file which must be able to display a percentage. Unfortunately I have no idea how to accomplish this.
The file takes a range of individual points from 0 to 29 and adds or subtracts points from this range in a background process the user never sees. I want the current percent of how full that range is to be displayed. IE if there are 29 points the file displays "100 %", if there are 22 points it lists "75 %", ectra.
Mathematically the operation should be (x/29)*100. I have coded this operation as:
set /a math="%shields%" / "%scap%"
set /a sm="%math%" * 100
but my code dose not function. sm is the variable which will be the percent, shields is the current 0 - 29 point value and scap is the maximum value shields can be (normally 29 but some conditions can adjust this.)
Can I get a hand with this please? Its confusing.
Matematic operation in bat don't accept floating point value if you make :
22/29 in bat you'll get 0 and 0 *100 = 0.
so you have to do (x*100)/29
#echo off
set $val=22
set /a $percent=(%$val%*100)/29
echo %$percent% %%

Wierd Results using script to find length of a string?

I was testing this code submitted by unclemeat in this question(UncleMeat referenced this site) when I tested it by inputting some carots(^) it produced some interesting results.
Len.bat
#echo off
setLocal EnableDelayedExpansion
set s=%1
set length=0
:count
if defined s (
set s=%s:~1%
set /A length += 1
goto count
)
echo %length%
Testing of Len.bat
C:\Users\Public>len ^
More?
More?
0
C:\Users\Public>len ^^
12
C:\Users\Public>len ^^^
More?
More?
12
C:\Users\Public>len ^^^^
1
C:\Users\Public>len ^^^^^
More?
More?
1
C:\Users\Public>len ^^^^^^
13
C:\Users\Public>len ^^^^^^^
More?
More?
13
C:\Users\Public>len ^^^^^^^^
22
C:\Users\Public>
Ignoring the double More? where I simply returned without inputting anything, the pattern is:
0
12
12
1
1
13
13
22
22
13
13
2
2
14
14
23
23
14
14
23
23
14
14
23
23
Every odd occurance prompts me with the double More?, which is why it is doubled, but other wise these results are just wierd. I thought it would have to do something with the following line in the code, but there seems to be no relationship!
Any explanation to this irregular data? Or is this just one of those things about cmd....
It has many reasons why the code completly fails with carets.
First the way you try to call your batch will fail.
A caret escapes the next character and is itself removed from the line.
A single caret at a line end escapes the line end (it's called multiline caret), that's the cause why cmd.exe show you the prompt More?.
This will be true for all odd number of carets.
Sample with seven carets.
length ^^^^^^^
More?
More?
cmd.exe will call the length bat with the following string ^^^<newline>.
The newline will be split from the %1 parameter, so in %1 is only ^^^.
But now you this part fails completly
set s=%1
set length=0
As it expands to
set s=^^^
set length=0
As the last caret is now a multiline caret it will append set length=0 to the line!
So in the variable s is now the content ^set length=0.
This will never work ...
Even in this block the %s:~1% will be a cause of further problems, as it will also can expand to multiline carets when s contains carets (when you use 8 carets length ^^^^^^^^).
if defined s (
set s=%s:~1%
set /A length += 1
goto count
)
For some more explanations about the caret you can read
SO:Long commands split over multiple lines in Vista/DOS batch (.bat) file

Resources