Trimming unwanted characters from a large file via batch commands - batch-file

I have a datafile in the following format
012394994SomeunwantedString
394949585MoreUnwantedString
348020200
349585940FurtherUnwantedString
I want to remove the unwanted strings from the file. The problem is, neither the unwanted string not the characters before the string are consistent. The only consistent part is the length of the string that is needed, and after this position I want to trim the rest of the line.
I realize that I can simply extract the characters from the left given that I know the count, but is there a more efficient manner to do this? The file contains over 80,000 lines, out of which only 10-20 will have unwanted characters.
Looking for a set of simple batch commands to get this done as this will need to run on a server.

#echo off
setlocal enabledelayedexpansion
(
for /f "delims=" %%a in (infile.name) do (
set "line=%%a"
echo(!line:~0,9!
)
)>"outfile.name"
Read each line from the input file, assign to line and echo the 9 characters starting from index 0

Related

Remove certain characters in a string using Windows command line

I have a text file having the string:
pub:
04:d6:b6:f2:98:ff:94:d8:3c:36:ad:5f:86:40:aa:
d1:5a:e1:87:5d:55:9d:ad:8b:2b:fc:18:e7:bb:47:
7f:9f:9a:62:c6:19:3a:9e:65:62:4e:5e:98:6d:db:
0e:7d:f9:22:a3:ca:cb:12:b2:ed:eb:14:0c:b3:31:
59:02:17:6d:6a
I need to remove the 04 from the beginning of this and also remove the ':' from between the characters and print it like this, in a single line:
d6b6f298ff94d83c36ad5f8640aad15ae1875d559dad8b2bfc18e7bb477f9f9a62c6193a9e65624e5e986ddb0e7df922a3cacb12b2edeb140cb3315902176d6a
How can I do that using Windows commands?
quite straightforward:
#echo off
setlocal enabledelayedexpansion
set "first=true"
(for /f "eol=p delims=" %%a in (input.txt) do (
set "line=%%a"
if defined first (set "line=!line:~2!" & set "first=")
<nul set /p ".=!line::=!"
))>output.txt
Read the input file line by line.
Use a flag (first) to check for first line and remove the first two chars.
Remove the colons and
use <nul set /p to write without a line break.
Edit as it turned out in comments, your file has actually just one long line. This changes the way for processing to:
for /f %%a in (input.txt) do set "line=%%a"
set "line=%line:pub:04:=%"
set "line=%line::=%"
echo %line%
Note: variables can just hold a certain amount of data, so this apporach will fail, when the line exceeds this limit.
Your question is pretty confusing. You first say: "I have a text file having the string:", but the example data have six lines that could be taken as six strings, so at this point we have no idea of what the real data format is. Perhaps a single long line that you write here in six parts?
Next, you said "I need to remove the 04 from the beginning of this", but what happen if the data have not a 04 at the beginning? Perhaps you want to remove the first element even if it is not a 04?
In this way, we must assume several points in order to try to write a working solution.
The Batch file below read a file with several lines, remove the first line (even if it does not contain pub:), and remove the first colon-separated element (even if it is not 04):
#echo off
setlocal EnableDelayedExpansion
rem Read all lines, excepting the first one:
set "string="
for /F "skip=1" %%a in (input.txt) do set "string=!string!%%a"
rem Remove the first element:
set "string=%string:*:=%"
rem Show the rest, removing colons:
echo %string::=%
In this code there are other assumptions that are implicit in the way the commands work, like lines that does not contain spaces nor exclamation marks. Of course, if the real data file have a different format, this program will fail...

Batch Remove Trailing Whitespace in a Text File

Remove trailing spaces from a file using Windows batch? and How to remove trailing and leading whitespace for user-provided input in a batch file? seem to be similar questions, but I can't get any of the answers to work.
I have a text file, say C:\Users\%username%\Desktop\sometext.txt that has a long list of numbers, some with trailing whitespace. How can I loop through and remove the whitespace at the end of each line? I'd like to do this all in batch. It'll be in a file with some other batch commands in it.
Regrettably, you don't show us an example from your file, so we're left to assume from your desription.
Assuming your file is something like
1
22
3
64
where some of the lines have trailing spaces, then
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
(
FOR /f "usebackq" %%a IN (q30594509.txt) DO (
SET /a num=%%a
ECHO(!num!
)
)>u:\newfile.txt
GOTO :EOF
I used a file named q30594509.txt containing the above data for my testing.
Produces u:\newfile.txt

Read text file lines starting with numbers using a batch file

I'm not sure if this is possible, but is there a way for a batch file to read a text file, but skip lines that do not start with a number?
For example:
handled
219278
check
219276
control
219274
co
219268
Can a for loop skip handled, check, control, etc.?
I know this:
cscript C:\Users\c1921\Test\curltest.vbs"!$MyVar!">>C:\Users\c1921\Test\Datapoints\!$MyVar!.txt
Will put all output to this text file but this:
FOR /F %%i in (C:\Users\c1921\Test\Datapoints\!$MyVar!.txt) DO (
set "$UnitID=%%i"
)
Reads every line into a variable. Can I somehow use delims and tokens to only get the numbers?
Edit:
I thought this might be possible going off of an answer on this question: Windows Batch file to echo a specific line number
The file I have on occassion might not have a number between the words for example:
handled
check
219276
control
219274
co
219268
This should not happen often, but I'd like to make sure I can avoid this when it does.
FINDSTR /b "[0-9]" q25003233.txt
I used a file named q25003233.txt containing your data for my testing.
for /f "delims=" %%a in ('findstr /r /b /c:"[0-9]" "c:\somewhere\file.txt"') do echo %%a
This uses findstr to filter the input file with a regular expresion , returning only lines that start with a number

how to copy a single instance of a string from one text file into a new text file?

I have a text file with n number of rows. m number of the rows contain a string that I'm interested in (m<=n). I need a batch file that will copy only a single row (e.g. the first occurance) containing the string to a new text file. When I use the findstr command it will copy all rows containing the string.
Thanks!
Paul Safier
Given your FINDSTR command that locates your m rows (it can be as simple or as complicated as you need)
findstr "search" "fileName.txt"
then you can process the results of that command with a FOR /F loop. You can break out of the loop after the first matching line by using GOTO.
for /f "delims=" %%A in ('findstr "search" "fileName.txt"') do (
echo %%A >>"outFile.txt"
goto :break
)
:break
The FOR command is one of the more complicated commands available to batch. There are many options. You can get help on the command by typing HELP FOR or FOR /? from a command prompt.
The "DELIMS=" option disables the parsing of the line into tokens. Without that option, the FOR /F would break each line into tokens, delimited by space or tab characters. The list of delimiters can be set to other caracter(s), or in your case, set to nothing.
The code I gave above will skip lines that begin with ; because FOR /F will skip any lines that begin with the EOL character - ; by default. You can change the EOL character to any single character. But if you don't know what your matching line might start with, then you don't know what character to use for EOL. The syntax to completely disable all token parsing and EOL line skipping is odd:
for /f delims^=^ eol^= %%A in (...) do ...

DOS FOR loop - Can I use an entire word as a delimiter (multi-character-delimiter)

I have a source file that would look something like this:
Name SerialNumber
\\.\PHYSICALDRIVE1 000000002027612
\\.\PHYSICALDRIVE0 000000002027476
\\.\PHYSICALDRIVE2 00000000202746E
\\.\PHYSICALDRIVE3 00000000202760E
Using FOR loops in dos I need to be able to parse out just the number associated with each PHYSICALDRIVE entry to be used later in the bat file. eg: 1,0,2 and 3)
From what I gather the delims= only looks at one character at a time.
Since I can't say delims=PHYSICALDRIVE and have it treat that as a single delimiter.
Can anyone give an example on how to parse out only the numbers at the end of the string?
In case it matters Delayed expansion is being used.
Thanks.
I think any solution will require first parsing out the full column value, and then using SET search and replace or substring to parse out the number at the end.
I'm assuming the Name column value can never have a space within it. So the default FOR /F delimiters will parse out the 1st column easily enough.
If the drive number will always be less than 10 then
for /f %%A in (yourFileName.txt) do (
set "drive=%%A"
set "drive=!drive:~-1!"
echo !drive!
)
Else if it can be 10 or greater then
for /f %%A in (yourFileName.txt) do (
set "drive=%%~A"
set "drive=!drive:*physicaldrive=!"
echo !drive!
)

Resources