Find line in text file, check for text in between? - file

Question about Batch/Windows/CMD:
I would like that my batch file can search for a line (which I already achieved, but what comes next not), it looks like this:
<name>MyName</name>
It needs to find the text in between <name> and </name>. After that it needs to set that as a variable (%name%).
Does anyone have any idea?
EDIT: if someone wants to give an answer, please list the code. Perl is OK, but this should be open-source and not everyone has Perl.

It can be done this way (assuming your input is in file "test1.html"):
findstr "<name>" test1.html > temp1.lis
FOR /F "tokens=2 delims=>" %%i in (temp1.lis) do #echo %%i > temp2.lis
FOR /F "tokens=1 delims=<" %%i in (temp2.lis) do #echo %%i > temp3.lis
The first line is a guard that only HTML/XML tag
"name" will match in the two FOR lines (you may
already have done this). The result is saved in a temporary
file, "temp1.lis".
The second line capture what is to the right of the first
">" - in effect what is after "<name>". At this stage
"MyName</name" is left in temporary file "temp2.lis" (as
the closing tag also contains ">"). Note the double "%"s
(%%i) as this is in a BAT file (if you want to test directly
from the command line then it should only be one "%").
The third line capture what is to the left of the first "<"
- this is the desired result: "MyName" (is left of "<" in
"MyName</name"). The result is in variable %%i and you
can call a function with %%i as a parameter and access the
result in that function (in the FOR line above the function
was the built-in "echo" and the result thus ended up in
temporary file "temp3.lis" by the redirection of standard
output)
Note that the above only works if
<name>MyName</name>
is the first HTML/XML tag in a line.
If that is not the case or you want a more robust solution
you can instead call a function in the first FOR line (that
receives %%i as the first parameter). That function can then
replace "<name>" with a single character that you are
sure is not in the input, e.g.:
set RLINE=%MYLINE:<name>=£%
Explanation: if the input line is in variable %MYLINE% then
"<name>" will be replaced with "£" and the result will be
assigned to variable %RLINE%.
The reason for the replace is that the delimiters for the
FOR loop are single character only.
You can then use "£" as a delimiter in the FOR loop (to extract what is
to the right of "<name>" - as before):
echo %RLINE%>temp5.lis
FOR /F "tokens=2 delims=£" %%i in (temp5.lis) do #echo %%i > temp6.lis
You have to repeat this technique for "</name>"
(but only if <name>MyName</name> is not
the first HTML/XML tag in a line).
So as you see it is possible, but is quite painful.

Learn Perl, it's made for exactly that kind of thing.

Related

extract number from string from another file

I am currently working on a batch file that is supposed to read a version number from another file.
Basically, I just need to extract the string from the other document and get the number from it (number changes over time).
set dir=..\folder1\file1.vcxproj
:: I want to get a string from %dir% with a specific beginning and then extract the version number from it
set string=... ("<PlatformToolset>*")
set vernum=... ("..>v142<..)
Thats what I want to achieve:
echo %string%
<PlatformToolset>v142</PlatformToolset>
echo %vernum%
142
I hope I could describe my problem. Unfortunately I have little experience with cmd and it is difficult for me to articulate myself in this respect.
I hope someone can still help :)
dir is a poor name for a variable for two reasons. First dir is a keyword in batch, and then it implies that it's a directory when it seems to be a file.
Use set "var1=value" for setting STRING values - this avoids problems caused by trailing spaces. Quotes are not needed for setting arithmetic values (set /a)
So presuming that what you're doing is to extract the 142 from the string <PlatformToolset>v142</PlatformToolset> found in the file dir (please change that name)
for /f "tokens=1,2delims=<> " %%b in ('type "%dir%"') do if "%%b"=="PlatformToolset" set "vernum=%%c"
set "vernum=%vernum:~1%"
note that the location and quote type (' or ") are critical.
The for reads each line of the file and analyses that line, assigning the first "token" (1) to %%b and the second (2) to %%c.
The line is regarded as being
<delimiter sequence><token1><delimiter sequence><token2><delimiter sequence><token3><delimiter sequence><token4>
A delimiter sequence is one or more of any of the characters specified in the delims= option; in this case, <, > and Space
So - where %%a (token 1) is PlatformToolset, we want to assign %%c (token 2) to vernum
so vernum will become v142 with a file containing the test line you've posted.
And the hieroglyphics behind the set command is explained by set /? from the prompt - it takes the value in vernum, starting at "character 1" (where the character numbering starts at 0) and assigns the result to the variable vernum
for more detail about for gymnastics, see for /? from the prompt.

Tokens not creating from a variable by for [batch]

i want to extract tokens from a variable by using / as delimiter.
i used this ::
set fileinput=properties\iam\self_iam_properties.xml
for /f "tokens=*delims=\" %%a in (%fileinput%) do #echo %%a
which gives me following error :
The system cannot find the file properties\iam\self_iam_properties.xml.
why is it searching for a file, i just need to use the value present in the variable. i do not want to take values from a file.
Also after parsing, i wish to create a file named self_iam_parsedoutput.txt so for that i need to extract self_iam_ from the input file name. how to proceed with that ?
Put the %fileinput% in double quotes to treat it like a string.
for /f "tokens=*delims=\" %%a in ("%fileinput%") do #echo %%a
Note that because you're using tokens=*, it won't actually split into multiple variables. If you want each section of the path to get its own variable, you could use something like tokens=1-26
A second way to escape variables in for /f is to run a command which outputs the variable.
for /f %a in ('echo.%variable%')
This form is useful if you're otherwise running into problems with double quotes.
As for the filename change, if the input file is always xyz_properties.xml and the output is always xyz_parsedoutput.txt, then you can use a simple text substitution of the form %variable:old=new%
set inputfile=self_iam_properties.xml
set outputfile=%inputfile:properties.xml=parsedoutput.txt%

Removing some columns and rows from csv file via batch

I am trying to create a batch file that will edit a .csv and remove the first column, and any summary lines contained in the file. I am, however, fairly new to programming batch files, so I am not sure the best way to start this, and it would be great if you could include a basic explanation of how the code works so I can be self-sustaining in the future!
,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
ABB - Egypt,,,,,,,,,,,
ElAin EL-Sokhna,,,,,,,,,,,
,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
,Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
ABB - Egypt - Other,,,,,,,,,,,
There are various iterations of this file, as they come from a monthly report, I need to remove the first (empty) column, and any rows that look like ABB - Egypt,,,,,,,,,,, or Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
So the output should be:
Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts,,Training Income,15000,,15000
Invoice,09-14-11,13002,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Thanks for the input!
EDIT: It seems I wasn't clear enough in my OP (Sorry, first time here).
There are two processes that need to happen here, in every file the first column must be deleted, and any lines that are either title lines ABB - Egypt,,,,,,,,,,, or summary lines Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794 need to be removed.
All lines that need to be kept will be mostly filled in, such as ,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance or ,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000 Notice that, as in the second line, it is possible for there to be some missing values in them, so doing a search for something like ",," will not work.
Batch is a terrible language for modifying text files. There are a great many special cases that require arcane knowledge to work around the problem. You may have a script that seems to do what you want, and then some wrinkle appears in your data, and the entire script may have to be redesigned.
With regard to your specific problem, it appears to me that you only want to preserve rows that begin with a comma, meaning the first column is empty. Of those remaining rows, you want to remove the first (empty) column.
Assuming none of the rows you want to keep have an empty value for the second column, then there is a really trivial solution:
#echo off
>"%~1.new" (for /f "delims=, tokens=*" %%A in ('findstr "^," %1') do echo %%A)
move /y "%~1.new" %1 >nul
The script expects the file to be passed as the first and only argument. So if your script is named "fixCSV.bat", and the file to be modified is "c:\test\file.csv", then you would use:
fixCSV "c:\test\file.csv"
The %1 expands to the value of the first argument, and %~1 is the same, except it also strips any enclosing quotes that may or may not be present.
The FINDSTR command reads the file and writes out only lines that begin with a comma. The FOR /F command iterates each line of output. The "delims=, tokens=*" options effectively strip all leading commas from each line, and the result is in variable %%A, which is then ECHOed. The entire construct is enclosed in parentheses and stdout is redirected to a temporary file. Finally, the temporary file is moved over top of the original file, thus replacing it.
If the 2nd column may be empty, then the result will be corrupted because it removes all leading commas (both columns 1 and 2 in this case). The script must be more complicated to compensate. You would need to set a variable and then use delayed expansion to get the sub-string, skipping the first character. But delayed expansion will corrupt expansion of the %%A variable if it contains the ! character. So delayed expansion must be toggled on and off. You are beginning to see what I mean by lots of special cases.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f "delims=" %%A in ('findstr "^," %1') do (
set "ln=%%A"
setlocal enableDelayedExpansion
echo !ln:~1!
endlocal
)
)
move /y "%~1.new" %1 >nul
As the batch scripts become more complicated, they become slower and slower. It may not be an issue for most files, but if the file is really large (say hundreds of megabytes) then it can become an issue.
I almost never use pure batch to modify text files anymore. Instead, I use a hybrid JScript/batch utility that I wrote called JREPL.BAT. The utility is pure script that runs natively on any Windows machine from XP onward. JREPL.BAT is able to efficiently modify text files using regular expression replacement. Regular expressions can appear to be mysterious, but they are well worth the investment in learning.
Assuming you have JREPL.BAT somewhere within your PATH, then the following command is all that you would need:
jrepl "^,(.*)" "$1" /jmatch /f "yourFile.csv" /o -
The /F option specifies the file to read.
The /O option with value of - specifies that the output should replace the original file.
The /JMATCH option specifies that each replacement value is written out to a new line. All other text is dropped.
The first argument is the search expression. It matches any line that begins with a comma, and everything after that is captured in a variable named $1.
The second argument specifies the replacement value, which is simply the captured value in variable $1.
A way will be to define all your rules in a variable which will be used against
findstr. The rules must be defined like this :
/c:"String which exclude the line" /c:"Another string which exclude the Line" /c: "etc.."
This rules must be exact (That they can't be found in a line who must stay).
For the empty first colonne you can use a substitution the way i made it in the code with
,Type=Type
,Invoice=Invoice
Test.bat :
#echo off&cls
setlocal enabledelayedexpansion
Rem The rules
set $String_To_Search=/c:"ABB - Egypt," /c:"Total ElAin El-Sokhna," /c:"ElAin EL-Sokhna," /c:"ABB - Egypt - Other,"
for /f "delims=" %%a in (test.csv) do (
set $line=%%a
Rem the substitutions for the first Column
set $Line=!$Line:,Type=Type!
set $line=!$Line:,Invoice=Invoice!
Rem the test and the ouput if nothing was found
echo !$Line! | findstr /i %$String_To_Search% >nul || echo !$Line!
))>Output.csv
I used a file test.csv for my test.
The ouput is redirected to Output.csv
Perhaps is this what you want?
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%a in (input.csv) do (
set "line=%%a"
if "!line:~0,1!" equ "," echo !line:~1!
)
When a problem is not enough explained we can only guess the missing details. In this case, I assumed that you just want the lines that start with comma, deleting it. The output is the same as your output example...
EDIT: Output example added
Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
I would start here to learn this: How can you find and replace text in a file using the Windows command-line environment?
It covers many details of substitution from Windows command line and many ways to do it, some requiring only what's built into Windows, and some requiring other downloadable software.
Magoo is right, more criteria is needed, but there might be enough information in the linked page for you to get past the main hurdles.
#ECHO OFF
SETLOCAL
(FOR /f "tokens=*delims=," %%a IN ('findstr /b /l "," q28079306.txt') DO ECHO %%a)>newfile.txt
GOTO :EOF
I used a file named q28079306.txt containing your data for my testing.
Produces newfile.txt

Using inputs from a batch file to create a new, saved as a different name batch file

I am trying to create a batch file to input 3 pieces of data and use that data to create another batch file. Just create it, and stop. The batch maps several network drives for users that haaven't a clue as to how to do it.
I have a "master.bat" and using notepad I am using "replace" to fill in the "username" "Password" and "drive path". I thought I would try to get it down to entering the variables into the "master.bat" creating a "custom.bat" for that user.
I got a lot of help here getting to the final step. Everything is working except the final part. Now that I have all the variables as well as a template to put them in, how do I get that first batch file to create the cuctomized output as a workable file that I can send the user where all they do is run it.
One way would be to use your template in file form and replace placeholders in there by your actual values:
setlocal enabledelayedexpansion
for /f %%L in (template.cmd) (
set "Line=%%L"
set "Line=!Line:[username]=!username!"
...
>network-drives.cmd echo !Line!
)
This assumes placeholders like [username] in the template and corresponding variables defined.
However, I always get a little anxious if I use data read from a file in a batch. When I recently had to create a batch file from another I went the following route:
(
echo #echo off
echo net use !drivepath! \\server\share "/user:!username!" "!password!"
echo net use !drivepath2! \\server\share2 "/user:!username!" "!password!"
) > network_drives.cmd
Care has to be taken with things like closing parentheses and several characters reserved for the syntax you may need in the generated batch file. But this approach is entirely self-contained, albeit a little harder to maintain.
It is simple to embed the template within your batch file. There are multiple ways to do this. One is to simply prefix each template line with :::. I chose that sequence because : is already used as a batch label and :: is frequently used as a batch comment.
Delayed expansion can be used to do your search and replace automatically!
There are just 3 special characters you need to worry about if you want to include them in your output. These special characters are probably not needed for the original question. But it is good to know how to handle them in a general sense.
An exclamation literal ! must be either escaped or substituted
A caret literal ^ can be escaped or substituted if it appears on a line with an exclamation. But it must not be escaped if there is not an exclamation on the line. Caret substitution is always safe.
Use substitution to start a line with :
#echo off
setlocal
::The following would be set by your existing script code
set drivePath=x:
set username=rumpelstiltskin
set password=gold
::This is only needed if you want to include ! literals using substitution
set "X=!"
::This is only needed if you want to include ^ literal on same line
::containing ! literal
set "C=^"
::This is only needed if you want to start a line with :
set ":=:"
::This is all that is needed to write your output
setlocal enableDelayedExpansion
>mapDrive.bat (
for /f "tokens=* delims=:" %%A in ('findstr "^:::" "%~f0"') do #echo(%%A
)
::----------- Here begins the template -----------
:::#echo off
:::net use !drivePath! \\server\share "/user:!username!" "!password!"
:::!:!: Use substitution to start a line with a :
:::!:!: The following blank line will be preserved
:::
:::echo Exclamation literal must be escaped ^! or substituted !X!
:::echo Caret with exclamation must be escaped ^^ or substituted !C!
:::echo Caret ^ without exclamation must not be escaped

Dos Batch - For Loop

can anyone
tell me why in the example below the value of LIST is always blank
i would also like to only retrive the first 4 characters of %%i in variable LIST
cd E:\Department\Finance\edi\Japan_orders\
FOR /f %%i IN ('dir /b *.*') DO (
copy %%i E:\Department\Finance\Edi\commsfile.txt
set LIST=%%i
echo %LIST%
if %%i == FORD110509 CALL E:\Department\Finance\edi\EXTRACT.exe E:\Department\Finance\edi\COMMSFILE.TXT
)
pause
thanks in advance
You need delayed expansion. Add the following at the start of your program:
setlocal enabledelayedexpansion
and then use !LIST! instead of %LIST% inside of the loop.
For a thorough explanation please read help set.
Bracketed command blocks are parsed entirely, and it is done prior to their execution. Your %LIST% expression, therefore, is expanded at the beginning, while the LIST variable is still empty. When the time comes to execute echo %LIST%, there's not %LIST% any more there, only the empty string (read: 'nothing') instead. It's just how it works (don't ask me why).
In such cases the delayed expansion mechanism is used, and Joey has already told you that you need to use a special syntax of !LIST! instead of %LIST%, which must first be enabled (typically, by the command SETLOCAL EnableDelayedExpansion, which he has mentioned as well).
On your other point, you can extract a substring from a value, but the value must first be stored into a variable. Basically, the syntax for extracting substrings is one these:
%VARIABLE:~offset,charcount%
%VARIABLE:~offset%
That is, you are to specify the starting position (0-based) and, optionally, the number of characters to cut from the value. (If quantity is omitted then you are simply cutting the source string at the offset to the end.) You can read more about it by issuing HELP SET from the command line (wait, it's the same command that Joey has mentioned!).
One more thing: don't forget about the delayed expansion. You need to change the above % syntax to the ! one. In your case the correct expression for retrieving the first 4 characters would be:
!LIST:~0,4!
You can use it directly or you could first store it back to LIST and then use simply !LIST! wherever you need the substring.

Resources