I need to extract texts between two given words from a file.
The File format is as below :
some lines
<name>text1</name>
some lines
some lines
<name>text2</name>
some lines
<name>text3</name>
some more lines
I need to extract all the occurrences of texts that occur between each of the name tags
<name> extract this text here </name>
Expected Output for above file :
text1
text2
text3
Thank you.
This should work for the sample data provided:
for /f "tokens=2 delims=<>" %A in ('type test.txt ^| findstr "<name>"') do #echo %A
If using this inside of a batch script, be sure to change %A to %%A. Basically, this will run through lines containing <name>, and split the line by < and > characters using delims=<>, giving you name, text in between, /name. The tokens=2 sets %A to only the second string.
Keep in mind this won't work if you have anything on the line before <name>. That would probably complicate things a lot more in batch, and I would then suggest using some parsing library in another language for that.
Also, this will not work if the text you wanted to extract contains < or >.
The following script extracts the text in between the desired tags of the file(s) provided as command line argument(s):
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Resolve command line arguments:
for %%F in (%*) do (
rem // Read a single line of text following certain criteria:
for /F "delims=" %%L in ('
findstr /R "^[^<>]*<name>[^<>][^<>]*</name>[^<>]*$" "%%~F"
') do (
set "LINE=%%L"
rem /* Extract the desired string portion;
rem the preceding `_` is inserted for the first token
rem never to appear empty to the `for /F` loop: */
setlocal EnableDelayedExpansion
for /F "tokens=3 delims=<>" %%K in ("_!LINE!") do (
endlocal
rem // Return found string portion:
echo(%%K
)
)
)
endlocal
exit /B
This works only if there is exactly one tag <name>, followed by some text not containing < and > on its own, followed by exactly one tag </name>; this string must be on a single line and may be preceded or followed by some texts not containing < and > on their own.
Suppose the input file is input.txt.
This should work :
grep '<name>.*</name>' input.txt | sed -r 's/<name>(.*)<\/name>/\1/'
grep finds the lines
sed deletes the name tags
Related
Two files (binary or non-binary) can be joined to one file in a specific manner with a specific divider string between them, using only console commands. Later the second file can be extracted by the following simple method°, again using only console commands:
#echo off
setlocal enableextensions enabledelayedexpansion
REM === JOINING FILES ===
REM Divider.txt must consist of a line starting with "di-vi-der"
REM preceeded by at least one line and followed by a blank line.
echo This is a necessary preceeding line. > Divider.txt
echo di-vi-der This line must be followed by a blank line. >> Divider.txt
REM Join files to Joined.bin by one of these two methods:
REM copy /b FirstFile.exe + /a Divider.txt + /b File2BeExtracted.exe /b Joined.bin
type FirstFile.exe Divider.txt File2BeExtracted.exe > Joined.bin
REM === UNJOINING FILES ===
REM Get the line number of the dividing line in Joined.bin:
for /F "delims=:" %%a in ('findstr /N "^di-vi-der" "Joined.bin"') do set "lines=%%a"
REM Extract the part of the Joined.bin following the divider line:
< "Joined.bin" (
REM Pass thru the first lines:
for /L %%i in (1,1,%lines%) do set /P "="
REM Copy the rest to Output.bin:
findstr "^"
) > Output.bin
This is working nearly perfectly.
My first (and main) problem is: findstr adds 0D 0A at the end of the last line written to Output.bin. How this could be corrected?
A second problem could be: Due to some restrictions for the findstr command (max. length of lines) this method could fail in some (rare?) cases. To detect such cases, the file size of File2BeExtracted.exe could be determined and stored in the divider line for a final check of identity after extracting the file. How this could be done?
° This tricky method was invented by jeb. See here too.
HI i am trying to get the sub string ZoomIn10X,ZoomIn20X,ZoomIn30X etc, from a file which contain following lines below and out put that to another file
Job Name : STANALONE/1234/JobId/Date/ZoomIn10X
Job Name : STANALONE/1234/JobId/Date/ZoomIn20X
Job Name : STANALONE/JobId/Date/ZoomIn30X
Job Name : STANALONE/1234/JobId/Date/ZoomIn40X
Job Name : STANALONE/1234/Date/ZoomIn10X
i Have tried
for /F "tokens=*" %%A in (input.txt) do (
echo %%A r "/" "\n" %%A | tail -1 >> output.txt
)
but its not working as properly.Can you please help
on unix, with perl
perl -pe 's#.*/##' input.txt
see perl -h for options and perlop regex for more details.
habitually substitute expressions are written with a / forward slash delimiter but any other character can be used as in sed. here using # to avoid escaping /.
or with shell language bash (slower because of bash read)
while read -r line; do
echo "${line##*/}"
done <input.txt
see bash variable expansion. here ## to remove the longest prefix.
>output.txt (for /f "delims=" %%a in (input.txt) do echo %%~nxa)
Given the input lines, and handling them as paths+files references (yes, they are not, but can be handled as if they were), using the for replaceable parameter modifiers (see for /?) we request the name and extension of the file being referenced. All the output of the for execution is redirected to output.txt.
[W:\44365640]:# type go.cmd
#echo off
setlocal enableextensions disabledelayedexpansion
>output.txt (for /f "delims=" %%a in (input.txt) do echo %%~nxa)
[W:\44365640]:# type input.txt
Job Name : STANALONE/1234/JobId/Date/ZoomIn10X
Job Name : STANALONE/1234/JobId/Date/ZoomIn20X
Job Name : STANALONE/JobId/Date/ZoomIn30X
Job Name : STANALONE/1234/JobId/Date/ZoomIn40X
Job Name : STANALONE/1234/Date/ZoomIn10X
[W:\44365640]:# go
[W:\44365640]:# type output.txt
ZoomIn10X
ZoomIn20X
ZoomIn30X
ZoomIn40X
ZoomIn10X
[W:\44365640]:#
Using awk you could define / as field separator and output the last field:
$ awk -F/ '{print $NF}' input.txt
ZoomIn10X
ZoomIn20X
ZoomIn30X
ZoomIn40X
ZoomIn10X
Solution outputs the last field of every line, including the empty lines you had between the data lines. If you need to filter the output somehow, please update the requirements to the OP.
I think the others answers are not using batch only syntax.
The following solution should work if you have the input.txt file as you have written (I have added even option to use the solution as variable):
EDIT:
Added comments for the script:
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
REM your input.txt file looks like this
REM Job Name : STANALONE/1234/JobId/Date/ZoomIn10X
REM Job Name : STANALONE/1234/JobId/Date/ZoomIn20X
REM Job Name : STANALONE/JobId/Date/ZoomIn30X
REM Job Name : STANALONE/1234/JobId/Date/ZoomIn40X
REM Job Name : STANALONE/1234/Date/ZoomIn10X
REM if you wished to use variable
REM SET final_name=""
FOR /F "tokens=2 delims=:" %%A IN (input.txt) DO (
SET temp_var=%%A
FOR /F "tokens=4 delims=/" %%B IN ("!temp_var!") DO (
SET temp_var_B=%%B
IF "!temp_var_B!" EQU "Date" (
FOR /F "tokens=5 delims=^/" %%C IN ("!temp_var!") DO (
ECHO %%C >> output.txt
REM if you wished to use variable
REM SET final_name=!final_name!,%%C
)
) ELSE (
ECHO %%B >> output.txt
REM if you wished to use variable
REM SET final_name=!final_name!,%%B
)
)
)
ECHO All collected variables: !final_name!
ENDLOCAL
This is somewhat inefficient solution, much better you can find below, but collects also variables. It uses the fact that you can split string multiple times and if you have different string length you can still extract it based on the string, if it is the same, like in your case a "Date". First FOR splits the string based on ":" character and the other FORs based on "/" char. There are two FORs as you have different length of the extracted string.
EDIT :
I think I was too quick to find a solution. When I thought about it I recognized it as very "crude" solution. I'll also comment it better as I have posted only the code.
The more efficient solution:
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
REM your input.txt file looks like this
REM Job Name : STANALONE/1234/JobId/Date/ZoomIn10X
REM Job Name : STANALONE/1234/JobId/Date/ZoomIn20X
REM Job Name : STANALONE/JobId/Date/ZoomIn30X
REM Job Name : STANALONE/1234/JobId/Date/ZoomIn40X
REM Job Name : STANALONE/1234/Date/ZoomIn10X
FOR /F "tokens=2 delims=:" %%A IN (input.txt) DO (
SET var=%%A
IF EXIST input.txt (
ECHO !var:~-9! >> output.txt
) ELSE (
ECHO !var:~-9! > output.txt
)
)
ENDLOCAL
Description:
First comes the FOR /F with "tokens=2" (takes a second part) and "delims=:" uses a ":" as delimiter.
The core of the solution uses the fact that you have at the end constant size string (9 characters to be exact) which you want to extract - !var:~-9!. Uses expanded variable and takes 9 characters from the end of the string which was selected with the FOR statement.
The solution needs to have the SETLOCAL ENABLEDELAYEDEXPANSION there as it uses the expansion.
There is also a IF EXISTS clause so it will create a new file every time you run it. That is not a vital part of the script and you may comment it out.
i currently have this command for a batch file
for /F "skip=1 tokens=1 delims=\n" %i in (stats.txt) do echo %i
with the contents of stats.txt being
Title = Subaru's Great Rehab Strategy
URL = http://dynasty-scans.com/chapters/subarus_great_rehab_strategy
Tags = Subaru x Tsukasa[|]Yuri[|]
No. of Pages = 3
^ NOTE: the final line is actually blank
the idea of the line of code is to return the 2nd line with URL. the end goal would be that i would run this line in some sort of loop going though a series of ~12000+ stats.txt files and collecting all the URL lines into a single file
but when i run the command i get this
as you can see it has skipped the first line but it's cutting off where the n in dynasty and outputting the last 3 lines.
now if i remove delims=\n i get the same 3 lines but i don't get the first word before the space which seems to indicate that the value of delims is what splits a line into "tokens" which then i just grab the first one (and space must be the default)
when i go into notepad++, open the Find and Replace Dialog, turn Search Mode to extended and look for "\r\n" i get taken to the end of each line which is why i chose delims to be \n assuming this would then make the entire line one token
So my question is How can i get all of the 2nd line only of my stats.txt file?
The for /f loop already treats the carriage return and / or line feed as an end-of-line. No need to specify it as a delimiter. With delims=\n you're actually saying that all literal backslashes and letter n's should be treated as token delimiters. If you want the whole line, what you want is "skip=1 delims=".
Just out of habit, when reading the contents of a file with a for /f loop, I find it useful to enable usebackq just in case the filename / path contains a space or ampersand. That allows you to quote the filename to protect against such potential treachery.
#echo off
setlocal
for /F "usebackq skip=1 delims=" %%I in ("stats.txt") do if not defined URL set "URL=%%~I"
echo %URL%
Put into context, to use this to read many files named stats.txt and output the URLs into a single collection, enclose the whole thing in another for loop and enable delayed expansion.
#echo off
setlocal
>URLs.txt (
for /R %%N in ("*stats.txt") do (
for /F "usebackq skip=1 delims=" %%I in ("%%~fN") do (
if not defined URL set "URL=%%~I"
)
setlocal enabledelayedexpansion
echo(!URL!
endlocal
set "URL="
)
)
echo Done. The results are in URLs.txt.
If you want to strip the "URL = " from the beginning of each line and keep only the address, you could try changing your for /F parameters to "usebackq skip=1 tokens=3" if all the files follow the same format of URLSpace=Spacehttp://etc.. If you can't depend on that, or if any of the URLs might contain unencoded spaces, you could also change echo(!URL! to echo(!URL:*http=http!
You don't need to use a FOR /F loop, you can also read it with a SET /P
setlocal EnableDelayedExpansion
< stats.txt (
set /p line1=
set /p URL_Line=
)
echo(!URL_Line!
Try this from the command line:
(for /F "tokens=1* delims=:" %i in ('findstr "URL" stats*.txt') do echo %j) > output.txt
the idea ... is to return the 2nd line with URL
If you want to insert this line in a Batch file, just double the percent signs.
Try this from the prompt:
(for /f "tokens=1*delims=]" %a in ('find /v /n "" *.csv^|findstr /l /b "[2]"') do #echo %b)>u:\r1.txt
Where - I used *.csv for testing (substitute your own filemask) and I used u:\r1.txt for the result - substitute as seems fit (but don't output to a file tat fits your selected filemask !)
It works by prefixing each line in each file with a bracketed number [n] (find - /n=and number /v lines that do not match "" - an empty string); then selecting those lines that /l - literally /b at the beginning of the line match "[2]".
The result is all of the second-lines of the files, preceded by the literal "[2]". All we need to do then is tokenise the result, first token up to delimiter "]" will be "[2" assgned to %%a and remainder-of line (token *) will be assigned to %%b
Have you tried
for /F "skip=1 tokens=1 delims=\n" %i in (stats.txt) do echo %i && goto :eof
I haven't tested it as I don't have access to a Windows machine at the moment, but that should exit the for-loop after the first iteration, which is what you want.
I've searched a thousand of example and tried, but none of them actually works for me. My requirement is pretty straight forward, I have a file - config.txt, it has one of lines:
sqlServer=localhost
I'm trying to update this line to:
sqlServer=myMachine\sql2012
I looked examples online, some of them are just working with set variables, or some are not replacing but inserting. There are some examples are writing into a new file, but the new file has line number in front of each line. I don't find a useful instruction how to write batch scripts, and none of the update file batch scripts works for me.
It will be very helpful if you leave some comments.
Thanks in advance
EDITED - to adapt to comments
#echo off
setlocal enableextensions disabledelayedexpansion
set "config=.\config.txt"
set "dbServer=localhost\sql2012"
for /f "tokens=*" %%l in ('type "%config%"^&cd.^>"%config%"'
) do for /f "tokens=1 delims== " %%a in ("%%~l"
) do if /i "%%~a"=="sqlServer" (
>>"%config%" echo(sqlServer=%dbServer%
) else (
>>"%config%" echo(%%l
)
type "%config%"
endlocal
Input file is read line by line (for /f %%l), then each line is split (for /f %%a) and if the first token in the line is "sqlserver" then the line is replaced, else the original line is sent to file.
The command used to retrieve the information in the first for loop includes an cd.>"%config%" that will remove the contents of the file, so the final resulting lines (that have been read in memory by the for command before removing them) are sent directly to the same file.
You can do this:
FINDSTR /I /V /B "sqlServer=" config.txt > config.new
ECHO sqlServer=myMachine\sql2012 >> config.new
DEL config.txt
REN config.new config.txt
The FINDSTR will remove all lines that start with sqlServer= and create a new file called newfile.
The ECHO will add a line at the end with sqlServer=MyMachine\sql2012.
The last two lines delete your existing config.txt and replace it with the output of the first two lines.
I have a log file containing a stack trace split over a number of lines. I need to read this file into a batch file and remove all of the lines breaks.
As a first step, I tried this:
if exist "%log_dir%\Log.log" (
for /F "tokens=*" %%a in ("%log_dir%\Log.log") do #echo %%a
)
My expectation was that this would echo out each line of the log file. I was then planning to concatenate these lines together and set that value in a variable.
However, this code doesn't do what I would expect. I have tried changing the value of the options for delims and tokens, but the only output I can get is the absolute path to the log file and nothing from the contents of this file.
How can I set a variable to be equal to the lines of text in a file with the line breaks removed?
If you want to use quotes for your filename in the FOR/F loop you need to add the usebackq option too, else you get a string not the content of your file.
for /F "usebackq delims=" %%a in ("%log_dir%\Log.log") do #echo %%a
Or remove the quotes
if exist "%log_dir%\Log.log" (
for /F "tokens=*" %%a in (%log_dir%\Log.log) do #echo %%a
)