Windows batch strip tags - batch-file

File with HTML Code:
<table>
<tr><th>ID</th><th>NAME</th></tr>
<tr><th>1</th><th>Alex</th></tr>
<tr><th>2</th><th>Andy</th></tr></table>
How to OUTPUT DATA WITHOUT TAGS with windows .bat file? (no vb)
Like this:
1:Alex
2:Andy
Thanks

I like batch, but honestly: it is not a suitable tool for processing xml files.
The following is more an exercise in logic and pain than a suitable solution (but works. At least with something like your example...):
#echo off
setlocal EnableDelayedExpansion
for /f "delims=" %%a in (t.txt) do call :process "%%a"
goto :eof
:process
set "line=%~1"
set flag=0
set var=
for /l %%i in (0,1,100) do (
if "!line:~%%i,1!"=="<" (
set /a "flag+=1"
set "var=!var!:"
)
if !flag!==0 set "var=!var!!line:~%%i,1!"
if "!line:~%%i,1!"==">" set /a flag-=1
)
for /f "tokens=1,2 delims=:" %%b in ("!var!") do echo %%b:%%c
How it works:
the first for loop processes each line of the textfile (one after the other).
The subroutine processes the line character by character. It increases the flag variable each time, it hits a < (plus add a :, because we know, the (possible) value is at it's end) and decreases at each >. So if flag is zero, we are "outside" of a tag, and can add the character to a variable.
The last for just reformats var, because there are too many : (every tag-start a : is added)

Related

Avoid a null value in last for loop iteration?

I have a text file with one string per line (only a couple lines total). This file needs to be searched and each string stored. The script will ultimately prompt the user to choose one of the stored strings (if more than one string/line is present in the file), but the for loop is iterating an extra time when I don't want it to.
I'm using a counter to check the number of iterations, and I also read that it's probably a NL CR regex issue, but the finstr /v /r /c:"^$" in the for file-set like in this post Batch file for loop appears to be running one extra time/iteration doesn't work for me (but I probably don't understand it correctly).
The "pref" term is because the strings are to be eventually used as a prefix of files in the same folder...
#echo off
setlocal enabledelayedexpansion
set /a x=1
for /f %%a in (sys.txt) do (
set pref!x!=%%a && echo %^%pref!X!%^% && set /a x+=1
)
echo last value of x = !x!
for /L %%a in (1,1,!x!) do (
echo !pref%%a!
)
REM The rest would be to prompt user to choose one (if multiple) and
REM then use choice as a prefix with a ren %%a %prefX%%%a
If the "sys.txt" contains three lines with strings A, B, C respectively, then the output I currently get is:
pref1
pref2
pref3
last value of x = 4
A
B
C
ECHO is off.
ECHO is off. is not desired, clearly.
You just need to change your increment structure like this. (set it before each line starting from a base of 0)
#Echo Off
Setlocal EnableDelayedExpansion
Set "i=0"
For /F "UseBackQ Delims=" %%A In ("sys.txt") Do (
Set/A "i+=1"
Set "pref!i!=%%A"
Echo(pref!i!)
Echo(last value of i = %i%
For /L %%A in (1,1,%i%) Do Echo(!pref%%A!

Reformatting Web Query Results for use in Batch

This is working, but doesn't feel elegant to me. I'm creating an automated movie archive script in batch and would like to automatically find a movie title based on the disc volume name. The web query is done via tmdb, but returned results is difficult to parse since it isn't meant for batch. The results would be a contiguous line like:
{"page":1,"results":[{"poster_path":"\/5ttOaThDVmTpV8iragbrhdfxEep.jpg","adult":false,"overview":"At the height of the Cold War, a mysterious criminal organization plans to use nuclear weapons and technology to upset the fragile balance of power between the United States and Soviet Union. CIA agent Napoleon Solo and KGB agent Illya Kuryakin are forced to put aside their hostilities and work together to stop the evildoers in their tracks. The duo's only lead is the daughter of a missing German scientist, whom they must find soon to prevent a global catastrophe.","release_date":"2015-08-13","genre_ids":[35,28,12],"id":203801,"original_title":"The Man from U.N.C.L.E.","original_language":"en","title":"The Man from U.N.C.L.E.","backdrop_path":"\/bKxcCNv2xq8M3GD5iSrv9bMGDVa.jpg","popularity":5.346674,"vote_count":1842,"video":false,"vote_average":7},{"poster_path":"\/3VScfiBmE1loQxMkuN1suALv4f8.jpg","adult":false,"overview":"When THRUSH steals a nuclear weapon and demands a ransom delivered by Napoleon Solo, UNCLE recalls him and his partner to duty.","release_date":"1983-04-05","genre_ids":[28,80,53,10770],"id":94116,"original_title":"The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair","original_language":"en","title":"The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair","backdrop_path":"\/5LGBhGg5Tj9OSW4rD0itz0sYKPT.jpg","popularity":1.046707,"vote_count":5,"video":false,"vote_average":3.6}],"total_results":2,"total_pages":1}
You don't really know what you're going to get or how many titles will be returned. Dumping this into a file and reading back tokens doesn't make sense. The delimiter is a string (,") so I've come up with the following script which does function.
#echo off
setlocal EnableDelayedExpansion
set _tmdbReturn=
set _metaDataFile=
set _metaDataFile="C:\some path\metaData.txt"
set _metaDataFile=%_metaDataFile:~1,-1%
:: Do a movie title search based on a Disc Volume Label
for /f "usebackq delims=" %%a in (`PowerShell -Command "(new-object net.webclient).DownloadString('https://api.themoviedb.org/3/search/movie?api_key=xxx&query=The+Man+from+uncle')"`) do (set _tmdbReturn=%%a)
:: Result is in a contiguous string and the delimiter is a string with a comma and double quotes (,")
:: Replace delimiter string with a single character that does not occur in tmdb data
set _tmdbReturn=%_tmdbReturn:,"=#"%
set _tmdbReturn=%_tmdbReturn:"=%
:: replace unique single character with a line feed
set _tmdbReturn=!_tmdbReturn:#=^
!
:: Eliminate the special character
set _tmpdbReturn=!_tmdbReturn:#=!
:: Rewrite data to txt file with row separated data.
echo !_tmdbReturn!>"%_metaDataFile%"
set x=
set /a x=0
for /f "tokens=* delims=#" %%a in ('type "%_metaDataFile%"') do (
if !x!==0 (
set _newline=%%a
echo !_newline!>"%_metaDataFile%"
) else (
set _newline=%%a
echo !_newline!>>"%_metaDataFile%"
)
set /a x+=1
)
My question is two fold...is there a better way to do this? I also have not figured out how to write to the _metaDataFile without first dumping !_tmdbReturn! into a txt file. I've tried replacing the command of the last For Loop with
for /f "tokens=* delims=#" %%a in ('echo !_tmdbReturn!') do (
Only the first token writes yet
echo !_tmdbReturn!
displays the data properly producing the following:
{page:1
results:[{poster_path:\/5ttOaThDVmTpV8iragbrhdfxEep.jpg
adult:false
overview:At the height of the Cold War, a mysterious criminal organization plans to use nuclear weapons and technology to upset the fragile balance of power between the United States and Soviet Union. CIA agent Napoleon Solo and KGB agent Illya Kuryakin are forced to put aside their hostilities and work together to stop the evildoers in their tracks. The duo's only lead is the daughter of a missing German scientist, whom they must find soon to prevent a global catastrophe.
release_date:2015-08-13
genre_ids:[35,28,12]
id:203801
original_title:The Man from U.N.C.L.E.
original_language:en
title:The Man from U.N.C.L.E.
backdrop_path:\/bKxcCNv2xq8M3GD5iSrv9bMGDVa.jpg
popularity:5.346674
vote_count:1842
video:false
vote_average:7},{poster_path:\/3VScfiBmE1loQxMkuN1suALv4f8.jpg
adult:false
overview:When THRUSH steals a nuclear weapon and demands a ransom delivered by Napoleon Solo, UNCLE recalls him and his partner to duty.
release_date:1983-04-05
genre_ids:[28,80,53,10770]
id:94116
original_title:The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair
original_language:en
title:The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair
backdrop_path:\/5LGBhGg5Tj9OSW4rD0itz0sYKPT.jpg
popularity:1.046707
vote_count:5
video:false
vote_average:3.6}]
total_results:2
total_pages:1}
I'm attempting to redirect echo !_tmdbReturn! to the Find function extracting a particular value by name. I can do it in a file using findstr, but was trying it on the variable. I'm not fluent in batch so any suggestions are appreciated.
In case its useful for someone I settled on the following:
set x=
set /a x=0
set y=
set /a y=0
:: clean up the beginning of the data replace {" with " so poster_path is passed as a value
set _tmdbReturn=%_tmdbReturn:{"="%
set _tmdbReturn=%_tmdbReturn:~0,-1%
for /F "tokens=1* delims=[" %%a in ("!_tmdbReturn!") do ( set _tmdbReturn=%%b)
rem Separate the string in lines at ," delimiter
for /F "delims=" %%a in (^"!_tmdbReturn:^,^"^=^
% Do NOT remove this line %
!^") do (
set "line=%%a"
rem Eliminate quotes
set "line=!line:"=!"
rem Show lines of desired values only
for /F "tokens=1* delims=:" %%b in ("!line!") do (
if "%%b" equ "poster_path" set /a x+=1
if "%%b" equ "total_results" (
call set _movie.%%b=%%c
) else (
echo call set _movie[!x!].%%b=%%c
call set _movie[!x!].%%b=%%c
)
)
)
This give me an array of the returned results with structured object properties that I can use as my script morphs. This may be old hat to most, but I'm having fun!
The code below separates your long string in several lines:
#echo off
setlocal EnableDelayedExpansion
:: Do a movie title search based on a Disc Volume Label
:: for /f "usebackq delims=" %%a in (`PowerShell -Command "(new-object net.webclient).DownloadString('https://api.themoviedb.org/3/search/movie?api_key=xxx&query=The+Man+from+uncle')"`) do (set _tmdbReturn=%%a)
for /F "delims=" %%a in (input.txt) do set "_tmdbReturn=%%a"
rem Separate the string in lines at ," delimiter
for /F "delims=" %%a in (^"!_tmdbReturn:^,^"^=^
% Do NOT remove this line %
!^") do (
set "line=%%a"
rem Eliminate quotes
echo !line:"=!
)
I stored your long line in input.txt file for my testings.
About "redirect echo !tmdbReturn! to the Find function extracting a particular value by name"; if you show what exactly you want, perhaps I could show you how to get an equivalent result in a simpler way (without using find command)...
EDIT: Show just desired values
If you want not to create a file with all lines, but just show the lines of a desired value, then you may directly look for such a value in each line:
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%a in (input.txt) do set "_tmdbReturn=%%a"
rem Separate the string in lines at ," delimiter
for /F "delims=" %%a in (^"!_tmdbReturn:^,^"^=^
% Do NOT remove this line %
!^") do (
set "line=%%a"
rem Eliminate quotes
set "line=!line:"=!"
rem Show lines of desired values only
for /F "tokens=1* delims=:" %%b in ("!line!") do (
if "%%b" equ "original_title" echo %%b: %%c
)
)
You may also look for several values; just define the list of the desired values at beginning, enclosing all values by a certain delimiter character:
set "values=/title/original_title/"
... and change the if command inside the for by this one:
if "!values:/%%b/=!" neq "%values%" echo %%b: %%c

print specific lines from a batch file

I am trying to print Line 4, Col 21-50 out of a text file, can this be simply done under Windows somehow? I've been trying to do this:
FOR /F "usebackq tokens=1 delims=-" %G IN (%COMPUTERNAME%.txt) DO ECHO %G
This is just working out terribly. Can't I just print a specific set of lines?
I need this script to be run on multiple computers, ideally I'd like to convert it to a variable for use with slmgr -ipk, maybe someone has a better suggestion?
Contents of text file (I want the XXXXX-XXXXX-XXXXX-XXXXX-XXXXX portion):
==================================================
Product Name : Windows 7 Professional
Product ID : 00371-OEM-9044632-95844
Product Key : XXXXX-XXXXX-XXXXX-XXXXX-XXXXX
Installation Folder : C:\Windows
Service Pack : Service Pack 1
Computer Name : LIBRA
Modified Time : 6/4/2015 7:26:54 PM
==================================================
if you want only the "Product Key" line you can try with
type %COMPUTERNAME%.txt|find /i "Product Key"
or
for /f "tokens=2 delims=:" %%# in (' type %COMPUTERNAME%.txt^|find /i "Product Key"') do echo %%#
For the task at hand, npocmaka's answer is the best suitable approach, as it does not insist on a fixed position of the string to extract from the file.
However, I want to provide a variant that sticks to a certain position.
The following code extracts the string placed at columns 21 to 50 in line 4 of file list.txt (the result is echoed (enclosed in "") and stored in variable LINE_TXT (without ""):
#echo off
for /F "tokens=1,* delims=:" %%L in (
'findstr /N /R ".*" "list.txt"'
) do (
if %%L equ 4 (
set "LINE_TXT=%%M"
goto :NEXT
)
)
:NEXT
if defined LINE_TXT set "LINE_TXT=%LINE_TXT:~20,29%"
echo."%LINE_TXT%"
The goto :NEXT command terminates the for /F loop at the given line; this is not mandatory but will improve performance for huge files (as long as the given line number is quite small).
To be more flexible, the following code can be used (define the string position in the initial set block):
#echo off
rem Define the string position here:
set FILE_TXT="list.txt"
set LINE_NUM=4
set COL_FROM=21
set COL_UPTO=50
setlocal EnableDelayedExpansion
set /A COL_UPTO-=COL_FROM
set /A COL_FROM-=1
for /F "tokens=1,* delims=:" %%L in (
'findstr /N /R ".*" %FILE_TXT%'
) do (
if %%L equ %LINE_NUM% (
set "LINE_TXT=%%M"
if defined LINE_TXT (
set "LINE_TXT=!LINE_TXT:~%COL_FROM%,%COL_UPTO%!"
)
goto :NEXT
)
)
:NEXT
endlocal & set "LINE_TXT=%LINE_TXT%"
echo."%LINE_TXT%"
Both of the above code snippets rely on the output of findstr /N /R ".*", which returns every line that matches the regular expression .*, meaning zero or more characters, which in turn is actually true for every line in the file; however, the switch /N defines to prefix each line with its line number, which I extract and compare with the originally defined one.
Here is another variant which uses for /F to directly loop through the content (lines) of the given text file, without using findstr:
#echo off
for /F "usebackq skip=3 eol== delims=" %%L in (
"list.txt"
) do (
set "LINE_TXT=%%L"
goto :NEXT
)
:NEXT
if defined LINE_TXT set "LINE_TXT=%LINE_TXT:~20,29%"
echo."%LINE_TXT%"
This method has got the better performance, because there is the skip option which skips parsing of and iterating through all lines (1 to 3) before the line of interest (4), opposed to the findstring variant.
However, there is one disadvantage:
for /F features an eol option which defines a character interpreted as line comment (and defaults to ;); there is no way to switch this option off as long as delims= defines no delimiters (last position in option string), which is mandatory here to return the line as is; so you have to find a character that does not appear as the first one in any line (I defined = here because your sample text file uses this as header/footer character only).
To extract a string from line 1, remove the skip option as skip=0 results in a syntax error.
Note that goto :NEXT is required here; otherwise, the last (non-empty) line of the file is extracted.
Although for /F does not iterate any empty lines in the file, this is no problem here as the skip option does not check the line content and skip over empty lines as well.
Finally, here is one more approach using more +3 where no text parsing is done. However, a temporary file is needed here to pass the text of the desired line to the variable LINE_TXT:
#echo off
set LINE_TXT=
more +3 "list.txt" > "list.tmp"
set /P LINE_TXT= < "list.tmp"
del /Q "list.tmp"
if defined LINE_TXT set "LINE_TXT=%LINE_TXT:~20,29%"
echo."%LINE_TXT%"
exit /B 0
This method avoids for /F and therefore the problem with the unwanted eol option as mentioned in the above solution. But this does not handle tabs correctly as more substitutes them with spaces (8 indent spaces as per default and configurable by the /Tn switch where n is the number of spaces).

Trying to reformat a very large csv with a batch file

I have an application that exports data in the format:
1a,1b,1c1,1c2,1c3, ... (up to 1c100),1d1,1d2,1d3, ... (up to 1d100)
2a,2b,2c1,2c2,2c3, ... (up to 2c100),2d1,2d2,2d3, ... (up to 2d100)
etc.
and I am trying to reformat this into
1a,1b,1c1,1d1
1a,1b,1c2,1d2
.
.
1a,1b,1c100,1d100
2a,2b,2c1,2d1
2a,2b,2c2,2d2
etc.
I figured that if this can be done a row at a time I can just loop through the file. However I can't find a way of doing a single row with either tokens, a list, or even as a string function. There is too much data to process in a single operation (each value is about 12 chars). Tokens limit at (roughly) 64/202, a list at about 107/202 and a string at about 1000/2300
Does anyone know how this can be written into a new file?
I was trying things like:
#echo off
setlocal enableDelayedExpansion
set dimCnt=0
<example.csv (
set /p "dimList=" >nul
for %%D in (!dimList!) do (
set /a dimCnt+=1
set "dim[!dimCnt!]=%%D"
)
)
echo
for /l %%I in (3 1 102) do echo !dim[1]!,!dim[2]!,!dim[%%I]!
</code>
..besides the fact that I have missed out the last variable in the line (need to add 100 to it), I can't get more than about 80-110 values out of the list (I guess it depends on value string length)
#echo off
setlocal enableextensions enabledelayedexpansion
(for /f "tokens=1,2,* delims=," %%a in (example.csv) do (
set "data=%%c"
set "i=0"
for %%f in ("!data:,=" "!") do (
set /a "i+=1"
set "d[!i!]=%%~f"
)
set /a "end=!i!/2"
set /a "j=!end!+1"
for /l %%i in (1 1 !end!) do (
for %%j in (!j!) do echo %%a,%%b,!d[%%i]!,!d[%%j]!
set /a "j+=1"
)
)) > output.csv
endlocal
This iterates over the file, getting the first two tokens in the line (%%a and %%b), the rest of the line (%%c) is splitted and each value stored in an environment variable array (kind of). Then, the array is iterated from the start and from the middle, reading the needed values to append to %%a and %%b and generating output file.
#ECHO OFF
SETLOCAL
(
FOR /f "tokens=1,2,*delims=," %%a IN (u:\long.csv) DO (
SET rpta=%%a
SET rptb=%%b
CALL :rptcd %%c
)
)>newfile.txt
GOTO :EOF
:rptcd
SET /a lines=100
SET lined=%*
FOR /l %%x IN (1,1,99) DO CALL SET lined=%%lined:*,=%%
:loop
IF %lines%==0 GOTO :EOF
SET /a lines-=1
CALL SET lined=%lined:*,=%
FOR /f "delims=," %%x IN ("%lined%") DO ECHO %rpta%,%rptb%,%1,%%x&shift&GOTO loop
GOTO :eof
This should get you going - just need to change the input filename and output filename...
Your code does not work because SET /P cannot read more than 1023 bytes. At that point it returns the data read so far, and the next SET /P picks up where it left off. Adapting your code to compensate will be very difficult. You would be better off using FOR /F as in MC ND's answer. But beware, batch has a hard limit of 8191 characters per line in pretty much all contexts.
Better yet, you could use another scripting language like JScript, VBS, or PowerShell. Performance will be much better, and the code much more robust and far less arcane. I love working with batch, but it simply is not a good text processing language.

How to change an image tag url in multiple html files using batch script?

There are more than 10 html files with image tags. Every time we deploy our build onto test site we need to change the img source. for eg <img src=/live/Content/xyz.png />
to <img src=/test/Content/xyz.png />.
After looking around and reading for sometime, i have come up with the following batch script, however i cant figure out how do i go further from here :
for /r %%i in (*.html) do echo %%i
for %%f in (*.html) do (
FOR /F %%L IN (%%f) DO (
SET "line=%%L"
SETLOCAL ENABLEDELAYEDEXPANSION
SET "x= <--------------------WHAT DO I SET HERE?
echo %x%
ENDLOCAL )) pause
This is my first batch script, could anyone please guide me in the right direction?
#ECHO OFF
SETLOCAL enabledelayedexpansion
for /r U:\ %%i in (*.html) do (
echo found %%i
SET outfile="%%~dpni.lmth"
(
SETLOCAL disabledelayedexpansion
FOR /F "usebackq delims=" %%L IN ("%%i") DO (
SET "line=%%L"
SETLOCAL ENABLEDELAYEDEXPANSION
SET "line=!line:/live/=/test/!
echo !line!
ENDLOCAL
)
ENDLOCAL
)>!outfile!
)
pause
GOTO :EOF
How about this development?
Notes:
I've modified your FOR/R to ECHO the HTML file being processed and use %%i rather than switching to %%f. U: is my RAMDRIVE; you'd need to modify that to suit.
outfile is set to generate a filename which matches the HTML filename, but with a .lmth extension (can't update in-place) - it gets that from the ~dpn prefixing the i, which means the drive, path and name of the file %%i. It's quoted to take care of potential spaces in the filename or pathname.
The next logical statement is (for /f...[lines] )>!outfile! which sends any echoed text to a NEW file !outfile!. The enabledelayedexpansion in the second physical line of the batch makes !outfile! the RUN-TIME value - as it is changed within the FOR r outer loop.
Since the actual HTML filename in %%i may contain spaces, it needs to be quoted, hence the 'usebackq' clause in the FOR/F. The delims= clause ensures that the ENTIRE line from the file "%%i" is applied to %%L - not just the first token (well, actually, makes the entire line the first token).
The SET command substitutes the string "/test/" for any occurrence of "/live/" in the RUN-TIME value of the variable lineand assigns the result to line. The resultant value is then ECHOd - which is redirected to outfile
Note that in your original, you would be assigning x in the set x= but echo %x% would have reproduced x as it stood when the line was PARSED because batch substitutes the value of any variable for %var% as part of the parsing phase. Consequently, the line would have become simply ECHO (since x would likely be unassigned) and bizarrely would have reported the echo state (Echo is OFF)
A couple of gatchas here. First, % and some other characters are notoriously hard to process with batch, so be careful. Next, FOR/F will bypass empty lines. This can be overcome if required. Third, this will replace ANY occurrence of /live/ in any case with /test/
Good luck!
Edit to support exclamation marks: 20130711T0624Z
Added SETLOCAL enabledelayedexpansion line and ENDLOCAL just before )>!outfile! to match

Resources