Trying to reformat a very large csv with a batch file - arrays

I have an application that exports data in the format:
1a,1b,1c1,1c2,1c3, ... (up to 1c100),1d1,1d2,1d3, ... (up to 1d100)
2a,2b,2c1,2c2,2c3, ... (up to 2c100),2d1,2d2,2d3, ... (up to 2d100)
etc.
and I am trying to reformat this into
1a,1b,1c1,1d1
1a,1b,1c2,1d2
.
.
1a,1b,1c100,1d100
2a,2b,2c1,2d1
2a,2b,2c2,2d2
etc.
I figured that if this can be done a row at a time I can just loop through the file. However I can't find a way of doing a single row with either tokens, a list, or even as a string function. There is too much data to process in a single operation (each value is about 12 chars). Tokens limit at (roughly) 64/202, a list at about 107/202 and a string at about 1000/2300
Does anyone know how this can be written into a new file?
I was trying things like:
#echo off
setlocal enableDelayedExpansion
set dimCnt=0
<example.csv (
set /p "dimList=" >nul
for %%D in (!dimList!) do (
set /a dimCnt+=1
set "dim[!dimCnt!]=%%D"
)
)
echo
for /l %%I in (3 1 102) do echo !dim[1]!,!dim[2]!,!dim[%%I]!
</code>
..besides the fact that I have missed out the last variable in the line (need to add 100 to it), I can't get more than about 80-110 values out of the list (I guess it depends on value string length)

#echo off
setlocal enableextensions enabledelayedexpansion
(for /f "tokens=1,2,* delims=," %%a in (example.csv) do (
set "data=%%c"
set "i=0"
for %%f in ("!data:,=" "!") do (
set /a "i+=1"
set "d[!i!]=%%~f"
)
set /a "end=!i!/2"
set /a "j=!end!+1"
for /l %%i in (1 1 !end!) do (
for %%j in (!j!) do echo %%a,%%b,!d[%%i]!,!d[%%j]!
set /a "j+=1"
)
)) > output.csv
endlocal
This iterates over the file, getting the first two tokens in the line (%%a and %%b), the rest of the line (%%c) is splitted and each value stored in an environment variable array (kind of). Then, the array is iterated from the start and from the middle, reading the needed values to append to %%a and %%b and generating output file.

#ECHO OFF
SETLOCAL
(
FOR /f "tokens=1,2,*delims=," %%a IN (u:\long.csv) DO (
SET rpta=%%a
SET rptb=%%b
CALL :rptcd %%c
)
)>newfile.txt
GOTO :EOF
:rptcd
SET /a lines=100
SET lined=%*
FOR /l %%x IN (1,1,99) DO CALL SET lined=%%lined:*,=%%
:loop
IF %lines%==0 GOTO :EOF
SET /a lines-=1
CALL SET lined=%lined:*,=%
FOR /f "delims=," %%x IN ("%lined%") DO ECHO %rpta%,%rptb%,%1,%%x&shift&GOTO loop
GOTO :eof
This should get you going - just need to change the input filename and output filename...

Your code does not work because SET /P cannot read more than 1023 bytes. At that point it returns the data read so far, and the next SET /P picks up where it left off. Adapting your code to compensate will be very difficult. You would be better off using FOR /F as in MC ND's answer. But beware, batch has a hard limit of 8191 characters per line in pretty much all contexts.
Better yet, you could use another scripting language like JScript, VBS, or PowerShell. Performance will be much better, and the code much more robust and far less arcane. I love working with batch, but it simply is not a good text processing language.

Related

Windows batch strip tags

File with HTML Code:
<table>
<tr><th>ID</th><th>NAME</th></tr>
<tr><th>1</th><th>Alex</th></tr>
<tr><th>2</th><th>Andy</th></tr></table>
How to OUTPUT DATA WITHOUT TAGS with windows .bat file? (no vb)
Like this:
1:Alex
2:Andy
Thanks
I like batch, but honestly: it is not a suitable tool for processing xml files.
The following is more an exercise in logic and pain than a suitable solution (but works. At least with something like your example...):
#echo off
setlocal EnableDelayedExpansion
for /f "delims=" %%a in (t.txt) do call :process "%%a"
goto :eof
:process
set "line=%~1"
set flag=0
set var=
for /l %%i in (0,1,100) do (
if "!line:~%%i,1!"=="<" (
set /a "flag+=1"
set "var=!var!:"
)
if !flag!==0 set "var=!var!!line:~%%i,1!"
if "!line:~%%i,1!"==">" set /a flag-=1
)
for /f "tokens=1,2 delims=:" %%b in ("!var!") do echo %%b:%%c
How it works:
the first for loop processes each line of the textfile (one after the other).
The subroutine processes the line character by character. It increases the flag variable each time, it hits a < (plus add a :, because we know, the (possible) value is at it's end) and decreases at each >. So if flag is zero, we are "outside" of a tag, and can add the character to a variable.
The last for just reformats var, because there are too many : (every tag-start a : is added)

Avoid a null value in last for loop iteration?

I have a text file with one string per line (only a couple lines total). This file needs to be searched and each string stored. The script will ultimately prompt the user to choose one of the stored strings (if more than one string/line is present in the file), but the for loop is iterating an extra time when I don't want it to.
I'm using a counter to check the number of iterations, and I also read that it's probably a NL CR regex issue, but the finstr /v /r /c:"^$" in the for file-set like in this post Batch file for loop appears to be running one extra time/iteration doesn't work for me (but I probably don't understand it correctly).
The "pref" term is because the strings are to be eventually used as a prefix of files in the same folder...
#echo off
setlocal enabledelayedexpansion
set /a x=1
for /f %%a in (sys.txt) do (
set pref!x!=%%a && echo %^%pref!X!%^% && set /a x+=1
)
echo last value of x = !x!
for /L %%a in (1,1,!x!) do (
echo !pref%%a!
)
REM The rest would be to prompt user to choose one (if multiple) and
REM then use choice as a prefix with a ren %%a %prefX%%%a
If the "sys.txt" contains three lines with strings A, B, C respectively, then the output I currently get is:
pref1
pref2
pref3
last value of x = 4
A
B
C
ECHO is off.
ECHO is off. is not desired, clearly.
You just need to change your increment structure like this. (set it before each line starting from a base of 0)
#Echo Off
Setlocal EnableDelayedExpansion
Set "i=0"
For /F "UseBackQ Delims=" %%A In ("sys.txt") Do (
Set/A "i+=1"
Set "pref!i!=%%A"
Echo(pref!i!)
Echo(last value of i = %i%
For /L %%A in (1,1,%i%) Do Echo(!pref%%A!

Reformatting Web Query Results for use in Batch

This is working, but doesn't feel elegant to me. I'm creating an automated movie archive script in batch and would like to automatically find a movie title based on the disc volume name. The web query is done via tmdb, but returned results is difficult to parse since it isn't meant for batch. The results would be a contiguous line like:
{"page":1,"results":[{"poster_path":"\/5ttOaThDVmTpV8iragbrhdfxEep.jpg","adult":false,"overview":"At the height of the Cold War, a mysterious criminal organization plans to use nuclear weapons and technology to upset the fragile balance of power between the United States and Soviet Union. CIA agent Napoleon Solo and KGB agent Illya Kuryakin are forced to put aside their hostilities and work together to stop the evildoers in their tracks. The duo's only lead is the daughter of a missing German scientist, whom they must find soon to prevent a global catastrophe.","release_date":"2015-08-13","genre_ids":[35,28,12],"id":203801,"original_title":"The Man from U.N.C.L.E.","original_language":"en","title":"The Man from U.N.C.L.E.","backdrop_path":"\/bKxcCNv2xq8M3GD5iSrv9bMGDVa.jpg","popularity":5.346674,"vote_count":1842,"video":false,"vote_average":7},{"poster_path":"\/3VScfiBmE1loQxMkuN1suALv4f8.jpg","adult":false,"overview":"When THRUSH steals a nuclear weapon and demands a ransom delivered by Napoleon Solo, UNCLE recalls him and his partner to duty.","release_date":"1983-04-05","genre_ids":[28,80,53,10770],"id":94116,"original_title":"The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair","original_language":"en","title":"The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair","backdrop_path":"\/5LGBhGg5Tj9OSW4rD0itz0sYKPT.jpg","popularity":1.046707,"vote_count":5,"video":false,"vote_average":3.6}],"total_results":2,"total_pages":1}
You don't really know what you're going to get or how many titles will be returned. Dumping this into a file and reading back tokens doesn't make sense. The delimiter is a string (,") so I've come up with the following script which does function.
#echo off
setlocal EnableDelayedExpansion
set _tmdbReturn=
set _metaDataFile=
set _metaDataFile="C:\some path\metaData.txt"
set _metaDataFile=%_metaDataFile:~1,-1%
:: Do a movie title search based on a Disc Volume Label
for /f "usebackq delims=" %%a in (`PowerShell -Command "(new-object net.webclient).DownloadString('https://api.themoviedb.org/3/search/movie?api_key=xxx&query=The+Man+from+uncle')"`) do (set _tmdbReturn=%%a)
:: Result is in a contiguous string and the delimiter is a string with a comma and double quotes (,")
:: Replace delimiter string with a single character that does not occur in tmdb data
set _tmdbReturn=%_tmdbReturn:,"=#"%
set _tmdbReturn=%_tmdbReturn:"=%
:: replace unique single character with a line feed
set _tmdbReturn=!_tmdbReturn:#=^
!
:: Eliminate the special character
set _tmpdbReturn=!_tmdbReturn:#=!
:: Rewrite data to txt file with row separated data.
echo !_tmdbReturn!>"%_metaDataFile%"
set x=
set /a x=0
for /f "tokens=* delims=#" %%a in ('type "%_metaDataFile%"') do (
if !x!==0 (
set _newline=%%a
echo !_newline!>"%_metaDataFile%"
) else (
set _newline=%%a
echo !_newline!>>"%_metaDataFile%"
)
set /a x+=1
)
My question is two fold...is there a better way to do this? I also have not figured out how to write to the _metaDataFile without first dumping !_tmdbReturn! into a txt file. I've tried replacing the command of the last For Loop with
for /f "tokens=* delims=#" %%a in ('echo !_tmdbReturn!') do (
Only the first token writes yet
echo !_tmdbReturn!
displays the data properly producing the following:
{page:1
results:[{poster_path:\/5ttOaThDVmTpV8iragbrhdfxEep.jpg
adult:false
overview:At the height of the Cold War, a mysterious criminal organization plans to use nuclear weapons and technology to upset the fragile balance of power between the United States and Soviet Union. CIA agent Napoleon Solo and KGB agent Illya Kuryakin are forced to put aside their hostilities and work together to stop the evildoers in their tracks. The duo's only lead is the daughter of a missing German scientist, whom they must find soon to prevent a global catastrophe.
release_date:2015-08-13
genre_ids:[35,28,12]
id:203801
original_title:The Man from U.N.C.L.E.
original_language:en
title:The Man from U.N.C.L.E.
backdrop_path:\/bKxcCNv2xq8M3GD5iSrv9bMGDVa.jpg
popularity:5.346674
vote_count:1842
video:false
vote_average:7},{poster_path:\/3VScfiBmE1loQxMkuN1suALv4f8.jpg
adult:false
overview:When THRUSH steals a nuclear weapon and demands a ransom delivered by Napoleon Solo, UNCLE recalls him and his partner to duty.
release_date:1983-04-05
genre_ids:[28,80,53,10770]
id:94116
original_title:The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair
original_language:en
title:The Return of the Man from U.N.C.L.E.: The Fifteen Years Later Affair
backdrop_path:\/5LGBhGg5Tj9OSW4rD0itz0sYKPT.jpg
popularity:1.046707
vote_count:5
video:false
vote_average:3.6}]
total_results:2
total_pages:1}
I'm attempting to redirect echo !_tmdbReturn! to the Find function extracting a particular value by name. I can do it in a file using findstr, but was trying it on the variable. I'm not fluent in batch so any suggestions are appreciated.
In case its useful for someone I settled on the following:
set x=
set /a x=0
set y=
set /a y=0
:: clean up the beginning of the data replace {" with " so poster_path is passed as a value
set _tmdbReturn=%_tmdbReturn:{"="%
set _tmdbReturn=%_tmdbReturn:~0,-1%
for /F "tokens=1* delims=[" %%a in ("!_tmdbReturn!") do ( set _tmdbReturn=%%b)
rem Separate the string in lines at ," delimiter
for /F "delims=" %%a in (^"!_tmdbReturn:^,^"^=^
% Do NOT remove this line %
!^") do (
set "line=%%a"
rem Eliminate quotes
set "line=!line:"=!"
rem Show lines of desired values only
for /F "tokens=1* delims=:" %%b in ("!line!") do (
if "%%b" equ "poster_path" set /a x+=1
if "%%b" equ "total_results" (
call set _movie.%%b=%%c
) else (
echo call set _movie[!x!].%%b=%%c
call set _movie[!x!].%%b=%%c
)
)
)
This give me an array of the returned results with structured object properties that I can use as my script morphs. This may be old hat to most, but I'm having fun!
The code below separates your long string in several lines:
#echo off
setlocal EnableDelayedExpansion
:: Do a movie title search based on a Disc Volume Label
:: for /f "usebackq delims=" %%a in (`PowerShell -Command "(new-object net.webclient).DownloadString('https://api.themoviedb.org/3/search/movie?api_key=xxx&query=The+Man+from+uncle')"`) do (set _tmdbReturn=%%a)
for /F "delims=" %%a in (input.txt) do set "_tmdbReturn=%%a"
rem Separate the string in lines at ," delimiter
for /F "delims=" %%a in (^"!_tmdbReturn:^,^"^=^
% Do NOT remove this line %
!^") do (
set "line=%%a"
rem Eliminate quotes
echo !line:"=!
)
I stored your long line in input.txt file for my testings.
About "redirect echo !tmdbReturn! to the Find function extracting a particular value by name"; if you show what exactly you want, perhaps I could show you how to get an equivalent result in a simpler way (without using find command)...
EDIT: Show just desired values
If you want not to create a file with all lines, but just show the lines of a desired value, then you may directly look for such a value in each line:
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%a in (input.txt) do set "_tmdbReturn=%%a"
rem Separate the string in lines at ," delimiter
for /F "delims=" %%a in (^"!_tmdbReturn:^,^"^=^
% Do NOT remove this line %
!^") do (
set "line=%%a"
rem Eliminate quotes
set "line=!line:"=!"
rem Show lines of desired values only
for /F "tokens=1* delims=:" %%b in ("!line!") do (
if "%%b" equ "original_title" echo %%b: %%c
)
)
You may also look for several values; just define the list of the desired values at beginning, enclosing all values by a certain delimiter character:
set "values=/title/original_title/"
... and change the if command inside the for by this one:
if "!values:/%%b/=!" neq "%values%" echo %%b: %%c

How to split string with "=" without for loop in batch file

I would like split a string in two part with = as delimiter.
I saw this post but I do not manage ta adapt.
I try this:
set "str=4567=abcde"
echo %str%
set "var1=%str:^=="^&REM #%
echo var1=%var1%
Why it does not work?
While not a bulletproof solution (use the for, artoon), without more info, this can do the work
#echo off
setlocal enableextensions enabledelayedexpansion
set "str=4567=abcde"
rem Step 1 - remove the left part
set "str1=!str:%str%!"
rem Step 2 - Get the right part
set "right=!str:*%str1%!"
rem Step 3 - Get the left part
set "left=!str:%right%=!"
set "left=%left:~0,-1%"
echo [%left%] [%right%]
edited to adapt to comments (OP code in comments adapted to my code, or the reverse)
for /f "delims=" %%i in ('set') do (
setlocal enabledelayedexpansion
rem Step 1 - remove the left part
set "str=%%i"
for %%x in ("!str!") do set "str1=!str:%%~x!"
rem Step 2 - Get the right part
for %%x in ("!str1!") do set "right=!str:*%%~x!"
rem Step 3 - Get the left part
for %%x in ("!right!") do set "left=!str:%%~x=!"
set "left=!left:~0,-1!"
echo [!left!] [!right!]
endlocal
)
And no, as previously indicated this is not bulletproof and some of the variables show problems (had I said it is not bulletproof?).
What i don't understand is the requirement to not use a for loop and then use a for loop. It is a lot easier this way
for /f "tokens=1,* delims==" %%a in ('set') do (
echo [%%a] [%%b]
)
Another alternative (not as easy as the for, more stable than the previous one, non bulletproof) is
for /f %%a in ('set') do (
call :split left right %%a
echo [!left!] [!right!]
)
goto :eof
:split leftVar rightVar data
set "%~1=%~3"
setlocal enabledelayedexpansion
set "data=%*"
set "data=!data:*%1 %2 %3=!"
set "data=%data:~1%"
endlocal & set "%~2=%data%"
goto :eof
As npocmaka commented above, = has special meaning and cannot be replaced with traditional variable string manipulation. If you know the length of either side of the equal sign, you could strip off a number of characters. For example, if "4567" will always be 4 characters, you could set "var1=%str:~0,4%". Or if "abcde" will always be 5 characters, you could set "var1=%str:~0,-6%" (5 chars + 1 for the equal sign).
Otherwise, a for loop is your only other option without using 3rd party utilities.
for /f "delims==" %%I in ("%str%") do set "var1=%%I"
If you've got grep installed, you can do something like:
echo %str% | grep -P -o "^[^=]*"
... but you'd still need to capture its output with another for /f loop.
If you are allergic to for loops, and as an exercise in providing a solution to your question without any regard for efficiency, here's how you get the first half of your string without using a single for loop. Put grep and its dependencies in your %PATH%. Then:
echo %str% | grep -P -o "^[^=]*" >temp.txt
set /P "var1="<temp.txt
del temp.txt
echo %var1%
There, I fixed it!

Batch - Read contents of a file in an array

I've a text file with two rows (say param.txt) which is shown below:
Mar2012
dim1,dim2,dim3,dim4
I want to read this file in batch and store the contents of first line in a variable called cube_name. When I'm reading the second line, I want to split the comma delimited string dim1,dim2,dim3,dim4 and create an array of four elements. I am planning to use the variable and the array in later part of the script.
The code which I created is shown below. The code is not working as expected.
#echo off & setlocal enableextensions enabledelayedexpansion
set /a count_=0
for /f "tokens=*" %%a in ('type param.txt') do (
set /a count_+=1
set my_arr[!count_!]=%%a
)
set /a count=0
for %%i in (%my_arr%) do (
set /a count+=1
if !count! EQU 1 (
set cube_name=%%i
)
if !count! GTR 1 (
set dim_arr=%%i:#=,%
)
)
for %%i in (%dim_arr%) do (
echo %%i
)
echo !cube_name!
I get to see the following when I run the code:
C:\Working folder>test2.bat
ECHO is off.
So this doesn't appear to work and I can't figure out what I'm doing wrong. I am fairly new to the batch scripting so help is appreciated
Your first FOR loop is OK. It is not how I would do it, but it works. Everything after that is a mess. It looks like you think arrays are a formal concept in batch, when they are not. It is possible to work with variables in a way that looks reminiscent of arrays. But true arrays do not exist within batch.
You use %my_arr% as if it is an array, but my_arr is not even defined. You have defined variables my_arr[1] amd my_arr[2] - the brackets and number are part of the variable name.
It also looks like you have a misunderstanding of FOR loops. I suggest you carefully read the FOR documentation (type HELP FOR from a command line). Also look at examples on this and other sites. The FOR command is very complicated because it has many variations that look similar to the untrained eye, yet have profoundly different behaviors. One excellent resource to help your understanding is http://judago.webs.com/batchforloops.htm
Assuming the file always has exactly 2 lines, I would solve your problem like so
#echo off
setlocal enableDelayedExpansion
set dimCnt=0
<param.txt (
set /p "cube_name=" >nul
set /p "dimList=" >nul
for %%D in (!dimList!) do (
set /a dimCnt+=1
set "dim[!dimCnt!]=%%D"
)
)
echo cube_name=!cube_name!
for /l %%I in (1 1 !dimCnt!) do echo dim[%%I]=!dim[%%I]!
One nice feature of the above solution is it allows for a varying number of terms in the list of dimensions in the 2nd line. It will fail if there are tabs, spaces, semicolon, equal, * or ? in the dimension names. There are relatively simple ways to get around this limitation if need be.
Tabs, spaces, semicolon and equal can be handled by using search and replace to enclose each term in quotes.
for %%D in ("!dimList:,=","!") do (
set /a dimCnt+=1
set "dim[!dimCnt!]=%%~D"
)
I won't post the full solution here since it is not likely to be needed. But handling * and/or ? would require replacing the commas with a new-line character and switching to a FOR /F statement.
I'm impressed of your code!
Do you try to debug or echo anything there?
You could simply add some echo's to see why your code can't work.
#echo off & setlocal enableextensions enabledelayedexpansion
set /a count_=0
for /f "tokens=*" %%a in ('type param.txt') do (
set /a count_+=1
set my_arr[!count_!]=%%a
)
echo ### show the variable(s) beginning with my_arr...
set my_arr
echo Part 2----
set /a count=0
echo The value of my_arr is "%my_arr%"
for %%i in (%my_arr%) do (
set /a count+=1
echo ## Count=!count!, content is %%i
if !count! EQU 1 (
set cube_name=%%i
)
if !count! GTR 1 (
echo ## Setting dim_arr to "%%i:#=,%"
set dim_arr=%%i:#=,%
echo
)
)
for %%i in (%dim_arr%) do (
echo the value of dim_arr is "%%i"
)
echo cube_name is "!cube_name!"
Output is
### show the variable(s) beginning with my_arr...
my_arr[1]=Mar2012
my_arr[2]=dim1,dim2,dim3,dim4
Part 2----
The value of my_arr is ""
cube_name is ""
As you can see your part2 fails completly.

Resources