I am trying to create a batch file that will edit a .csv and remove the first column, and any summary lines contained in the file. I am, however, fairly new to programming batch files, so I am not sure the best way to start this, and it would be great if you could include a basic explanation of how the code works so I can be self-sustaining in the future!
,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
ABB - Egypt,,,,,,,,,,,
ElAin EL-Sokhna,,,,,,,,,,,
,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
,Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
ABB - Egypt - Other,,,,,,,,,,,
There are various iterations of this file, as they come from a monthly report, I need to remove the first (empty) column, and any rows that look like ABB - Egypt,,,,,,,,,,, or Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
So the output should be:
Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts,,Training Income,15000,,15000
Invoice,09-14-11,13002,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Thanks for the input!
EDIT: It seems I wasn't clear enough in my OP (Sorry, first time here).
There are two processes that need to happen here, in every file the first column must be deleted, and any lines that are either title lines ABB - Egypt,,,,,,,,,,, or summary lines Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794 need to be removed.
All lines that need to be kept will be mostly filled in, such as ,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance or ,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000 Notice that, as in the second line, it is possible for there to be some missing values in them, so doing a search for something like ",," will not work.
Batch is a terrible language for modifying text files. There are a great many special cases that require arcane knowledge to work around the problem. You may have a script that seems to do what you want, and then some wrinkle appears in your data, and the entire script may have to be redesigned.
With regard to your specific problem, it appears to me that you only want to preserve rows that begin with a comma, meaning the first column is empty. Of those remaining rows, you want to remove the first (empty) column.
Assuming none of the rows you want to keep have an empty value for the second column, then there is a really trivial solution:
#echo off
>"%~1.new" (for /f "delims=, tokens=*" %%A in ('findstr "^," %1') do echo %%A)
move /y "%~1.new" %1 >nul
The script expects the file to be passed as the first and only argument. So if your script is named "fixCSV.bat", and the file to be modified is "c:\test\file.csv", then you would use:
fixCSV "c:\test\file.csv"
The %1 expands to the value of the first argument, and %~1 is the same, except it also strips any enclosing quotes that may or may not be present.
The FINDSTR command reads the file and writes out only lines that begin with a comma. The FOR /F command iterates each line of output. The "delims=, tokens=*" options effectively strip all leading commas from each line, and the result is in variable %%A, which is then ECHOed. The entire construct is enclosed in parentheses and stdout is redirected to a temporary file. Finally, the temporary file is moved over top of the original file, thus replacing it.
If the 2nd column may be empty, then the result will be corrupted because it removes all leading commas (both columns 1 and 2 in this case). The script must be more complicated to compensate. You would need to set a variable and then use delayed expansion to get the sub-string, skipping the first character. But delayed expansion will corrupt expansion of the %%A variable if it contains the ! character. So delayed expansion must be toggled on and off. You are beginning to see what I mean by lots of special cases.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f "delims=" %%A in ('findstr "^," %1') do (
set "ln=%%A"
setlocal enableDelayedExpansion
echo !ln:~1!
endlocal
)
)
move /y "%~1.new" %1 >nul
As the batch scripts become more complicated, they become slower and slower. It may not be an issue for most files, but if the file is really large (say hundreds of megabytes) then it can become an issue.
I almost never use pure batch to modify text files anymore. Instead, I use a hybrid JScript/batch utility that I wrote called JREPL.BAT. The utility is pure script that runs natively on any Windows machine from XP onward. JREPL.BAT is able to efficiently modify text files using regular expression replacement. Regular expressions can appear to be mysterious, but they are well worth the investment in learning.
Assuming you have JREPL.BAT somewhere within your PATH, then the following command is all that you would need:
jrepl "^,(.*)" "$1" /jmatch /f "yourFile.csv" /o -
The /F option specifies the file to read.
The /O option with value of - specifies that the output should replace the original file.
The /JMATCH option specifies that each replacement value is written out to a new line. All other text is dropped.
The first argument is the search expression. It matches any line that begins with a comma, and everything after that is captured in a variable named $1.
The second argument specifies the replacement value, which is simply the captured value in variable $1.
A way will be to define all your rules in a variable which will be used against
findstr. The rules must be defined like this :
/c:"String which exclude the line" /c:"Another string which exclude the Line" /c: "etc.."
This rules must be exact (That they can't be found in a line who must stay).
For the empty first colonne you can use a substitution the way i made it in the code with
,Type=Type
,Invoice=Invoice
Test.bat :
#echo off&cls
setlocal enabledelayedexpansion
Rem The rules
set $String_To_Search=/c:"ABB - Egypt," /c:"Total ElAin El-Sokhna," /c:"ElAin EL-Sokhna," /c:"ABB - Egypt - Other,"
for /f "delims=" %%a in (test.csv) do (
set $line=%%a
Rem the substitutions for the first Column
set $Line=!$Line:,Type=Type!
set $line=!$Line:,Invoice=Invoice!
Rem the test and the ouput if nothing was found
echo !$Line! | findstr /i %$String_To_Search% >nul || echo !$Line!
))>Output.csv
I used a file test.csv for my test.
The ouput is redirected to Output.csv
Perhaps is this what you want?
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%a in (input.csv) do (
set "line=%%a"
if "!line:~0,1!" equ "," echo !line:~1!
)
When a problem is not enough explained we can only guess the missing details. In this case, I assumed that you just want the lines that start with comma, deleting it. The output is the same as your output example...
EDIT: Output example added
Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
I would start here to learn this: How can you find and replace text in a file using the Windows command-line environment?
It covers many details of substitution from Windows command line and many ways to do it, some requiring only what's built into Windows, and some requiring other downloadable software.
Magoo is right, more criteria is needed, but there might be enough information in the linked page for you to get past the main hurdles.
#ECHO OFF
SETLOCAL
(FOR /f "tokens=*delims=," %%a IN ('findstr /b /l "," q28079306.txt') DO ECHO %%a)>newfile.txt
GOTO :EOF
I used a file named q28079306.txt containing your data for my testing.
Produces newfile.txt
I have this script, to read xml file. The file contains coordinates and I want to list the coordinates:
#echo off
setlocal EnableDelayedExpansion
FOR %%K IN (*.xml) DO (
SET K=%%K
SET K=!K:~0,-4!
SET "prep=0"
REM READ DATA
FOR /F "tokens=*" %%X IN (!K!.kml) DO (
if !prep! == 1 (
echo %%X
pause
FOR /F %%L IN ("%%X") DO (
SET L=%%L
IF NOT "!L:~0,1!" == "<" (
echo %%L
)
)
SET "prep=0"
)
if "%%X" == "<coordinates>" ( SET "prep=1" )
)
)
I got these result:
14.63778004128814,49.50141683426452,0 14.63696238385996,49.48348965654706,0 14.6
8840586504191,49.47901033971912,0 14.68589371304878,49.49939179836829,0 14.63778
004128814,49.50141683426452,0 </coordinates>
Press and key to continue...
14.63778004128814,49.50141683426452,0
Press and key to continue...
First you see the line with coordinates. Second, in the 3rd loop, there are coordinates printed. But I have only one pair of coordinates printed... If I will press a key again, the batch finishes without printing next columns. Can you help?
Edit
After the answer has been posted, I have question 1) could we use this:
SET LF=^
setlocal EnableDelayedExpansion
... (next code) ...
set "var=!var: =%LF%!"
So when there is no delayed LF variable, we could embed it. Or not?
And 2) why in your code
for %%L in ("!LF!") do set "X=!X: =%%~L!"
Did you use %%~L and not just %%L
Your immediate problem is that FOR /F does not iterate the tokens in a line. It simply parses each token that you ask for. If you don't specify a "tokens" option, then it defaults to "tokens=1" - it only parses the first token in the line.
However, FOR /F will treat a string as multiple lines if the string contains linefeed characters. It will then iterate each line like you want. The trick is to replace your space delimiter with a line feed character. There are multiple methods that can do the job, but I will show what I think is the easiest to work with.
First define a variable containing a single linefeed
set LF=^
::The two blank lines above are critical for the definition of the line feed
The next trick is to replace spaces in your variable with linefeeds. Normally substituion using a variable for the replacement would look something like set "var=!var:search=%replaceVar%!". But that won't work for the LF variable - it is difficult to work with the LF variable using normal expansion. It is much easier to use delayed expansion. We can't embed delayed expansion within delayed expansion, but we can transfer the value of LF to a simple FOR variable and use for %%L in ("!LF!") do set "var=!var: =%%~L!"
One thing about your code I do not understand - your initial FOR loop is iterating accross all the .KML files. You strip off the extension using a substring operation. There is a much easier way to do that without using an environment variable: %%~nK will give the base name of the file without the extension. But why do that at all when you turn around and append the extension again?
I used the %%K value directly - I added the USEBACKQ option and added quotes to allow for spaces in the file name.
Here is code that should do what you are expecting.
#echo off
setlocal EnableDelayedExpansion
::define a variable containing a linefeed character
set LF=^
::Above 2 blank lines are part of the LF definition, do not remove
for %%K in (*.kml) do (
set "prep=0"
for /f "usebackq tokens=*" %%X in ("%%K") do (
if !prep! == 1 (
echo %%X
pause
set "ln=%%X"
for %%L in ("!LF!") do set "ln=!ln: =%%~L!"
for /f %%L in ("!ln!") do (
set L=%%L
if not "!L:~0,1!" == "<" (
echo %%L
)
)
set "prep=0"
)
if "%%X" == "<coordinates>" ( set "prep=1" )
)
)
BUT - I think you have a bigger problem. I am worried that you are setting yourself up for a world of pain by using batch to parse XML. You are assuming the XML will always be layed out the same way. There are countless valid ways of adding or subtracting linefeeds and white space into the XML document that would break your algorithm. Can you be sure all your input files came from the same source and will always be formatted like you expect? I think you really should be using XSLT to parse and transform your XML document into a naked list of coordinates.
Answsers to additional questions
1) set "var=!var: =%LF%!" will not work - Regular expansion of LF requires escape sequences and multiple expansions. This will work: set "var=!var: =^%LF%LF%!"
The escape sequences for %LF% can get very tricky, so I try to avoid them.
2) Regarding for %%L in ("!LF!") do set "X=!X: =%%~L!", note that it is a simple FOR, not FOR /F. The !LF! must be quoted or else FOR will not read it. But the FOR statement preserves the quotes (unlike FOR /F), so I need %%~L to remove the enclosing quotes.
There is a very important distinction between FOR and FOR /F with regard to linefeeds. FOR will preserve quoted linefeeds, whereas FOR /F treats the linefeed as a line delimiter and iterates each line, so the linefeeds are not preserved.
Sometimes FINDSTR with multiple literal search strings fails to find all matches. For example, the following FINDSTR example fails to find a match.
echo ffffaaa|findstr /l "ffffaaa faffaffddd"
Why?
Apparantly this is a long standing FINDSTR bug. I think it can be a crippling bug, depending on the circumstances.
I have confirmed the command fails on two different Vista machines, a Windows 7 machine, and an XP machine.
I found this findstr - broken ??? link that reports a similar search fails on Windows Server 2003, but it succeeds on Windows 2000.
I've done a number of experiments and it seems all of the following conditions must be met for the potential of a failure:
The search is using multiple literal search strings
The search strings are of different lengths
A short search string has some amount of overlap with a longer search string
The search is case sensitive (no /I option)
In every failure I have seen, it is always one of the shorter search strings that fails.
It does not matter how the search strings are specified. The same faulty result is achieved using multiple /C:"search" options and also with the /G:file option.
The only 3 workarounds I have been able to come up with are:
Use the /I option if you don't care about case. Obviously this might not meet your needs.
Use the /R regular expression option. But if you do then you have to make sure you escape any meta-characters in the search so that it matches the result expected of a literal search. This can be problematic as well.
If you are using the /V option, then use multiple piped FINDSTR commands with one search string each instead of one FINDSTR with multiple searches. This also can be a problem if you have a lot of search strings for which you want to use the /G:file option.
I hate this bug!!!!
Note - See What are the undocumented features and limitations of the Windows FINDSTR command? for a comprehensive list of FINDSTR idiosyncrasies.
I cannot tell why findstr may fail with multiple literal strings. However, I can provide a method to work around that annoying bug.
Given that the literal search strings are listed in a text file called search_strings.txt...:
ffffaaa
faffaffddd
..., you can convert it to regular expressions by inserting a backslash in front of every single character:
#echo off
setlocal EnableExtensions DisableDelayedExpansion
> "regular_expressions.txt" (
for /F usebackq^ delims^=^ eol^= %%S in ("search_strings.txt") do (
set "REGEX=" & set "STRING=%%S"
for /F delims^=^ eol^= %%T in ('
cmd /U /V /C echo(!STRING!^| find /V ""
') do (
set "ESCCHR=\%%T"
if "%%T"="<" (set "ESCCHR=%%T") else if "%%T"=">" (set "ESCCHR=%%T")
setlocal EnableDelayedExpansion
for /F "delims=" %%U in ("REGEX=!REGEX!!ESCCHR!") do (
endlocal & set "%%U"
)
)
setlocal EnableDelayedExpansion
echo(!REGEX!
endlocal
)
)
endlocal
Then use the converted file regular_expressions.txt...:
\f\f\f\f\a\a\a
\f\a\f\f\a\f\f\d\d\d
...to do a regular expression search, which seems to work fine also with multiple search strings:
echo ffffaaa| findstr /R /G:"regular_expressions.txt"
The preceding backslashes simply escape every character including those that have a particular meaning in regular expression searches.
The characters < and > are excluded from being escaped in order to avoid conflicts with word boundaries, which were expressed by \< and \> when appearing at the beginning and at the end of a search string, respectively.
Since regular expressions are limited to 254 characters for findstr versions past Windows XP (opposed to literal strings, which are limited to 511 characters), the length of the original search strings is limited to 127 characters, because every such character is expressed by two characters due to the escaping.
Here is an alternative approach that only escapes the meta-characters ., *, ^, $, [, ], \, ":
#echo off
setlocal EnableExtensions DisableDelayedExpansion
set "_META=.*^$[]\"^" & rem (including `"`)
> "regular_expressions.txt" (
for /F usebackq^ delims^=^ eol^= %%S in ("search_strings.txt") do (
set "REGEX=" & set "STRING=%%S"
for /F delims^=^ eol^= %%T in ('
cmd /U /V /C echo(!STRING!^| find /V ""
') do (
set "CHR=%%T"
setlocal EnableDelayedExpansion
if not "!_META!"=="!_META:*%%T=!" set "CHR=\!CHR!"
for /F "delims=" %%U in ("REGEX=!REGEX!!CHR!") do (
endlocal & set "%%U"
)
)
setlocal EnableDelayedExpansion
echo(!REGEX!
endlocal
)
)
endlocal
The advantage of this method is that the length of the search strings is no longer limited to 127 characters but to 254 characters minus 1 for every occurring aforementioned meta-character, applying for findstr versions past Windows XP.
Here is another work-around, using a case-insensitive search with findstr at the first place, then post-filtering the result by case-sensitive comparisons:
echo ffffaaa|findstr /L /I "ffffaaa faffaffddd"|cmd /V /C set /P STR=""^&if #^^!STR^^!==#^^!STR:ffffaaa=ffffaaa^^! (echo(^^!STR^^!) else if #^^!STR^^!==#^^!STR:faffaffddd=faffaffddd^^! (echo(^^!STR^^!)
The double-escaped exclamation marks ensure the variable STR is expanded in the explicitly invoked cmd instance even in case delayed expansion is enabled in the hosting cmd instance.
By the way, due to what I call a design flaw, searches with literal strings using findstr never work reliably as soon as they contain backslashes, because such may still be consumed to escape following meta-characters, although not necessary; for example, the search string \. actually matches .; to truly match \. literally, you must specify the search string \\.. I do not understand why meta-characters are still recognised when doing literal searches, that is not what I call literal.
Adding a colon ":" before the searched string, made it work for me :
k get service | findstr /R *initiator*
FINDSTR: No search strings
but
k get service | findstr /R :*initiator*
foo-initiator-service ClusterIP 12.97.103.214 <none> 9100/TCP 90d