I am trying to create a batch file that will edit a .csv and remove the first column, and any summary lines contained in the file. I am, however, fairly new to programming batch files, so I am not sure the best way to start this, and it would be great if you could include a basic explanation of how the code works so I can be self-sustaining in the future!
,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
ABB - Egypt,,,,,,,,,,,
ElAin EL-Sokhna,,,,,,,,,,,
,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
,Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
ABB - Egypt - Other,,,,,,,,,,,
There are various iterations of this file, as they come from a monthly report, I need to remove the first (empty) column, and any rows that look like ABB - Egypt,,,,,,,,,,, or Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794
So the output should be:
Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts,,Training Income,15000,,15000
Invoice,09-14-11,13002,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
Thanks for the input!
EDIT: It seems I wasn't clear enough in my OP (Sorry, first time here).
There are two processes that need to happen here, in every file the first column must be deleted, and any lines that are either title lines ABB - Egypt,,,,,,,,,,, or summary lines Total ElAin EL-Sokhna,,,,,,,,,241194,210400,301794 need to be removed.
All lines that need to be kept will be mostly filled in, such as ,Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance or ,Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000 Notice that, as in the second line, it is possible for there to be some missing values in them, so doing a search for something like ",," will not work.
Batch is a terrible language for modifying text files. There are a great many special cases that require arcane knowledge to work around the problem. You may have a script that seems to do what you want, and then some wrinkle appears in your data, and the entire script may have to be redesigned.
With regard to your specific problem, it appears to me that you only want to preserve rows that begin with a comma, meaning the first column is empty. Of those remaining rows, you want to remove the first (empty) column.
Assuming none of the rows you want to keep have an empty value for the second column, then there is a really trivial solution:
#echo off
>"%~1.new" (for /f "delims=, tokens=*" %%A in ('findstr "^," %1') do echo %%A)
move /y "%~1.new" %1 >nul
The script expects the file to be passed as the first and only argument. So if your script is named "fixCSV.bat", and the file to be modified is "c:\test\file.csv", then you would use:
fixCSV "c:\test\file.csv"
The %1 expands to the value of the first argument, and %~1 is the same, except it also strips any enclosing quotes that may or may not be present.
The FINDSTR command reads the file and writes out only lines that begin with a comma. The FOR /F command iterates each line of output. The "delims=, tokens=*" options effectively strip all leading commas from each line, and the result is in variable %%A, which is then ECHOed. The entire construct is enclosed in parentheses and stdout is redirected to a temporary file. Finally, the temporary file is moved over top of the original file, thus replacing it.
If the 2nd column may be empty, then the result will be corrupted because it removes all leading commas (both columns 1 and 2 in this case). The script must be more complicated to compensate. You would need to set a variable and then use delayed expansion to get the sub-string, skipping the first character. But delayed expansion will corrupt expansion of the %%A variable if it contains the ! character. So delayed expansion must be toggled on and off. You are beginning to see what I mean by lots of special cases.
#echo off
setlocal disableDelayedExpansion
>"%~1.new" (
for /f "delims=" %%A in ('findstr "^," %1') do (
set "ln=%%A"
setlocal enableDelayedExpansion
echo !ln:~1!
endlocal
)
)
move /y "%~1.new" %1 >nul
As the batch scripts become more complicated, they become slower and slower. It may not be an issue for most files, but if the file is really large (say hundreds of megabytes) then it can become an issue.
I almost never use pure batch to modify text files anymore. Instead, I use a hybrid JScript/batch utility that I wrote called JREPL.BAT. The utility is pure script that runs natively on any Windows machine from XP onward. JREPL.BAT is able to efficiently modify text files using regular expression replacement. Regular expressions can appear to be mysterious, but they are well worth the investment in learning.
Assuming you have JREPL.BAT somewhere within your PATH, then the following command is all that you would need:
jrepl "^,(.*)" "$1" /jmatch /f "yourFile.csv" /o -
The /F option specifies the file to read.
The /O option with value of - specifies that the output should replace the original file.
The /JMATCH option specifies that each replacement value is written out to a new line. All other text is dropped.
The first argument is the search expression. It matches any line that begins with a comma, and everything after that is captured in a variable named $1.
The second argument specifies the replacement value, which is simply the captured value in variable $1.
A way will be to define all your rules in a variable which will be used against
findstr. The rules must be defined like this :
/c:"String which exclude the line" /c:"Another string which exclude the Line" /c: "etc.."
This rules must be exact (That they can't be found in a line who must stay).
For the empty first colonne you can use a substitution the way i made it in the code with
,Type=Type
,Invoice=Invoice
Test.bat :
#echo off&cls
setlocal enabledelayedexpansion
Rem The rules
set $String_To_Search=/c:"ABB - Egypt," /c:"Total ElAin El-Sokhna," /c:"ElAin EL-Sokhna," /c:"ABB - Egypt - Other,"
for /f "delims=" %%a in (test.csv) do (
set $line=%%a
Rem the substitutions for the first Column
set $Line=!$Line:,Type=Type!
set $line=!$Line:,Invoice=Invoice!
Rem the test and the ouput if nothing was found
echo !$Line! | findstr /i %$String_To_Search% >nul || echo !$Line!
))>Output.csv
I used a file test.csv for my test.
The ouput is redirected to Output.csv
Perhaps is this what you want?
#echo off
setlocal EnableDelayedExpansion
for /F "delims=" %%a in (input.csv) do (
set "line=%%a"
if "!line:~0,1!" equ "," echo !line:~1!
)
When a problem is not enough explained we can only guess the missing details. In this case, I assumed that you just want the lines that start with comma, deleting it. The output is the same as your output example...
EDIT: Output example added
Type,Date,Num,Name,Memo,Member,Clr,Split,Alias,Value,Balance
Invoice,09-06-10,12005,ABB - EL-Sokhna,,Accounts Receivable,,Training Income,15000,,15000
Invoice,09-14-11,12005,ABB - EL-Sokhna,“ElAin EL-Sokhna“ Trainer for OTS Application: First two weeks,Training Income,,Accounts,,150001,0
I would start here to learn this: How can you find and replace text in a file using the Windows command-line environment?
It covers many details of substitution from Windows command line and many ways to do it, some requiring only what's built into Windows, and some requiring other downloadable software.
Magoo is right, more criteria is needed, but there might be enough information in the linked page for you to get past the main hurdles.
#ECHO OFF
SETLOCAL
(FOR /f "tokens=*delims=," %%a IN ('findstr /b /l "," q28079306.txt') DO ECHO %%a)>newfile.txt
GOTO :EOF
I used a file named q28079306.txt containing your data for my testing.
Produces newfile.txt
I want to split a string in two parts, without using any for loop.
For example, I have the string in a variable:
str=45:abc
I want to get 45 in a variable and abc in another variable. Is it possible in batch file?
pattern is like somenumber:somestring
You could split the str with different ways.
The for loop, you don't want use it.
The trailing part is easy with the * (match anything until ...)
set "var2=%str:*:=%"
The leading part can be done with a nasty trick
set "var1=%str::="^&REM #%
The caret is needed to escape the ampersand,
so effectivly the colon will be replaced by "&REM #
So in your case you got the line after replacing
set "var1=4567"&REM #abcde
And this is splitted into two commands
set "var1=4567"
REM #abcde`
And the complete code is here:
set "str=4567:abcde"
echo %str%
set "var1=%str::="^&REM #%
set "var2=%str:*:=%"
echo var1=%var1% var2=%var2%
Edit 2: More stable leading part
Thanks Dave for the idea to use a linefeed.
The REM technic isn't very stable against content with quotes and special characters.
But with a linefeed trick there exists a more stable version which also works when the split argument is longer than a single character.
#echo off
setlocal enableDelayedExpansion
set ^"str=456789#$#abc"
for /F "delims=" %%a in (^"!str:#$#^=^
!^") do (
set "lead=%%a"
goto :break
)
:break
echo !lead!
Solution 3: Adpated dbenhams answer
Dbenham uses in his solution a linefeed with a pipe.
This seems a bit over complicated.
As the solution uses the fact, that the parser removes the rest of the line after an unescaped linefeed (when this is found before or in the special character phase).
At first the colon character is replaced to a linefeed with delayed expansion replacement.
That is allowed and the linefeed is now part of the variable.
Then the line set lead=%lead% strips the trailing part.
It's better not to use the extended syntax here, as set "lead=%lead%" would break if a quote is part of the string.
setlocal enableDelayedExpansion
set "str=45:abc"
set ^"lead=!str::=^
!"
set lead=%lead%
echo "!lead!"
You can try this . If its fixed that numbers to left of the colon will be always 2 & to the right will be 3. Then following code should work assuming your str has the value.
set "str=45:abc"
echo %str%
set var1=%str:~0,2%
set var2=%str:~3,3%
echo %var1% %var2%
Keep me posted. :)
It seems pointless to avoid using a FOR loop, but it does make the problem interesting.
As jeb has pointed out, getting the trailing part is easy using !str:*:=!.
The tricky bit is the leading part. Here is an alternative to jeb's solution.
You can insert a linefeed into a variable in place of the : using the following syntax
setlocal enableDelayedExpansion
set "str=45:abc"
echo !str::=^
!
--OUTPUT--
45
abc
The empty line above the last ! is critical.
I'm not sure why, but when the output of the above is piped to a command, only the first line is preserved. So the output can be piped to a FINDSTR that matches any line, and that result directed to a file that can then be read into a variable using SET /P.
The 2nd line must be eliminated prior to using SET /P because SET /P does not recognize <LF> as a line terminator - it only recognizes <CR><LF>.
Here is a complete solution:
#echo off
setlocal enableDelayedExpansion
set "str=45:abc"
echo(!str::=^
!|findstr "^" >test.tmp
<test.tmp set /p "var1="
del test.tmp
set "var2=!str:*:=!"
echo var1=!var1! var2=!var2!
Update
I believe I've mostly figured out why the 2nd line is stripped from the output :)
It has to do with how pipes are handled by Windows cmd.exe with each side being processed by a new CMD.EXE thread. See Why does delayed expansion fail when inside a piped block of code? for a related question with a great answer from jeb.
Just looking at the left side of the piped command, I believe it is parsed (in memory) into a statement that looks like
C:\Windows\system32\cmd.exe /S /D /c" echo {delayedExpansionExpression}"
I use {delayedExpansionExpression} to represent the multi-line search and replace expansion that has not yet occurred.
Next, I think the variable expression is actually expanded and the line is broken in two by the search and replace:
C:\Windows\system32\cmd.exe /S /D /c" echo 43
abc"
Only then is the command executed, and by normal cmd.exe rules, the command ends at the linefeed. The quoted command string is missing the end quote, but the parser doesn't care about that.
The part I am still puzzled by is what happens to the abc"? I would have thought that an attempt would be made to execute it, resulting in an error message like 'abc"' is not recognized as an internal or external command, operable program or batch file. But instead it appears to simply get lost in the ether.
note - jeb's 3rd comment explains why :)
Safe version without FOR
My original solution will not work with a string like this & that:cats & dogs. Here is a variation without FOR that should work with nearly any string, except for string length limits and trailing control chars will be stripped from leading part.
#echo off
setlocal enableDelayedExpansion
set "str=this & that:cats & dogs"
set ^"str2=!str::=^
!^"
cmd /v:on /c echo ^^!str2^^!|findstr /v "$" >test.tmp
<test.tmp set /p "var1="
del test.tmp
set "var2=!str:*:=!"
echo var1=!var1! var2=!var2!
I delay the expansion until the new CMD thread, and I use a quirk of FINDSTR regex that $ only matches lines that end with <cr>. The first line doesn't have it and the second does. The /v option inverts the result.
Yes, I know this is a very old topic, but I just discovered it and I can't resist the temptation of post my solution:
#echo off
setlocal
set "str=45:abc"
set "var1=%str::=" & set "var2=%"
echo var1="%var1%" var2="%var2%"
You may read full details of this method here.
In the Light of people posting all sorts of methots for splitting variables here i might as well post my own method, allowing for not only one but several splits out of a variable, indicated by the same symbol, which is not possible with the REM-Method (which i used for some time, thanks #jeb).
With the method below, the string defined in the second line is split into three parts:
setlocal EnableDelayedExpansion
set fulline=one/two/three or/more
set fulline=%fulline%//
REM above line prevents unexpected results when input string has less than two /
set line2=%fulline:*/=%
set line3=%line2:*/=%
set line1=!fulline:/%line2%=!
set line2=!line2:/%line3%=!
setlocal DisableDelayedExpansion
echo."%line1%"
echo."%line2%"
echo."%line3%"
OUTPUT:
"one"
"two"
"three or/more//"
i recommend using the last so-created partition of the string as a "bin" for the remaining "safety" split-characters.
Here's a solution without nasty tricks for leading piece
REM accepts userID#host
setlocal enableDelayedExpansion
set "str=%1"
set "host=%str:*#=%"
for /F "tokens=1 delims=#" %%F IN ("%str%") do set "user=%%F"
echo user#host = %user%#%host%
endlocal