repl.bat : Using Regular Expressions in the Replace Parameter - batch-file

I have a question with regards to using the 'repl' batch command, specifically it's replace parameter.
After taking the time to read the documentation... :-)... and some testing, it seems that regular expressions can't be used in the replace parameter.
"type file.txt | repl "Jacob is alive. He lives.\n" "Betty lives.\nGo Betty." M >file.txt.new"
This will do a literal replace using the characters '.' & '\n' rather than inserting a new line. Is it true regular expressions cannot be used in the [replace] parameter of repl.bat? If not, do you know of a way to achieve this behavior? Thanks ahead of time!

Extracted from the repl.bat /? information
M - Multi-line mode. The entire contents of stdin is read and
processed in one pass instead of line by line, thus enabling
search for \n. This also enables preservation of the original
line terminators. If the M option is not present, then every
printed line is termiated with carriage return and line feed.
The M option is incompatible with the A option unless the S
option is also present.
X - Enables extended substitution pattern syntax with support
for the following escape sequences within the Replace string:
\\ - Backslash
\b - Backspace
\f - Formfeed
\n - Newline
\q - Quote
\r - Carriage Return
\t - Horizontal Tab
\v - Vertical Tab
\xnn - Extended ASCII byte code expressed as 2 hex digits
\unnnn - Unicode character expressed as 4 hex digits
So, your repl command options should be MX instead of only M

Related

JREPL.bat regex replacing inside quotes

Im using JREPL.BAT to find and replace specific instances and my regex I have works for find and replace in VSC code and also in the couple regex editors I've used.
CALL ./framework/config/JREPL.BAT "(Error)+\(([^()]*|\(([^()]*|\([^()]*\))*\))*\)" "Error(\"\")" /f ./dist/index.html /o
so what I'm expecting is it to find any case of
Error("")
or
Error( skjdksjdskd() + "" + )
etc
Find and replace works perfectly but jrepl takes
Error( skjdksjdskd() + "" + )
and changes it to
Error()( skjdksjdskd() + "" + )
does anyone know with more JREPL experience know why its ignoring the quotes and also not replacing the () area?
JREPL is hybrid JScript/batch that uses CSCRIPT - the Windows script host.
CSCRIPT has an inherent limitation that prevents double quote literals from being passed as parameters - there is no CSCRIPT escape sequence that includes a " literal.
To include a " literal in your query string, you can use \x22 instead. All of the standard JScript escape sequences can be used in the query string. By default, escape sequences are not recognized in the replace string.
But you want a quote literal in your replace string. This requires the /XSEQ option so you can use the JREPL extended escape sequence of \q. A significant advantage of this option is you can also use the extended escape sequences in the replace string. You could also use \x22 for both the search and replace strings if you prefer, but I find \q much easier to remember.
You have one other potential problem - the CALL command doubles all quoted carets, so [^()] (any character other than ( or )) becomes [^^()] (any character other than ^, ( or )). This is definitely not what you want. That is the reason I added the \c = ^ extended escape sequence.
So I believe the following will give your expected result:
CALL .\framework\config\JREPL.BAT "(Error)+\(([\c()]*|\(([\c()]*|\([\c()]*\))*\))*\)" "Error(\q\q)" /xseq /f .\dist\index.html /o -
FYI - The effect of the ^ beginning of string anchor is not harmed by caret doubling - you don't need the \c escape sequence for the beginning of string anchor because "^MatchStringBeginning" and "^^MatchStringBeginning" yield identical regex results.
You can get more information about the extended escape sequences by issuing jrepl /?/xseq, or jrepl /??/xseq for paged help.
>jrepl /?/xseq
/XSEQ - Enables extended escape sequences for both Search strings and
Replacement strings, with support for the following sequences:
\\ - Backslash
\b - Backspace
\c - Caret (^)
\f - Formfeed
\n - Newline
\q - Quote (")
\r - Carriage Return
\t - Horizontal Tab
\v - Vertical Tab
\xnn - Extended ASCII byte code expressed as 2 hex digits nn.
The code is mapped to the correct Unicode code point,
depending on the chosen character set. If used within
a Find string, then the input character set is used. If
within a Replacement string, then the output character
set is used. If the selected character set is invalid or
not a single byte character set, then \xnn is treated as
a Unicode code point. Note that extended ASCII character
class ranges like [\xnn-\xnn] should not be used because
the intended range likely does not map to a contiguous
set of Unicode code points - use [\x{nn-mm}] instead.
\x{nn-mm} - A range of extended ASCII byte codes for use within
a regular expression character class expression. The
The min value nn and max value mm are expressed as hex
digits. The range is automatically expanded into the
full set of mapped Unicode code points. The character
set mapping rules are the same as for \xnn.
\x{nn,CharSet} - Same as \xnn, except explicitly uses CharSet
character set mapping.
\x{nn-mm,CharSet} - Same as \x{nn-mm}, except explicitly uses
CharSet character set mapping.
\unnnn - Unicode code point expressed as 4 hex digits nnnn.
\u{N} - Any Unicode code point where N is 1 to 6 hex digits
JREPL automatically creates an XBYTES.DAT file containing all 256
possible byte codes. The XBYTES.DAT file is preferentially created
in "%ALLUSERSPROFILE%\JREPL\" if at all possible. Otherwise the
file is created in "%TEMP%\JREPL\" instead. JREPL uses the file
to establish the correct \xnn byte code mapping for each character
set. Once created, successive runs reuse the same XBYTES.DAT file.
If the file gets corrupted, then use the /XBYTES option to force
creation of a new XBYTES.DAT file. If JREPL cannot create the file
for any reason, then JREPL silently defaults to using pre v7.4
behavior where /XSEQ \xnn is interpreted as Windows-1252. Creation
of XBYTES.DAT requires either CERTUTIL.EXE or ADO. It is possible
that both may be missing from an XP machine.
Without the /XSEQ option, only standard JSCRIPT escape sequences
\\, \b, \f, \n, \r, \t, \v, \xnn, \unnnn are available for the
search strings. And the \xnn sequence represents a unicode
code point, not extended ASCII.
Extended escape sequences are supported even when the /L option
is used. Both Search and Replace support all of the extended
escape sequences if both the /XSEQ and /L options are combined.
Extended escape sequences are not applied to JScript code when
using any of the /Jxxx options. Use the decode() function if
extended escape sequences are needed within the code.
Final Answer for this is to escape the quotes and backslashes as \" AND \\ when using CALL in Webpack-shell-plugin.
'call "./framework/config/JREPL.BAT" \"(Error)\\(([\\c()]*|\\(([\\c()]*|\\([\\c()]*\\))*\\))*\\)\" \"Error(\\q\\q)\" /xseq /f ./dist/index.html /o ./dist/indexFinal.html'

What is the utility of escape sequence '\'?

In the below code snippet , how is '\' behaving ?
printf("hii\"); // This line gives error : missing terminating " character
printf("hii\ n"); // This line prints hii n
I am unable to get how this escape sequence is behaving here ,Please explain .
An escape sequence isn't the single \ character; it's that followed by another character. For example, \" is an escape sequence, as is \n. Under some circumstances you can see more than a single character following the backslash all as the same escape code; this has to do with how the characters are represented internally (ASCII or Unicode value) and can be safely ignored for now.
An escape sequence is used to write a character that is inconvenient/impossible to put into the code directly. For example, \" is the escape sequence for a quotation mark. It is like putting a quote inside the string, which you couldn't otherwise do because it would instead close the string literal. Look at the syntax highlighting of your question to see what I mean; most of the first line is considered part of the string, because you never have an unescaped closing quote.
The most common escape sequence is perhaps \n. Unlike with \", it doesn't just produce a literal n in the string; you could do that without an escape. Instead it produces a newline. The code
printf("hii\nthere");
prints
hii
there
to the screen.
The second line of code in your question uses the escape sequence \ (backslash space). Thisis not a standard escape sequence; if you compile with warnings your compiler will probably report that it's ignoring it or something.
(If you want to actually print a backslash to the screen, you need to escape a backslash, using \\)

C: Ignoring a specific character, while using fscanf

As an example I have a text file that includes this text: "name?"
I want to save this String only as name?
I tried ("%["]"), but this doesn't work.
Which function should I use?
The scanf and fscanf functions work exactly the same. Your format is however wrong.
Try instead e.g. "\"%[^\"]\"" as your format.
The first and last " is to mark the start and end of the string. Inside the string one can't use plain double-quote as that will end the string. So these have to be escaped using the backslash.
If we break down the format string into its three main components:
\" - This matches the literal double-quote
%[^\"] - This matches a string not containing the double-quote (the negation is what the ^ does)
Lastly \" again, to match the end quote of your input

What do \t and \b do?

I expect this simple line of code
printf("foo\b\tbar\n");
to replace "o" with "\t" and to produce the following output
fo bar
(assuming that tab stop occurs every 8 characters).
On the contrary I get
foo bar
It seems that my shell interprets \b as "move the cursors one position back" and \t as "move cursor to the next tab stop". Is this behaviour specific to the shell in which I'm running the code? Should I expect different behaviour on different systems?
No, that's more or less what they're meant to do.
In C (and many other languages), you can insert hard-to-see/type characters using \ notation:
\a is alert/bell
\b is backspace/rubout
\n is newline
\r is carriage return (return to left margin)
\t is tab
You can also specify the octal value of any character using \0nnn, or the hexadecimal value of any character with \xnn.
EG: the ASCII value of _ is octal 137, hex 5f, so it can also be typed \0137 or \x5f, if your keyboard didn't have a _ key or something. This is more useful for control characters like NUL (\0) and ESC (\033)
As someone posted (then deleted their answer before I could +1 it), there are also some less-frequently-used ones:
\f is a form feed/new page (eject page from printer)
\v is a vertical tab (move down one line, on the same column)
On screens, \f usually works the same as \v, but on some printers/teletypes, it will
go all the way to the next form/sheet of paper.
Backspace and tab both move the cursor position. Neither is truly a 'printable' character.
Your code says:
print "foo"
move the cursor back one space
move the cursor forward to the next tabstop
output "bar".
To get the output you expect, you need printf("foo\b \tbar"). Note the extra 'space'. That says:
output "foo"
move the cursor back one space
output a ' ' (this replaces the second 'o').
move the cursor forward to the next tabstop
output "bar".
Most of the time it is inappropriate to use tabs and backspace for formatting your program output. Learn to use printf() formatting specifiers. Rendering of tabs can vary drastically depending on how the output is viewed.
This little script shows one way to alter your terminal's tab rendering. Tested on Ubuntu + gnome-terminal:
#!/bin/bash
tabs -8
echo -e "\tnormal tabstop"
for x in `seq 2 10`; do
tabs $x
echo -e "\ttabstop=$x"
done
tabs -8
echo -e "\tnormal tabstop"
Also see man setterm and regtabs.
And if you redirect your output or just write to a file, tabs will quite commonly be displayed as fewer than the standard 8 chars, especially in "programming" editors and IDEs.
So in otherwords:
printf("%-8s%s", "foo", "bar"); /* this will ALWAYS output "foo bar" */
printf("foo\tbar"); /* who knows how this will be rendered */
IMHO, tabs in general are rarely appropriate for anything. An exception might be generating output for a program that requires tab-separated-value input files (similar to comma separated value).
Backspace '\b' is a different story... it should never be used to create a text file since it will just make a text editor spit out garbage. But it does have many applications in writing interactive command line programs that cannot be accomplished with format strings alone. If you find yourself needing it a lot, check out "ncurses", which gives you much better control over where your output goes on the terminal screen. And typically, since it's 2011 and not 1995, a GUI is usually easier to deal with for highly interactive programs. But again, there are exceptions. Like writing a telnet server or console for a new scripting language.
The C standard (actually C99, I'm not up to date) says:
Alphabetic escape sequences representing nongraphic characters in the execution character set are intended to produce actions on display devices as follows:
\b (backspace) Moves the active position to the previous position on the current line. [...]
\t (horizontal tab) Moves the active position to the next horizontal tabulation position on the current line. [...]
Both just move the active position, neither are supposed to write any character on or over another character. To overwrite with a space you could try: puts("foo\b \tbar"); but note that on some display devices - say a daisy wheel printer - the o will show the transparent space.
This behaviour is terminal-specific and specified by the terminal emulator you use (e.g. xterm) and the semantics of terminal that it provides. The terminal behaviour has been very stable for the last 20 years, and you can reasonably rely on the semantics of \b.
\t is the tab character, and is doing exactly what you're anticipating based on the action of \b - it goes to the next tab stop, then gets decremented, and then goes to the next tab stop (which is in this case the same tab stop, because of the \b.

How do I replace a word in a text file with another word and an action?

hi all
Suppose we have a text file (file1.txt)
file1.txt contains many words and spaces and enter characters (cR+LF).
I wanna to replace a specific word that follows with an enter character and replace it with only that word. I mean eliminating cr+lf character.
How ?
Thank you
i assume you're asking about how to do it programmatically.
LF and CR are characters and as such they have an ascii code assigned (10,13). you'll need to load the text file, copy it to a new buffer word by word and whenever you encounter the word you want to replace - check whether it is followed by 10,13 and just don't copy those characters if so.
then write the new buffer back to the file.
Use of regular expressions should make short work of this:
replace word\r\n with word
How this is exactly done depends on your environment / editor / tools. You mentioned cf + lf, which hints that you're using Windows.
If you use Notepad++ for example, it has builtin regex support and you can use these facilities to obtain your goal.
Update: I have tried this variant it works:
Download Vim for Windows.
Open your file in Vim.
In it, issue the following command:
%s/\v([[:digit:]]+NPN[[:alpha:]]+)\n/\1 /g
Explanation:
%s - work for all lines
\v - easier regex manipulation regarding backslashes
([[:digit:]]+NPN[[:alpha:]]+) - match some digits, then NPN, then letters and capture this
\n - match end of line
\1 - replace everything with first group and two spaces
g - do this many times for each line (this is basically optional)
If you want to convert CRLF to LF:
sed 's/.$//' # assumes that all lines end with CR/LF
If you want to remove CRLF altogether
cat file1.txt | tr '\n' ' ' # join the lines with a space
cat file1.txt | tr -d '\n' # join the lines without a space
You might have to convert the line endings to unix (CRLF to LF) first and then do the translation.

Resources