How to escape semicolon in C system function - c

I am calling a utility program installed on Unix from a C program using system(). The input to call the utility program has arguments separated by semicolons, as below:
snprintf(buffer, sizeof(buffer), ". /path/to/program/env.sh && utilityname command WKS#%s\;at=%s", strmnm,dte);
system(buffer);
The issue is that the arguments after the semicolon are being ignored/treated as next command. I tried escaping with '\' as above \; but it is not working.

system invokes the shell. On Unix that's most commonly either a derivative of sh or a derivative of csh. POSIX specifies only sh (thanks #KeithThompson). Both families of shells treat unescaped ; as a command separator.
To escape a character according to shell rules, one would precede it with a backslash, or enclose it in quotes. C has its own rules regarding quotes and backslashes in strings, so more backslashes are usually needed. Single quotes have a nice property of needing no backslashes in C strings.
So any of these should work:
"..... ';' ....."
"..... \";\" ....."
"..... \\; ....."

Related

Improperly terminated macro invocation when using # before the replacement text

I am new to C so this question might be dumb. Nevertheless, I was trying to play around with macro.
I was using # to make the replacement text into a quoted string and utilizing the feature that the macro will automatically add escape characters where appropriate.
In VS2019 the following code gives me an error saying improperly terminated macro invocation. Why is that?
#include <stdio.h>
#define toStr(x) #x
main(){
printf(toStr(the "\" is the escape character));
}
As #alinsoar mentioned in their comment, the text string is not properly written. The escape character, '\', escapes the text following according to the rules. The result is that when you have "\" what you have specified is that the quote following the backslash is being escaped which results in the text string not having a terminating quote. See How to print out a slash (/ or \) in C?
To actually print the escape character you need to escape it as in "\\" which will result in a single escape character being printed and the text string is also properly closed with a terminating quote.
See C Preprocessor, Stringify the result of a macro which provides information about the Stringify operator of the C Preprocessor, which is what the # for macro expansion does.
You should review the various rules for the escape character and its usage in C. This list for C++ is pretty much the same as for C. https://en.cppreference.com/w/cpp/language/escape
See as well Explain Backslash in C

What is the utility of escape sequence '\'?

In the below code snippet , how is '\' behaving ?
printf("hii\"); // This line gives error : missing terminating " character
printf("hii\ n"); // This line prints hii n
I am unable to get how this escape sequence is behaving here ,Please explain .
An escape sequence isn't the single \ character; it's that followed by another character. For example, \" is an escape sequence, as is \n. Under some circumstances you can see more than a single character following the backslash all as the same escape code; this has to do with how the characters are represented internally (ASCII or Unicode value) and can be safely ignored for now.
An escape sequence is used to write a character that is inconvenient/impossible to put into the code directly. For example, \" is the escape sequence for a quotation mark. It is like putting a quote inside the string, which you couldn't otherwise do because it would instead close the string literal. Look at the syntax highlighting of your question to see what I mean; most of the first line is considered part of the string, because you never have an unescaped closing quote.
The most common escape sequence is perhaps \n. Unlike with \", it doesn't just produce a literal n in the string; you could do that without an escape. Instead it produces a newline. The code
printf("hii\nthere");
prints
hii
there
to the screen.
The second line of code in your question uses the escape sequence \ (backslash space). Thisis not a standard escape sequence; if you compile with warnings your compiler will probably report that it's ignoring it or something.
(If you want to actually print a backslash to the screen, you need to escape a backslash, using \\)

Single Quotes or No quotes in file paths in Unix shells

I am new to Unix systems and trying to learn some thing with help of terminal. I have following question in my mind. If we can write filepath without single quotes in terminal (for ex : mv path1 path2) then why we sometime use single quotes to specify paths. What is the difference between these two?
This is not a question of the operating system, but of the shell you use. You can actually chose what shell you want to use on a unixoid system if multiple are installed (which usually is the case).
In general the shell has to interpret the input you make. It has to decide how to handle the tokens of the input. What to consider as the "command" you want to execute, what as arguments. For the arguments it has to decide if the string is meant as a single argument or multiple arguments.
Without quotes (single or double quotes), whitespace characters are considered separators between words, words are typically considered separate arguments. So you can specify multiple arguments for a single command. If that is not desired then you can use quote characters to group multiple words separated by whitespace characters into a single argument, for example a folder name containing a space character. This works because now the shell knows that you want everything following the quote character to be considered as a single argument up to the next matching quote character (actually except escaped ones...).
It's used to escape spaces in file names, otherwise, a backslash is needed. For instance:
$ rm spaces\ in\ file\ name
$ rm 'spaces in file name'
If your file path does not have spaces, it's probably safe to omit the quotes.

cmd - sed command input parse error

I'm translating a GNU Makefile into MSBuild XML.
I have the following sed command (part of a larger command):
... | sed 's/^Q(.*)/"&"/' | ...
When I execute just that sed portion in Cygwin, it "works" in the sense that it doesn't error out.
However, after I've translated it to MSBuild XML - replaced the XML-sensitive symbols with ", &, &apos; - I get the error
sed: unsupported command '
I'm sure it's not an issue with XML escaping issues; the Visual Studio build log says
Task Parameter:Command="C:\Program Files\GNU ARM Eclipse\Build Tools\2.8-201611221915\bin\sed" 's/^Q(.*)/"&"/' (TaskId:21)
sed: unsupported command ' (TaskId:21)
The command ""C:\Program Files\GNU ARM Eclipse\Build Tools\2.8-201611221915\bin\sed" 's/^Q(.*)/"&"/' " exited with code 1. (TaskId:21)
The command was translated into the originally intended sed 's/^Q(.*)/"&"/'
However, there now appears to be an issue with cmd.exe.
With respect to cmd.exe, what part of that command doesn't it like?
Is it the single/double quote? The ampersand?
I'm using the table from here.
The shell interpreters on Unix/Linux/Mac support at least three types of argument strings:
None quoted argument strings for simple strings not containing a space character or a character with a special meaning for shell interpreter.
" quoted argument strings for strings containing a space character or a character with a special meaning for shell interpreter not containing ".
' quoted argument strings for strings containing a space character or a character with a special meaning for shell interpreter not containing '.
The Windows command line interpreter supports only two types of argument strings.
None quoted argument strings for simple strings not containing a space character or one of these characters: &()[]{}^=;!'+,`~|<>.
" quoted argument strings for strings containing a space character or a character with a special meaning for Windows command interpreter not containing ".
For that reason a regular expression string as argument string containing " is problematic on Windows as there is no alternate solution for quoting the argument string as on *nix/Mac shells.
A solution for regular expressions in Perl syntax is using the hexadecimal notation \x22 for double quote character " in the regular expression.
So "s/^Q(.*)/\x22&\x22/" could work on Windows as well as on Linux instead of 's/^Q(.*)/"&"/'.
However, the command line argument strings are finally processed by the called application and it depends on the application and the used compiler how the application interprets the arguments. Therefore it depends always also on the application what finally works. See C code in this answer and the two different results on running this code compiled with three different compilers with a misquoted argument string. Another example for application dependent interpretation of argument string is explained in answer on How to set environment variables with spaces?
For that reason using just s/^Q(.*)/"&"/ without enclosing the regular expression argument string in the single quotes ' works for sed on Windows too.

Find and replace multi line file content in files

What I want to do is:
find some_files -name '*.html' -exec sed -i "s/`cat old`/`cat new`/g" {} \;
with old and new containing newline characters and slashes and other special characters, which prevent sed from parsing correctly.
I have read about how to escape newline characters with sed, and the command tr, the command printf '%q', but I can't make these work properly, maybe because I don't fully understand their function. Additionally, I don't know which special characters I still have to escape for sed to work.
I'm not sure what you want to do exactly, but if the old file contains newlines, you're probably going to run into trouble. That is because sed works by applying the commands on each line, so trying to match a line with a pattern that represents multiple lines will not work unless you load more lines explicitly.
My suggestion would be to load the whole file into sed's "buffer" before applying the substitute command. Then, you'd have to make sure that old and new are escaped correctly. Also, what could become more confusing is that escaping for the old file (the pattern) must be different than for the new file (the replacement).
Let's start by escaping the new file into a "new.tmp" file. For clarity, we'll create a sed script called "escape_new.sed":
#!/bin/sed -f
# Commas used as separators
s,\\,\\\\,g
s,$,\\,g
s,[/&],\\&,g
$ a/
Then run it: sed -f escape_new.sed new > new.tmp
There are three commands we use to escape:
Backslashes should be preceded by another backslash
Newlines should be preceded by a backslash (we do this by adding a backslash before the end of the line).
Ampersands and slashes should be preceded by a backslash (notice that the & at the replacement text is actually an operator that contains the match, therefore if it matches the slash it contains the slash, and if it matches the ampersand, it contains the ampersand).
On the last line (refered to with the "$" symbol), we append (through the "a" command) a slash. This is the closing slash for the substitute command we will be using later. We have to put it here because the backticks will remove any extra newlines at the end of the input, and that can cause problems (like for example a backslash used for quoting a newline actually quoting the terminating slash).
Now let's escape the old file. As above, we'll create an "escape_old.sed" script. Before we do it though, we need to load the whole file into the pattern space (sed's internal buffer) so we can replace newline characters. We can do that with the following commands:
: a
$! {
N
b a
}
The first command creates a label called "a". The second command ("{") actually starts a group of commands. The magic here is the "$!" address prefix. That prefix tells it to run the commands only if the last input line that was read wasn't the last line of the input ("$" means last line of the input and "!" means not). The first command in the group appends the next line from the input into the pattern space. If this "N" command is executed in the last line, it terminates the script, so we must be careful to not execute it on the last line. The second command in the group is a branch command, "b", which will "jump" back to the "a" label. The magic is the "$!" address prefix we have before the command. The closing bracket closes the group. This group, with its respective address prefix, allows us to loop through all of the lines, concatanting them together, and stop after the last line, allowing any further commands to be executed. We then have the final script:
#!/bin/sed -f
: a
$! {
N
b a
}
s,\\,\\\\,g
s,\n,\\n,g
s,[][/^$.],\\&,g
As above, we need to escape the special characters. In this case an actual newline is now escaped as a backslash followed by the letter n. In the last command, there are more characters that need to be prefixed by a backslash. Notice that to match a closing square-bracket, it needs to be the first character inside the square-brackets, to prevent sed from interpreting it as the closing character for our list of characters to match. Therefore, the characters that are listed in order between the square brackets are ][/^$. .
And again, we execute it with: sed -f escape_new.sed old > old.tmp
Now we can use these escaped files in the sed command, but again we must load all of the lines into pattern space. Using the same commands as before, but placing them into a single line we have the compact form: :a;$!{N;ba}: which we can now use in the final expression (without the closing slash character that is now on the new.tmp file):
find some_files -name '*.html' -exec sed -e ":a;\$!{N;ba};s/`cat old.tmp`/`cat new.tmp`g" -i {} \;
And hopefully it will work =)
Notice that we have escaped the $ symbol with a backslash, otherwise the shell will think that we are trying to access the $! variable (result of the last asynchronous command executed).

Resources