How to portability use "${#:2}"? - arrays

On Allow for ${#:2} syntax in variable assignment they say I should not use "${#:2}" because it breaks things across different shells, and I should use "${*:2}" instead.
But using "${*:2}" instead of "${#:2}" is nonsense because doing "${#:2}" is not equivalent to "${*:2}" as the following example:
#!/bin/bash
check_args() {
echo "\$#=$#"
local counter=0
for var in "$#"
do
counter=$((counter+1));
printf "$counter. '$var', ";
done
printf "\\n\\n"
}
# setting arguments
set -- "space1 notspace" "space2 notspace" "lastargument"; counter=1
echo $counter': ---------------- "$*"'; counter=$((counter+1))
check_args "$*"
echo $counter': ---------------- "${*:2}"'; counter=$((counter+1))
check_args "${*:2}"
echo $counter': ---------------- "${#:2}"'; counter=$((counter+1))
check_args "${#:2}"
-->
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
1: ---------------- "$*"
$#=1
1. 'space1 notspace space2 notspace lastargument',
2: ---------------- "${*:2}"
$#=1
1. 'space2 notspace lastargument',
3: ---------------- "${#:2}"
$#=2
1. 'space2 notspace', 2. 'lastargument',
If I cannot use "${#:2}" (as they say), what is the equivalent can I use instead?
This is original question Process all arguments except the first one (in a bash script) and their only answer to keep arguments with spaces together is to use "${#:2}"

There's context that's not clear in the question unless you follow the links. It's concerning the following recommendation from shellcheck.net:
local _help_text="${#:2}"
^––SC2124 Assigning an array to a string! Assign as array, or use * instead of # to concatenate.
Short answer: Don't assign lists of things (like arguments) to plain variables, use an array instead.
Long answer: Generally, "${#:2}" will get all but the first argument, with each treated as a separate item ("word"). "${*:2}", on the other hand, produces a single item consisting of all but the first argument stuck together, separated by a space (or whatever the first character of $IFS is).
But in the specific case where you're assigning to a plain variable, the variable is only capable of storing a single item, so var="${#:2}" also collapses the arguments down to a single item, but it does it in a less consistent way than "${*:2}". In order to avoid this, use something that is capable of storing multiple items: an array. So:
Really bad: var="${#:2}"
Slightly less bad: var="${*:2}"
Much better: arrayvar=("${#:2}") (the parentheses make this an array)
Note: to get the elements of the array back, with each one treated properly as a separate item, use "${arrayvar[#]}". Also, arrays are not supported by all shells (notably, dash doesn't support them), so if you use them you should be sure to use a bash shebang (#!/bin/bash or #!/usr/bin/env bash). If you really need portability to other shells, things get much more complicated.

Neither ${#:2} nor ${*:2} is portable, and many shells will reject both as invalid syntax. If you want to process all arguments except the first, you should get rid of the first with a shift.
first="${1}"
shift
echo The arguments after the first are:
for x; do echo "$x"; done
At this point, the first argument is in "$first" and the positional parameters are shifted down one.

This demonstrates how to combine all ${#} arguments into a single variable one without the hack ${#:1} or ${#:2} (live example):
#!/bin/bash
function get_all_arguments_as_single_one_unquoted() {
single_argument="$(printf "%s " "${#}")";
printf "unquoted arguments %s: '%s'\\n" "${#}" "${single_argument}";
}
function get_all_arguments_as_single_one_quoted() {
single_argument="${1}";
printf "quoted arguments %s: '%s'\\n" "${#}" "${single_argument}";
}
function escape_arguments() {
escaped_arguments="$(printf '%q ' "${#}")";
get_all_arguments_as_single_one_quoted "${escaped_arguments}";
get_all_arguments_as_single_one_unquoted ${escaped_arguments};
}
set -- "first argument" "last argument";
escape_arguments "${#}";
-->
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
quoted arguments 1: 'first\ argument last\ argument '
unquoted arguments 4: 'first\ argument last\ argument '
As #William Pursell answer points out, if you would like to get only {#:2} arguments, you can add a shift call before "${#}"
function escape_arguments() {
shift;
escaped_arguments="$(printf '%q ' "${#}")";
get_all_arguments_as_single_one_quoted "${escaped_arguments}";
get_all_arguments_as_single_one_unquoted ${escaped_arguments};
}
-->
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
quoted arguments 1: 'last\ argument '
unquoted arguments 2: 'last\ argument '

Related

How to use C code variable inside system()

I am using C code with sed. I want to read lines in the interval 1-10,11-20 etc. to perform some calculation.
int i,j,m,n;
for(i=0;i<10;i++){
j=i+1;
//correction. m,n is modified which was incorrect earlier.
m=i*10;
n=j*10;
system("sed -n 'm,n p' oldfile > newfile");
}
Ouput.
m,n p
It looks the variable is not passed in system. Is there any way to do that?
Use sprintf to build the command line:
char cmdline[100];
sprintf(cmdline, "sed -n '%d,%dp' oldfile.txt > newfile.txt", 10*i+1, 10*(i+1));
puts(cmdline); // optionally, verify manually it's going to do the right thing
system(cmdline);
(This is vulnerable to buffer overflow, but if your command-line arguments are not too flexible, 100 bytes should be enough.)
You cannot replace part of a string literal in C. What you need is to
Form a string with patterns
Replace those patterns with proper values with formatted I/O functions.
sprintf()/snprintf() will be your friend in this. You can do something like (copying from pmg's comment)
char cmd[100];
snprintf(cmd, 100, "sed -n '%d,%dp' oldfile > newfile", 10*i+1, 10*(i+1));
system(cmd);

How to let a user enter command line arguments in any order?

I have to write an encrypt/decrypt C program and to start off it needs to take 6 CL arguments. This is normally fine for me, but this time the argument order needs to not matter. The flags also always match the argument type. eg. -t will always be before a csv file.
For example, the following are all equivalent ways to run the program and will yield the same behavior:
./encrypt -t mappingfile.csv -m 1 -i words.txt
./encrypt -m 2 -i words.txt -t mappingfile.csv
./encrypt -m 1 -i words_to_encrypt.txt -t mappingfile.csv
I'm not exactly sure how to check for this, any info helps! Thanks!
If you don't want to use some other library and would rather handle everything yourself, you would want to run a loop to handle needed arguments.
Typically, if you are certain that a value will come after an argument, you can do something like this:
for( int i; i < argc; i++ ){
if( [ argument is equal to some tag ] ){
[ handle argument at args[i+1] ]
i++; // Skip the next arg
} else if ( [ argument is equal to next tag ] ) {
} // Use for any additional tags you need
}
You can add a check before handling the argument to insure that i+1 does not pass the bound of the args array. To check if the argument is equal you could use the <string.h> function strcmp() or write your own.
Handling the argument could be something as simple as copying the string into some other char[] array or maybe even remembering the index of the desired argument. That all depends on how you want to use it.
Looping through the tags like this means you will not need them to be in any particular order.
--- Hope my first SA answer wasn't too bad :)

Finding "main" functions' names in a C file via Bash script

I'm having a large number of C files, that are structured with the following principle:
All functions are declared in the C file and are with return type int, double or void.
All functions start with "ksz_". Only functions use this - nothing else uses "ksz_" in their names.
The file contains "main" functions. All supporting functions use their "main" function's name to form themselves.
Because they were made by different people they are quite messly made and have spaces placed at random places:
A rought visualization would be(note the spaces):
int ksz_Print(...)
{
...
}
void ksz_Print_Helper1 (... ){
...
}
void ksz_Print_Helper2(...) {
...
}
int ksz_Input(...){
...
}
double ksz_Input_Helper1 ( ...){
...
}
I need to find the "main" function names of each individual C file in order to use them for another seach algorithm.
Since these files are huge(sme of them have over a dozen thousand lines) and there are hundreds of them - I need a Bash scrip for this.
Ideally this script would extract only the "main" functions:
ksz_Print
ksz_Input
What stops me is that i can't think the Regex of my grep in order to extract the function lines. I think its logic should look like this:
(spaces)(int/float/double)(spaces)(ksz_)(other characers without space)(spaces)(open bracket)
After that I guess I'll extract the word containing "ksz_" from each line with cut(after trimming and removing duplicate spaces).
And last I'll need to find a way to filter out the supporting functions.
But what would be my initial grep in this script?
If I understand your specifications correctly this should do it:
root#local [~]# awk '/^[ \t]*(int|float|double)[ \t]+ksz_/ {print $2}' sample.txt
One thing I did not understand was whether there should only be one "_" after ksz so for example if "double ksz_Input_Helper1" is not something you want to match. In the regex above it does match.
I also chose to go with awk rather than grep as you said you want only the name the above awk prints only the second field using whitespace as a delimiter. If you still want to use grep this one does the same task:
root#local [~]# egrep '^\s*(int|float|double)\s+ksz_' sample.txt
Here is a breakdown(note in awk I use [ \t] in place of \s as I could not get it to recognize \s]:
^ - match start of line
\s* - match if there are 0 or more white spaces
(int|float|double) - match int, float, OR double
\s+ - match at least one whitespace
ksz_ - match literal string "ksz_"
Try using a regex that only matches the portion you want and only print that:
grep -oRE "(ksz_[a-zA-Z_]*\b)" *
-o - output only match
-R - recursive
-E - regex
[a-zA-Z_] - upper and lower case letters, underscore
\b - ending at word boundry

splittling a file into multiple with a delimiter awk

I am trying to split files evenly in a number of chunks. This is my code:
awk '/*/ { delim++ } { file = sprintf("splits/audio%s.txt", int(delim /2)); print >> file; }' < input_file
my files looks like this:
"*/audio1.lab"
0 6200000 a
6200000 7600000 b
7600000 8200000 c
.
"*/audio2.lab"
0 6300000 a
6300000 8300000 w
8300000 8600000 e
8600000 10600000 d
.
It is giving me an error: awk: line 1: syntax error at or near *
I do not know enough about awk to understand this error. I tried escaping characters but still haven't been able to figure it out. I could write a script in python but I would like to learn how to do this in awk. Any awkers know what I am doing wrong?
Edit: I have 14021 files. I gave the first two as an example.
For one thing, your regular expression is illegal; '*' says to match the previous character 0 or more times, but there is no previous character.
It's not entirely clear what you're trying to do, but it looks like when you encounter a line with an asterisk you want to bump the file number. To match an asterisk, you'll need to escape it:
awk '/\*/ { close(file); delim++ } { file = sprintf("splits/audio%d.txt", int(delim /2)); print >> file; }' < input_file
Also note %d is the correct format character for decimal output from an int.
idk what all the other stuff around this question is about but to just split your input file into separate output files all you need is:
awk '/\*/{close(out); out="splits/audio"++c".txt"} {print > out}' file
Since "repetition" metacharacters like * or ? or + can take on a literal meaning when they are the first character in a regexp, the regexp /*/ will work just fine in some (e.g. gawk) but not all awks and since you apparently have a problem with having too many files open you must not be using gawk (which manages files for you) so you probably need to escape the * and close() each output file when you're done writing to it. No harm doing that and it makes the script portable to all awks.

Regex to detect begining of the c function body

I working on a perl script that prints the required function body from the c source file. i have written a regex to get to the start of the function body as
(/(void|int)\s*($function_name)\s*\(.*?\)\s*{/s
but this works only for functions returning void or int(basic types)
how can i change this regex to handle user defined datatypes (struct or pointers)
Try this one (untested!), although it does expect the function to start at the beginning of a line :
/
^ # Start of line
\s*(?:struct\s+)[a-z0-9_]+ # return type
\s*\** # return type can be a pointer
\s*([a-z0-9_]+) # Function name
\s*\( # Opening parenthesis
(
(?:struct\s+) # Maybe we accept a struct?
\s*[a-z0-9_]+\** # Argument type
\s*(?:[a-z0-9_]+) # Argument name
\s*,? # Comma to separate the arguments
)*
\s*\) # Closing parenthesis
\s*{? # Maybe a {
\s*$ # End of the line
/mi # Close our regex and mark as case insensitive
You can squeeze all of these into a single line by removing the whitespace and comments.
Parsing code with a regex is generally hard though, and this regex is not perfect at all.

Resources