unix file utility: magic syntax

unix file utility: magic syntax - file

I would like to create a custom magic file for the file utility, but I'm having a really hard time understanding the syntax described in man magic.
I need to test several places, each of which can contain several strings. Only if all the tests succeed would it print a file type.
To summarize, I would like a test similar to this if it were fields in an SQL database:
( byte_0 = "A" OR byte_0 = "B" OR byte_0 = "C" )
AND
( byte_1_to_3 = "DEF" OR byte_1_to_3 = "GHI" OR byte_1_to_3 = "JKL" )
Or in Perl regexp syntax:
m/^
[ABC]
(DEF|GHI|JKL)
/x

file has its own syntax, with hundreds of examples. If the documentation is unclear, you should start by reading examples which are close to your intended changes. That's what I did with ncurses for example, in the terminfo magic-file, to describe the Solaris xcurses header as a sequence of strings:
# Rather than SVr4, Solaris "xcurses" writes this header:
0 regex \^MAX=[0-9]+,[0-9]+$
>1 regex \^BEG=[0-9]+,[0-9]+$
>2 regex \^SCROLL=[0-9]+,[0-9]+$
>3 regex \^VMIN=[0-9]+$
>4 regex \^VTIME=[0-9]+$
>5 regex \^FLAGS=0x[[:xdigit:]]+$
>6 regex \^FG=[0-9],[0-9]+$
>7 regex \^BG=[0-9]+,[0-9]+, Solaris xcurses screen image
#
but without the insight gained by reading this example,
0 string \032\001
# 5th character of terminal name list, but not Targa image pixel size (15 16 24 32)
>16 ubyte >32
# namelist, if more than 1 separated by "|" like "st|stterm| simpleterm 0.4.1"
>>12 regex \^[a-zA-Z0-9][a-zA-Z0-9.][^|]* Compiled terminfo entry "%-s"
the manual page was not (as you report) clear enough that file processes a numbered series of steps in sequence.

Related

sh - appending 0's to file names according to the max

I am trying to make a file sorter. In the current directory I have files named like this :
info-0.jpg
info-12.jpg
info-40.jpg
info-5.jpg
info-100.jpg
I want it to become
info-000.jpg
info-012.jpg
info-040.jpg
info-005.jpg
info-100.jpg
That is, append 0's so that the number of digits is equal to 3, because the max number was 100 and had 3 digits.
I would like to use cut and wc by doing a loop on each of the file names, If $1 is "info", for i in $1-*.jpg, but how. Thanks
I did this to start but get a syntax error
wcount=0
for i in $filename-*.jpg; do
wcount=$((echo $i | wc -c))
done

for f in info*.jpg ; do
numPart=${f%.*} ; #dbg echo numPart1=$numPart;
numPart=${numPart#*-}; #dbg echo numPart2=$numPart;
newFilename="${f%-*}"-$(printf '%03d' "$numPart")."${f##*.}"
echo /bin/mv "$f" "$newFilename"
done
The key is using printf with formatting that forces the width to 3 digits wide, and includes 0 padding; the printf "%03d" "$numPart" portion of the script.
Also the syntax ${f%.*} is a set of features offered by modern shells to remove parts of a variables value, where % means match (and destroy) the minimal match from the right side of the value, and ${numPart#*-} means match (and destroy) the minimal match from the left side of the value. There are also %% (maximum match from right) and ## (maximum match from left). Experiment with a variable on your command line get comfortable with this.
Triple check the output of this code in your environment and only when sure all mv commands look correct, remove the echo in front of /bin/mv.
If you get an error message like Can't find /bin/mv, then enter type mv and replace /bin/ with whatever path is returned for mv.
IHTH

Reading lines from each file in an array -- condition never succeeds

I'm trying to integrate a cat command into a for loop with the cat reading the element '$currentccoutput' but it seems (I think) that cat is reading the line literally rather than understanding that it's an array element with the name of a txt file.
#create an array of text files
currentccoutputs=($currentccfolder'/'*.txt*)
#basic for loop until I can get my cat command working
for currentccoutput in "${currentccoutputs[#]}"; do
cat "$currentccoutput" | while read LINE; do
# I have .txt files with three numbers per line
# that I would like to read / use
IFS=' ' read C1 C2 threshold
if [ $C1 != $C2 ] && [ $threshold \> 0.2 ]; then
echo "Huzzah!!!! Progress at last"
fi
done < "$currrentccoutput" # I don't know what
# this backwards chevron
# does but other people
# have used it...
done
I've no doubt there are other imperfections with this snippet but I'm entirely new to creating scripts so I'm trying to keep things within the realms of what I know for now and hopefully sophisticated solutions will come later. (for now, I'm trying to get from island A to island B, where a few bits of wood and some hemp rope will be both understood and replicable. Whilst I appreciate advice on - and hope one day to build - a decent frigate, right now it might leave me somewhat confused).
I've never even used 'while' 'read' or 'LINE', I've pinched it from someone else's solution.
I have used the echo command to ensure it's not my paths that are wrong, just that I'm not using cat correctly.

The only problem with how you're using cat is that you're overriding it with a (much better) shell-builtin redirection. That fine -- in fact, it's preferable; you shouldn't use cat unless you absolutely must.[1]
What is a problem is that you're running read LINE and then read C1 C2 threshold after each other, both coming from the same source.
This means that you read the first line of each file into the variable LINE (which your code never looks at again), and the second line into the variables C1, C2 and threshold. If there are more lines, you read the third into LINE, the fourth into C1/C2/threshold, etc.
If you don't want to skip every other line (starting at the first one), just take out the read LINE entirely, making your code something like:
#!/usr/bin/env bash
case $BASH_VERSION in '') echo "ERROR: This script must be run with bash" >&2; exit 1;; esac
currentccoutputs=( "$currentccfolder"/*.txt )
for currentccoutput in "${currentccoutputs[#]}"; do
while IFS=$' \t\r' read -r c1 c2 threshold; do
if [ "$c1" != "$c2" ] && [ "$(bc -l <<<"$threshold > 0.2")" = 1 ]; then
echo "Huzzah!!!! Progress at last: c1=$c1; c2=$c2; threshold=$threshold"
fi
done < "$currentccoutput"
done
See:
BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
BashFAQ #22 - How can I calculate with floating point numbers instead of just integers? (describing the bc idiom used above)
BashFAQ #24 - I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read? (describing why cat | while read is a Bad Idea)
[1] - Yes, this means you should ignore many if not most of the examples of bash code you find online. Sturgeon's Law applies.

Bash - how to ignore first delimiter of each line?

I have a file BookDB.txt which stores information in the following manner :
C++ for dummies:Jared:10.67:4:5
Java for dummies:David:10.45:3:6
PHP for dummies:Sarah:10.47:2:7
How do I ignore the first delimiter of each line and add the first 2 fields into an array? (Refer to example below).
Assuming that at runtime, the script asks the user for the variables TITLE and AUTHOR respectively. How would I then store the combined fields into an array?
Eg :
ARRAY=('C++ for dummies:Jared' 'Java for dummies:David' 'PHP for dummies:Sarah')
ARRAY=($TITLE:$AUTHOR)

This is very similar to your other question, and it would have been beneficial for you to link it.
My answer there can be modified to handle this quite easily.
IFS=$'\n'; arr=( $(awk -F':' '{print $1 ":" $2 }' Input.txt ) )
Note that there is no need to ignore the first delimiter to solve this problem. It suffices to acknowledge it and incorporate two fields instead of one.

Sorting by unique values of multiple fields in UNIX shell script

I am new to unix and would like to be able to do the following but am unsure how.
Take a text file with lines like:
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
And output this:
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
TR=P567;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=lowell
TR=P234;dir=o;day=su;TI=12:10;stn=westborough;Line=worcester
I would like the script to be able to find all all the lines for each TR value that have a unique Line value.
Thanks

Since you are apparently O.K. with randomly choosing among the values for dir, day, TI, and stn, you can write:
sort -u -t ';' -k 1,1 -k 6,6 -s < input_file > output_file
Explanation:
The sort utility, "sort lines of text files", lets you sort/compare/merge lines from files. (See the GNU Coreutils documentation.)
The -u or --unique option, "output only the first of an equal run", tells sort that if two input-lines are equal, then you only want one of them.
The -k POS[,POS2] or --key=POS1[,POS2] option, "start a key at POS1 (origin 1), end it at POS2 (default end of line)", tells sort where the "keys" are that we want to sort by. In our case, -k 1,1 means that one key consists of the first field (from field 1 through field 1), and -k 6,6 means that one key consists of the sixth field (from field 6 through field 6).
The -t SEP or --field-separator=SEP option tells sort that we want to use SEP — in our case, ';' — to separate and count fields. (Otherwise, it would think that fields are separated by whitespace, and in our case, it would treat the entire line as a single field.)
The -s or --stabilize option, "stabilize sort by disabling last-resort comparison", tells sort that we only want to compare lines in the way that we've specified; if two lines have the same above-defined "keys", then they're considered equivalent, even if they differ in other respects. Since we're using -u, that means that means that one of them will be discarded. (If we weren't using -u, it would just mean that sort wouldn't reorder them with respect to each other.)

How to 'cut' on null?

Unix 'file' command has a -0 option to output a null character after a filename. This is supposedly good for using with 'cut'.
From man file:
-0, --print0
Output a null character ‘\0’ after the end of the filename. Nice
to cut(1) the output. This does not affect the separator which is
still printed.
(Note, on my Linux, the '-F' separator is NOT printed - which makes more sense to me.)
How can you use 'cut' to extract a filename from output of 'file'?
This is what I want to do:
find . "*" -type f | file -n0iNf - | cut -d<null> -f1
where <null> is the NUL character.
Well, that is what I am trying to do, what I want to do is get all file names from a directory tree that have a particular MIME type. I use a grep (not shown).
I want to handle all legal file names and not get stuck on file names with colons, for example, in their name. Hence, NUL would be excellent.
I guess non-cut solutions are fine too, but I hate to give up on a simple idea.

Just specify an empty delimiter:
cut -d '' -f1
(N.B.: The space between the -d and the '' is important, so that the -d and the empty string get passed as separate arguments; if you write -d'', then that will get passed as just -d, and then cut will think you're trying to use -f1 as the delimiter, which it will complain about, with an error message that "the delimiter must be a single character".)

This works with gnu awk.
awk 'BEGIN{FS="\x00"}{print$1}'

ruakh's helpful answer works well on Linux.
On macOS, the cut utility doesn't accept '' as a delimiter argument (bad delimiter):
Here is a portable workaround that works on both platforms, via the tr utility; it only makes one assumption:
The input mustn't contain \1 control characters (START OF HEADING, U+0001) - which is unlikely in text.
You can substitute any character known not to occur in the input for \1; if it's a character that can be represented verbatim in a string, that simplifies the solution because you won't need the aux. command substitution ($(...)) with a printf call for the -d argument.
If your shell supports so-called ANSI C-quoted strings - which is true of bash, zsh and ksh - you can replace "$(printf '\1')" with $'\1'
(The following uses a simpler input command to demonstrate the technique).
# In zsh, bash, ksh you can simplify "$(printf '\1')" to $'\1'
$ printf '[first field 1]\0[rest 1]\n[first field 2]\0[rest 2]' |
tr '\0' '\1' | cut -d "$(printf '\1')" -f 1
[first field 1]
[first field 2]
Alternatives to using cut:
C. Paul Bond's helpful answer shows a portable awk solution.