uuidgen on MacOS generates all uppercase letters - uuid

I am using uuidgen on MacOS and I get output like:
9404CF07-BBED-41F4-A81F-1FE2F04D2C9E
FFE3EF70-B04D-4614-A9AF-9A0828AF514C
7F433185-E0C4-4664-B841-AD5795751F6E
any reason why all the letters are capitalized? seems like with lowercase letters they could have an extra randomness of 26 in there?
If someone can tag this with uuidgen that'd be great.
It's a linux utility: http://man7.org/linux/man-pages/man1/uuidgen.1.html

An easy way to create an alias:
alias uuidgen='uuidgen | tr -d "\n" | tr "[:upper:]" "[:lower:]" | pbcopy; pbpaste'

A UUID is, at its heart, a 128-bit value. The standard representation of a UUID is as hexadecimal, split into groups at certain points by hyphens; the capitalization of any letters which appear in that value is not significant.
(Note that the only letters which can appear in a UUID are A through F. Other letters are not valid.)

Related

BASH: Parsing CSV into array using IFS last array element missing when is an empty string [duplicate]

To parse colon-delimited fields I can use read with a custom IFS:
$ echo 'foo.c:41:switch (color) {' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 41 | switch (color) {
If the last field contains colons, no problem, the colons are retained.
$ echo 'foo.c:42:case RED: //alert' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert
A trailing delimiter is also retained...
$ echo 'foo.c:42:case RED: //alert:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert:
...Unless it's the only extra delimiter. Then it's stripped. Wait, what?
$ echo 'foo.c:42:case RED:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED
Bash, ksh93, and dash all do this, so I'm guessing it is POSIX standard behavior.
Why does it happen?
What's the best alternative?
I want to parse the strings above into three variables and I don't want to mangle any text in the third field. I had thought read was the way to go but now I'm reconsidering.
Yes, that's standard behaviour (see the read specification and Field Splitting). A few shells (ash-based including dash, pdksh-based, zsh, yash at least) used not to do it, but except for zsh (when not in POSIX mode), busybox sh, most of them have been updated for POSIX compliance.
That's the same for:
$ var='a:b:c:' IFS=:
$ set -- $var; echo "$#"
3
(see how the POSIX specification for read actually defers to the Field Splitting mechanism where a:b:c: is split into 3 fields, and so with IFS=: read -r a b c, there are as many fields as variables).
The rationale is that in ksh (on which the POSIX spec is based) $IFS (initially in the Bourne shell the internal field separator) became a field delimiter, I think so any list of elements (not containing the delimiter) could be represented.
When $IFS is a separator, one can't represent a list of one empty element ("" is split into a list of 0 element, ":" into a list of two empty elements¹). When it's a delimiter, you can express a list of zero element with "", or one empty element with ":", or two empty elements with "::".
It's a bit unfortunate as one of the most common usages of $IFS is to split $PATH. And a $PATH like /bin:/usr/bin: is meant to be split into "/bin", "/usr/bin", "", not just "/bin" and "/usr/bin".
Now, with POSIX shells (but not all shells are compliant in that regard), for word splitting upon parameter expansion, that can be worked around with:
IFS=:; set -o noglob
for dir in $PATH""; do
something with "${dir:-.}"
done
That trailing "" makes sure that if $PATH ends in a trailing :, an extra empty element is added. And also that an empty $PATH is treated as one empty element as it should be.
That approach can't be used for read though.
Short of switching to zsh, there's no easy work around other than inserting an extra : and remove it afterwards like:
echo a:b:c: | sed 's/:/::/2' | { IFS=: read -r x y z; z=${z#:}; echo "$z"; }
Or (less portable):
echo a:b:c: | paste -d: - /dev/null | { IFS=: read -r x y z; z=${z%:}; echo "$z"; }
I've also added the -r which you generally want when using read.
Most likely here you'd want to use a proper text processing utility like sed/awk/perl instead of writing convoluted and probably inefficient code around read which has not been designed for that.
¹ Though in the Bourne shell, that was still split into zero elements as there was no distinction between IFS-whitespace and IFS-non-whitespace characters there, something that was also added by ksh
One "feature" of read is that it will strip leading and trailing whitespace separators in the variables it populates - it is explained in much more detail at the linked answer. This enables beginners to have read do what they expect when doing for example read first rest <<< ' foo bar ' (note the extra spaces).
The take-away? It is hard to do accurate text processing using Bash and shell tools. If you want full control it's probably better to use a "stricter" language like for example Python, where split() will do what you want, but where you might have to dig much deeper into string handling to explicitly remove newline separators or handle encoding.

unix file utility: magic syntax

I would like to create a custom magic file for the file utility, but I'm having a really hard time understanding the syntax described in man magic.
I need to test several places, each of which can contain several strings. Only if all the tests succeed would it print a file type.
To summarize, I would like a test similar to this if it were fields in an SQL database:
( byte_0 = "A" OR byte_0 = "B" OR byte_0 = "C" )
AND
( byte_1_to_3 = "DEF" OR byte_1_to_3 = "GHI" OR byte_1_to_3 = "JKL" )
Or in Perl regexp syntax:
m/^
[ABC]
(DEF|GHI|JKL)
/x
file has its own syntax, with hundreds of examples. If the documentation is unclear, you should start by reading examples which are close to your intended changes. That's what I did with ncurses for example, in the terminfo magic-file, to describe the Solaris xcurses header as a sequence of strings:
# Rather than SVr4, Solaris "xcurses" writes this header:
0 regex \^MAX=[0-9]+,[0-9]+$
>1 regex \^BEG=[0-9]+,[0-9]+$
>2 regex \^SCROLL=[0-9]+,[0-9]+$
>3 regex \^VMIN=[0-9]+$
>4 regex \^VTIME=[0-9]+$
>5 regex \^FLAGS=0x[[:xdigit:]]+$
>6 regex \^FG=[0-9],[0-9]+$
>7 regex \^BG=[0-9]+,[0-9]+, Solaris xcurses screen image
#
but without the insight gained by reading this example,
0 string \032\001
# 5th character of terminal name list, but not Targa image pixel size (15 16 24 32)
>16 ubyte >32
# namelist, if more than 1 separated by "|" like "st|stterm| simpleterm 0.4.1"
>>12 regex \^[a-zA-Z0-9][a-zA-Z0-9.][^|]* Compiled terminfo entry "%-s"
the manual page was not (as you report) clear enough that file processes a numbered series of steps in sequence.

BASH grep result as array name

Heyhey,
since this is my first post please be patient :) I try my best.
I try to "grep" the language out of my system (osx) and take this as a string name to set a language.
I've got some strings called $en['a' 'b' 'c'], $de['d' 'e' 'f'] and $fr['g' 'h' 'i'] somewhere...
I use:
language=$(locale | grep LANG= | cut -d'"' -f2 | cut -d_ -f1)
which gives me a ISO value like en, fr, de, ...
Here comes my main problem. I just can't just use ${language[*]}.
It feels like I tried everything. Already doing try an error with {} () '' and $.
Only thing I found out while debugging is language results in
language=ISO (so this works correct)
and if I try to get this value as my desired string
echo ${language[*]} , ${language[0]} , ${language[1]}
results in
ISO , ISO ,
which is not correct. It seems like I'm creating a new string but I want to use the existing ones.
Don't know any more keywords to google :(
These all use array syntax:
echo ${language[*]} , ${language[0]} , ${language[1]}
But the way you created the variable, this is not an array, this is a simple variable:
language=$(locale | grep LANG= | cut -d'"' -f2 | cut -d_ -f1)
To access its value, use simply $language, for example:
echo $language
Also, the pipeline with locale, grep and cut gets the first two characters of the LANG variable in a very inefficient way.
You can get it more efficiently using a substring:
language=${LANG:0:2}
If you want to use an array (though I don't see the point here), then you must put parentheses around the values to assign, for example:
language=(${LANG:0:2})
This array in this example has one element, you can access its value like this:
echo ${language[0]}
Note that the syntax of Bash is very strict with respect to symbols and spaces, every little detail may make a big difference, so it's important to write accurately.
You can paste your scripts to shellcheck.net to check for trivial errors.
You can read more about arrays in man bash.

awk make it less system dependant

If I'm not mistaken, awk parses a number depending on the OS language (eg,echo "1,2" | awk '{printf("%f\n",$1)}' would be interpreted as 1 in an english system and as 1.2 in a system where a comma separates the integer from the decimal part).
I don't know if the C printf does this too, so I added the C tag.
I would like to modify the previous command so that it returns the same value (1.2) regardless of the system being used.
Welcome to the ugliness of locale. To fix your problem, first set the locale to the C one.
export LC_NUMERIC=C
echo "1,2" | awk '...your code...'
To turn off other locale-dependent tomfoolery, you can
export LC_ALL=C
If you're using gawk, you can use the --use-lc-numeric option.
$ LC_NUMERIC=de_DE.UTF-8 awk 'BEGIN {printf("%f\n", "1,2")}'
1.000000
$ LC_NUMERIC=de_DE.UTF-8 awk --use-lc-numeric 'BEGIN {printf("%f\n", "1,2")}'
1,200000
From the
GAWK manual
The POSIX standard says that awk always uses the period as the decimal
point when reading the awk program source code, and for command-line
variable assignments (see Other Arguments). However, when interpreting
input data, for print and printf output, and for number to string
conversion, the local decimal point character is used. Here are some
examples indicating the difference in behavior, on a GNU/Linux system:
$ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3.14159
$ LC_ALL=en_DK gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3,14159
$ echo 4,321 | gawk '{ print $1 + 1 }'
-| 5
$ echo 4,321 | LC_ALL=en_DK gawk '{ print $1 + 1 }'
-| 5,321
The ‘en_DK’ locale is for English in Denmark, where the comma acts as
the decimal point separator. In the normal "C" locale, gawk treats
‘4,321’ as ‘4’, while in the Danish locale, it's treated as the full
number, 4.321.
Some earlier versions of gawk fully complied with this aspect of the
standard. However, many users in non-English locales complained about
this behavior, since their data used a period as the decimal point, so
the default behavior was restored to use a period as the decimal point
character. You can use the --use-lc-numeric option (see Options) to
force gawk to use the locale's decimal point character. (gawk also
uses the locale's decimal point character when in POSIX mode, either
via --posix, or the POSIXLY_CORRECT environment variable.)
I get similar behavior from /usr/bin/printf
$ LC_NUMERIC=de_DE.UTF-8 /usr/bin/printf "%f\n" "1,2"
/usr/bin/printf: 1,2: value not completely converted
1,000000
$ LC_NUMERIC=de_DE.UTF-8 /usr/bin/printf "%f\n" "1.2"
1,200000
But without the ability to override it.
If your intent is to do the opposite, that is to take "European" input and
output "US" numbers, you're going to need to use something more robust. Possible
Python or Perl with their locale modules.

How to 'cut' on null?

Unix 'file' command has a -0 option to output a null character after a filename. This is supposedly good for using with 'cut'.
From man file:
-0, --print0
Output a null character ‘\0’ after the end of the filename. Nice
to cut(1) the output. This does not affect the separator which is
still printed.
(Note, on my Linux, the '-F' separator is NOT printed - which makes more sense to me.)
How can you use 'cut' to extract a filename from output of 'file'?
This is what I want to do:
find . "*" -type f | file -n0iNf - | cut -d<null> -f1
where <null> is the NUL character.
Well, that is what I am trying to do, what I want to do is get all file names from a directory tree that have a particular MIME type. I use a grep (not shown).
I want to handle all legal file names and not get stuck on file names with colons, for example, in their name. Hence, NUL would be excellent.
I guess non-cut solutions are fine too, but I hate to give up on a simple idea.
Just specify an empty delimiter:
cut -d '' -f1
(N.B.: The space between the -d and the '' is important, so that the -d and the empty string get passed as separate arguments; if you write -d'', then that will get passed as just -d, and then cut will think you're trying to use -f1 as the delimiter, which it will complain about, with an error message that "the delimiter must be a single character".)
This works with gnu awk.
awk 'BEGIN{FS="\x00"}{print$1}'
ruakh's helpful answer works well on Linux.
On macOS, the cut utility doesn't accept '' as a delimiter argument (bad delimiter):
Here is a portable workaround that works on both platforms, via the tr utility; it only makes one assumption:
The input mustn't contain \1 control characters (START OF HEADING, U+0001) - which is unlikely in text.
You can substitute any character known not to occur in the input for \1; if it's a character that can be represented verbatim in a string, that simplifies the solution because you won't need the aux. command substitution ($(...)) with a printf call for the -d argument.
If your shell supports so-called ANSI C-quoted strings - which is true of bash, zsh and ksh - you can replace "$(printf '\1')" with $'\1'
(The following uses a simpler input command to demonstrate the technique).
# In zsh, bash, ksh you can simplify "$(printf '\1')" to $'\1'
$ printf '[first field 1]\0[rest 1]\n[first field 2]\0[rest 2]' |
tr '\0' '\1' | cut -d "$(printf '\1')" -f 1
[first field 1]
[first field 2]
Alternatives to using cut:
C. Paul Bond's helpful answer shows a portable awk solution.

Resources