Organizing ".bash_history" into ".bash_history.cmd" category files - arrays

My script loads a "categories" array with chosen terminal commands.
This array is then used to yank matching ".bash_history" records into
separate category files. My function: "extract_records()" extracts each
category using an ERE grep:
BASH_HISTORY="$HOME"/.bash_history
...
# (where "$1" here is a category)
grep -E "^($1 )" "$BASH_HISTORY" >> ".bash_history.$1"
Once the chosen records are grepped from "$BASH_HISTORY" into individual
category files, they are then removed from "$BASH_HISTORY". This is done
using "grep -v -e" patterns where the category list is re-specified.
My script works but a potential problem exists: the list of history
command keywords is defined twice, once in the array and then in a grep
pattern list. Excerpted from the script:
#!/usr/bin/bash--------------------------------------------------
# original array definition.
categories=(apt cat dpkg echo file find git grep less locate)
...
for i in "${categories[#]}"; do
extract_records "$i" # which does the grep -E shown above.
done
...
# now remove what has been categorized to separate files.
grep -E -v \
-e "^(apt )" \
-e "^(cat )" \
-e "^(dpkg )" \
... \
"$BASH_HISTORY" >> "$BASH_HISTORY".$$
# finally the temporary "$$" file is optionally sorted and moved
# back as the main "$BASH_HISTORY".
The first part calls extract_records() each time to grep and create
each category file. The second part uses a single grep to remove
records using a pattern list, re-specified based on the array.
PROBLEM: Potentially, the two independent lists can be mismatched.
Optimally, the array: "${categories[#]}" should be used for each part:
extracting chosen records, and then rebuilding "$BASH_HISTORY" without
the separated records. This would replace my now using the "grep -E -v"
pattern list. Something of the sort:
grep -E -v "^(${categories[#]})" "$BASH_HISTORY"
It's nice and compact, but this does not work.
The goal is to divide out oft used terminal commands into separate files
so as to keep "$BASH_HISTORY" reasonably small. The separately saved
records can then be recalled using another script that functions like
the Bash's internal history facility. In this way, no history is lost
and everything is grouped and better managed.

Related

exclude stdout based on an array of words

Question:
Is it possible to exclude some of a commands output its output based on an array of words?**
Why?
On Ubuntu/Debian there are two ways to list all available pkgs:
apt list (show all available pkgs, installed pkgs too)
apt-cache search . (show all available pkgs, installed pkgs too)
Difference is, the first command, you can exclude all installed pkgs using grep -v, problem is unlike the first, the second command you cannot exclude these as the word "installed" isnt present. Problem with the first command is it doesnt show the pkg description, so I want to use apt-cache search . but excluding all installed pkgs.
# List all of my installed pkgs,
# Get just the pkg's name,
# Swap newlines for spaces,
# Save this list as an array for exclusion.
INSTALLED=("$(apt list --installed 2> /dev/null | awk -F/ '{print $1}' | tr '\r\n' ' ')")
I then tried:
apt-cache search . | grep -v "${INSTALLED[#]}"
Unfortunately this doesnt work as I still see my installed pkgs too so I'm guessing its excluding the first pkg in the array and not the rest.
Again thank you in advance!
Would you please try the following:
#!/bin/bash
declare -A installed # an associative array to memorize the "installed" command names
while IFS=/ read -r cmd others; do # split the line on "/" and assign the variable "cmd" to the 1st field
(( installed[$cmd]++ )) # increment the array value associated with the "$cmd"
done < <(apt list --installed 2> /dev/null) # excecute the `apt` command and feed the output to the `while` loop
while IFS= read -r line; do # read the whole line "as is" because it includes a package description
read -r cmd others <<< "$line" # split the line on the whitespace and assign the variable "cmd" to the 1st field
[[ -z ${installed[$cmd]} ]] && echo "$line" # if the array value is not defined, the cmd is not installed, then print the line
done < <(apt-cache search .) # excecute the `apt-cache` command to feed the output to the `while` loop
The associative array installed is used to check if the command is
installed.
The 1st while loop scans over the installed list of the command and
store the command names in the associative array installed.
The 2nd while loop scans over the available command list and if the
command is not found in the associative array, then print it.
BTW your trial code starts with #!/bin/sh which is run with sh, not bash.
Please make sure it looks like #!/bin/bash.
Sorry if I'm misunderstanding, but you just want to get the list of packages which are not installed, right?
If so, you can just do this -
apt list --installed=false | grep -v '\[installed'

Creating an array from JSON data of file/directory locations; bash not registering if element is a directory

I have a JSON file which holds data like this: "path/to/git/directory/location": "path/to/local/location". A minimum example of the file might be this:
{
"${HOME}/dotfiles/.bashrc": "${HOME}/.bashrc",
"${HOME}/dotfiles/.atom/": "${HOME}/.atom/"
}
I have a script that systematically reads the above JSON (called locations.json) and creates an array, and then prints elements of the array that are directories. MWE:
#!/usr/bin/env bash
unset sysarray
declare -A sysarray
while IFS=: read -r field data
do
sysarray["${field}"]="${data}"
done <<< $(sed '/^[{}]$/d;s/\s*"/"/g;s/,$//' locations.json)
for file in "${sysarray[#]}"
do
if [ -d "${file}" ]
then
echo "${file}"
fi
done
However, this does not print the directory (i.e., ${HOME}/.atom).
I don't understand why this is happening, because
I have tried creating an array manually (i.e., not from a JSON) and checking if its elements are directories, and that works fine.
I have tried echoing each element in the array into a temporary file and reading each line in the file to see if it was a product of how the information was stored in the array, but no luck.
I have tried adding | tr -d "[:blank:]" | tr -d '\"' after using sed on the JSON (to see if it was a product of unintended whitespace or quotes), but no luck.
I have tried simply running [ -d "${HOME}/.atom/" ] && echo '.atom is a directory', and that works (so indeed it is a directory). I'm unsure what might be causing this.
Help on this would be great!
You could use a tool to process json files properly, which will deal with any valid json.
#!/usr/bin/env bash
unset sysarray
declare -A sysarray
while IFS="=" read -r field data
do
sysarray["${field}"]=$(eval echo "${data}")
done <<< $(jq -r 'keys[] as $k | "\($k)=\(.[$k])"' locations.json)
for file in "${sysarray[#]}"
do
if [ -d "${file}" ]
then
echo "${file}"
fi
done
Another problem is that, once the extra quote signs are properly processed, we have a literal ${HOME} that is not expanded. The only solution I came is using eval to force the expansion. It is not the nicest way, but right now I cannot find a better solution.

Bash help tallying/parsing substrings

I have a shell script I wrote a while back, that reads a word list (HITLIST), and recursively searches a directory for all occurrences of those words. Each line containing a "hit" is appended to file (HITOUTPUT).
I have used this script a couple of times over the last year or so, and have noticed that we often get hits from frequent offenders, and that it would be nice if we kept a count of each "super-string" that is triggered, and automatically remove repeat offenders.
For instance, if my word list contains "for" I might get a hundred hits or so for "foreign" or "form" or "force". Instead of validating each of these lines, it would be nice to simply wipe them all with one "yes/no" dialog per super-string.
I was thinking the best way to do this would be to start with a word from the hitlist, and record each unique occurrence of the super-string for that word (go until you are book-ended by why space) and go from there.
So on to the questions ...
What would be a good and efficient way to do this? My current idea
was to read in the file as a string, perform my counts, remove
repeat offenders from the the file input string, and output, but this is
proving to be a little more painful that I first suspected.
Would any specific data type/structure be preferred for this type of
work?
I have also thought about building the super-string count as I
create the HitOutput file, but I could not figure out a clean way of
doing this either. Any thoughts or suggestions?
A sample of the file I am reading in, and my code for reading in and traversing the hitlist and creating the HitOutput file below:
# Loop through hitlist list
while read -re hitlist || [[ -n "$hitlist" ]]
do
# If first character is "#" it's a comment, or line is blank, skip
if [ "$(echo $hitlist | head -c 1)" != "#" ]; then
if [ ! -z "$hitlist" -a "$histlist" != "" ]; then
# Parse comma delimited hitlist
IFS=',' read -ra categoryWords <<< "$hitlist"
# Search for occurrences/hits for each hit
for categoryWord in "${categoryWords[#]}"; do
# Append results to hit output string
eval 'find "$DIR" -type f -print0 | xargs -0 grep -HniI "$categoryWord"' >> HITOUTPUT
done
fi
fi
done < "$HITLIST"
src/fakescript.sh:1:Never going to win the war you mother!
src/open_source_licenses.txt:6147:May you share freely, never taking more than you give.
src/open_source_licenses.txt:8764:May you share freely, never taking more than you give.
src/open_source_licenses.txt:21711:No Third Party Beneficiaries. You agree that, except as otherwise expressly provided in this TOS, there shall be no third party beneficiaries to this Agreement. Waiver and Severability of Terms. The failure of UBM LLC to exercise or enforce any right or provision of the TOS shall
not constitute a waiver of such right or provision. If any provision of the TOS is found by a court of competent jurisdiction to be invalid, the parties nevertheless agree that the court should endeavor to give effect to the parties' intentions as reflected in the provision, and the other provisions of the TOS remain in full force and effect.
src/fakescript.sh:1:Never going to win the war you mother!
An example of my hitlist file:
# Comment out any category word lines that you do not want processed (the comma delimited lines)
# -----------------
# MEH
never,going,to give,you up
# ----------------
# blah
word to,your,mother
Let's divide this problem into two parts. First, we will update the hitlist interactively as requires by your customer. Second, we will find all matches to the updated hitlist.
1. Updating the hitlist
This searches for all words in files under directory dir that contain any word on the hitlist:
#!/bin/bash
grep -Erowhf <(sed -E 's/.*/([[:alpha:]]+&[[:alpha:]]*|[[:alpha:]]*&[[:alpha:]]+)/' hitlist) dir |
sort |
uniq -c |
while read n word
do
read -u 2 -p "$word occurs $n times. Include (y/n)? " a
[ "$a" = y ] && echo "$word" >>hitlist
done
This script runs interactively. As an example, suppose that dir contains these two files:
$ cat dir/file1.txt
for all foreign or catapult also cat.
The catapult hit the catermaran.
The form of a foreign formula
$ cat dir/file2.txt
dog and cat and formula, formula, formula
And hitlist contains two words:
$ cat hitlist
for
cat
If we then run our script, it looks like:
$ bash script.sh
catapult occurs 2 times. Include (y/n)? y
catermaran occurs 1 times. Include (y/n)? n
foreign occurs 2 times. Include (y/n)? y
form occurs 1 times. Include (y/n)? n
formula occurs 4 times. Include (y/n)? n
After the script is run, the file hitlist is updated with all the words that you want to include. We are now ready to proceed to the next step:
2. Finding matches to the updated hitlist
To read each word from a "hitlist" and search recursively for matches while ignoring, foreign even if the hitlist contains for, try:
grep -wrFf ../hitlist dir
-w tells grep to look only for full-words. Thus foreign will be ignored.
-r tells grep to search recursively.
-F tells grep to treat the hitlist as word, not regular expressions. (optional)
-f ../hitlist tells grep to read words from the file ../hitlist.
Following on with the example above, we would have:
$ grep -wrFf ./hitlist dir
dir/file2.txt:dog and cat and formula, formula, formula
dir/file1.txt:for all foreign or catapult also cat.
dir/file1.txt:The catapult hit the catermaran.
dir/file1.txt:The form of a foreign formula
If we don't want the file names displayed, use the -h option:
$ grep -hwrFf ./hitlist dir
dog and cat and formula, formula, formula
for all foreign or catapult also cat.
The catapult hit the catermaran.
The form of a foreign formula
Automatic update for counts 10 or less
#!/bin/bash
grep -Erowhf <(sed -E 's/.*/([[:alpha:]]+&[[:alpha:]]*|[[:alpha:]]*&[[:alpha:]]+)/' hitlist) dir |
sort |
uniq -c |
while read n word
do
a=y
[ "$n" -gt 10 ] && read -u 2 -p "$word occurs $n times. Include (y/n)? " a
[ "$a" = y ] && echo "$word" >>hitlist
done
Reformatting the customer's hitlist
I see that your customer's hitlist has extra formatting, including comments, empty lines, and duplicated words. For example:
$ cat hitlist.source
# MEH
never,going,to give,you up
# ----------------
# blah
word to,your,mother
To convert that to format useful here, try:
$ sed -E 's/#.*//; s/[[:space:],]+/\n/g; s/\n\n+/\n/g; /^$/d' hitlist.source | grep . | sort -u >hitlist
$ cat hitlist
give
going
mother
never
to
up
word
you
your

How do I store the output from a find command in an array? + bash

I have the following find command with the following output:
$ find -name '*.jpg'
./public_html/github/screencasts-gh-pages/reactiveDataVis/presentation/images/telescope.jpg
./public_html/github/screencasts-gh-pages/introToBackbone/presentation/images/telescope.jpg
./public_html/github/StarCraft-master/img/Maps/(6)Thin Ice.jpg
./public_html/github/StarCraft-master/img/Maps/Snapshot.jpg
./public_html/github/StarCraft-master/img/Maps/Map_Grass.jpg
./public_html/github/StarCraft-master/img/Maps/(8)TheHunters.jpg
./public_html/github/StarCraft-master/img/Maps/(2)Volcanis.jpg
./public_html/github/StarCraft-master/img/Maps/(3)Trench wars.jpg
./public_html/github/StarCraft-master/img/Maps/(8)BigGameHunters.jpg
./public_html/github/StarCraft-master/img/Maps/(8)Turbo.jpg
./public_html/github/StarCraft-master/img/Maps/(4)Blood Bath.jpg
./public_html/github/StarCraft-master/img/Maps/(2)Switchback.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(6)Thin Ice.jpg
./public_html/github/StarCraft-master/img/Maps/Original/Map_Grass.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)TheHunters.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(2)Volcanis.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(3)Trench wars.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)BigGameHunters.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(8)Turbo.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(4)Blood Bath.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(2)Switchback.jpg
./public_html/github/StarCraft-master/img/Maps/Original/(4)Orbital Relay.jpg
./public_html/github/StarCraft-master/img/Maps/(4)Orbital Relay.jpg
./public_html/github/StarCraft-master/img/Bg/GameLose.jpg
./public_html/github/StarCraft-master/img/Bg/GameWin.jpg
./public_html/github/StarCraft-master/img/Bg/GameStart.jpg
./public_html/github/StarCraft-master/img/Bg/GamePlay.jpg
./public_html/github/StarCraft-master/img/Demo/Demo.jpg
./public_html/github/flot/examples/image/hs-2004-27-a-large-web.jpg
./public_html/github/minicourse-ajax-project/other/GameLose.jpg
How do I store this output in an array? I want it to handle filenames with spaces
I have tried this arrayname=($(find -name '*.jpg')) but this just stores the first element. # I am doing the following which seems to be just the first element?
$ arrayname=($(find -name '*.jpg'))
$ echo "$arrayname"
./public_html/github/screencasts-gh-pages/reactiveDataVis/presentation/images/telescope.jpg
$
I have tried here but again this just stores the 1st element
Other similar Qs
How do I capture the output from the ls or find command to store all file names in an array?
How do i store the output of a bash command in a variable?
If you know with certainty that your filenames will not contain newlines, then
mapfile -t arrayname < <(find ...)
If you want to be able to handle any file
arrayname=()
while IFS= read -d '' -r filename; do
arrayname+=("$filename")
done < <(find ... -print0)
echo "$arrayname" will only show the first element of the array. It is equivalent to echo "${arrayname[0]}". To dump an array:
printf "%s\n" "${arrayname[#]}"
# ............^^^^^^^^^^^^^^^^^ must use exactly this form, with the quotes.
arrayname=($(find ...)) is still wrong. It will store the file ./file with spaces.txt as 3 separate elements in the array.
If you have a sufficiently recent version of bash, you can save yourself a lot of trouble by just using a ** glob.
shopt -s globstar
files=(**/*.jpg)
The first line enables the feature. Once enabled, ** in a glob pattern will match any number (including 0) of directories in the path.
Using the glob in the array definition makes sure that whitespace is handled correctly.
To view an array in a form which could be used to define the array, use the -p (print) option to the declare builtin:
declare -p files

script for getting extensions of a file

I need to get all the file extension types in a folder. For instance, if the directory's ls gives the following:
a.t
b.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
I should get this by running the script
.t
.t.pg
.bin
.old
.txt
I have a bash shell.
Thanks a lot!
See the BashFAQ entry on ParsingLS for a description of why many of these answers are evil.
The following approach avoids this pitfall (and, by the way, completely ignores files with no extension):
shopt -s nullglob
for f in *.*; do
printf '%s\n' ".${f#*.}"
done | sort -u
Among the advantages:
Correctness: ls behaves inconsistently and can result in inappropriate results. See the link at the top.
Efficiency: Minimizes the number of subprocess invoked (only one, sort -u, and that could be removed also if we wanted to use Bash 4's associative arrays to store results)
Things that still could be improved:
Correctness: this will correctly discard newlines in filenames before the first . (which some other answers won't) -- but filenames with newlines after the first . will be treated as separate entries by sort. This could be fixed by using nulls as the delimiter, or by the aforementioned bash 4 associative-array storage approach.
try this:
ls -1 | sed 's/^[^.]*\(\..*\)$/\1/' | sort -u
ls lists files in your folder, one file per line
sed magic extracts extensions
sort -u sorts extensions and removes duplicates
sed magic reads as:
s/ / /: substitutes whatever is between first and second / by whatever is between second and third /
^: match beginning of line
[^.]: match any character that is not a dot
*: match it as many times as possible
\( and \): remember whatever is matched between these two parentheses
\.: match a dot
.: match any character
*: match it as many times as possible
$: match end of line
\1: this is what has been matched between parentheses
People are really over-complicating this - particularly the regex:
ls | grep -o "\..*" | uniq
ls - get all the files
grep -o "\..*" - -o only show the match; "\..*" match at the first "." & everything after it
uniq - don't print duplicates but keep the same order
you can also sort if you like, but sorting doesn't match the example
This is what happens when you run it:
> ls -1
a.t
a.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
> ls | grep -o "\..*" | uniq
.t
.t.pg
.bin
.old
.txt

Resources