The US Naval Observatory has an API that outputs a JSON file containing the sunrise and sunset times, among other things, as documented here.
Here is an example of the output JSON file:
{
"error":false,
"apiversion":"2.0.0",
"year":2017,
"month":6,
"day":10,
"dayofweek":"Saturday",
"datechanged":false,
"lon":130.000000,
"lat":30.000000,
"tz":0,
"sundata":[
{"phen":"U", "time":"03:19"},
{"phen":"S", "time":"10:21"},
{"phen":"EC", "time":"10:48"},
{"phen":"BC", "time":"19:51"},
{"phen":"R", "time":"20:18"}],
"moondata":[
{"phen":"R", "time":"10:49"},
{"phen":"U", "time":"16:13"},
{"phen":"S", "time":"21:36"}],
"prevsundata":[
{"phen":"BC","time":"19:51"},
{"phen":"R","time":"20:18"}],
"closestphase":{"phase":"Full Moon","date":"June 9, 2017","time":"13:09"},
"fracillum":"99%",
"curphase":"Waning Gibbous"
}
I'm relatively new to using JSON, but I understand that everything in square brackets after "sundata" is a JSON array (please correct me if I'm wrong). So I searched for instructions on how to get a value from a JSON array, without success.
I have downloaded the file to my system using:
wget -O usno.json "http://api.usno.navy.mil/rstt/oneday?ID=iOnTheSk&date=today&tz=0&coords=30,130"
I need to extract the time (in HH:MM format) from this line:
{"phen":"S", "time":"10:21"},
...and then use it to create a variable (that I will later write to a separate file).
I would prefer to use Bash if possible, preferably using a JSON parser (such as jq) if it'll be easier to understand/implement. I'd rather not use Python (which was suggested by a lot of the articles I have read previously) if possible as I am trying to become more familiar with Bash specifically.
I have examined a lot of different webpages, including answers on Stack Overflow, but none of them have specifically covered an array line with two key/value pairs per line (they've only explained how to do it with only one pair per line, which isn't what the above file structure has, sadly).
Specifically, I have read these articles, but they did not solve my particular problem:
https://unix.stackexchange.com/questions/177843/parse-one-field-from-an-json-array-into-bash-array
Parsing JSON with Unix tools
Parse json array in shell script
Parse JSON to array in a shell script
What is JSON and why would I use it?
https://developers.squarespace.com/what-is-json/
Read the json data in shell script
Thanks in advance for any thoughts.
Side note: I have managed to do this with a complex 150-odd line script made up of "sed"s, "grep"s, "awk"s, and whatnot, but obviously if there's a one-liner JSON-native solution that's more elegant, I'd prefer to use that as I need to minimise power usage wherever possible (it's being run on a battery-powered device).
(Side-note to the side-note: the script was so long because I need to do it for each line in the JSON file, not just the "S" value)
If you already have jq you can easily select your desired time with:
sun_time=$(jq '.sundata[] | select(.phen == "S").time' usno.json)
echo $sun_time
# "10:21"
If you must use "regular" bash commands (really, use jq):
wget -O - "http://api.usno.navy.mil/rstt/oneday?ID=iOnTheSk&date=today&tz=0&coords=30,130" \
| sed -n '/^"sundata":/,/}],$/p' \
| sed -n -e '/"phen":"S"/{s/^.*"time":"//'\;s/...$//\;p}
Example:
$ wget -O - "http://api.usno.navy.mil/rstt/oneday?ID=iOnTheSk&date=today&tz=0&coords=30,130" | sed -n '/^"sundata":/,/}],$/p' | sed -n -e '/"phen":"S"/{s/^.*"time":"//'\;s/...$//\;p}
--2017-06-10 08:02:46-- http://api.usno.navy.mil/rstt/oneday?ID=iOnTheSk&date=today&tz=0&coords=30,130
Resolving api.usno.navy.mil (api.usno.navy.mil)... 199.211.133.93
Connecting to api.usno.navy.mil (api.usno.navy.mil)|199.211.133.93|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘STDOUT’
- [ <=> ] 753 --.-KB/s in 0s
2017-06-10 08:02:47 (42.6 MB/s) - written to stdout [753]
10:21
Disclaimers:
1) English is my second language, so please forgive any gramatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
3) You will find some text in capital letters here and there. Is is of course not me "shouting" at you, but only a way to make portions of text stand out. Plase do not consider this an act of unpoliteness.
4) For those of you who get to the bottom of this novella alive, THANKS IN ADVANCE for your patience, even if you do not get to be able to/feel like help/ing me. My disclamer here would be the fact that, after surfing the site for a while, I noticed that the most common "complaint" from people willing to help seems to be lack of information (and/or the lack of quality) provided by the ones seeking for help. I then preferred to be accused of overwording if need be... It would be, at least, not a common offense...
The "Problem":
I have 2 files (a and b for simplification). File a has 7 columns separated by commas. File b has 2 columns separated by commas.
What I need: Whenever the data in the 7th column of file a matches -EXACT MATCHES ONLY- the data on the 1st column of file b, a new line, containing the whole line of file a plus column 2 of file b is to be appended into a new file "c".
--- MORE INFO IN THE NOTES AT THE BOTTOM ---
file a:
Server Name,File System,Path,File,Date,Type,ID
horror,/tmp,foldera/folder/b/folderc,binaryfile.bin,2014-01-21 22:21:59.000000,typet,aaaaaaaa
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333
hostile,/sad,folder22,higefile.hug,2016-06-17 18:43:12.000000,typeasd,77777777
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999
file b:
ID,Size
11111111,215915
22222222,1716
33333333,212856
44444444,1729
55555555,215927
66666666,1728
88888888,1729
99999999,213876
bbbbbbbb,26669080
Expected file c:
Server Name,File System,Path,File,Date,Type,ID,Size
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111,215915
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222,1716
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666,1728
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333,212856
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444,1729
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555,215927
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999,213876
Additional notes:
0) Notice how line with ID "aaaaaaaa" in file a does not make it into file c since ID "aaaaaaaa" is not present in file b. Likewise, line with ID "bbbbbbbb" in file b does not make it into file c since ID "bbbbbbbb" is not present in file a and it is therefore never looked out for in the first place.
1) Data is clearly completely made out due to confidenciality issues, though the examples provided fairly resemble what the real files look like.
2) I added headers just to provide a better idea of the nature of the data. The real files don't have it, so no need to skip them on the source file nor create it in the destination file.
3) Both files come sorted by default, meaning that IDs will be properly sorted in file b, while they will be most likely scrambled in file a. File c should preferably follow the order of file a (though I can manipulate later to fit my needs anyway, so no worries there, as long as the code does what I need and doesn't mess up with the data by combining the wrong lines).
4) VERY VERY VERY IMPORTANT:
4.a) I already have a "working" ksh code (attached below) that uses "cat", "grep", "while" and "if" to do the job. It worked like a charm (well, acceptably) with 160K-lines sample files (it was able to output 60K lines -approx- an hour, which, in projection, would yield an acceptable "20 days" to produce 30 million lines [KEEP ON READING]), but somehow (I have plenty of processor and memory capacity) cat and/or grep seem to be struggling to process a real life 5Million-lines file (both file a and b can have up to 30 million lines each, so that's the maximum probable amount of lines in the resulting file, even assuming 100% lines in file a find it's match in file b) and the c file is now only being feed with a couple hundred lines every 24 hours.
4.b) I was told that awk, being stronger, should succeed where the more weaker commands I worked with seem to fail. I was also told that working with arrays might be the solution to my performance problem, since all data is uploded to memory at once and worked from there, instead of having to cat | grep file b as many times as there are lines in file a, as I am currently doing.
4.c) I am working on AIX, so I only have sh and ksh, no bash, therefore I cannot use the array tools provided by the latter, that's why I thought of AWK, that and the fact that I think AWK is probably "stronger", though I might be (probably?) wrong.
Now, I present to you the magnificent piece of ksh code (obvious sarcasm here, though I like the idea of you picturing for a brief moment in your mind the image of the monkey holding up and showing all other jungle-crawlers their future lion king) I have managed to develop (feel free to laugh as hard as you need while reading this code, I will not be able to hear you anyway, so no feelings harmed :P ):
cat "${file_a}" | while read -r line_file_a; do
server_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $1}'`
filespace_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $2}'`
folder_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $3}'`
file_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $4}'`
file_date_file_a=`echo "${line_file_a}" | awk -F"," '{print $5}'`
file_type_file_a=`echo "${line_file_a}" | awk -F"," '{print $6}'`
file_id_file_a=`echo "${line_file_a}" | awk -F"," '{print $7}'`
cat "${file_b}" | grep ${object_id_file_a} | while read -r line_file_b; do
file_id_file_b=`echo "${line_file_b}" | awk -F"," '{print $1}'`
file_size_file_b=`echo "${line_file_b}" | awk -F"," '{print $2}'`
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}" >> ${file_c}.csv
fi
done
done
One last additional note, just in case you wonder:
The "if" section was not only built as a mean to articulate the output line, but it servers a double purpose, while safe-proofing any false positives that may derive from grep, IE 100 matching 1000 (Bear in mind that, as I mentioned earlier, I am working on AIX, so my grep does not have the -m switch the GNU one has, and I need matches to be exact/absolute).
You have reached the end. CONGRATULATIONS! You've been awarded the medal to patience.
$ cat stuff.awk
BEGIN { FS=OFS="," }
NR == FNR { a[$1] = $2; next }
$7 in a { print $0, a[$7] }
Note the order for providing the files to the awk command, b first, followed by a:
$ awk -f stuff.awk b.txt a.txt
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111,215915
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222,1716
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666,1728
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333,212856
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444,1729
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555,215927
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999,213876
EDIT: Updated calculation
You can try to predict how often you are calling another program:
At least 7 awk's + 1 cat + 1 grep for each line in file a multiplied by 2 awk's for each line in file b.
(9 * 160.000).
For file b: 2 awk's, one file open and one file close for each hit. With 60K output, that would be 4 * 60.000.
A small change in the code can change this into "only" 160.000 times a grep:
cat "${file_a}" | while IFS=, read -r server_name_file_a \
filespace_name_file_a folder_name_file_a file_name_file_a \
file_date_file_a file_type_file_a file_id_file_a; do
grep "${object_id_file_a}" "${file_b}" | while IFS="," read -r line_file_b; do
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}"
fi
done
done >> ${file_c}.csv
Well, try this with your 160K files and see how much faster it is.
Before I explain that this still is the wrong way I will make another small improvement: I will move the cat for the while loop to the end (after done).
while IFS=, read -r server_name_file_a \
filespace_name_file_a folder_name_file_a file_name_file_a \
file_date_file_a file_type_file_a file_id_file_a; do
grep "${object_id_file_a}" "${file_b}" | while IFS="," read -r line_file_b; do
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}"
fi
done
done < "${file_a}" >> ${file_c}.csv
The main drawback of the solutions is that you are reading the complete file_b again and again with your grep for each line in file a.
This solution is a nice improvement in the performance, but still a lot overhead with grep. Another huge improvement can be found with awk.
The best solution is using awk as explained in What is "NR==FNR" in awk? and found in the answer of #jas.
It is only one system call and both files are only read once.
I'm running GNU - Screen (4.03.01) so I can have multiple terminals in one, and I'm looking for a good way to display live stats of my memory, so as I do things like compiling, testing programs, etc... I can see how much resources I have left.
I know there is "TOP" the performance monitor... and other similar programs, but I'm not looking for the entire active process list etc... I just want a snapshot of my memory stats that updates for example every 3-5 seconds.
I really appreciate anyone taking the time to help me with this, so thank you!
(for visualization purposes)
Screenshot:
You can use the combination of watch which repeats the specified program and displays its output and free which shows current memory usage
watch free -m
free --help
Usage:
free [options]
Options:
-b, --bytes show output in bytes
-k, --kilo show output in kilobytes
-m, --mega show output in megabytes
-g, --giga show output in gigabytes
--tera show output in terabytes
-h, --human show human-readable output
--si use powers of 1000 not 1024
-l, --lohi show detailed low and high memory statistics
-o, --old use old format (without -/+buffers/cache line)
-t, --total show total for RAM + swap
-s N, --seconds N repeat printing every N seconds
-c N, --count N repeat printing N times, then exit
--help display this help and exit
-V, --version output version information and exit
For more details see free(1).
watch --help
Usage:
watch [options] command
Options:
-b, --beep beep if command has a non-zero exit
-c, --color interpret ANSI color sequences
-d, --differences[=]
highlight changes between updates
-e, --errexit exit if command has a non-zero exit
-g, --chgexit exit when output from command changes
-n, --interval seconds to wait between updates
-p, --precise attempt run command in precise intervals
-t, --no-title turn off header
-x, --exec pass command to exec instead of "sh -c"
-h, --help display this help and exit
-v, --version output version information and exit
You could use valgrind tool Massif, I haven't tried it, but it seems to be what you are looking for.
To use massif, install valgrind then run:
valgrind --tool=massif program argument1 argument2 ...
another fast solution is script like this
while true; do
free -m
# any command for CPU stats - i didn't understand - what you really want to see, please clarify - just % of CPU usage ?
# i think this command should help you.
ps -A -o pcpu | tail -n+2 | paste -sd+ | bc
done
The other thing you can do is use htop. It displays memory usage, CPU usage per core and shows resources used by each process. Really neat but maybe not that detailed as the rest of the answers.
Edit: I think this has been answered successfully, but I can't check 'til later. I've reformatted it as suggested though.
The question: I have a series of files, each with a name of the form XXXXNAME, where XXXX is some number. I want to move them all to separate folders called XXXX and have them called NAME. I can do this manually, but I was hoping that by naming them XXXXNAME there'd be some way I could tell Terminal (I think that's the right name, but not really sure) to move them there. Something like
mv *NAME */NAME
but where it takes whatever * was in the first case and regurgitates it to the path.
This is on some form of Linux, with a bash shell.
In the real life case, the files are 0000GNUmakefile, with sequential numbering. I'm having to make lots of similar-but-slightly-altered versions of a program to compile and run on a cluster as part of my research. It would probably have been quicker to write a program to edit all the files and put in the right place in the first place, but I didn't.
This is probably extremely simple, and I should be able to find an answer myself, if I knew the right words. Thing is, I have no formal training in programming, so I don't know what to call things to search for them. So hopefully this will result in me getting an answer, and maybe knowing how to find out the answer for similar things myself next time. With the basic programming I've picked up, I'm sure I could write a program to do this for me, but I'm hoping there's a simple way to do it just using functionality already in Terminal. I probably shouldn't be allowed to play with these things.
Thanks for any help! I can actually program in C and Python a fair amount, but that's through trial and error largely, and I still don't know what I can do and can't do in Terminal.
SO many ways to achieve this.
I find that the old standbys sed and awk are often the most powerful.
ls | sed -rne 's:^([0-9]{4})(NAME)$:mv -iv & \1/\2:p'
If you're satisfied that the commands look right, pipe the command line through a shell:
ls | sed -rne 's:^([0-9]{4})(NAME)$:mv -iv & \1/\2:p' | sh
I put NAME in brackets and used \2 so that if it varies more than your example indicates, you can come up with a regular expression to handle your filenames better.
To do the same thing in gawk (GNU awk, the variant found in most GNU/Linux distros):
ls | gawk '/^[0-9]{4}NAME$/ {printf("mv -iv %s %s/%s\n", $1, substr($0,0,4), substr($0,5))}'
As with the first sample, this produces commands which, if they make sense to you, can be piped through a shell by appending | sh to the end of the line.
Note that with all these mv commands, I've added the -i and -v options. This is for your protection. Read the man page for mv (by typing man mv in your Linux terminal) to see if you should be comfortable leaving them out.
Also, I'm assuming with these lines that all your directories already exist. You didn't mention if they do. If they don't, here's a one-liner to create the directories.
ls | sed -rne 's:^([0-9]{4})(NAME)$:mkdir -p \1:p' | sort -u
As with the others, append | sh to run the commands.
I should mention that it is generally recommended to use constructs like for (in Tim's answer) or find instead of parsing the output of ls. That said, when your filename format is as simple as /[0-9]{4}word/, I find the quick sed one-liner to be the way to go.
Lastly, if by NAME you actually mean "any string of characters" rather than the literal string "NAME", then in all my examples above, replace NAME with .*.
The following script will do this for you. Copy the script into a file on the remote machine (we'll call it sortfiles.sh).
#!/bin/bash
# Get all files in current directory having names XXXXsomename, where X is an integer
files=$(find . -name '[0-9][0-9][0-9][0-9]*')
# Build a list of the XXXX patterns found in the list of files
dirs=
for name in ${files}; do
dirs="${dirs} $(echo ${name} | cut -c 3-6)"
done
# Remove redundant entries from the list of XXXX patterns
dirs=$(echo ${dirs} | uniq)
# Create any XXXX directories that are not already present
for name in ${dirs}; do
if [[ ! -d ${name} ]]; then
mkdir ${name}
fi
done
# Move each of the XXXXsomename files to the appropriate directory
for name in ${files}; do
mv ${name} $(echo ${name} | cut -c 3-6)
done
# Return from script with normal status
exit 0
From the command line, do chmod +x sortfiles.sh
Execute the script with ./sortfiles.sh
Just open the Terminal application, cd into the directory that contains the files you want moved/renamed, and copy and paste these commands into the command line.
for file in [0-9][0-9][0-9][0-9]*; do
dirName="${file%%*([^0-9])}"
mkdir -p "$dirName"
mv "$file" "$dirName/${file##*([0-9])}"
done
This assumes all the files that you want to rename and move are in the same directory. The file globbing also assumes that there are at least four digits at the start of the filename. If there are more than four numbers, it will still be caught, but not if there are less than four. If there are less than four, take off the appropriate number of [0-9]s from the first line.
It does not handle the case where "NAME" (i.e. the name of the new file you want) starts with a number.
See this site for more information about string manipulation in bash.
The page 38 of the book Linux 101 Hacks suggests:
cat url-list.txt | xargs wget –c
I usually do:
for i in `cat url-list.txt`
do
wget -c $i
done
Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?
Added
The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.
From the Rationale section of a UNIX manpage for xargs. (Interestingly this section doesn't appear in the OS X BSD version of xargs, nor in the GNU version.)
The classic application of the xargs
utility is in conjunction with the
find utility to reduce the number of
processes launched by a simplistic use
of the find -exec combination. The
xargs utility is also used to enforce
an upper limit on memory required to
launch a process. With this basis in
mind, this volume of POSIX.1-2008
selected only the minimal features
required.
In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?
There are other some other considerations. xargs requires extra care for filenames with spaces or other no-no characters, and -exec has an option (+), that groups processing into batches. So, not everyone prefers xargs, and perhaps it's not best for all situations.
See these links:
http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html
http://fahdshariff.blogspot.com/2009/05/find-exec-vs-xargs.html
Also consider:
xargs -I'{}' wget -c '{}' < url-list.txt
but wget provides an even better means for the same:
wget -c -i url-list.txt
With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.
xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.
xargs is designed to process multiple inputs for each process it forks. A shell script with a for loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give an xargs solution a significant performance enhancement.
instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...
seq 1 10 | xargs -n 1 -P 3 echo
would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.
Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.
cat url-list.txt | parallel wget -c
One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.
I'm not really a bash expert though, so there could be other reasons it's better (or worse).