How to generate table from csv in postgres sql - database

I am new to database management and we are using psql. All I need to do is to migrate csv (around 200 tables) to our database. Manually creating tables for every csv file is bit tiresome so please help me out, Is there any way to generate table from csv file?

Answered at DBA Stackexchange by the OP. I'm copying the answer here because this was the first link returned by my search engine.
OP made a script like:
DATADIR='data' # this directory name
PREFIX='jobd'
DBNAME='divacsv'
function createSchema {
COLUMNS=`head -n 1 $1 |
awk -F, '{for(i=1; i<=NF; i++){out=out $i" text, ";} print out;}' |
sed 's/ text, $/MYEXTRA text/' |
sed 's/"//g'`
CMD_CREATE="psql $DBNAME -c \"CREATE TABLE $2 ($COLUMNS);\""
echo $CMD_CREATE
sh -c "$CMD_CREATE"
CMD_COPY="psql divacsv -c \"COPY $2 FROM '"`pwd`"/$1' DELIMITER ',' CSV;\""
echo $CMD_COPY
sh -c "$CMD_COPY"
}
for file in $DATADIR/*.csv; do
table=$PREFIX"_"`echo $file | sed 's/.*\///' | sed 's/.csv//' `
createSchema "$file" $table
done
Comments advise that HEADER might be needed to avoid loading first line with header texts, which is true.
I've tested this code but couldn't make it work under CentOS.

Related

How to run a command on all .cs files in directory and store file path as a variable to be used as command on windows

I'm trying to run the following command on each file of a directory.
svn blame FILEPATH | gawk '{print $2}' | sort | uniq -c
It works well however it only works on individual files. For whatever reason, it won't run on the directory as a whole. I was hoping to create some form of batch script that would iterate through the directory and would grab the file path and store it as a variable to be used in the command. However, I've never written a batch script nor do I know the first thing about them. I tried this loop but couldn't get it to work
set codedirectory=%C:\Repo\Pineapple% for %codedirectory% %%i in (*.cs) do
but I'm not necessarily sure what to do next. Unfortunately, this all has to be run on windows. Any help would be greatly appreciated. Thanks!
use for and find, similar to example on
https://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html
for i in $(find . -name "*.cs"); do
svn blame $i | gawk '{print $2}' | sort | uniq -c
done

using command variable in column output not working

I'm using a script that uses curl to obtain specific array values from a configuration. I'd like to place the output into columns separating values (values are unknown to script). Here's my code:
# get overlay networks and their details
get_overlay=`curl -H "X-Person-Token: $auth_token" -H "X-Person-Email: $auth_email" -k "$api_host/api/v1/networks"`
# array of overlay names with uuid
overlay_name=`echo $get_overlay | jq '.[] | .name'`
overlay_uuid=`echo $get_overlay | jq '.[] | .uuid'`
echo ""
echo -e "Overlay UUID\n$oname $ouuid" | column -t
exit 0
Here's the ouput:
Overlay UUID
"TESTOVERLAY"
"Auto_API_Overlay"
"ANOTHEROVERLAYTEST" "ea178905-6ab0-4154-ab05-412dc4b39151"
"e5be9dbe-b0fc-4e30-aaf5-ac4bdcd863a7"
"850ebf6b-3651-4cf1-aae1-5a6c03fad61b"
What I was expecting was:
Overlay UUID
"TESTOVERLAY" "ea178905-6ab0-4154-ab05-412dc4b39151"
"Auto_API_Overlay" "e5be9dbe-b0fc-4e30-aaf5-ac4bdcd863a7"
"ANOTHEROVERLAYTEST" "850ebf6b-3651-4cf1-aae1-5a6c03fad61b"
I'm an absolute beginner at this, any insight is very much appreciated.
Thanks!
I would suggest using paste to combine your two variables line by line:
paste <(printf 'Overlay\n%s\n' "$name") <(printf 'UUID\n%s\n' "$uuid") | column -t
Two process substitutions are used to pass the contents of each variable along with their titles.

How to copy second column from all the files in the directory and place them as columns in a new text file

I have 150 tab delimited text files, I want to copy the 2nd column of each file and paste next to another in a new text file. the new file will have 150 columns of 2nd column from each file. Help me guys.
This code worked but placed each column under the other, forming one loooong column.
for file in *.txt
do
awk '{print $2}' *.txt > AllCol.txt
done
Here is another approach without looping
$ c=$(ls -1 file*.tsv | wc -l); cut -f2 file*.tsv | pr -$c -t
#!/bin/bash
# Be sure the file suffix of the new file is not .txt
OUT=AllColumns.tsv
touch $OUT
for file in *.txt
do
paste $OUT <(awk -F\\t '{print $2}' $file) > $OUT.tmp
mv $OUT.tmp $OUT
done
One of many alternatives would be to use cut -f 2 instead of awk, but you flagged your question with awk.
Since your files are so regular, you could also skip the do loop, and use a command-line utility such as rs (reshape) or datamash.

importing data from a CSV in Bash

I have a CSV file that I need to use in a bash script. The CSV is formatted like so.
server1,file.name
server1,otherfile.name
server2,file.name
server3,file.name
I need to be able to pull this information into either an array or in some other way so that I can then filter the information and only pull out data for a single server that i can then pass to another command within the script.
I need it to go something like this.
Import workfile.csv
check hostname | return only lines from workfile.csv that have the hostname as column one and store column 2 as a variable.
find / -xdev -type f -perm -002 | compare to stored info | chmod o-w all files not in listing
I'm stuck using bash because of the environment that I'm working in.
The csv can be to big for adding all filenames in the find parameter list.
You also do not want to call find in a loop for every line in the csv.
Solution:
First make a complete list of files in a tmp file.
Second parse the csv and filter the files.
Third is chmod -w.
The next solution stores the files in a tmp
Make a script that gets the servername as a parameter.
See comment in the code:
# Before EDIT:
# Hostname by parameter 1
# Check that you have a hostname
if [ $# -ne 1 ]; then
echo "Usage: $0 hostname"
# Exit script, failure
exit 1
fi
hostname=$1
# Edit, get hostname by system call
hostname=$(hostname)
# Or: hostname=$(hostname -s)
# Additional check
if [ ! -f workfile.csv ]; then
echo "inputfile missing"
exit 1
fi
# After edits, ${hostname} is now filled.
find / -xdev -type f -perm -002 -name "${file}" > /tmp/allfiles.tmp
# Do not use cat workfile.csv | grep ..., you do not need to call cat
# grep with ^ for beginning of line, add a , for a complete first field
# grep "^${hostname}," workfile.csv
# cut for selecting second field with delimiter ','
# cut -d"," -f2
# while read file => can be improved with xargs but lets start with this.
grep "^${hostname}," workfile.csv | cut -d"," -f2 | while read file; do
# Using sed with #, not /, since you need / in the search string
# Variable in sed mist be outside the single quotes and in double quotes
# Add $ after the file for end-of-line
# delete the line with the file (#searchstring#d)
sed -i '#/'"${file}"'$#d' /tmp/allfiles.tmp
done
echo "Review /tmp/allfiles.tmp before chmodding all these files"
echo "Delete the echo and exit when you are happy"
# Just an exit for testing
exit
# Using < is for avoiding a call to cat
</tmp/allfiles.tmp xargs chmod -w
It might be easier when you can chmod -w all the files and chmod +w the files in the csv. This is a little different than you asked, since all files from the csv are writable after this process, maybe you do not want that.

Execute bash command stored in associative array over SSH, store result

For a larger project that's not relevant, I need to collect system stats from the local system or a remote system. Since I'm collecting the same stats either way, I'm preventing code duplication by storing the stats-collecting commands in a Bash associative array.
declare -A stats_cmds
# Actually contains many more key:value pairs, similar style
stats_cmds=([total_ram]="$(free -m | awk '/^Mem:/{print $2}')")
I can collect local system stats like this:
get_local_system_stats()
{
# Collect stats about local system
complex_data_structure_that_doesnt_matter=${stats_cmds[total_ram]}
# Many more similar calls here
}
A precondition of my script is that ~/.ssh/config is setup such that ssh $SSH_HOSTNAME works without any user input. I would like something like this:
get_remote_system_stats()
{
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=`ssh $SSH_HOSTNAME ${stats_cmds[total_ram]}`
}
I've tried every combination of single quotes, double quotes, backticks and such that I can imagine. Some combinations result in the stats command getting executed too early (bash: 7986: command not found), others cause syntax errors, others return null (single quotes around the stats command) but none store the proper result in my data structure.
How can I evaluate a command, stored in an associative array, on a remote system via SSH and store the result in a data structure in my local script?
Make sure that the commands you store in your array don't get expanded when you assign your array!
Also note that the complex-looking quoting style is necessary when nesting single quotes. See this SO post for an explanation.
stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"'')
And then just launch your ssh as:
sh "$ssh_hostname" "${stats_cmds[total_ram]}"
(yeah, I lowercased your variable name because uppercase variable names in Bash are really sick). Then:
get_local_system_stats() {
# Collect stats about local system
complex_data_structure_that_doesnt_matter=$( ${stats_cmds[total_ram]} )
# Many more similar calls here
}
and
get_remote_system_stats() {
# Collect stats about remote system
complex_data_structure_that_doesnt_matter=$(ssh "$ssh_hostname" "${stats_cmds[total_ram]}")
}
First, I'm going to suggest an approach that makes minimal changes to your existing implementation. Then, I'm going to demonstrate something closer to best practices.
Smallest Modification
Given your existing code:
declare -A remote_stats_cmds
remote_stats_cmds=([total_ram]='free -m | awk '"'"'/^Mem:/{print $2}'"'"''
[used_ram]='free -m | awk '"'"'/^Mem:/{print $3}'"'"''
[free_ram]='free -m | awk '"'"'/^Mem:/{print $4}'"'"''
[cpus]='nproc'
[one_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $1}'"'"' | tr -d " "'
[five_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $2}'"'"' | tr -d " "'
[fifteen_min_load]='uptime | awk -F'"'"'[a-z]:'"'"' '"'"'{print $2}'"'"' | awk -F "," '"'"'{print $3}'"'"' | tr -d " "'
[iowait]='cat /proc/stat | awk '"'"'NR==1 {print $6}'"'"''
[steal_time]='cat /proc/stat | awk '"'"'NR==1 {print $9}'"'"'')
...one can evaluate these locally as follows:
result=$(eval "${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
...or remotely as follows:
result=$(ssh "$hostname" bash <<<"${remote_stat_cmds[iowait]}")
echo "$result" # demonstrate value retrieved
No separate form is required.
The Right Thing
Now, let's talk about an entirely different way to do this:
# no awful nested quoting by hand!
collect_total_ram() { free -m | awk '/^Mem:/ {print $2}'; }
collect_used_ram() { free -m | awk '/^Mem:/ {print $3}'; }
collect_cpus() { nproc; }
...and then, to evaluate locally:
result=$(collect_cpus)
...or, to evaluate remotely:
result=$(ssh "$hostname" bash <<<"$(declare -f collect_cpus); collect_cpus")
...or, to iterate through defined functions with the collect_ prefix and do both of these things:
declare -A local_results
declare -A remote_results
while IFS= read -r funcname; do
local_results["${funcname#collect_}"]=$("$funcname")
remote_results["${funcname#collect_}"]=$(ssh "$hostname" bash <<<"$(declare -f "$funcname"); $funcname")
done < <(compgen -A function collect_)
...or, to collect all the items into a single remote array in one pass, avoiding extra SSH round-trips and not eval'ing or otherwise taking security risks with results received from the remote system:
remote_cmd=""
while IFS= read -r funcname; do
remote_cmd+="$(declare -f "$funcname"); printf '%s\0' \"$funcname\" \"\$(\"$funcname\")\";"
done < <(compgen -A function collect_)
declare -A remote_results=( )
while IFS= read -r -d '' funcname && IFS= read -r -d '' result; do
remote_results["${funcname#collect_}"]=$result
done < <(ssh "$hostname" bash <<<"$remote_cmd")

Resources