Redis mass insertion CSV - data is not inserting to Redis - database

I am trying to insert the CSV data into Redis using the below command
Column1
Column2
long_ago/speech:
well-done speech
long-ago/debate:
well-done debate
long ago/work:
well-done work
awk -F ',' 'FNR > 1 && $1 && $2 {printf("SET Topic:%s %s\n",$1,$2)}' data_topics.csv | redis-cli --pipe
The expectation is when I do
GET "Topic:long_ago/speech:"
should print
>"well-done speech"
But I am not getting any output when I tried inserting 1000 rows in CSV. So have tried with the above 3 rows in CSV and getting the below error
[admin~]$ awk -F ',' 'FNR > 1 && $1 && $2 {printf("SET Topic:%s %s\n",$1,$2)}' data_topics.csv | redis-cli --pipe
All data transferred. Waiting for the last reply...
ERR syntax error
ERR syntax error
ERR syntax error
Last reply received from the server.
errors: 3, Replies: 3
So I have tried adding double quotes in the 2nd column, now my CSV looks something like the below
Column1
Column2
long_ago/speech:
"well-done speech"
long-ago/debate:
"well-done debate"
long ago/work:
"well-done work"
and this is the error I am getting now -
[admin~]$ awk -F ',' 'FNR > 1 && $1 && $2 {printf("SET Topic:%s %s\n",$1,$2)}' data_topics.csv | redis-cli --pipe
All data transferred. Waiting for the last reply...
ERR Protocol error: unbalanced quotes in request
Please help me to insert my CSV data into Redis.

Using a CSV called data.csv that contains this:
long_ago/speech:,well-done speech
long-ago/debate:,well-done debate
long-ago/work:,well-done work
You could use:
awk -F, '{printf("SET \"Topic:%s\" \"%s\"\n",$1,$2)}' data.csv | redis-cli --pipe
Then you could do:
redis-cli GET "Topic:long_ago/speech:"
"well-done speech"

Related

How to lookup data from a csv file

I have an array and a csv file. I'm trying to get data from the csv file that match with the data from the array.
Here's my array:
array[12345678876543]=ID00000111
array[87654321234567]=ID00000222
and here is the data from csv file:
12345678876543,floor1
87654321234567,floor2
Im trying to get this output:
ID00000111 floor1
ID00000222 floor2
I tried this sytax but I can only get the floor number.
for key in ${!array[#]}; do
awk -F, -v serial="${key}" '$1 == serial { print $2; exit}' test.csv
done
I hope someone could help me in my problem.
I'm assuming that the first entry in your csv file is the key to the array.
#!/bin/bash
array[12345678876543]=ID00000111
array[87654321234567]=ID00000222
while read -r line; do
key=$(echo $line | cut -d, -f1)
val=$(echo $line | cut -d, -f2-)
echo ${array[$key]} $val
done < test.csv
You can also do something like this which would be closer to what you have right now:
for key in ${!array[#]}; do
echo ${array[$key]} $(grep "$key" test.csv | cut -d, -f2-)
done

Bash array values as variables

Is it possible to use array values as variables?
For example, i have this script:
#!/bin/bash
SOURCE=$(curl -k -s $1 | sed 's/{//g;s/}//g;s/,/"\n"/g;s/:/=/g;s/"//g' | awk -F"=" '{ print $1 }')
JSON=$(curl -k -s $1 | sed 's/{//g;s/}//g;s/,/"\n"/g;s/:/=/g;s/"//g' | awk -F"=" '{ print $NF }')
data=$2
readarray -t prot_array <<< "$SOURCE"
readarray -t pos_array <<< "$JSON"
for ((i=0; i<${#prot_array[#]}; i++)); do
echo "${prot_array[i]}" "${pos_array[i]}" | sed 's/NOK/0/g;s/OK/1/g' | grep $2 | awk -F' ' '{ print $2,$3,$4 }'
done
EDIT:
I just added: grep $2 | awk -F' ' '{ print $2,$3,$4 }'
Usage:
./json.sh URL
Sample (very short) output:
DATABASE 1
STATUS 1
I don't want to echo out all the lines, i would like to use DATABASE STATUS as variable $DATABASE and echo that out.
I just need DATABASE (or any other) value from command line.
Is it somehow possible to use something like this?
./json.sh URL $DATABASE
Happy to explain more if needed.
EDIT:
curl output without any formattings etc:
{
"VERSION":"R3.1",
"STATUS":"OK",
"DATABASES":{
"READING":"OK"
},
"TIMESTAMP":"2017-03-08-16-20-35"
}
Output using script:
VERSION R3.1
STATUS 1
DATABASES 1
TIMESTAMP 2017-03-08-16-21-54
What i want is described before. For example use DATABASE as varible $DATABASE and somehow get the value "1"
EDIT:
Random json from uconn.edu
./json.sh https://github.uconn.edu/raw/nam12023/novaLauncher/master/manifest.json
Another:
./json.sh https://gitlab.uwe.ac.uk/dc2-roskilly/angular-qs/raw/master/.npm/nan/2.4.0/package/package.json
Last output begins with:
name nan
version 2.4.0
From command line: ./json.sh URL version
At leats it works for me.
I think you want to use jq something like this:
$ curl -k -s "$1" | jq --arg d DATABASES -r '
"VERSION \(.VERSION)",
"STATUS \(if .STATUS == "OK" then 1 else 0 end)",
"DATABASES \(if .[$d].READING == "OK" then 1 else 0 end)",
"TIMESTAMP \(.TIMESTAMP)"
'
VERSION R3.1
STATUS 1
DATABASES 1
TIMESTAMP 2017-03-08-16-20-35
(I'm probably missing a simpler way to convert a boolean value to an integer.)
Quick explanation:
The ,-separated strings each become a separate output line.
The -r option outputs a raw string, rather than a JSON string value.
The name of the database field is passed using the --arg option.
\(...) is jq's interpolation operator; the contents are evaluated as a JSON expression and the result is inserted into the string.

How do i echo specific rows and columns from csv's in a variable?

The below script:
#!/bin/bash
otscurrent="
AAA,33854,4528,38382,12
BBB,83917,12296,96213,13
CCC,20399,5396,25795,21
DDD,27198,4884,32082,15
EEE,2472,981,3453,28
FFF,3207,851,4058,21
GGG,30621,4595,35216,13
HHH,8450,1504,9954,15
III,4963,2157,7120,30
JJJ,51,59,110,54
KKK,87,123,210,59
LLL,573,144,717,20
MMM,617,1841,2458,75
NNN,234,76,310,25
OOO,12433,1908,14341,13
PPP,10627,1428,12055,12
QQQ,510,514,1024,50
RRR,1361,687,2048,34
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
"
IFS="," array1=(${otscurrent})
echo ${array1[4]}
Prints:
$ ./test.sh
12
BBB
I'm trying to get it to just print 12... And I am not even sure how to make it just print row 5 column 4
The variable is an output of a sqlquery that has been parsed with several sed commands to change the formatting to csv.
otscurrent="$(sqlplus64 user/password#dbserverip/db as sysdba #query.sql |
sed '1,11d; /^-/d; s/[[:space:]]\{1,\}/,/g; $d' |
sed '$d'|sed '$d'|sed '$d' | sed '$d' |
sed 's/Used,MB/Used MB/g' |
sed 's/Free,MB/Free MB/g' |
sed 's/Total,MB/Total MB/g' |
sed 's/Pct.,Free/Pct. Free/g' |
sed '1b;/^Name/d' |
sed '/^$/d'
)"
Ultimately I would like to be able to call on a row and column and run statements on the values.
Initially i was piping that into :
awk -F "," 'NR>1{ if($5 < 10) { printf "%-30s%-10s%-10s%-10s%-10s\n", $1,$2,$3,$4,$5"%"; } else { echo "Nothing to do" } }')"
Which works but I couldn't run commands from if else ... or atleaste I didn't know how.
If you have bash 4.0 or newer, an associative array is an appropriate way to store data in this kind of form.
otscurrent=${otscurrent#$'\n'} # strip leading newline present in your sample data
declare -A data=( )
row=0
while IFS=, read -r -a line; do
for idx in "${!line[#]}"; do
data["$row,$idx"]=${line[$idx]}
done
(( row += 1 ))
done <<<"$otscurrent"
This lets you access each individual item:
echo "${data[0,0]}" # first field of first line
echo "${data[9,0]}" # first field of tenth line
echo "${data[9,1]}" # second field of tenth line
"I'm trying to get it to just print 12..."
The issue is that IFS="," splits on commas and there is no comma between 12 and BBB. If you want those to be separate elements, add a newline to IFS. Thus, replace:
IFS="," array1=(${otscurrent})
With:
IFS=$',\n' array1=(${otscurrent})
Output:
$ bash test.sh
12
All you need to print the value of the 4th column on the 5th row is:
$ awk -F, 'NR==5{print $4}' <<< "$otscurrent"
3453
and just remember that in awk row (record) and column (field) numbers start at 1, not 0. Some more examples:
$ awk -F, 'NR==1{print $5}' <<< "$otscurrent"
12
$ awk -F, 'NR==2{print $1}' <<< "$otscurrent"
BBB
$ awk -F, '$5 > 50' <<< "$otscurrent"
JJJ,51,59,110,54
KKK,87,123,210,59
MMM,617,1841,2458,75
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
If you'd like to avoid all of the complexity and simply parse your SQL output to produce what you want without 20 sed commands in between, post a new question showing the raw sqlplus output as the input and what you want finally output and someone will post a brief, clear, simple, efficient awk script to do it all at one time, or maybe 2 commands if you still want an intermediate CSV for some reason.

SED Match/Replace URL and Update Serialized Array Count

Below is an example snippet from a sql dump file. This specific row contains a meta_value of a Wordpress PHP serialized array. During database restores in dev., test., and qc. environments I'm using sed to replace URLs with the respective environment sub-domain.
INSERT INTO `wp_postmeta`
(`meta_id`,
`post_id`,
`meta_key`,
`meta_value`)
VALUES
(527,
1951,
'ut_parallax_image',
'a:4:{
s:17:\"background-image\";
s:33:\"http://example.com/background.jpg\";
s:23:\"mobile-background-image\";
s:37:\"www.example.com/mobile-background.jpg\";
}')
;
However, I need to extend this to correct the string length in the serialized arrays after replace.
sed -r -e "s/:\/\/(www\.)?${domain}/:\/\/\1${1}\.${domain}/g" "/vagrant/repositories/apache/$domain/_sql/$(basename "$file")" > "/vagrant/repositories/apache/$domain/_sql/$1.$(basename "$file")"
The result should look like this for dev.:
INSERT INTO `wp_postmeta`
(`meta_id`,
`post_id`,
`meta_key`,
`meta_value`)
VALUES
(527,
1951,
'ut_parallax_image',
'a:4:{
s:17:\"background-image\";
s:37:\"http://dev.example.com/background.jpg\";
s:23:\"mobile-background-image\";
s:41:\"www.dev.example.com/mobile-background.jpg\";
}')
;
I'd prefer to not introduce any dependancies other than sed.
Thanks #John1024. #Fabio and #Seth, I not sure for perfomance, but these code work and without wp-cli:
localdomain=mylittlewordpress.local
maindomain=strongwordpress.site.ru
cat dump.sql | sed 's/;s:/;\ns:/g' | awk -F'"' '/s:.+'$maindomain'/ {sub("'$maindomain'", "'$localdomain'"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1' | sed ':a;N;$!ba;s/;\ns:/;s:/g' | sed "s/$maindomain/$localdomain/g" | mysql -u$USER -p$PASS $DBNAME
PHP serialized string exploded by ';s:' to multiline string and awk processed all lines by #John1024 solution.
cat dump.sql | sed 's/;s:/;\ns:/g'
Redirect output to awk
awk -F'"' '/^s:.+'$maindomain'/ {sub("'$maindomain'", "'$localdomain'"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1'
After all lines processed, multiline implode to one line (as then exists in original dump.sql). Thanks #Zsolt https://stackoverflow.com/a/1252191
sed ':a;N;$!ba;s/;\ns:/;s:/g'
Addition sed replacement need for any other strings in wordpress database.
sed "s/$maindomain/$localdomain/g"
And load into main server DB
... | mysql -u$USER -p$PASS $DBNAME
Your algorithm involves arithmetic. That makes sed a poor choice. Consider awk instead.
Consider this input file:
$ cat inputfile
something...
s:33:\"http://example.com/background.jpg\";
s:37:\"www.example.com/mobile-background.jpg\";
s:33:\"http://www.example.com/background.jpg\";
more lines...
I believe that this does what you want:
$ awk -F'"' '/:\/\/(www[.])?example.com/ {sub("example.com", "dev.example.com"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1' inputfile
something...
s:37:\"http://dev.example.com/background.jpg\";
s:37:\"www.example.com/mobile-background.jpg\";
s:41:\"http://www.dev.example.com/background.jpg\";
more lines...
WP-CLI handles serialized PHP arrays during a search-replace http://wp-cli.org/commands/search-replace/. I wanted to try a native shell solution, but having WP-CLI was worth the extra overhead in the end.
Here is a sample text file you asked for (it's a database export).
Original (https://www.example.com) :
LOCK TABLES `wp_options` WRITE;
INSERT INTO `wp_options` VALUES (1,'siteurl','https://www.example.com','yes'),(18508,'optionsframework','a:48:{s:4:\"logo\";s:75:\"https://www.example.com/wp-content/uploads/2014/04/logo_imbrique_small3.png\";s:7:\"favicon\";s:62:\"https://www.example.com/wp-content/uploads/2017/04/favicon.ico\";}','yes')
/*!40000 ALTER TABLE `wp_options` ENABLE KEYS */;
UNLOCK TABLES;
Result needed (http://example.localhost) :
LOCK TABLES `wp_options` WRITE;
INSERT INTO `wp_options` VALUES (1,'siteurl','http://example.localhost','yes'),(18508,'optionsframework','a:48:{s:4:\"logo\";s:76:\"http://example.localhost/wp-content/uploads/2014/04/logo_imbrique_small3.png\";s:7:\"favicon\";s:64:\"https://example.localhost/wp-content/uploads/2017/04/favicon.ico\";}','yes');
/*!40000 ALTER TABLE `wp_options` ENABLE KEYS */;
UNLOCK TABLES;
As you can see :
there is multiple occurence on the same line
escape characters aren't counted in length number (eg: "/")
some occurence aren't preceded by "s:" length number (no need to replace, it can be done after awk with a simple sed)
Thanks in advance !
#Alexander Demidov's answer is great, here's our implementation for reference
public static function replaceInFile(string $replace, string $replacement, string $absoluteFilePath): void
{
ColorCode::colorCode("Attempting to replace ::\n($replace)\nwith replacement ::\n($replacement)\n in file ::\n(file://$absoluteFilePath)", iColorCode::BACKGROUND_MAGENTA);
$replaceDelimited = preg_quote($replace, '/');
$replacementDelimited = preg_quote($replacement, '/');
$replaceExecutable = CarbonPHP::CARBON_ROOT . 'extras/replaceInFileSerializeSafe.sh';
// #link https://stackoverflow.com/questions/29902647/sed-match-replace-url-and-update-serialized-array-count
$replaceBashCmd = "chmod +x $replaceExecutable && $replaceExecutable '$absoluteFilePath' '$replaceDelimited' '$replace' '$replacementDelimited' '$replacement'";
Background::executeAndCheckStatus($replaceBashCmd);
}
public static function executeAndCheckStatus(string $command, bool $exitOnFailure = true): int
{
$output = [];
$return_var = null;
ColorCode::colorCode('Running CMD >> ' . $command,
iColorCode::BACKGROUND_BLUE);
exec($command, $output, $return_var);
if ($return_var !== 0 && $return_var !== '0') {
ColorCode::colorCode("The command >> $command \n\t returned with a status code (" . $return_var . '). Expecting 0 for success.', iColorCode::RED);
$output = implode(PHP_EOL, $output);
ColorCode::colorCode("Command output::\t $output ", iColorCode::RED);
if ($exitOnFailure) {
exit($return_var);
}
}
return (int) $return_var;
}
#!/usr/bin/env bash
set -e
SQL_FILE="$1"
replaceDelimited="$2"
replace="$3"
replacementDelimited="$4"
replacement="$5"
if ! grep --quiet "$replace" "$SQL_FILE" ;
then
exit 0;
fi
cp "$SQL_FILE" "$SQL_FILE.old.sql"
# #link https://stackoverflow.com/questions/29902647/sed-match-replace-url-and-update-serialized-array-count
# #link https://serverfault.com/questions/1114188/php-serialize-awk-command-speed-up/1114191#1114191
sed 's/;s:/;\ns:/g' "$SQL_FILE" | \
awk -F'"' '/s:.+'$replaceDelimited'/ {sub("'$replace'", "'$replacement'"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1' 2>/dev/null | \
sed -e ':a' -e 'N' -e '$!ba' -e 's/;\ns:/;s:/g' | \
sed "s/$replaceDelimited/$replacementDelimited/g" > "$SQL_FILE.replaced.sql"
cp "$SQL_FILE.replaced.sql" "$SQL_FILE"

Find content of one file from another file in UNIX

I have 2 files. First file contains the list of row ID's of tuples of a table in the database.
And second file contains SQL queries with these row ID's in "where" clause of the query.
For example:
File 1
1610657303
1610658464
1610659169
1610668135
1610668350
1610670407
1610671066
File 2
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668350;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672154;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668135;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672153;
I have to read File 1 and search in File 2 for all the SQL commands which matches the row ID's from File 1 and dump those SQL queries in a third file.
File 1 has 1,00,000 entries and File 2 contains 10 times the entries of File 1 i.e. 1,00,0000.
I used grep -f File_1 File_2 > File_3. But this is extremely slow and the rate is 1000 entries per hour.
Is there any faster way to do this?
You don't need regexps, so grep -F -f file1 file2
One way with awk:
awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2
This should be pretty quick. On my machine, it took under 2 seconds to create a lookup of 1 million entries and compare it against 3 million lines.
Machine Specs:
Intel(R) Xeon(R) CPU E5-2670 0 # 2.60GHz (8 cores)
98 GB RAM
I suggest using a programming language such as Perl, Ruby or Python.
In Ruby, a solution reading both files (f1 and f2) just once could be:
idxes = File.readlines('f1').map(&:chomp)
File.foreach('f2') do | line |
next unless line =~ /where ri=(\d+);$/
puts line if idxes.include? $1
end
or with Perl
open $file, '<', 'f1';
while (<$file>) { chomp; $idxs{$_} = 1; }
close($file);
open $file, '<', 'f2';
while (<$file>) {
next unless $_ =~ /where ri=(\d+);$/;
print $_ if $idxs{$1};
}
close $file;
The awk/grep solutions mentioned above were slow or memory hungry on my machine (file1 10^6 rows, file2 10^7 rows). So I came up with an SQL solution using sqlite3.
Turn file2 into a CSV-formatted file where the first field is the value after ri=
cat file2.txt | gawk -F= '{ print $3","$0 }' | sed 's/;,/,/' > file2_with_ids.txt
Create two tables:
sqlite> CREATE TABLE file1(rowId char(10));
sqlite> CREATE TABLE file2(rowId char(10), statement varchar(200));
Import the row IDs from file1:
sqlite> .import file1.txt file1
Import the statements from file2, using the "prepared" version:
sqlite> .separator ,
sqlite> .import file2_with_ids.txt file2
Select all and ony the statements in table file2 with a matching rowId in table file1:
sqlite> SELECT statement FROM file2 WHERE file2.rowId IN (SELECT file1.rowId FROM file1);
File 3 can be easily created by redirecting output to a file before issuing the select statement:
sqlite> .output file3.txt
Test data:
sqlite> select count(*) from file1;
1000000
sqlite> select count(*) from file2;
10000000
sqlite> select * from file1 limit 4;
1610666927
1610661782
1610659837
1610664855
sqlite> select * from file2 limit 4;
1610665680|update TABLE_X set ATTRIBUTE_A=87 where ri=1610665680;
1610661907|update TABLE_X set ATTRIBUTE_A=87 where ri=1610661907;
1610659801|update TABLE_X set ATTRIBUTE_A=87 where ri=1610659801;
1610670610|update TABLE_X set ATTRIBUTE_A=87 where ri=1610670610;
Without creating any indices, the select statement took about 15 secs on an AMD A8 1.8HGz 64bit Ubuntu 12.04 machine.
Most of previous answers are correct but the only thing that worked for me was this command
grep -oi -f a.txt b.txt
Maybe try AWK and use number from file 1 as a key for example simple script
First script will produce awk script:
awk -f script1.awk
{
print "\$0 ~ ",$0,"{ print \$0 }" > script2.awk;
}
and then invoke script2.awk with file
I may be missing something, but wouldn't it be sufficient to just iterate the IDs in file1 and for each ID, grep file2 and store the matches in a third file? I.e.
for ID in `cat file1`; do grep $ID file2; done > file3
This is not terribly efficient (since file2 will be read over and over again), but it may be good enough for you. If you want more speed, I'd suggest to use a more powerful scripting language which lets you read file2 into a map which quickly allows identifying lines for a given ID.
Here's a Python version of this idea:
queryByID = {}
for line in file('file2'):
lastEquals = line.rfind('=')
semicolon = line.find(';', lastEquals)
id = line[lastEquals + 1:semicolon]
queryByID[id] = line.rstrip()
for line in file('file1'):
id = line.rstrip()
if id in queryByID:
print queryByID[id]
## reports any lines contained in < file 1> missing in < file 2>
IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do ((\!$(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS
or to do what the asker wants, take off the negation and redirect
(IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do (($(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS) >> < file 3>

Resources