How to download Postgres bytea column as file - file

Currently, i have a number of files stored in postgres 8.4 as bytea. The file types are .doc, .odt, .pdf, .txt and etc.
May i know how to download all the file stored in Postgres because i need to to do a backup.
I need them in their original file type instead of bytea format.
Thanks!

One simple option is to use COPY command with encode to hex format and then apply xxd shell command (with -p continuous hexdump style switch). For example let's say I have jpg image in bytea column in samples table:
\copy (SELECT encode(file, 'hex') FROM samples LIMIT 1) TO
'/home/grzegorz/Desktop/image.hex'
$ xxd -p -r image.hex > image.jpg
As I checked it works in practice.

Try this:
COPY (SELECT yourbyteacolumn FROM yourtable WHERE <add your clauses here> ...) TO 'youroutputfile' (FORMAT binary)

Here's the simplest thing I could come up with:
psql -qAt "select encode(file,'base64') from files limit 1" | base64 -d
The -qAt is important as it strips off any formatting of the output. These options are available inside the psql shell, too.

base64
psql -Aqt -c "SELECT encode(content, 'base64') FROM ..." | base64 -d > file
xxd
psql -Aqt -c "SELECT encode(content, 'hex') FROM ..." | xxd -p -r > file

If you have a lot of data to download then you can get the lines first and then iterate through each one writing the bytea field to file.
$resource = pg_connect('host=localhost port=5432 dbname=website user=super password=************');
// grab all the user IDs
$userResponse = pg_query('select distinct(r.id) from resource r
join connection c on r.id = c.resource_id_from
join resource rfile on c.resource_id_to = rfile.id and rfile.resource_type_id = 10
join file f on rfile.id = f.resource_id
join file_type ft on f.file_type_id = ft.id
where r.resource_type_id = 38');
// need to work through one by one to handle data
while($user = pg_fetch_array($userResponse)){
$user_id = $user['id'];
$query = 'select r.id, f.data, rfile.resource_type_id, ft.extension from resource r
join connection c on r.id = c.resource_id_from
join resource rfile on c.resource_id_to = rfile.id and rfile.resource_type_id = 10
join file f on rfile.id = f.resource_id
join file_type ft on f.file_type_id = ft.id
where r.resource_type_id = 38 and r.id = ' . $user_id;
$fileResponse = pg_query($query);
$fileData = pg_fetch_array($fileResponse);
$data = pg_unescape_bytea($fileData['data']);
$extension = $fileData['extension'];
$fileId = $fileData['id'];
$filename = $fileId . '.' . $extension;
$fileHandle = fopen($filename, 'w');
fwrite($fileHandle, $data);
fclose($fileHandle);
}

DO $$
DECLARE
l_lob_id OID;
r record; BEGIN
for r in
select data, filename from bytea_table
LOOP
l_lob_id:=lo_from_bytea(0,r.data);
PERFORM lo_export(l_lob_id,'/home/...'||r.filename);
PERFORM lo_unlink(l_lob_id);
END LOOP;
END; $$

Best I'm aware, bytea to file needs to be done at the app level.
(9.1 might change this with the filesystem data wrapper contrib. There's also a lo_export function, but it is not applicable here.)

If you want to do this from a local windows, and not from the server, you will have to run every statement individually, and have PGAdmin and certutil:
Have PGAdmin installed.
Open cmd from the runtime folder or cd "C:\Program Files\pgAdmin 4\v6\runtime"
Run in PGAdmin query to get every statement that you will have to paste in cmd:
SELECT 'set PGPASSWORD={PASSWORD} && psql -h {host} -U {user} -d {db name} -Aqt -c "SELECT encode({bytea_column}, ''base64'') FROM {table} WHERE id='||id||'" > %a% && CERTUTIL -decode %a% "C:\temp{name_of_the_folder}\FileName - '||{file_name}||' ('||TO_CHAR(current_timestamp(),'DD.MM.YYYY,HH24 MI SS')||').'||{file_extension}||'"'
FROM table WHERE ....;
Replace {...}
It will generate something like:
set PGPASSWORD=123 psql -h 192.1.1.1 -U postgres -d my_test_db -Aqt -c "SELECT encode(file_bytea, 'base64') FROM test_table_bytea WHERE id=33" > %a% && CERTUTIL -decode %a% "C:\temp\DB_FILE\FileName - test1 - (06.04.2022,15 42 26).docx"
set PGPASSWORD=123 psql -h 192.1.1.1 -U postgres -d my_test_db -Aqt -c "SELECT encode(file_bytea, 'base64') FROM test_table_bytea WHERE id=44" > %a% && CERTUTIL -decode %a% "C:\temp\DB_FILE\FileName - test2 - (06.04.2022,15 42 26).pdf"
Copy paste all the generated statements in CMD. The files will be saved to your local machine.

Related

How to escape this command in a batch file?

I want to generate dynamically a set of switches for PG_DUMP like below:
--table=mySchema.foo --table=mySchema.bar ...
However, I want to restrict those switches to views only. The views names don't follow a pattern. They all reside in a single schema called mySchema.
Here is the batch file script I wrote:
#echo off
set PARAM_HOTE=localhost
set PARAM_PORT=5435
set PSQL="C:\Program Files\PostgreSQL\9.4\bin\psql.exe"
set SQL_QUERY=^
select string_agg( '--table=' || quote_ident(nspname) || '.' || quote_ident(relname), ' ' )^
from (^
select *^
from pg_class^
join pg_namespace on pg_namespace.oid = pg_class.relnamespace^
where relkind = 'v'^
and nspname = 'mySchema'^
order by relname ASC^
) infos_vues^
;
for /f %%i in ('"%PSQL%" --quiet --tuples-only --host %PARAM_HOTE% --port %PARAM_PORT% --username "rec" -c "%SQL_QUERY%" db') do set PG_DUMP_SWITCHES_FOR_VIEWS_ONLY=%%i
:: Call PG_DUMP...
When I run it, I am getting the following error:
'"C:\Program Files\PostgreSQL\9.4\bin\psql.exe"" -c "select' is not recognized as an internal
or external command, operable program or batch file.
Here is how I solved my issue:
#echo off
set PARAM_HOTE=localhost
set PARAM_PORT=5435
set PSQL="C:\Program Files\PostgreSQL\9.2\bin\psql.exe"
set SQL_LISTE_VUES=^
select string_agg( concat('--table=' , quote_ident(nspname) , '.' , quote_ident(relname)), ' ' )^
from (^
select *^
from pg_class^
join pg_namespace on pg_namespace.oid = pg_class.relnamespace^
where relkind = 'v'^
and nspname = 'rec'^
order by relname ASC^
) infos_vues^
;
for /f "usebackq delims=" %%i in (`%%PSQL%% --quiet --tuples-only --host %PARAM_HOTE% --port %PARAM_PORT% --username "rec" -c "%SQL_LISTE_VUES%" REC`) do set LISTE_VUES=%%i
echo %LISTE_VUES%
I rewrote my query by replacing || with the concat function
I used back ticks
I escaped % with %% in the for command

SED Match/Replace URL and Update Serialized Array Count

Below is an example snippet from a sql dump file. This specific row contains a meta_value of a Wordpress PHP serialized array. During database restores in dev., test., and qc. environments I'm using sed to replace URLs with the respective environment sub-domain.
INSERT INTO `wp_postmeta`
(`meta_id`,
`post_id`,
`meta_key`,
`meta_value`)
VALUES
(527,
1951,
'ut_parallax_image',
'a:4:{
s:17:\"background-image\";
s:33:\"http://example.com/background.jpg\";
s:23:\"mobile-background-image\";
s:37:\"www.example.com/mobile-background.jpg\";
}')
;
However, I need to extend this to correct the string length in the serialized arrays after replace.
sed -r -e "s/:\/\/(www\.)?${domain}/:\/\/\1${1}\.${domain}/g" "/vagrant/repositories/apache/$domain/_sql/$(basename "$file")" > "/vagrant/repositories/apache/$domain/_sql/$1.$(basename "$file")"
The result should look like this for dev.:
INSERT INTO `wp_postmeta`
(`meta_id`,
`post_id`,
`meta_key`,
`meta_value`)
VALUES
(527,
1951,
'ut_parallax_image',
'a:4:{
s:17:\"background-image\";
s:37:\"http://dev.example.com/background.jpg\";
s:23:\"mobile-background-image\";
s:41:\"www.dev.example.com/mobile-background.jpg\";
}')
;
I'd prefer to not introduce any dependancies other than sed.
Thanks #John1024. #Fabio and #Seth, I not sure for perfomance, but these code work and without wp-cli:
localdomain=mylittlewordpress.local
maindomain=strongwordpress.site.ru
cat dump.sql | sed 's/;s:/;\ns:/g' | awk -F'"' '/s:.+'$maindomain'/ {sub("'$maindomain'", "'$localdomain'"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1' | sed ':a;N;$!ba;s/;\ns:/;s:/g' | sed "s/$maindomain/$localdomain/g" | mysql -u$USER -p$PASS $DBNAME
PHP serialized string exploded by ';s:' to multiline string and awk processed all lines by #John1024 solution.
cat dump.sql | sed 's/;s:/;\ns:/g'
Redirect output to awk
awk -F'"' '/^s:.+'$maindomain'/ {sub("'$maindomain'", "'$localdomain'"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1'
After all lines processed, multiline implode to one line (as then exists in original dump.sql). Thanks #Zsolt https://stackoverflow.com/a/1252191
sed ':a;N;$!ba;s/;\ns:/;s:/g'
Addition sed replacement need for any other strings in wordpress database.
sed "s/$maindomain/$localdomain/g"
And load into main server DB
... | mysql -u$USER -p$PASS $DBNAME
Your algorithm involves arithmetic. That makes sed a poor choice. Consider awk instead.
Consider this input file:
$ cat inputfile
something...
s:33:\"http://example.com/background.jpg\";
s:37:\"www.example.com/mobile-background.jpg\";
s:33:\"http://www.example.com/background.jpg\";
more lines...
I believe that this does what you want:
$ awk -F'"' '/:\/\/(www[.])?example.com/ {sub("example.com", "dev.example.com"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1' inputfile
something...
s:37:\"http://dev.example.com/background.jpg\";
s:37:\"www.example.com/mobile-background.jpg\";
s:41:\"http://www.dev.example.com/background.jpg\";
more lines...
WP-CLI handles serialized PHP arrays during a search-replace http://wp-cli.org/commands/search-replace/. I wanted to try a native shell solution, but having WP-CLI was worth the extra overhead in the end.
Here is a sample text file you asked for (it's a database export).
Original (https://www.example.com) :
LOCK TABLES `wp_options` WRITE;
INSERT INTO `wp_options` VALUES (1,'siteurl','https://www.example.com','yes'),(18508,'optionsframework','a:48:{s:4:\"logo\";s:75:\"https://www.example.com/wp-content/uploads/2014/04/logo_imbrique_small3.png\";s:7:\"favicon\";s:62:\"https://www.example.com/wp-content/uploads/2017/04/favicon.ico\";}','yes')
/*!40000 ALTER TABLE `wp_options` ENABLE KEYS */;
UNLOCK TABLES;
Result needed (http://example.localhost) :
LOCK TABLES `wp_options` WRITE;
INSERT INTO `wp_options` VALUES (1,'siteurl','http://example.localhost','yes'),(18508,'optionsframework','a:48:{s:4:\"logo\";s:76:\"http://example.localhost/wp-content/uploads/2014/04/logo_imbrique_small3.png\";s:7:\"favicon\";s:64:\"https://example.localhost/wp-content/uploads/2017/04/favicon.ico\";}','yes');
/*!40000 ALTER TABLE `wp_options` ENABLE KEYS */;
UNLOCK TABLES;
As you can see :
there is multiple occurence on the same line
escape characters aren't counted in length number (eg: "/")
some occurence aren't preceded by "s:" length number (no need to replace, it can be done after awk with a simple sed)
Thanks in advance !
#Alexander Demidov's answer is great, here's our implementation for reference
public static function replaceInFile(string $replace, string $replacement, string $absoluteFilePath): void
{
ColorCode::colorCode("Attempting to replace ::\n($replace)\nwith replacement ::\n($replacement)\n in file ::\n(file://$absoluteFilePath)", iColorCode::BACKGROUND_MAGENTA);
$replaceDelimited = preg_quote($replace, '/');
$replacementDelimited = preg_quote($replacement, '/');
$replaceExecutable = CarbonPHP::CARBON_ROOT . 'extras/replaceInFileSerializeSafe.sh';
// #link https://stackoverflow.com/questions/29902647/sed-match-replace-url-and-update-serialized-array-count
$replaceBashCmd = "chmod +x $replaceExecutable && $replaceExecutable '$absoluteFilePath' '$replaceDelimited' '$replace' '$replacementDelimited' '$replacement'";
Background::executeAndCheckStatus($replaceBashCmd);
}
public static function executeAndCheckStatus(string $command, bool $exitOnFailure = true): int
{
$output = [];
$return_var = null;
ColorCode::colorCode('Running CMD >> ' . $command,
iColorCode::BACKGROUND_BLUE);
exec($command, $output, $return_var);
if ($return_var !== 0 && $return_var !== '0') {
ColorCode::colorCode("The command >> $command \n\t returned with a status code (" . $return_var . '). Expecting 0 for success.', iColorCode::RED);
$output = implode(PHP_EOL, $output);
ColorCode::colorCode("Command output::\t $output ", iColorCode::RED);
if ($exitOnFailure) {
exit($return_var);
}
}
return (int) $return_var;
}
#!/usr/bin/env bash
set -e
SQL_FILE="$1"
replaceDelimited="$2"
replace="$3"
replacementDelimited="$4"
replacement="$5"
if ! grep --quiet "$replace" "$SQL_FILE" ;
then
exit 0;
fi
cp "$SQL_FILE" "$SQL_FILE.old.sql"
# #link https://stackoverflow.com/questions/29902647/sed-match-replace-url-and-update-serialized-array-count
# #link https://serverfault.com/questions/1114188/php-serialize-awk-command-speed-up/1114191#1114191
sed 's/;s:/;\ns:/g' "$SQL_FILE" | \
awk -F'"' '/s:.+'$replaceDelimited'/ {sub("'$replace'", "'$replacement'"); n=length($2)-1; sub(/:[[:digit:]]+:/, ":" n ":")} 1' 2>/dev/null | \
sed -e ':a' -e 'N' -e '$!ba' -e 's/;\ns:/;s:/g' | \
sed "s/$replaceDelimited/$replacementDelimited/g" > "$SQL_FILE.replaced.sql"
cp "$SQL_FILE.replaced.sql" "$SQL_FILE"

Convert output to arrays then extract values to loop

I want that output result to grep second and third column:
1 db1 ADM_DAT 300 yes 95.09
2 db2 SYSAUX 400 yes 94.52
and convert them like array for example:
outputres=("db1 ADM_DAT" "db2 SYSAUX")
and after that to be able to read those values in loop for example:
for i in "${outputres[#]}"; do read -r a b <<< "$i"; unix_command $(cat file|grep $a|awk '{print $1}') $a $b;done
file:
10.1.1.1 db1
10.1.1.2 db2
Final expectation:
unix_command 10.1.1.1 db1 ADM_DAT
unix_command 10.1.1.2 db2 SYSAUX
This is only a theoretical example, I am not sure if it is working.
I would use a simple bash while read and keep adding elements into the array with the += syntax:
outputres=()
while read -r _ a b _; do
outputres+=("$a $b")
done < file
Doing so, with your input file, I got:
$ echo "${outputres[#]}" #print all elements
db1 ADM_DAT db2 SYSAUX
$ echo "${outputres[0]}" #print first one
db1 ADM_DAT
$ echo "${outputres[1]}" #print second one
db2 SYSAUX
Since you want to use both values separatedly, it may be better to use an associative array:
$ declare -A array=()
$ while read -r _ a b _; do array[$a]=$b; done < file
And then you can loop through the values with:
$ for key in ${!array[#]}; do echo "array[$key] = ${array[$key]}"; done
array[db2] = SYSAUX
array[db1] = ADM_DAT
See a basic example of utilization of these arrays:
#!/bin/bash
declare -A array=([key1]='value1' [key2]='value2')
for key in ${!array[#]}; do
echo "array[$key] = ${array[$key]}"
done
echo ${array[key1]}
echo ${array[key2]}
So maybe this can solve your problem: loop through the file with columns, fetch the 2nd and 3rd and use them twice: firstly the $a to perform a grep in file and then as parameters to cmd_command:
while read -r _ a b _
do
echo "cmd_command $(awk -v patt="$a" '$0~patt {print $1}' file) $a, $b"
done < columns_file
For a sample file file:
$ cat file
hello this is db1
and this is another db2
I got this output (note I am just echoing):
$ while read -r _ a b _; do echo "cmd_command $(awk -v patt="$a" '$0~patt {print $1}' file) $a, $b"; done < a
cmd_command hello db1, ADM_DAT
cmd_command and db2, SYSAUX

Bash - read Postgresql zero separated fields into array

I want to read the output of a psql query produced with --field-separator-zero into an array inside my bash script. The best I have tried was the following:
psql -w -t --quiet --no-align --field-separator-zero -c $'select nickname,first_name,last_name,email from users' | while IFS= read -d '' -a USERS; do
echo ${USERS[0]} ${USERS[1]} ${USERS[2]} ${USERS[3]};
done;
The above would return each field of a row as a new array. Changing the delimiter to anything else would make the process work, but the problem is the nickname field might contain any character, so I'm forced to use the safe NUL char as a delimiter. Is there any way to do this ?
I'm assuming here that nickname is a unique key; make the appropriate modifications if a different field should be used in that role.
The below code reads the data into a series of associative arrays, and emits each row in turn.
Note that associative arrays are a Bash 4 feature; if you're on Mac OS, which ships 3.2, use MacPorts or a similar tool to install a modern release.
declare -A first_names=( ) last_names=( ) emails=( )
while IFS= read -r -d '' nickname && \
IFS= read -r -d '' first_name && \
IFS= read -r -d '' last_name && \
IFS= read -r -d '' email; do
first_names[$nickname]=$first_name
last_names[$nickname]=$last_name
emails[$nickname]=$email
done < <(psql ...)
echo "Found users: "
for nickname in "${!emails[#]}"; do
printf 'nickname - %q\n' "$nickname"
printf 'email - %q\n' "${emails[$nickname]}"
printf 'first name - %q\n' "${first_names[$nickname]}"
printf 'last name - %q\n' "${last_names[$nickname]}"
echo
done
This technique is described in BashFAQ #1 -- search for -print0 to find its mention.

Find content of one file from another file in UNIX

I have 2 files. First file contains the list of row ID's of tuples of a table in the database.
And second file contains SQL queries with these row ID's in "where" clause of the query.
For example:
File 1
1610657303
1610658464
1610659169
1610668135
1610668350
1610670407
1610671066
File 2
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668350;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672154;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668135;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672153;
I have to read File 1 and search in File 2 for all the SQL commands which matches the row ID's from File 1 and dump those SQL queries in a third file.
File 1 has 1,00,000 entries and File 2 contains 10 times the entries of File 1 i.e. 1,00,0000.
I used grep -f File_1 File_2 > File_3. But this is extremely slow and the rate is 1000 entries per hour.
Is there any faster way to do this?
You don't need regexps, so grep -F -f file1 file2
One way with awk:
awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2
This should be pretty quick. On my machine, it took under 2 seconds to create a lookup of 1 million entries and compare it against 3 million lines.
Machine Specs:
Intel(R) Xeon(R) CPU E5-2670 0 # 2.60GHz (8 cores)
98 GB RAM
I suggest using a programming language such as Perl, Ruby or Python.
In Ruby, a solution reading both files (f1 and f2) just once could be:
idxes = File.readlines('f1').map(&:chomp)
File.foreach('f2') do | line |
next unless line =~ /where ri=(\d+);$/
puts line if idxes.include? $1
end
or with Perl
open $file, '<', 'f1';
while (<$file>) { chomp; $idxs{$_} = 1; }
close($file);
open $file, '<', 'f2';
while (<$file>) {
next unless $_ =~ /where ri=(\d+);$/;
print $_ if $idxs{$1};
}
close $file;
The awk/grep solutions mentioned above were slow or memory hungry on my machine (file1 10^6 rows, file2 10^7 rows). So I came up with an SQL solution using sqlite3.
Turn file2 into a CSV-formatted file where the first field is the value after ri=
cat file2.txt | gawk -F= '{ print $3","$0 }' | sed 's/;,/,/' > file2_with_ids.txt
Create two tables:
sqlite> CREATE TABLE file1(rowId char(10));
sqlite> CREATE TABLE file2(rowId char(10), statement varchar(200));
Import the row IDs from file1:
sqlite> .import file1.txt file1
Import the statements from file2, using the "prepared" version:
sqlite> .separator ,
sqlite> .import file2_with_ids.txt file2
Select all and ony the statements in table file2 with a matching rowId in table file1:
sqlite> SELECT statement FROM file2 WHERE file2.rowId IN (SELECT file1.rowId FROM file1);
File 3 can be easily created by redirecting output to a file before issuing the select statement:
sqlite> .output file3.txt
Test data:
sqlite> select count(*) from file1;
1000000
sqlite> select count(*) from file2;
10000000
sqlite> select * from file1 limit 4;
1610666927
1610661782
1610659837
1610664855
sqlite> select * from file2 limit 4;
1610665680|update TABLE_X set ATTRIBUTE_A=87 where ri=1610665680;
1610661907|update TABLE_X set ATTRIBUTE_A=87 where ri=1610661907;
1610659801|update TABLE_X set ATTRIBUTE_A=87 where ri=1610659801;
1610670610|update TABLE_X set ATTRIBUTE_A=87 where ri=1610670610;
Without creating any indices, the select statement took about 15 secs on an AMD A8 1.8HGz 64bit Ubuntu 12.04 machine.
Most of previous answers are correct but the only thing that worked for me was this command
grep -oi -f a.txt b.txt
Maybe try AWK and use number from file 1 as a key for example simple script
First script will produce awk script:
awk -f script1.awk
{
print "\$0 ~ ",$0,"{ print \$0 }" > script2.awk;
}
and then invoke script2.awk with file
I may be missing something, but wouldn't it be sufficient to just iterate the IDs in file1 and for each ID, grep file2 and store the matches in a third file? I.e.
for ID in `cat file1`; do grep $ID file2; done > file3
This is not terribly efficient (since file2 will be read over and over again), but it may be good enough for you. If you want more speed, I'd suggest to use a more powerful scripting language which lets you read file2 into a map which quickly allows identifying lines for a given ID.
Here's a Python version of this idea:
queryByID = {}
for line in file('file2'):
lastEquals = line.rfind('=')
semicolon = line.find(';', lastEquals)
id = line[lastEquals + 1:semicolon]
queryByID[id] = line.rstrip()
for line in file('file1'):
id = line.rstrip()
if id in queryByID:
print queryByID[id]
## reports any lines contained in < file 1> missing in < file 2>
IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do ((\!$(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS
or to do what the asker wants, take off the negation and redirect
(IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do (($(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS) >> < file 3>

Resources