I encountered the following confusing problem:
A DB table was updated through CSV file and some KSH script:
logFile=$moduleDir/install_scripts/`basename $0`_$(date '+%Y%m%d_%H%M').log
cp -ri $moduleDir/ntscripts/datafiles/* $PROVHOME/database/ntscripts/datafiles
echo "\n***********************"
echo "* New/Modified files: *"
echo "***********************"
find $moduleDir/ntscripts/datafiles -type f | xargs ls -l
) 2>&1 | tee $logFile
echo "\nInstallation of Database completed\n"
The update is equal (under equal I mean that the visual result is the same as the INSERT command below) to the following INSERT query:
VALUES (171, 'Postpaid', '11', '', '')
When I'm using the following SELECT command:
SELECT * FROM tp_nt_mapping ORDER BY tariff_plan DESC
I'm able to see the new inserted record, but when I try with any of the following SELECT queries, I'm not:
SELECT * FROM tp_nt_mapping WHERE network_template = 11 ORDER BY tariff_plan DESC
SELECT * FROM tp_nt_mapping WHERE network_template = '11' ORDER BY tariff_plan DESC
Any suggestions?
The value in network_template field isn't just '11' but '11' || chr(13).
So you have the carrige return char at the end.
You can fix the data by doing:
update tp_nt_mapping
set network_template = replace(network_template, chr(13), '')
But better check why it was added on the first place....
I want to generate dynamically a set of switches for PG_DUMP like below:
--table=mySchema.foo --table=mySchema.bar ...
However, I want to restrict those switches to views only. The views names don't follow a pattern. They all reside in a single schema called mySchema.
Here is the batch file script I wrote:
#echo off
set PARAM_HOTE=localhost
set PARAM_PORT=5435
set PSQL="C:\Program Files\PostgreSQL\9.4\bin\psql.exe"
select string_agg( '--table=' || quote_ident(nspname) || '.' || quote_ident(relname), ' ' )^
from (^
select *^
from pg_class^
join pg_namespace on pg_namespace.oid = pg_class.relnamespace^
where relkind = 'v'^
and nspname = 'mySchema'^
order by relname ASC^
) infos_vues^
for /f %%i in ('"%PSQL%" --quiet --tuples-only --host %PARAM_HOTE% --port %PARAM_PORT% --username "rec" -c "%SQL_QUERY%" db') do set PG_DUMP_SWITCHES_FOR_VIEWS_ONLY=%%i
:: Call PG_DUMP...
When I run it, I am getting the following error:
'"C:\Program Files\PostgreSQL\9.4\bin\psql.exe"" -c "select' is not recognized as an internal
or external command, operable program or batch file.
Here is how I solved my issue:
#echo off
set PARAM_HOTE=localhost
set PARAM_PORT=5435
set PSQL="C:\Program Files\PostgreSQL\9.2\bin\psql.exe"
select string_agg( concat('--table=' , quote_ident(nspname) , '.' , quote_ident(relname)), ' ' )^
from (^
select *^
from pg_class^
join pg_namespace on pg_namespace.oid = pg_class.relnamespace^
where relkind = 'v'^
and nspname = 'rec'^
order by relname ASC^
) infos_vues^
for /f "usebackq delims=" %%i in (`%%PSQL%% --quiet --tuples-only --host %PARAM_HOTE% --port %PARAM_PORT% --username "rec" -c "%SQL_LISTE_VUES%" REC`) do set LISTE_VUES=%%i
I rewrote my query by replacing || with the concat function
I used back ticks
I escaped % with %% in the for command
I have a bash script that I am using to modify a sql file (test.sql).
The sql file is as follows:
select count(*) from (
from MY_TABLE I with (nolock)
where I.START_DATE >= '20170101' and I.START_DATE < '20170201'
) as cnt
I want to replace all the occurrences of the string START_DATE but only in the WHERE clause of the sql file.
So far, I have tried this:
sed 'x; ${/START_DATE/s/START/END;p;x}; 1d' test.sql > new-test.sql
However, this returns the following new-test.sql file:
select count(*) from (
from MY_TABLE I with (nolock)
where I.END_DATE >= '20170101' and I.START_DATE < '20170201'
) as cnt
The second occurrence of the string START_DATE in the WHERE clause is not being replaced.
How should I modify my sed expression so that I can achieve this?
Try it like this:
sed -e '/where/ s/START_DATE/END_DATE/g' -i test.sql
operate only on lines that contain where (we "address" only the lines that match the regex pattern where)
replace each occurrence of START_DATE with END_DATE - notice the "global" flag g at the end
the -i flag tells sed to edit the file "in place" (no need to redirect output).
If you are using GNU sed then you can use the following command
sed '/WHERE/I s/START_DATE/END_DATE/g' test.sql > new-test.sql
|-------------------------- case insensitive match; GNU sed only
The capital I tells sed to perform a case insensitive match. This comes in handy when dealing with case insensitive SQL commands.
If you do not have GNU sed then case insensitive matching is a bit more complicated:
sed '/[wW][hH][eE][rR][eE]/ s/START_DATE/END_DATE/g' test.sql > new-test.sql
Simply adding the global g option at the end of the s command will make your solution work, too.
sed 'x; ${/START_DATE/s/START/END/g;p;x}; 1d'
However, it will break if the file has a trailing newline. I strongly recommend to search (case insensitively) for the WHERE clause and run the according substitution command.
I have a dataset of many files. Each file contains many reviews of the type separated by a blank line:
<Content>definitely above average! we had a really nice stay there last year when I and...USUALLY OVER MANY LINES
<Date>Jan 2, 2009
<img src="http://cdn.tripadvisor.com/img2/new.gif" alt="New"/>
<No. Reader>-1
<No. Helpful>-1
<Check in / front desk>4
<Business service>4
<Author>rickMN... next review goes on
For every review I need to extract the data after the tag and put it in something like this (which I plan write to a .sql file so when I do ".read" it will populate my database):
INSERT INTO [HotelReviews] ([Author], [Content], [Date], [Image], [No_Reader], [No_Helpful], [Overall], [Value], [Rooms], [Location], [Cleanliness], [Check_In], [Service], [Business_Service]) VALUES ('bigBob', 'definitely above...', ...)
My question is how can I extract the data after each tag and put it in an insert statement using bash?
Text after <Content> tag is usually a paragraph with a number of lines
This is the right approach for what you're trying to do:
$ cat tst.awk
NF {
if ( match($0,/^<img\s+src="([^"]+)/,a) ) {
else if ( match($0,/^<([^>"]+)>(.*)/,a) ) {
sub(/ \/.*|\./,"",name)
gsub(/ /,"_",name)
names[++numNames] = name
values[numNames] = value
{ prt() }
END { prt() }
function prt() {
printf "INSERT INTO [HotelReviews] ("
for (nameNr=1; nameNr<=numNames; nameNr++) {
printf " [%s]", names[nameNr]
printf ") VALUES ("
for (nameNr=1; nameNr<=numNames; nameNr++) {
printf " \047%s\047", values[nameNr]
print ""
numNames = 0
delete names
delete values
$ awk -f tst.awk file
INSERT INTO [HotelReviews] ( [Author] [Content] [Date] [Image] [No_Reader] [No_Helpful] [Overall] [Value] [Rooms] [Location] [Cleanliness] [Check_in] [Service] [Business_service]) VALUES ( 'bigBob' 'definitely above average! we had a really nice stay there last year when I and...USUALLY OVER MANY LINES' 'Jan 2, 2009' 'http://cdn.tripadvisor.com/img2/new.gif' '-1' '-1' '4' '4' '4' '4' '5' '4' '3' '4'
INSERT INTO [HotelReviews] ( [Author]) VALUES ( 'rickMN... next review goes on'
The above uses GNU awk for the 3rd arg to match(). Massage to get the precise formatting/output you want.
while IFS= read -r line; do
[[ $line =~ ^\<Author\>(.*) ]] && Author="${BASH_REMATCH[1]}"
[[ $line =~ ^\<Content\>(.*) ]] && Content="${BASH_REMATCH[1]}"
# capture lines not starting with < and append to variable Content
[[ $line =~ ^[^\<] ]] && Content+="$line"
# match an empty line
[[ $line =~ ^$ ]] && echo "${Author}, ${Content}"
done < file
Output with your file:
bigBob, definitely above average! we had a really nice stay there last year when I and ...
=~: match to a regex (string left, regex right without quotes)
^: match start of line
\< or \>: match < or >
.*: here match rest of line
(.*): capture rest of line to first element of array BASH_REMATCH
See: The Stack Overflow Regular Expressions FAQ
I have 2 files. First file contains the list of row ID's of tuples of a table in the database.
And second file contains SQL queries with these row ID's in "where" clause of the query.
For example:
File 1
File 2
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668350;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672154;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668135;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672153;
I have to read File 1 and search in File 2 for all the SQL commands which matches the row ID's from File 1 and dump those SQL queries in a third file.
File 1 has 1,00,000 entries and File 2 contains 10 times the entries of File 1 i.e. 1,00,0000.
I used grep -f File_1 File_2 > File_3. But this is extremely slow and the rate is 1000 entries per hour.
Is there any faster way to do this?
You don't need regexps, so grep -F -f file1 file2
One way with awk:
awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2
This should be pretty quick. On my machine, it took under 2 seconds to create a lookup of 1 million entries and compare it against 3 million lines.
Machine Specs:
Intel(R) Xeon(R) CPU E5-2670 0 # 2.60GHz (8 cores)
I suggest using a programming language such as Perl, Ruby or Python.
In Ruby, a solution reading both files (f1 and f2) just once could be:
idxes = File.readlines('f1').map(&:chomp)
File.foreach('f2') do | line |
next unless line =~ /where ri=(\d+);$/
puts line if idxes.include? $1
or with Perl
open $file, '<', 'f1';
while (<$file>) { chomp; $idxs{$_} = 1; }
open $file, '<', 'f2';
while (<$file>) {
next unless $_ =~ /where ri=(\d+);$/;
print $_ if $idxs{$1};
close $file;
The awk/grep solutions mentioned above were slow or memory hungry on my machine (file1 10^6 rows, file2 10^7 rows). So I came up with an SQL solution using sqlite3.
Turn file2 into a CSV-formatted file where the first field is the value after ri=
cat file2.txt | gawk -F= '{ print $3","$0 }' | sed 's/;,/,/' > file2_with_ids.txt
Create two tables:
sqlite> CREATE TABLE file1(rowId char(10));
sqlite> CREATE TABLE file2(rowId char(10), statement varchar(200));
Import the row IDs from file1:
sqlite> .import file1.txt file1
Import the statements from file2, using the "prepared" version:
sqlite> .separator ,
sqlite> .import file2_with_ids.txt file2
Select all and ony the statements in table file2 with a matching rowId in table file1:
sqlite> SELECT statement FROM file2 WHERE file2.rowId IN (SELECT file1.rowId FROM file1);
File 3 can be easily created by redirecting output to a file before issuing the select statement:
sqlite> .output file3.txt
Test data:
sqlite> select count(*) from file1;
sqlite> select count(*) from file2;
sqlite> select * from file1 limit 4;
sqlite> select * from file2 limit 4;
1610665680|update TABLE_X set ATTRIBUTE_A=87 where ri=1610665680;
1610661907|update TABLE_X set ATTRIBUTE_A=87 where ri=1610661907;
1610659801|update TABLE_X set ATTRIBUTE_A=87 where ri=1610659801;
1610670610|update TABLE_X set ATTRIBUTE_A=87 where ri=1610670610;
Without creating any indices, the select statement took about 15 secs on an AMD A8 1.8HGz 64bit Ubuntu 12.04 machine.
Most of previous answers are correct but the only thing that worked for me was this command
grep -oi -f a.txt b.txt
Maybe try AWK and use number from file 1 as a key for example simple script
First script will produce awk script:
awk -f script1.awk
print "\$0 ~ ",$0,"{ print \$0 }" > script2.awk;
and then invoke script2.awk with file
I may be missing something, but wouldn't it be sufficient to just iterate the IDs in file1 and for each ID, grep file2 and store the matches in a third file? I.e.
for ID in `cat file1`; do grep $ID file2; done > file3
This is not terribly efficient (since file2 will be read over and over again), but it may be good enough for you. If you want more speed, I'd suggest to use a more powerful scripting language which lets you read file2 into a map which quickly allows identifying lines for a given ID.
Here's a Python version of this idea:
queryByID = {}
for line in file('file2'):
lastEquals = line.rfind('=')
semicolon = line.find(';', lastEquals)
id = line[lastEquals + 1:semicolon]
queryByID[id] = line.rstrip()
for line in file('file1'):
id = line.rstrip()
if id in queryByID:
print queryByID[id]
## reports any lines contained in < file 1> missing in < file 2>
IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do ((\!$(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS
or to do what the asker wants, take off the negation and redirect
(IFS=$(echo -en "\n\b") && for a in $(cat < file 1>);
do (($(grep -F -c -- "$a" < file 2>))) && echo $a;
done && unset IFS) >> < file 3>
Currently, i have a number of files stored in postgres 8.4 as bytea. The file types are .doc, .odt, .pdf, .txt and etc.
May i know how to download all the file stored in Postgres because i need to to do a backup.
I need them in their original file type instead of bytea format.
One simple option is to use COPY command with encode to hex format and then apply xxd shell command (with -p continuous hexdump style switch). For example let's say I have jpg image in bytea column in samples table:
\copy (SELECT encode(file, 'hex') FROM samples LIMIT 1) TO
$ xxd -p -r image.hex > image.jpg
As I checked it works in practice.
Try this:
COPY (SELECT yourbyteacolumn FROM yourtable WHERE <add your clauses here> ...) TO 'youroutputfile' (FORMAT binary)
Here's the simplest thing I could come up with:
psql -qAt "select encode(file,'base64') from files limit 1" | base64 -d
The -qAt is important as it strips off any formatting of the output. These options are available inside the psql shell, too.
psql -Aqt -c "SELECT encode(content, 'base64') FROM ..." | base64 -d > file
psql -Aqt -c "SELECT encode(content, 'hex') FROM ..." | xxd -p -r > file
If you have a lot of data to download then you can get the lines first and then iterate through each one writing the bytea field to file.
$resource = pg_connect('host=localhost port=5432 dbname=website user=super password=************');
// grab all the user IDs
$userResponse = pg_query('select distinct(r.id) from resource r
join connection c on r.id = c.resource_id_from
join resource rfile on c.resource_id_to = rfile.id and rfile.resource_type_id = 10
join file f on rfile.id = f.resource_id
join file_type ft on f.file_type_id = ft.id
where r.resource_type_id = 38');
// need to work through one by one to handle data
while($user = pg_fetch_array($userResponse)){
$user_id = $user['id'];
$query = 'select r.id, f.data, rfile.resource_type_id, ft.extension from resource r
join connection c on r.id = c.resource_id_from
join resource rfile on c.resource_id_to = rfile.id and rfile.resource_type_id = 10
join file f on rfile.id = f.resource_id
join file_type ft on f.file_type_id = ft.id
where r.resource_type_id = 38 and r.id = ' . $user_id;
$fileResponse = pg_query($query);
$fileData = pg_fetch_array($fileResponse);
$data = pg_unescape_bytea($fileData['data']);
$extension = $fileData['extension'];
$fileId = $fileData['id'];
$filename = $fileId . '.' . $extension;
$fileHandle = fopen($filename, 'w');
fwrite($fileHandle, $data);
DO $$
l_lob_id OID;
r record; BEGIN
for r in
select data, filename from bytea_table
PERFORM lo_export(l_lob_id,'/home/...'||r.filename);
PERFORM lo_unlink(l_lob_id);
END; $$
Best I'm aware, bytea to file needs to be done at the app level.
(9.1 might change this with the filesystem data wrapper contrib. There's also a lo_export function, but it is not applicable here.)
If you want to do this from a local windows, and not from the server, you will have to run every statement individually, and have PGAdmin and certutil:
Have PGAdmin installed.
Open cmd from the runtime folder or cd "C:\Program Files\pgAdmin 4\v6\runtime"
Run in PGAdmin query to get every statement that you will have to paste in cmd:
SELECT 'set PGPASSWORD={PASSWORD} && psql -h {host} -U {user} -d {db name} -Aqt -c "SELECT encode({bytea_column}, ''base64'') FROM {table} WHERE id='||id||'" > %a% && CERTUTIL -decode %a% "C:\temp{name_of_the_folder}\FileName - '||{file_name}||' ('||TO_CHAR(current_timestamp(),'DD.MM.YYYY,HH24 MI SS')||').'||{file_extension}||'"'
FROM table WHERE ....;
Replace {...}
It will generate something like:
set PGPASSWORD=123 psql -h -U postgres -d my_test_db -Aqt -c "SELECT encode(file_bytea, 'base64') FROM test_table_bytea WHERE id=33" > %a% && CERTUTIL -decode %a% "C:\temp\DB_FILE\FileName - test1 - (06.04.2022,15 42 26).docx"
set PGPASSWORD=123 psql -h -U postgres -d my_test_db -Aqt -c "SELECT encode(file_bytea, 'base64') FROM test_table_bytea WHERE id=44" > %a% && CERTUTIL -decode %a% "C:\temp\DB_FILE\FileName - test2 - (06.04.2022,15 42 26).pdf"
Copy paste all the generated statements in CMD. The files will be saved to your local machine.