Insert SQL statements via command line without reopening connection to remote database - database

I have a large amount of data files to process and to be stored in the remote database. Each line of a data file represents a row in the database, but must be formatted before inserting into the database.
My first solution was to process data files by writing bash scripts and produce SQL data files, and then import the dump SQL files into the database. This solution seems to be too slow and as you can see involves an extra step of creating intermediary SQL file.
My second solution was to write bash scripts that while processing each line of the data file, creates and INSERT INTO ... statement and sends the SQL statement to the remote database:
echo sql_statement | psql -h remote_server -U username -d database
i.e. does not create SQL file. This solution, however, has one major issue that I am searching an advice on:
Each time I have to reconnect to the remote database to insert one single row.
Is there a way to connect to the remote database, stay connected and then "pipe" or "send" the insert-SQL-statement without creating a huge SQL file?

Answer to your actual question
Yes. You can use a named pipe instead of creating a file. Consider the following demo.
Create a schema x in my database event for testing:
-- DROP SCHEMA x CASCADE;
CREATE SCHEMA x;
CREATE TABLE x.x (id int, a text);
Create a named pipe (fifo) from the shell like this:
postgres#db:~$ mkfifo --mode=0666 /tmp/myPipe
Either 1) call the SQL command COPY using a named pipe on the server:
postgres#db:~$ psql event -p5433 -c "COPY x.x FROM '/tmp/myPipe'"
This will acquire an exclusive lock on the table x.x in the database. The connection stays open until the fifo gets data. Be careful not to leave this open for too long! You can call this after you have filled the pipe to minimize blocking time. You can chose the sequence of events. The command executes as soon as two processes bind to the pipe. The first waits for the second.
Or 2) you can execute SQL from the pipe on the client:
postgres#db:~$ psql event -p5433 -f /tmp/myPipe
This is better suited for your case. Also, no table locks until SQL is executed in one piece.
Bash will appear blocked. It is waiting for input to the pipe. To do it all from one bash instance, you can send the waiting process to the background instead. Like this:
postgres#db:~$ psql event -p5433 -f /tmp/myPipe 2>&1 &
Either way, from the same bash or a different instance, you can fill the pipe now.
Demo with three rows for variant 1):
postgres#db:~$ echo '1 foo' >> /tmp/myPipe; echo '2 bar' >> /tmp/myPipe; echo '3 baz' >> /tmp/myPipe;
(Take care to use tabs as delimiters or instruct COPY to accept a different delimiter using WITH DELIMITER 'delimiter_character')
That will trigger the pending psql with the COPY command to execute and return:
COPY 3
Demo for for variant 2):
postgres#db:~$ (echo -n "INSERT INTO x.x VALUES (1,'foo')" >> /tmp/myPipe; echo -n ",(2,'bar')" >> /tmp/myPipe; echo ",(3,'baz')" >> /tmp/myPipe;)
INSERT 0 3
Delete the named pipe after you are done:
postgres#db:~$ rm /tmp/myPipe
Check success:
event=# select * from x.x;
id | a
----+-------------------
1 | foo
2 | bar
3 | baz
Useful links for the code above
Reading compressed files with postgres using named pipes
Introduction to Named Pipes
Best practice to run bash script in background
Advice you may or may not not need
For bulk INSERT you have better solutions than a separate INSERT per row. Use this syntax variant:
INSERT INTO mytable (col1, col2, col3) VALUES
(1, 'foo', 'bar')
,(2, 'goo', 'gar')
,(3, 'hoo', 'har')
...
;
Write your statements to a file and do one mass INSERT like this:
psql -h remote_server -U username -d database -p 5432 -f my_insert_file.sql
(5432 or whatever port the db-cluster is listening on)
my_insert_file.sql can hold multiple SQL statements. In fact, it's common practise to restore / deploy whole databases like that. Consult the manual about the -f parameter, or in bash: man psql.
Or, if you can transfer the (compressed) file to the server, you can use COPY to insert the (decompressed) data even faster.
You can also do some or all of the processing inside PostgreSQL. For that you can COPY TO (or INSERT INTO) a temporary table and use plain SQL statements to prepare and finally INSERT / UPDATE your tables. I do that a lot. Be aware that temporary tables live and die with the session.
You could use a GUI like pgAdmin for comfortable handling. A session in an SQL Editor window remains open until you close the window. (Therefore, temporary tables live until you close the window.)

I know I'm late to the party, but why couldn't you combine all your INSERT statements into a single string, with a semicolon marking the end of each statement? (Warning! Pseudocode ahead...)
Instead of:
for each line
sql_statement="INSERT whatever YOU want"
echo $sql_statement | psql ...
done
Use:
sql_statements=""
for each line
sql_statement="INSERT whatever YOU want;"
sql_statements="$sql_statements $sql_statement"
done
echo $sql_statements | psql ...
That way you don't have to create anything on your filesystem, do a bunch of redirection, run any tasks in the background, remember to delete anything on your filesystem afterwards, or even remind yourself what a named pipe is.

Related

Complete novice question: I want to delete a table via command prompt (postgis)

I am a regular FME user and I know how to create a postgis database to be used in FME (using PGadmin). I now want to use FME to backup and then delete a generated table. I use the 'systemcaller' transformer for this, the systemcaller basically opens command prompt.
I was able to make a backup file using cd C:\Program Files\PostgreSQL\118\bin\ && set PGPASSWORD=password&& pg_dump.exe -U postgres -p 5433 -d bag -t tablename -F c -f C:/Users/username/Downloads/tablename.backup", but I am having a hard time getting the table to be deleted.
I tried the 'drop table' command, but this is not recognized. It will recognize drobdb, but I obviously do not want to drop the whole database.
What commandline should I use to drop te table 'testtable' in the database 'testdatabase'?

SQL Server OPENROWSET error reading bcp file

I'm trying to transfer table data from one SQL Server to another and wanting to use the bcp utility for it. This is purely to transfer data between two identical schemas, but I'm not able to use something like SSDT; I need something that can be scriptable and portable so it can be run by others with just SQL server and SSMS access.
I am generating a native output file and format file like so:
$> bcp database.TableName OUT c:\data\bcp\TableName.bcp -T -N -S SQLINSTANCE
$> bcp database.TableName format nul -f c:\data\bcp\TableName.fmt -T -N
Then in Management Studio I am trying to in turn read the files like this:
SELECT
*
FROM
OPENROWSET (BULK 'c:\data\bcp\TableName.bcp',
FORMATFILE = 'c:\data\bcp\TableName.fmt') AS t1
But am getting this error:
The bulk load failed. The column is too long in the data file for row 6, column 19. Verify that the field terminator and row terminator are specified correctly.
I have followed this process before successfully, and it works for other tables. But I'm running into issue with this table. The column mentioned is of datatype nvarchar(max). I can inspect what I think is the "problem" record in the source data and it's just a very long string but I don't see anything else special about it.
Is there something else I should be doing when generating the format file or what else am I missing?
If you are only exporting for the purpose of importing to another SQL Server, native format is the way to go. And is this case you don't need to use format files. Just do a native export and import.
Note you are specifying a capital -N and that's not native. Native is lower -n.
You should export using something like:
bcp database.Schema.TableName OUT c:\data\bcp\TableName.bcp -T -n -S SQLINSTANCE
Then on the importing side I sugest using BULK IMPORT, which don't need a format file for native at all:
BULK INSERT TargetDB.dbo.TargetTable
FROM 'c:\data\bcp\TableName.bcp'
WITH (DATAFILETYPE = 'native');
If you can't use BULK INSERT and must absolutely go for OPENROWSET, you need a format file. bcp can generate that for you, but again, lower case -n:
bcp database.Schema.TableName format nul -f c:\data\bcp\TableName.fmt -T -n -S SQLINSTANCE
Now your OPENROWSET should work.

SQL - Automatic results to CSV or Text File

I was wondering if anyone can help.
I have a number of queries in SQL (all in separate *.sql files). I wanted to know if there is a way to run these queries automatically or mass run them to be saved to either a csv or txt file?
Also, I have come variables within these queries which will need to be amended on a weekly bases before the queries are run.
Thanks.
KJ
Could you please provide some additional help in relation to the variables? Previously I would declare and set variables as:
DECLARE #TW_FROM DATETIME
DECLARE #TW_TO DATETIME
SET #TW_FROM = '2015-11-16 00:00:00';
SET #TW_TO = '2015-11-22 23:00:00';
How do I do this using sqlcmd?
Yes, you can use sqlcmd to do this.
First of all - variables. You can refer to your variables in the .sql files using $(variablename) wherever you want to substitue the variable. For example,
use $(dbname);
select $(columnname) from table1 where column= '$(var1)'
You then call sqlcmd with the following command (note the argument -v variables)
sqlcmd -S servername -d database -i "yoursqlfile.sql" -v dbname="database" columnname="column" var1="Fred"
In order to output this to a file, you tag > filename.txt on the end
sqlcmd -S servername -d database -i "yoursqlfile.sql" -v dbname="database" columnname="column" var1="Fred" > filename.txt
If you want to output to a csv, you can also specify the delimiter using the argument -s (note the idfference with the capital S for server). So now we have
sqlcmd -S servername -d database -s "," -i "yoursqlfile.sql" -v dbname="database" columnname="column" var1="Fred" > filename.csv
If you want to output several commands to the same csv or txt file, use >> instead of > as it add to teh bottom of the file, rather than replacing it.
sqlcmd -S servername -d database -s "," -i "yoursqlfile.sql" -v dbname="database" columnname="column" var1="Fred" >> filename.csv
To run this for several scripts, you can put the statements in a batch file, and then change the variables every week.
You could write a batch file that uses sqlcmd:
MSDN sqlcmd
That will allow you to call script files in a loop and output the results to a file.
Convert your current scrips to a Stored Procedure.
You can then pass your variables to that and run the query.
If you have SQL Server agent available (SQL standard or better) you can use this to automate the running of the stored procedures.
Otherwise the same can be achieved with Task Scheduler in windows.
As for exporting to CSV this will be useful.
It depends on where your SQL Server is acutally running. It might be quite tricky to write anything to the location you want.
You could read about BCP.
My suggestion is:
Create an UDF (best is inline-UDF!) from all of your queries within your database. Than call them from EXCEL or any other fitting product. You might want to set up an Excel where all your queries are filled one on each Sheet automatically

perl script without using DBI

I have to make a perl script populate a database in PostgreSQL without using DBI or any sort of database interface model. I am a beginner to scripting so naturally, I'v been stuck on this for quite a while. I only have this much so far.
open my $pipe, '|-', "psql -d postgres -U postgres", #options or die;
# NOT SURE WHAT TO DO AFTER THIS
close $pipe;
edit 1: Now i'm trying to do this.
for ($count = $iters; $count >= 1; $count--) {
$randdecimal = rand();
$pipe "INSERT INTO random_table (runid, random_number) VALUES ($runid, $randdecimal)";
}
but it gives me a syntax error
Like the others say, DBI is much better than printing to a pipe.
However, there is a halfway house. Just print all your SQL to STDOUT and then do something like:
myscript.pl | psql -v ON_ERROR_STOP=1 --single-transaction -f -
This lets you easily check your script output / send it to a file. The psql options stop on the first error, wrap everything in a transaction and read from STDIN. You might want the usual -h/-U options too.
Personally, I tend to have two terminals open and just write to a .sql file then \i from a psql prompt. I like having a record of what command I ran.

disable NOTICES in psql output

How do I stop psql (PostgreSQL client) from outputting notices? e.g.
psql:schema/auth.sql:20: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "users_pkey" for table "users"
In my opinion a program should be silent unless it has an error, or some other reason to output stuff.
SET client_min_messages TO WARNING;
That could be set only for the session or made persistent with ALTER ROLE or ALTER DATABASE.
Or you could put that in your ".psqlrc".
Probably the most comprehensive explanation is on Peter Eisentrauts blog entry here (Archive)
I would strongly encourage that the original blog be studied and digested but the final recommendation is something like :
PGOPTIONS='--client-min-messages=warning' psql -X -q -a -1 -v ON_ERROR_STOP=1 --pset pager=off -d mydb -f script.sql
Use --quiet when you start psql.
A notice is not useless, but that's my point of view.
It can be set in the global postgresql.conf file as well with modifiying the client_min_messages parameter.
Example:
client_min_messages = warning
I tried the various solutions suggested (and permutations thereof) suggested in this thread, but I was unable to completely suppress PSQL output / notifications.
I am executing a claws2postgres.sh BASH script that does some preliminary processing then calls/executes a PSQL .sql script, to insert 1000's of entries into PostgreSQL.
...
PGOPTIONS="-c client_min_messages=error"
psql -d claws_db -f claws2postgres.sql
Output
[victoria#victoria bash]$ ./claws2postgres.sh
pg_terminate_backend
----------------------
DROP DATABASE
CREATE DATABASE
You are now connected to database "claws_db" as user "victoria".
CREATE TABLE
SELECT 1
INSERT 0 1
UPDATE 1
UPDATE 1
UPDATE 1
Dropping tmp_table
DROP TABLE
You are now connected to database "claws_db" as user "victoria".
psql:/mnt/Vancouver/projects/ie/claws/src/sql/claws2postgres.sql:33: NOTICE: 42P07: relation "claws_table" already exists, skipping
LOCATION: transformCreateStmt, parse_utilcmd.c:206
CREATE TABLE
SELECT 1
INSERT 0 1
UPDATE 2
UPDATE 2
UPDATE 2
Dropping tmp_table
DROP TABLE
[ ... snip ... ]
SOLUTION
Note this modified PSQL line, where I redirect the psql output:
psql -d claws_db -f $SRC_DIR/sql/claws2postgres.sql &>> /tmp/pg_output.txt
The &>> /tmp/pg_output.txt redirect appends all output to an output file, that can also serve as a log file.
BASH terminal output
[victoria#victoria bash]$ time ./claws2postgres.sh
pg_terminate_backend
----------------------
DROP DATABASE
CREATE DATABASE
2:40:54 ## 2 h 41 min
[victoria#victoria bash]$
Monitor progress:
In another terminal, execute
PID=$(pgrep -l -f claws2postgres.sh | grep claws | awk '{ print $1 }'); while kill -0 $PID >/dev/null 2>&1; do NOW=$(date); progress=$(cat /tmp/pg_output.txt | wc -l); printf "\t%s: %i lines\n" "$NOW" $progress; sleep 60; done; for i in seq{1..5}; do aplay 2>/dev/null /mnt/Vancouver/programming/scripts/phaser.wav && sleep 0.5; done
...
Sun 28 Apr 2019 08:18:43 PM PDT: 99263 lines
Sun 28 Apr 2019 08:19:43 PM PDT: 99391 lines
Sun 28 Apr 2019 08:20:43 PM PDT: 99537 lines
[victoria#victoria output]$
pgrep -l -f claws2postgres.sh | grep claws | awk '{ print $1 }' gets the script PID, assigned to $PID
while kill -0 $PID >/dev/null 2>&1; do ... : while that script is running, do ...
cat /tmp/pg_output.txt | wc -l : use the output file line count as a progress indicator
when done, notify by playing phaser.wav 5 times
phaser.wav: https://persagen.com/files/misc/phaser.wav
Output file:
[victoria#victoria ~]$ head -n22 /tmp/pg_output.txt
You are now connected to database "claws_db" as user "victoria".
CREATE TABLE
SELECT 1
INSERT 0 1
UPDATE 1
UPDATE 1
UPDATE 1
Dropping tmp_table
DROP TABLE
You are now connected to database "claws_db" as user "victoria".
psql:/mnt/Vancouver/projects/ie/claws/src/sql/claws2postgres.sql:33: NOTICE: 42P07: relation "claws_table" already exists, skipping
LOCATION: transformCreateStmt, parse_utilcmd.c:206
CREATE TABLE
SELECT 1
INSERT 0 1
UPDATE 2
UPDATE 2
UPDATE 2
Dropping tmp_table
DROP TABLE
References
[re: solution, above] PSQL: How can I prevent any output on the command line?
[re: this SO thread] disable NOTICES in psql output
[related SO thread] Postgresql - is there a way to disable the display of INSERT statements when reading in from a file?
[relevant to solution] https://askubuntu.com/questions/350208/what-does-2-dev-null-mean
The > operator redirects the output usually to a file but it can be to a device. You can also use >> to append.
If you don't specify a number then the standard output stream is assumed but you can also redirect errors
> file redirects stdout to file
1> file redirects stdout to file
2> file redirects stderr to file
&> file redirects stdout and stderr to file
/dev/null is the null device it takes any input you want and throws it away. It can be used to suppress any output.
Offering a suggestion that is useful for a specific scenario I had:
Windows command shell calls psql.exe call to execute one essential SQL command
Only want to see warnings or errors, and suppress NOTICES
Example:
psql.exe -c "SET client_min_messages TO WARNING; DROP TABLE IF EXISTS mytab CASCADE"
(I was unable to make things work with PGOPTIONS as a Windows environment variable--couldn't work out the right syntax. Tried multiple approaches from different posts.)

Resources