How to remove comments and settings from pg_dump output? - database

I am trying to dump only data from a PostgreSQL database using pg_dump and then to restore those data into another one. But generating sql script with this tool also add some comments and settings into the output file.
Running this command :
pg_dump --column-inserts --data-only my_db > my_dump.sql
I get something like :
--
-- PostgreSQL database dump
--
-- Dumped from database version 8.4.22
-- Dumped by pg_dump version 10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET row_security = off;
--
-- Data for Name: sf_guard_user; Type: TABLE DATA; Schema: public; Owner: admin
--
INSERT INTO public.....
Is there any way to avoid pg_dump generating those comments and settings ?
I could do a small script to remove every lines before the first insert but it also generates comments everywhere on the file and I am sure there is a cleaner way to proceed but found nothing.

I don't think there is. I'd simply pipe through grep to filter out lines that start with the comment delimiter:
pg_dump --column-inserts --data-only my_db | grep -v "^--" > my_dump.sql

Related

Trying to Export Tables to CSVs from SQL Server

I ran the following script to try to get all tables in my DB exported (trying to backup the data in CSVs).
SELECT 'sqlcmd -S . -d '+DB_NAME()+' -E -s, -W -Q "SET NOCOUNT ON; SELECT * FROM '+table_schema+'.'+TABLE_name+'" > "C:\Temp\'+Table_Name+'.csv"'
FROM [INFORMATION_SCHEMA].[TABLES]
I saved the results as a batch file and ran the batch file as Administrator.
That runs without an error, but I get no data exported. All it does is create blank CSV files.
I ran this as well: 'EXEC sp_configure 'remote access',1 reconfigure'.
Still, nothing is exported. CSVs are created, but no data is exported...
Any thoughts?
I ended up using R to do the task...
library("RODBC")
conn <- odbcDriverConnect('driver={SQL Server};server=Server_Name;DB_Name;trusted_connection=true')
data <- sqlQuery(conn, "SELECT * FROM DB.dbo.TBL#1")
write.csv(data,file=paste("C:/Users/TBL#1.csv",sep=""),row.names=FALSE)
data <- sqlQuery(conn, "SELECT * FROM DB.dbo.TBL#2")
write.csv(data,file=paste("C:/Users/TBL#2.csv",sep=""),row.names=FALSE)
Gotta love the IT teams in corporate America...especially when they lock down your system so tight, you need to come up with all kinds of weird hacks just so you can do the job that you were hired to do...
Is there a word for negative synergy?

What is the best to remove xp_cmdshell calls from t-sql code

I'm maintaining large t-sql based application.
It has a lot of usages of bcp called through xp_cmdshell.
It is problematic, because xp_cmdshell has the same security context as SQL Server service account and it's more than necessary to the work.
My first idea to get rid of this disadvantage is to use CLR code. CLR is running with permissions of user that called the code.
I created following procedure and it works fine. I can see that it's using permissions of account that is running this code:
public static void RunBCP(SqlString arguments, out SqlString output_msg, out SqlString error_msg, out SqlInt32 return_val) {
output_msg = string.Empty;
error_msg = string.Empty;
try {
var proc = new Process {
StartInfo = new ProcessStartInfo {
FileName = "bcp",
Arguments = arguments.ToString(),
UseShellExecute = false,
RedirectStandardOutput = true,
CreateNoWindow = true
}
};
proc.Start();
while (!proc.StandardOutput.EndOfStream) {
output_msg += proc.StandardOutput.ReadLine();
}
return_val = proc.ExitCode;
}
catch (Exception e) {
error_msg = e.Message;
return_val = 1;
}
}
This is good solution because I'm not messing up in BCP calls(arguments are the same). There are no major changes in logic so there is no risk of an error.
Therefore previous call of BCP in T-SQL was looking this way:
declare #ReturnCode int;
declare #cmd varchar(1000);
SELECT #CMD = 'bcp "select FirstName, LastName, DateOfBirth" queryout "c:\temp\OutputFile.csv" -c -t -T -S"(local)"'
EXEC #ReturnCode=xp_cmdshell #CMD,no_output
Now I call it this way:
declare #ReturnCode int;
declare #cmd varchar(1000);
SELECT #CMD = '"select FirstName, LastName, DateOfBirth" queryout "c:\temp\OutputFile.csv" -c -t -T -S"(local)"'
exec DataBase.dbo.up_RunBCP #arguments = #cmd;
So, the question is: is there any other way to get rid of xp_cmdshell bcp code?
I heard that I can use PowerShell(sqlps). But examples I found suggest to create a powershell script.
Can I call such script from t-sql code?
How this code(powershell script) should be stored? As a database object?
Or maybe there is some other way? Not necessary SSIS. Most what I'd like to know is about powershell.
Thanks for any advices.
Your options for data EXPORT are the following:
using xp_cmdshell to call bcp.exe - your old way of bulk copying
using CLR - your new way of bulk copying
SSIS - my preferred way of doing this; here is the example
INSERT INTO OPENROWSET - the interesting alternative you can use if you are either working on 32-bit environment with text/Jet/whatever drivers installed, or you can install 64-bit drivers (e.g. 64-bit ODBC text driver, see Microsoft Access Database Engine 2010 Redistributable)
SQL Server Import/Export wizard - ugly manual way that seldom works in the way you want it to work
using external CSV table - not supported yet (SQL Server 2016 promises it will be...)
HTH
I would use simple Powershell script that does this, something like:
Invoke-SqlCommand -query '...' | ExportTo-Csv ...
Generally, for administrative functions you could add this to Task Scheduler and be done with it. If you need to execute this task as needed, you can do it via xp_cmdshell using schtasks.exe run Task_NAME which might be better for you since it might be easier to express yourself in Powershell then in T-SQL in given context.
Other mentioned thing all require extra tools (SSIS requires VS for example), this is portable with no dependencies.
To call script without xp_cmdshell you should create a job with powershell step and run it from within t-sql.

Use sqlcmd and setvar dynamically show hide columns

I toggle the display of PHI when generating data extracts from a vendor's EHR system.
To date, I've been manually enabling and disabling these fields in my script files:
-- PHI enabled
SELECT MRN
--,HASHBYTES('SHA2_256',MRN) MRN_HASH
...
GO
-- PHI disabled
SELECT -- MRN
,HASHBYTES('SHA2_256',MRN) MRN_HASH
...
GO
Is there a way to do this dynamically?
--
-- disable this variable when running `SQLCMD` from command line
-- PS> sqlcmd -E -S server -d database -i .\script.sql -v hide_phi=1
--
:setvar hide_phi 0
:out c:\users\x\desktop\patients.csv
SELECT
<if $(hide_phi)=0 then hide MRN>
<if $(hide_phi)=1 then hide MRN_HASH>
...
GO
SQLCMD accepts variables. You can simply pass in the variable to your .SQL file, and inside your file, do a conditional check on the variable value. You could use a CASE statement to check value of the variable, and return the appropriate value.
Pseudo sample query:
select "MRN" = case
when '$(hide_phi)' = '1' then HASHBYTES('SHA2_256', MRN)
else MRN
END
...
GO
or possibly this:
select "MRN" = case '$(hide_phi)'
when '1' then HASHBYTES('SHA2_256', MRN)
else MRN
END
More info: https://msdn.microsoft.com/en-us/library/ms188714.aspx

Error while updating Database with mssql_query

I'm using mssql_query to connect to an existing SQL Server 2008 Database.
SELECT querys are ok, but when I run UPDATE querys like the following:
mssql_query("UPDATE TABLENAME SET fieldname = 1 WHERE Pk = '".$pk."'");
I get this error:
UPDATE failed because the following SET options have incorrect
settings: 'ANSI_NULLS, QUOTED_IDENTIFIER, CONCAT_NULL_YIELDS_NULL,
ANSI_WARNINGS, ANSI_PADDING'. Verify that SET options are correct for
use with indexed views and/or indexes on computed columns and/or
filtered indexes and/or query notifications and/or XML data type
methods and/or spatial index operations. (severity 16)
Here is my connection code to Database:
$server = 'SRVSQL';
// Connect to MSSQL
$link = mssql_connect($server, 'xx', 'xxxxxx');
if (!$link) {
die('Something went wrong while connecting to MSSQL');
}
$conn = mssql_select_db('xxxxxxx',$link);
You might have to explicitly change the settings by turning the settings on. You can do so by issuing the following query prior to the UPDATE statement:
SET
ANSI_NULLS,
QUOTED_IDENTIFIER,
CONCAT_NULL_YIELDS_NULL,
ANSI_WARNINGS,
ANSI_PADDING
ON;
Should there be additional settings yielding errors, those might have to be changed as well.
See also: ANSWER: UPDATE failed because the following SET options have incorrect settings: 'ANSI_NULLS, QUOTED_IDENTIFIER'

Percona's pt-table-sync: how to run on more than one table?

In the command line, this will successfully update table1:
pt-table-sync --execute h=host1,D=db1,t=table1 h=host2,D=db2
However if I want to update more than one table, I'm not sure how to write it. This only updates table1 as well and ignores the other tables:
pt-table-sync --execute h=host1,D=db1,t=table1,table2,table3 h=host2,D=db2
And this gives me an error:
pt-table-sync --execute h=host1,D=db1 --tables table1,table2,table3 h=host2,D=db2
Anyone have an example of how to list the '-tables'... so that it successfully update all the tables in the list?
The --tables option seems to be incompatible with the DSN notation, you get this error:
You specified a database but not a table in h=localhost,D=test.
Are you trying to sync only tables in the 'test' database?
If so, use '--databases test' instead.
As suggested in that error message, you can use --databases and then you can use --tables successfully.
For example, I created tables test.foo and test.bar, filled each with three rows, then deleted the rows from test.bar on the second server dewey.
I ran this:
$ pt-table-sync h=huey h=dewey --databases test --tables foo,bar --execute --verbose
# Syncing h=dewey
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 3 0 Chunk 15:26:15 15:26:15 2 test.bar
# 0 0 0 0 Chunk 15:26:15 15:26:15 0 test.foo
It successfully re-inserted the 3 missing rows in test.bar.
Other tables in my test database were ignored.
This is an old question, but I searched everywhere for an answer. pt-table-sync only does one table. There is no tool that does the same thing to a list of tables or a full database schema. Specifically I want to run a Live server and be able to sync back to a Staging server, then edit code and files in the Staging server without fear of messing up Live or being overwritten by Live... and I want it to be free :)
I ended up writing a shell script called mysql_sync_live_to_stage.sh as follows:
#!/bin/bash
# sync db live to staging
error_log_file='./mysql_sync_errors.log'
echo $(date +"%Y %m %d %H:%M") > $error_log_file
function sync_table()
{
pt-table-sync --no-foreign-key-checks --execute
h=DB_1_HOST,u=DB_1_USER,p=DB_1_PASSWORD,D=$1,t=$3
h=DB_2_HOST,u=DB_2_USER,p=DB_2_PASSWORD,D=$2,t=$3 >> $error_log_file
}
# SYNC ALL TABLES IN name_of_live_database
mysql -h "DB_1_HOST" -u "DB_1_USER" -pDB_1_PASSWORD -D "DB_1_DBNAME" -e "SHOW TABLES" |
egrep -i '[0-9a-z\-\_]+' | egrep -i -v 'Tables_in' | while read -r table ; do
echo "Processing $table"
sync_table "name_of_live_database" "name_of_staging_database" $table
done
# FIX Config Settings For Staging
echo "Cleanup Queries..."
mysql -h "DB_2_HOST" -u "DB_2_USER" -pDB_2_PASSWORD -D "DB_2_DBNAME"
-e "UPDATE name_of_staging_database.nameofmyconfigtable SET value='bar'
WHERE config_id='foo'"
mysql -h "DB_2_HOST" -u "DB_2_USER" -pDB_2_PASSWORD -D "DB_2_DBNAME"
-e "UPDATE name_of_staging_database.nameofmyconfigtable SET value='bar2'
WHERE config_id='foo2'"
echo "Done"
This reads a list of table names from the live site then executes a sync on each one via the do loop. It goes through the list alphabetically, so I recommend keeping the --no-foreign-key-checks flag.
Its not perfect... It won't sync tables that don't exist in both databases, but when combined with a "git pull -f origin master" I get a complete sync in a couple minutes.

Resources