Insert from select or update from select with commit every 1M records - database

I've already seen a dozen such questions but most of them get answers that doesn't apply to my case.
First off - the database is am trying to get the data from has a very slow network and is connected to using VPN.
I am accessing it through a database link.
I have full write/read access on my schema tables but I don't have DBA rights so I can't create dumps and I don't have grants for creation new tables etc.
I've been trying to get the database locally and all is well except for one table.
It has 6.5 million records and 16 columns.
There was no problem getting 14 of them but the remaining two are Clobs with huge XML in them.
The data transfer is so slow it is painful.
I tried
insert based on select
insert all 14 then update the other 2
create table as
insert based on select conditional so I get only so many records and manually commit
The issue is mainly that the connection is lost before the transaction finishes (or power loss or VPN drops or random error etc) and all the GBs that have been downloaded are discarded.
As I said I tried putting conditionals so I get a few records but even this is a bit random and requires focus from me.
Something like :
Insert into TableA
Select * from TableA#DB_RemoteDB1
WHERE CREATION_DATE BETWEEN to_date('01-Jan-2016') AND to_date('31-DEC-2016')
Sometimes it works sometimes it doesn't. Just after a few GBs Toad is stuck running but when I look at its throughput it is 0KB/s or a few Bytes/s.
What I am looking for is a loop or a cursor that can be used to get maybe 100000 or a 1000000 at a time - commit it then go for the rest until it is done.
This is a one time operation that I am doing as we need the data locally for testing - so I don't care if it is inefficient as long as the data is brought in in chunks and a commit saves me from retrieving it again.
I can count already about 15GBs of failed downloads I've done over the last 3 days and my local table still has 0 records as all my attempts have failed.
Server: Oracle 11g
Local: Oracle 11g
Attempted Clients: Toad/Sql Dev/dbForge Studio
Thanks.

You could do something like:
begin
loop
insert into tablea
select * from tablea#DB_RemoteDB1 a_remote
where not exists (select null from tablea where id = a_remote.id)
and rownum <= 100000; -- or whatever number makes sense for you
exit when sql%rowcount = 0;
commit;
end loop;
end;
/
This assumes that there is a primary/unique key you can use to check if a row int he remote table already exists in the local one - in this example I've used a vague ID column, but replace that with your actual key column(s).
For each iteration of the loop it will identify rows in the remote table which do not exist in the local table - which may be slow, but you've said performance isn't a priority here - and then, via rownum, limit the number of rows being inserted to a manageable subset.
The loop then terminates when no rows are inserted, which means there are no rows left in the remote table that don't exist locally.
This should be restartable, due to the commit and where not exists check. This isn't usually a good approach - as it kind of breaks normal transaction handling - but as a one off and with your network issues/constraints it may be necessary.
Toad is right, using bulk collect would be (probably significantly) faster in general as the query isn't repeated each time around the loop:
declare
cursor l_cur is
select * from tablea#dblink3 a_remote
where not exists (select null from tablea where id = a_remote.id);
type t_tab is table of l_cur%rowtype;
l_tab t_tab;
begin
open l_cur;
loop
fetch l_cur bulk collect into l_tab limit 100000;
forall i in 1..l_tab.count
insert into tablea values l_tab(i);
commit;
exit when l_cur%notfound;
end loop;
close l_cur;
end;
/
This time you would change the limit 100000 to whatever number you think sensible. There is a trade-off here though, as the PL/SQL table will consume memory, so you may need to experiment a bit to pick that value - you could get errors or affect other users if it's too high. Lower is less of a problem here, except the bulk inserts become slightly less efficient.
But because you have a CLOB column (holding your XML) this won't work for you, as #BobC pointed out; the insert ... select is supported over a DB link, but the collection version will get an error from the fetch:
ORA-22992: cannot use LOB locators selected from remote tables
ORA-06512: at line 10
22992. 00000 - "cannot use LOB locators selected from remote tables"
*Cause: A remote LOB column cannot be referenced.
*Action: Remove references to LOBs in remote tables.

Related

How does SQL Server handle failed query to linked server?

I have a stored procedure that relies on a query to a linked server.
This stored procedure is roughly structured as follows:
-- Create local table var to stop query from needing round trips to linked server
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT eid FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
-- This view obscures sensitive information and shows only the data that I have permission to see
-- Many other things
The query itself is much more complex, but the key idea is building this temporary table from a linked server (because it takes the query 5 minutes to run if I don't, versus 3 seconds if I do).
I've recently had an issue where I ended up with updates to my table that failed to get checked against the linked server for duplicate information.
The logical chain of events is this:
Get all of the data from the original view
The original view contains maybe 3000 records, of which maybe 30 are
duplicates of the entity in question, but with 1 field having a
different value.
I then have to grab data from a different server to know which of
the duplicates is the correct one.
When the stored procedure runs, it updates each record.
ERROR STEP - when the stored procedure hits a duplicate record, it
updates my_table again - so es gets changed multiple times in a row.
The temp table was added after the fact when we realized incorrect es values were being introduced to my_table.
'my_database` does not contain the data needed to determine which is the correct tuple, hence the requirement for the linked server.
As far as I can tell, we had a temporary network interruption or a connection timeout that stopped my_server from getting the response back from linked_server, and it just passed an empty table to the rest of the procedure.
So, my question is - how can I guard against this happening?
I can't just check if the table is empty, because it could legitimately be empty. I need to definitively know if that initial SELECT from linked_server failed, if it timed out, or if it intentionally returned nothing.
without knowing the definition of the table you're querying you could get into an issue where your data is to long and you get a truncation error on your table.
Better make sure and substring it...
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT SUBSTRING(eid,1,6) FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
I had a similar problem where I needed to move data between servers, could not use a network connection so I ended up doing BCP out and BCP in. This is fast, clean and takes away the complexity of user authentication, drivers, trust domains. also it's repeatable and can be used for incremental loading.

Query when value trails with .b decreases speed by 30x-100x

Consider table testTable, a table with six fields: one of them a UNIQUEIDENTIFIER, one a TIMESTAMP and four of them VARCHARs. Field Filename is VARCHAR.
This first query takes 1 minutes 38 seconds
Select top 1 * from testTable WHERE Filename = 'any.string.1512.b'
Either of these queries takes 1-3 seconds
Select top 1 * from testTable WHERE Filename = 'any.string.1512'
Select top 1 * from testTable WHERE Filename like 'cusip.realloc.1412.b%'
I have looked at the execution plan for all three and the only difference is that the last query (the LIKE statement) used 46% index seek\54% Key Lookup vs a 50\50 index\key lookup for the first two. As far as I can tell as soon as I no longer use the .b part of this search criterium, the queries go back to normal speed.
FileName has been indexed; table has been removed and recreated just in case. We have added indexes, removed indexes, check table, checked database, restart services, restart server, recreated the table. This field used to be VARCHAR(MAX) and I changed it to VARCHAR(100) to index it, but the problem was occurring before making this change.
Something else that I believe may be happening is that there might be something wrong with the end of the table. It will never complete a full:
Select * from testTable
I hoped it was a corrupted table but that wasn't the case. However when we attempt to generate a script in SSMS it fails to generate (no error given). I was able to recreate it by using another SQL client and generating the structure from SSMS and data copy from the other SQL client.
We are pretty stumped.

Trouble with SQL Server locks

I am running into an issue where SQL Server is causing a significant number of locks (95 to 150) on our main table. They are typically short duration locks, lasting under 3 seconds, but I would like to eliminate those if I possibly can. We have also noticed that typically there are no blocks, but occasionally we have a situation where the blocks seem to "cascade" and then the entire system slows down considerably.
Background
We have up to 600 virtual machines processing data and we loaded a table in SQL so we could monitor any records that got stalled and records that were marked complete. We typically have between 200,000 and 1,000,000 records in this table during our processing.
What we are trying to accomplish
We are attempting to get the next available record (Status = 0). However, since there can be multiple hits on the stored proc simultaneously, we are trying to make sure each VM gets a unique record. This is important because processing takes between 1.5 and 2.5 minutes per record and we want to make this as clean as possible.
Our thought process to this point
UPDATE TOP (1) dbo.Test WITH (ROWLOCK)
SET Status = 1,
VMID = #VMID,
ReadCount = ReadCount + 1,
ProcessDT = GETUTCDATE()
OUTPUT INSERTED.RowID INTO #retValue
WHERE Status = 0
This update was causing us a few issues with locks, so we re-worked the process a little bit and changed the where to a sub-query to return the top 1 RowID (primary key) from the table. This seemed to help things run a little bit smoother, but then we occasionally get over-loaded in the database again.
UPDATE TOP (1) dbo.Test WITH (ROWLOCK)
SET Status = 1,
VMID = #VMID,
ReadCount = ReadCount + 1,
ProcessDT = GETUTCDATE()
OUTPUT INSERTED.RowID INTO #retValue
-- WHERE Status = 0
WHERE RowID IN (SELECT TOP 1 RowID FROM do.Test WHERE Status = 0 ORDER BY RowID)
We discovered that having a significant number of Status 1 and 2 records int he table causes slowdowns. We figured it was from a table scan on the Status column. We added the following index but it did not help solve the locks.
CREATE NONCLUSTERED INDEX IX_Test_Status_RowID
ON [dbo].[Test] ([Status])
INCLUDE ([RowID])
The final step after the UPDATE, we use the RowID returned to select out the details:
SELECT 'Test' as FileName, *, #Nick as [Nickname]
FROM Test WITH (NOLOCK)
WHERE RowID IN (SELECT id from #retValue)
Types of locks
The majority of the blocks are LCK_M_U and LCK_M_S, which I would expect with that UPDATE and SELECT query. We did have 1 or 2 LCK_M_X locks as well occasionally. That made me think we may still be getting collisions on our "unique" record code.
Questions
Are these locks and the number of locks just normal SQL operations for this type load?
Is the sub-query causing more issues than a TOP(1) in the UPDATE we started with? I am trying to get confirmation I can remove the ORDER BY statement and remove that extra step of processing.
Would a different index help? I wondered if the index updating was a possible cause of the locks initially, but now I am not sure.
Is there a better or more efficient way to get a unique RowID?
Is the WITH (ROWLOCK) causing more locks than leaving it off would cause? The idea is ROWLOCK would only lock the 1 specific record and allow another proc to update another record and select without locking the table or page.
Does anyone have any tools they recommend to stress test and run 100 queries simultaneously in order to test any potential solutions?
Sorry for all the questions, just trying to make sure I am as clear as possible on our process and the questions we have.
Thanks in advance for any insight as this is a really frustrating issue for us.
Hardware
We are running SQL Server 2008 R2 on a Dual Xeon CPU with 24 GB of RAM. So we should have plenty of horsepower for this process.
It looks like the best solution to the issue was to create a separate table with an identity and use the ##IDENTITY from the insert to determine the next row to process. That has solved all my lock issues so far in my stress testing. Thanks to all who pointed my in the right direction!

error when insert into linked server

I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?
I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)
The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]
I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';
I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.
I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.

Hidden Features of PostgreSQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm surprised this hasn't been posted yet. Any interesting tricks that you know about in Postgres? Obscure config options and scaling/perf tricks are particularly welcome.
I'm sure we can beat the 9 comments on the corresponding MySQL thread :)
Since postgres is a lot more sane than MySQL, there are not that many "tricks" to report on ;-)
The manual has some nice performance tips.
A few other performance related things to keep in mind:
Make sure autovacuum is turned on
Make sure you've gone through your postgres.conf (effective cache size, shared buffers, work mem ... lots of options there to tune).
Use pgpool or pgbouncer to keep your "real" database connections to a minimum
Learn how EXPLAIN and EXPLAIN ANALYZE works. Learn to read the output.
CLUSTER sorts data on disk according to an index. Can dramatically improve performance of large (mostly) read-only tables. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered.
Here's a few things I've found useful that aren't config or performance related per se.
To see what's currently happening:
select * from pg_stat_activity;
Search misc functions:
select * from pg_proc WHERE proname ~* '^pg_.*'
Find size of database:
select pg_database_size('postgres');
select pg_size_pretty(pg_database_size('postgres'));
Find size of all databases:
select datname, pg_size_pretty(pg_database_size(datname)) as size
from pg_database;
Find size of tables and indexes:
select pg_size_pretty(pg_relation_size('public.customer'));
Or, to list all tables and indexes (probably easier to make a view of this):
select schemaname, relname,
pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) as size
from (select schemaname, relname, 'table' as type
from pg_stat_user_tables
union all
select schemaname, relname, 'index' as type
from pg_stat_user_indexes) x;
Oh, and you can nest transactions, rollback partial transactions++
test=# begin;
BEGIN
test=# select count(*) from customer where name='test';
count
-------
0
(1 row)
test=# insert into customer (name) values ('test');
INSERT 0 1
test=# savepoint foo;
SAVEPOINT
test=# update customer set name='john';
UPDATE 3
test=# rollback to savepoint foo;
ROLLBACK
test=# commit;
COMMIT
test=# select count(*) from customer where name='test';
count
-------
1
(1 row)
The easiest trick to let postgresql perform a lot better (apart from setting and using proper indexes of course) is just to give it more RAM to work with (if you have not done so already). On most default installations the value for shared_buffers is way too low (in my opinion). You can set
shared_buffers
in postgresql.conf. Divide this number by 128 to get an approximation of the amount of memory (in MB) postgres can claim. If you up it enough this will make postgresql fly. Don't forget to restart postgresql.
On Linux systems, when postgresql won't start again you will probably have hit the kernel.shmmax limit. Set it higher with
sysctl -w kernel.shmmax=xxxx
To make this persist between boots, add a kernel.shmmax entry to /etc/sysctl.conf.
A whole bunch of Postgresql tricks can be found here:
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks
Postgres has a very powerful datetime handling facility thanks to its INTERVAL support.
For example:
select NOW(), NOW() + '1 hour';
now | ?column?
-------------------------------+-------------------------------
2009-04-18 01:37:49.116614+00 | 2009-04-18 02:37:49.116614+00
(1 row)
select current_date ,(current_date + interval '1 year')::date;
date | date
---------------------+----------------
2014-10-17 | 2015-10-17
(1 row)
You can cast many strings to an INTERVAL type.
COPY
I'll start. Whenever I switch to Postgres from SQLite, I usually have some really big datasets. The key is to load your tables with COPY FROM rather than doing INSERTS. See documentation:
http://www.postgresql.org/docs/8.1/static/sql-copy.html
The following example copies a table to the client using the vertical bar (|) as the field delimiter:
COPY country TO STDOUT WITH DELIMITER '|';
To copy data from a file into the country table:
COPY country FROM '/usr1/proj/bray/sql/country_data';
See also here:
Faster bulk inserts in sqlite3?
My by far favorite is generate_series: at last a clean way to generate dummy rowsets.
Ability to use a correlated value in a LIMIT clause of a subquery:
SELECT (
SELECT exp_word
FROM mytable
OFFSET id
LIMIT 1
)
FROM othertable
Abitlity to use multiple parameters in custom aggregates (not covered by the documentation): see the article in my blog for an example.
One of the things I really like about Postgres is some of the data types supported in columns. For example, there are column types made for storing Network Addresses and Arrays. The corresponding functions (Network Addresses / Arrays) for these column types let you do a lot of complex operations inside queries that you'd have to do by processing results through code in MySQL or other database engines.
Arrays are really cool once you get to know 'em.
Lets say you would like to store some hyper links between pages. You might start by thinking about creating a Table kinda like this:
CREATE TABLE hyper.links (
tail INT4,
head INT4
);
If you needed to index the tail column, and you had, say 200,000,000 links-rows (like wikipedia would give you), you would find yourself with a huge Table and a huge Index.
However, with PostgreSQL, you could use this Table format instead:
CREATE TABLE hyper.links (
tail INT4,
head INT4[],
PRIMARY KEY(tail)
);
To get all heads for a link you could send a command like this (unnest() is standard since 8.4):
SELECT unnest(head) FROM hyper.links WHERE tail = $1;
This query is surprisingly fast when it is compared with the first option (unnest() is fast and the Index is way way smaller). Furthermore, your Table and Index will take up much less RAM-memory and HD-space, especially when your Arrays are so long that they are compressed to a Toast Table. Arrays are really powerful.
Note: while unnest() will generate rows out of an Array, array_agg() will aggregate rows into an Array.
Materialized Views are pretty easy to setup:
CREATE VIEW my_view AS SELECT id, AVG(my_col) FROM my_table GROUP BY id;
CREATE TABLE my_matview AS SELECT * FROM my_view;
That creates a new table, my_matview, with the columns and values of my_view. Triggers or a cron script can then be setup to keep the data up to date, or if you're lazy:
TRUNCATE my_matview;
INSERT INTO my_matview SELECT * FROM my_view;
Inheritance..infact Multiple Inheritance (as in parent-child "inheritance" not 1-to-1 relation inheritance which many web frameworks implement when working with postgres).
PostGIS (spatial extension), a wonderful add-on that offers comprehensive set of geometry functions and coordinates storage out of the box. Widely used in many open-source geo libs (e.g. OpenLayers,MapServer,Mapnik etc) and definitely way better than MySQL's spatial extensions.
Writing procedures in different languages e.g. C, Python,Perl etc (makes your life easir to code if you're a developer and not a db-admin).
Also all procedures can be stored externally (as modules) and can be called or imported at runtime by specified arguments. That way you can source control the code and debug the code easily.
A huge and comprehensive catalogue on all objects implemented in your database (i.e. tables,constraints,indexes,etc).
I always find it immensely helpful to run few queries and get all meta info e.g. ,constraint names and fields on which they have been implemented on, index names etc.
For me it all becomes extremely handy when I have to load new data or do massive updates in big tables (I would automatically disable triggers and drop indexes) and then recreate them again easily after processing has finished. Someone did an excellent job of writing handful of these queries.
http://www.alberton.info/postgresql_meta_info.html
Multiple schemas under one database, you can use it if your database has large number of tables, you can think of schemas as categories. All tables (regardless of it's schema) have access to all other tables and functions present in parent db.
You don't need to learn how to decipher "explain analyze" output, there is a tool: http://explain.depesz.com
select pg_size_pretty(200 * 1024)
pgcrypto: more cryptographic functions than many programming languages' crypto modules provide, all accessible direct from the database. It makes cryptographic stuff incredibly easy to Just Get Right.
A database can be copied with:
createdb -T old_db new_db
The documentation says:
this is not (yet) intended as a general-purpose "COPY DATABASE" facility
but it works well for me and is much faster than
createdb new_db
pg_dump old_db | psql new_db
Memory storage for throw-away data/global variables
You can create a tablespace that lives in the RAM, and tables (possibly unlogged, in 9.1) in that tablespace to store throw-away data/global variables that you'd like to share across sessions.
http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/
Advisory locks
These are documented in an obscure area of the manual:
http://www.postgresql.org/docs/9.0/interactive/functions-admin.html
It's occasionally faster than acquiring multitudes of row-level locks, and they can be used to work around cases where FOR UPDATE isn't implemented (such as recursive CTE queries).
This is my favorites list of lesser know features.
Transactional DDL
Nearly every SQL statement is transactional in Postgres. If you turn off autocommit the following is possible:
drop table customer_orders;
rollback;
select *
from customer_orders;
Range types and exclusion constraint
To my knowledge Postgres is the only RDBMS that lets you create a constraint that checks if two ranges overlap. An example is a table that contains product prices with a "valid from" and "valid until" date:
create table product_price
(
price_id serial not null primary key,
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
NoSQL features
The hstore extension offers a flexible and very fast key/value store that can be used when parts of the database need to be "schema-less". JSON is another option to store data in a schema-less fashion and
insert into product_price
(product_id, price, valid_during)
values
(1, 100.0, '[2013-01-01,2014-01-01)'),
(1, 90.0, '[2014-01-01,)');
-- querying is simply and can use an index on the valid_during column
select price
from product_price
where product_id = 42
and valid_during #> date '2014-10-17';
The execution plan for the above on a table with 700.000 rows:
Index Scan using check_price_range on public.product_price (cost=0.29..3.29 rows=1 width=6) (actual time=0.605..0.728 rows=1 loops=1)
Output: price
Index Cond: ((product_price.valid_during #> '2014-10-17'::date) AND (product_price.product_id = 42))
Buffers: shared hit=17
Total runtime: 0.772 ms
To avoid inserting rows with overlapping validity ranges a simple (and efficient) unique constraint can be defined:
alter table product_price
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&)
Infinity
Instead of requiring a "real" date far in the future Postgres can compare dates to infinity. E.g. when not using a date range you can do the following
insert into product_price
(product_id, price, valid_from, valid_until)
values
(1, 90.0, date '2014-01-01', date 'infinity');
Writeable common table expressions
You can delete, insert and select in a single statement:
with old_orders as (
delete from orders
where order_date < current_date - interval '10' year
returning *
), archived_rows as (
insert into archived_orders
select *
from old_orders
returning *
)
select *
from archived_rows;
The above will delete all orders older than 10 years, move them to the archived_orders table and then display the rows that were moved.
1.) When you need append notice to query, you can use nested comment
SELECT /* my comments, that I would to see in PostgreSQL log */
a, b, c
FROM mytab;
2.) Remove Trailing spaces from all the text and varchar field in a database.
do $$
declare
selectrow record;
begin
for selectrow in
select
'UPDATE '||c.table_name||' SET '||c.COLUMN_NAME||'=TRIM('||c.COLUMN_NAME||') WHERE '||c.COLUMN_NAME||' ILIKE ''% '' ' as script
from (
select
table_name,COLUMN_NAME
from
INFORMATION_SCHEMA.COLUMNS
where
table_name LIKE 'tbl%' and (data_type='text' or data_type='character varying' )
) c
loop
execute selectrow.script;
end loop;
end;
$$;
3.) We can use a window function for very effective removing of duplicate rows:
DELETE FROM tab
WHERE id IN (SELECT id
FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), id
FROM tab) x
WHERE x.row_number > 1);
Some PostgreSQL's optimized version (with ctid):
DELETE FROM tab
WHERE ctid = ANY(ARRAY(SELECT ctid
FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), ctid
FROM tab) x
WHERE x.row_number > 1));
4.) When we need to identify server's state, then we can use a function:
SELECT pg_is_in_recovery();
5.) Get functions's DDL command.
select pg_get_functiondef((select oid from pg_proc where proname = 'f1'));
6.) Safely changing column data type in PostgreSQL
create table test(id varchar );
insert into test values('1');
insert into test values('11');
insert into test values('12');
select * from test
--Result--
id
character varying
--------------------------
1
11
12
You can see from the above table that I have used the data type – ‘character varying’ for ‘id’
column. But it was a mistake, because I am always giving integers as id. So using varchar here is a
bad practice. So let’s try to change the column type to integer.
ALTER TABLE test ALTER COLUMN id TYPE integer;
But it returns:
ERROR: column “id” cannot be cast automatically to type integer SQL
state: 42804 Hint: Specify a USING expression to perform the
conversion
That means we can’t simply change the data type because data is already there in the column. Since the data is of type ‘character varying’ postgres cant expect it as integer though we entered integers only. So now, as postgres suggested we can use the ‘USING’ expression to cast our data into integers.
ALTER TABLE test ALTER COLUMN id TYPE integer USING (id ::integer);
It Works.
7.) Know who is connected to the Database
This is more or less a monitoring command. To know which user connected to which database
including their IP and Port use the following SQL:
SELECT datname,usename,client_addr,client_port FROM pg_stat_activity ;
8.) Reloading PostgreSQL Configuration files without Restarting Server
PostgreSQL configuration parameters are located in special files like postgresql.conf and pg_hba.conf. Often, you may need to change these parameters. But for some parameters to take effect we often need to reload the configuration file. Of course, restarting server will do it. But in a production environment it is not preferred to restarting the database, which is being used by thousands, just for setting some parameters. In such situations, we can reload the configuration files without restarting the server by using the following function:
select pg_reload_conf();
Remember, this wont work for all the parameters, some parameter
changes need a full restart of the server to be take in effect.
9.) Getting the data directory path of the current Database cluster
It is possible that in a system, multiple instances(cluster) of PostgreSQL is set up, generally, in different ports or so. In such cases, finding which directory(physical storage directory) is used by which instance is a hectic task. In such cases, we can use the following command in any database in the cluster of our interest to get the directory path:
SHOW data_directory;
The same function can be used to change the data directory of the cluster, but it requires a server restarts:
SET data_directory to new_directory_path;
10.) Find a CHAR is DATE or not
create or replace function is_date(s varchar) returns boolean as $$
begin
perform s::date;
return true;
exception when others then
return false;
end;
$$ language plpgsql;
Usage: the following will return True
select is_date('12-12-2014')
select is_date('12/12/2014')
select is_date('20141212')
select is_date('2014.12.12')
select is_date('2014,12,12')
11.) Change the owner in PostgreSQL
REASSIGN OWNED BY sa TO postgres;
12.) PGADMIN PLPGSQL DEBUGGER
Well explained here
It's convenient to rename an old database rather than mysql can do. Just using the following command:
ALTER DATABASE name RENAME TO new_name

Resources