Execute postgres trigger after 24 hours after record insertion? - database

I`m trying to execute trigger after 24 hours after record insertion, for each row, how to do this? please help
you know, if user doesn`t verificate his email, his account will be deleted.
Without cron and so on.
Postgres db. PostgreSQL

This may help In relation to scheduling an sql command to run at a certain time.
PGagent
This is the command I tested and you could use this in PGagent.
delete from email_tbl where email_id in(select email_id from email_tbl
where timestamp < now() - '1 day'::interval );
Here is the test data i used.
CREATE EXTENSION citext;
CREATE DOMAIN email_addr AS citext
CHECK(
VALUE ~ '^[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+[.][A-Za-z]+$'
);
CREATE TABLE email_tbl (
email_id SERIAL PRIMARY KEY,
email_addr email_addr NOT NULL UNIQUE,
timestamp timestamp default current_timestamp
);
And here's some test data
insert into email_tbl (email_addr) values('me#home.net')
insert into email_tbl (email_addr,timestamp)
values('me2#home.net','2015-07-15 00:00:00'::timestamp)
select * from email_tbl where timestamp < now() - '1 day'::interval
All the best

Related

PostgreSQL query won't finish on StandBY server

We have a Primary-Stanby setup on PostgreSQL 12.1. A DELETE query is run on the primary server (which takes time but finishes completely) however the same query does not finish (and runs like forever):
DELETE FROM EVENTS
WHERE Event_Type_ID != 2
AND last_update_time <= '2020-11-04'
AND Event_ID NOT IN ( SELECT DISTINCT Event_ID FROM Event_Association )
AND Event_ID NOT IN ( SELECT DISTINCT Event_ID FROM EVENTS WHERE Last_Update_Time > '2020-11-14');
The execution plan is as follows (replacing delete with select query for the same):
https://explain.depesz.com/s/GZp7
There is a INDEX created on EVENTS.Event_ID and Event_Association.Event_ID however the delete query still won't finish on the standby server.
The EVENTS table has more than 2 million rows and the Event_Association table has more than 300,000 rows.
Can someone help me resolve this issue?
Thanks

Read amount on a postgres table

Is there any way to calculate the amount of read per second on a Postgres table?
but what I need is that whether a table has any read at the moment. (If no, then I can safely drop it)
Thank you
To figure out if the table is used currently, tun
SELECT pid
FROM pg_locks
WHERE relation = 'mytable'::regclass;
That will return the process ID of all backends using it.
To measure whether s table is used at all or not, run this query:
SELECT seq_scan + idx_scan + n_tup_ins + n_tup_upd + n_tup_del
FROM pg_stat_user_tables
WHERE relname = 'mytable';
Then repeat the query in a day. If the numbers haven't changed, nobody has used the table.
Audit SELECT activity
My suggestion is to wrap mytable in a view (called the_view_to_use_instead in the example) which invokes a logging function upon every select and then use the view for selecting from, i.e.
select <whatever you need> from the_view_to_use_instead ...
instead of
select <whatever you need> from mytable ...
So here it is
create table audit_log (table_name text, event_time timestamptz);
create function log_audit_event(tname text) returns void language sql as
$$
insert into audit_log values (tname, now());
$$;
create view the_view_to_use_instead as
select mytable.*
from mytable, log_audit_event('mytable') as ignored;
Every time someone queries the_view_to_use_instead an audit record with a timestamp appears in table audit_log. You can then examine it in order to find out whether and when mytable was selected from and make your decision. Function log_audit_event can be reused in other similar scenarios. The average number of selects per second over the last 24 hours would be
select count(*)::numeric/86400
from audit_log
where event_time > now() - interval '86400 seconds';

Oracle where clause date selection does not work

Basically, my problem can be re-created using the following script in oracle db:
create table test
(
current_date date
);
insert into test(current_date) values( TO_DATE('2018-02-01', 'yyyy-MM-dd') );
insert into test(current_date) values( TO_DATE('2018-03-01', 'yyyy-MM-dd') );
insert into test(current_date) values( TO_DATE('2018-04-01', 'yyyy-MM-dd') );
--select data later than May
select * from test where current_date >= TO_DATE('2018-05-01', 'yyyy-MM-dd') ;
But all three date come out as result? Why? Did I do something wrong here?
2/1/2018 12:00:00 AM
3/1/2018 12:00:00 AM
4/1/2018 12:00:00 AM
It's because current_date is an Oracle built-in function, returning the current date (and time). The way Oracle namespaces work means the built-in reference trumps your column name.
One way to fix it would be to use a table alias in your query:
select * from test t
where t.current_date >= TO_DATE('2018-05-01', 'yyyy-MM-dd') ;
This tells Oracle you're referencing the column name not the built-in.
Obviously the better solution is to change your table so you don't have a column name which clashes with an Oracle built-in.

How to get the the most recent queries in Oracle DB

I have a web application and I doubt some others have deleted some records manually. Upon enquiry nobody is admitting the mistakes. How to find out at what time those records were deleted ?? Is it possible to get the history of delete queries ?
If you have access to v$ view then you can use the following query to get it. It contains the time as FIRST_LOAD_TIME column.
select *
from v$sql v
where upper(sql_text) like '%DELETE%';
If flashback query is enabled for your database (try it with select * from table as of timestamp sysdate - 1) then it may be possible to determine the exact time the records were deleted. Use the as of timestamp clause and adjust the timestamp as necessary to narrow down to a window where the records still existed and did not exist anymore.
For example
select *
from table
as of timestamp to_date('21102016 09:00:00', 'DDMMYYYY HH24:MI:SS')
where id = XXX; -- indicates record still exists
select *
from table
as of timestamp to_date('21102016 09:00:10', 'DDMMYYYY HH24:MI:SS')
where id = XXX; -- indicates record does not exist
-- conclusion: record was deleted in this 10 second window

PostgreSQL multi-layer partitioning

I have been using partitioning with a postgreSQL database for a while. My database has grown quite a lot and does so nicely with partitioning. Unfortunately I now seem to have hit another barrier in speed and am trying to figure out some ways to speed up the database even more.
My basic setup is as follows:
I have one master table called database_data from which all the partitions inherit. I chose to have one partition per month and name them like: database_data_YYYY_MM which works nicely.
By analyzing my data usage, I noticed, that I mostly do insert operations on the table and only some updates. The updates, however also occur on only a certain kind of row: I have a column called channel_id (a FK to another table). The rows I update always have a channel_id out of a set of maybe 50 IDs, so this would be a great way of distinguishing the rows that are never updated from the ones that potentially are.
I figured it would speed up my setup further if I would use the partitioning to have one table of insert only data and one of potentially updated data per month, as my updates would have to check less rows each time.
I could of course use the "simple" partitioning I am using now and add another table for each month called database_data_YYYY_MM_update and add the special constraints to that and the database_data_YYYY_MM table in order for the query planner to distinguish between the tables.
I was, however thinking, that I do sometimes have operations which operate on all data of a given month, no matter if updateable or not. In such a case I could JOIN the two tables but there could be an easier way for such queries.
So now to my real question:
Is "two layer" partitioning possible in PostgreSQL? What I mean by that is, that instead of having two tables for each month inheriting from the master table, I would only have one table per month directly inheriting from the master table e.g. database_data_YYYY_MM and then have two more tables inheriting from that table, one for the insert only data e.g. database_data_YYYY_MM_insert and one for the updateable data e.g. database_data_YYYY_MM_update.
Would this speed up the query planning at all? I would guess that it would be faster if the query planner could eliminate both tables at once if the intermediate table was eliminated.
The obvious advantage here would be that I could operate on all data of one month by simply using the table database_data_YYYY_MM and for my updates use the child table directly.
Any drawbacks that I am not thinking of?
Thank you for your thoughts.
Edit 1:
I don't think a schema is strictly necessary to answer my question but if it helps understanding I'll provide a sample schema:
CREATE TABLE database_data (
id bigint PRIMARY KEY,
channel_id bigint, -- This is a FK to another table
timestamp TIMESTAMP WITH TIME ZONE,
value DOUBLE PRECISION
)
I have a trigger on the database_data table that generates the partitions on demand:
CREATE OR REPLACE FUNCTION function_insert_database_data() RETURNS TRIGGER AS $BODY$
DECLARE
thistablename TEXT;
thisyear INTEGER;
thismonth INTEGER;
nextmonth INTEGER;
nextyear INTEGER;
BEGIN
-- determine year and month of timestamp
thismonth = extract(month from NEW.timestamp AT TIME ZONE 'UTC');
thisyear = extract(year from NEW.timestamp AT TIME ZONE 'UTC');
-- determine next month for timespan in check constraint
nextyear = thisyear;
nextmonth = thismonth + 1;
if (nextmonth >= 13) THEN
nextmonth = nextmonth - 12;
nextyear = nextyear +1;
END IF;
-- Assemble the tablename
thistablename = 'database_datanew_' || thisyear || '_' || thismonth;
-- We are looping until it's successfull to catch the case when another connection simultaneously creates the table
-- if that would be the case, we can retry inserting the data
LOOP
-- try to insert into table
BEGIN
EXECUTE 'INSERT INTO ' || quote_ident(thistablename) || ' SELECT ($1).*' USING NEW;
-- Return NEW inserts the data into the main table allowing insert statements to return the values like "INSERT INTO ... RETURNING *"
-- This requires us to use another trigger to delete the data again afterwards
RETURN NEW;
-- If the table does not exist, create it
EXCEPTION
WHEN UNDEFINED_TABLE THEN
BEGIN
-- Create table with check constraint on timestamp
EXECUTE 'CREATE TABLE ' || thistablename || ' (CHECK ( timestamp >= TIMESTAMP WITH TIME ZONE '''|| thisyear || '-'|| thismonth ||'-01 00:00:00+00''
AND timestamp < TIMESTAMP WITH TIME ZONE '''|| nextyear || '-'|| nextmonth ||'-01 00:00:00+00'' ), PRIMARY KEY (id)
) INHERITS (database_data)';
-- Add any trigger and indices to the table you might need
-- Insert the new data into the new table
EXECUTE 'INSERT INTO ' || quote_ident(thistablename) || ' SELECT ($1).*' USING NEW;
RETURN NEW;
EXCEPTION WHEN DUPLICATE_TABLE THEN
-- another thread seems to have created the table already. Simply loop again.
END;
-- Don't insert anything on other errors
WHEN OTHERS THEN
RETURN NULL;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER trigger_insert_database_data
BEFORE INSERT ON database_data
FOR EACH ROW EXECUTE PROCEDURE function_insert_database_data();
As for sample data: Let's assume we only have two channels: 1 and 2. 1 is insert only data and 2 is updateable.
My two layer approach would be something like:
Main table:
CREATE TABLE database_data (
id bigint PRIMARY KEY,
channel_id bigint, -- This is a FK to another table
timestamp TIMESTAMP WITH TIME ZONE,
value DOUBLE PRECISION
)
Intermediate table:
CREATE TABLE database_data_2015_11 (
(CHECK ( timestamp >= TIMESTAMP WITH TIME ZONE '2015-11-01 00:00:00+00' AND timestamp < TIMESTAMP WITH TIME ZONE '2015-12-01 00:00:00+00)),
PRIMARY KEY (id)
) INHERITS(database_data);
Partitions:
CREATE TABLE database_data_2015_11_insert (
(CHECK (channel_id = 1)),
PRIMARY KEY (id)
) INHERITS(database_data_2015_11);
CREATE TABLE database_data_2015_11_update (
(CHECK (channel_id = 2)),
PRIMARY KEY (id)
) INHERITS(database_data_2015_11);
Of course I would then need another trigger on the intermediate table to create the child tables on demand.
It's a clever idea, but sadly it doesn't seem to work. If I have a parent table with 1000 direct children, and I run a SELECT that should pull from just one child, then explain analyze gives me a planning time of around 16ms. On the other hand, if I have just 10 direct children, and they all have 10 children, and those all have 10 children, I get a query planning time of about 29ms. I was surprised---I really thought it would work!
Here is some ruby code I used to generate my tables:
0.upto(999) do |i|
if i % 100 == 0
min_group_id = i
max_group_id = min_group_id + 100
puts "CREATE TABLE datapoints_#{i}c (check (group_id > #{min_group_id} and group_id <= #{max_group_id})) inherits (datapoints);"
end
if i % 10 == 0
min_group_id = i
max_group_id = min_group_id + 10
puts "CREATE TABLE datapoints_#{i}x (check (group_id > #{min_group_id} and group_id <= #{max_group_id})) inherits (datapoints_#{i / 100 * 100}c);"
end
puts "CREATE TABLE datapoints_#{i + 1} (check (group_id = #{i + 1})) inherits (datapoints_#{i / 10 * 10}x);"
end

Resources