Read amount on a postgres table - database

Is there any way to calculate the amount of read per second on a Postgres table?
but what I need is that whether a table has any read at the moment. (If no, then I can safely drop it)
Thank you

To figure out if the table is used currently, tun
SELECT pid
FROM pg_locks
WHERE relation = 'mytable'::regclass;
That will return the process ID of all backends using it.
To measure whether s table is used at all or not, run this query:
SELECT seq_scan + idx_scan + n_tup_ins + n_tup_upd + n_tup_del
FROM pg_stat_user_tables
WHERE relname = 'mytable';
Then repeat the query in a day. If the numbers haven't changed, nobody has used the table.

Audit SELECT activity
My suggestion is to wrap mytable in a view (called the_view_to_use_instead in the example) which invokes a logging function upon every select and then use the view for selecting from, i.e.
select <whatever you need> from the_view_to_use_instead ...
instead of
select <whatever you need> from mytable ...
So here it is
create table audit_log (table_name text, event_time timestamptz);
create function log_audit_event(tname text) returns void language sql as
$$
insert into audit_log values (tname, now());
$$;
create view the_view_to_use_instead as
select mytable.*
from mytable, log_audit_event('mytable') as ignored;
Every time someone queries the_view_to_use_instead an audit record with a timestamp appears in table audit_log. You can then examine it in order to find out whether and when mytable was selected from and make your decision. Function log_audit_event can be reused in other similar scenarios. The average number of selects per second over the last 24 hours would be
select count(*)::numeric/86400
from audit_log
where event_time > now() - interval '86400 seconds';

Related

Update on key violation in Stored Procedure using BULK INSERT & Trigger

I have a stored procedure that performs a bulk insert of a large number of DNS log entries. I wish to summarise this raw data in a new table for analysis. The new table takes a given log entry for FQDN and Record Type and holds one record only with a hitcount.
Source table might include 100 rows of:
FQDN, Type
www.microsoft.com,A
Destination table would have:
FQDN, Type, HitCount
www.microsoft.com, A, 100
The SP establishes a unique ID made up of [FQDN] +'|'+ [Type], which is then used as the primary key in the destination table.
My plan was to have the SP fire a trigger that did an UPDATE...IF ##ROWCOUNT=0...INSERT. However, that of course failed because the trigger receives all the [inserted] rows as a single set so always throws a key violation error.
I'm having trouble getting my head around a solution and need some fresh eyes and better skills to take a look. The bulk insert SP works just fine and the raw data is exactly as desired. However trying to come up with a suitable method to create the summary data is beyond my present skills/mindset.
I have several 10s of Tb of data to process, so I don't see the summary as a something we could do dynamically with a SELECT COUNT - which is why I started down the trigger route.
The relevant code in the SP is driven by a cursor consisting of a list of compressed log files needing to be decompressed and bulk-inserted, and is as follows:
-- Bulk insert to a view because bulk insert cannot populate the UID field
SET #strDynamicSQL = 'BULK INSERT [DNS_Raw_Logs].[dbo].[vwtblRawQueryLogData] FROM ''' + #strTarFolder + '\' + #strLogFileName + ''' WITH (FIRSTROW = 1, FIELDTERMINATOR = '' '', ROWTERMINATOR = ''0x0a'', ERRORFILE = ''' + #strTarFolder + '\' + #strErrorFile + ''', TABLOCK)'
--PRINT #strDynamicSQL
EXEC (#strDynamicSQL)
-- Update [UID] field after the bulk insert
UPDATE [DNS_Raw_Logs].[dbo].[tblRawQueryLogData]
SET [UID] = [FQDN] + '|' + [Type]
FROM [tblRawQueryLogData]
WHERE [UID] IS NULL
I know that the UPDATE...IF ##ROWCOUNT=0...INSERT solution is wrong because it assumes that the input data is a single row. I'd appreciate help on a way to do this.
Thank you
First, at that scale make sure you understand columnstore tables. They are very highly compressed and fast to scan.
Then write a query that reads from the raw table and returns the summarized
create or alter view DnsSummary
as
select FQDN, Type, count(*) HitCount
from tblRawQueryLogData
group by FQDN, Type
Then if querying that view directly is too expensive, write a stored procedure that loads a table after each bulk insert. Or make the view into an indexed view.
Thanks for the answer David, obvious when someone else looks at it!
I ran the view-based solution with 14M records (about 4 hours worth) and it took 40secs to return, so I think i'll modify the SP to drop and re-create summary table each time it runs the bulk insert.
The source table also includes a timestamp for each entry. I would like to grab the earliest and latest times associated with each UID and add that to the summary.
My current summary query (courtesy of David) looks like this:
SELECT [UID], [FQDN], [Type], COUNT([UID]) AS [HitCount]
FROM [DNS_Raw_Logs].[dbo].tblRawQueryLogData
GROUP BY [UID], [FQDN], [Type]
ORDER BY COUNT([UID]) DESC
And returns:
UID, FQDN, Type, HitCount
www.microsoft.com|A, www.microsoft.com, A, 100
If I wanted to grab first earliest and latest times then I think I'm looking at nesting 3 queries to grab the earliest time (SELECT TOP N...ORDER BY... ASC), the latest time (SELECT TOP N...ORDER BY... DESC) and the hitcount. Is there a more efficient way of doing this, before I try and wrap my head around this route?

How to access the a table ABC_XXX constantly in Teradata where XXX changes periodically?

I have a table in Teradata ABC_XXX where XXX will change monthly basis.
For Ex: ABC_1902, ABC_1812, ABC_1904 etc...
I need to access this table in my application without modifying the code every month.
Is that any way to do in Teradata or any alternate solution.??
Please help
Can you try using DBC.TABLES in subquery like below:
with tbl as (select 'select * from ' || databasename||'.'||tablename as tb from
dbc.tables where tablename like 'ABC_%')
select * from tbl;
If you can get the final query executed in your application, you will be able to query the required table without editing the query.
The above solution is with expectation that the previous month's table gets dropped whenever a new month's table is created.
However, if previous table is not being dropped, then you can try the below approach:
select 'select * from db.ABC_' ||to_char(current_date,'YYMM')
Output will be
select * from db.ABC_1902
execute the output in your application, you will be able to query dynamic table.

Select from a view using where clause on indexed column in Oracle Database

We have a Oracle database table called StockDailyQuote which contains billions of rows. There are two indexes: on TradeDate (a Date field) and on StockTicker (a string field). This table is actually a partitioned table and on each TradeDate there is an individual table. So we never query against it using
SELECT * FROM StockDailyQuote
because it is impractical to get any result. Instead we always use a WHERE clause on TradeDate and the the speed is acceptable.
Now we have compiled a view out of StockDailyQuote table (joined some other tables to get useful information). Because we can't specify the TradeDate at this stage so our view looks like this (this is a simplified version):
create or replace view MyStockQuoteView as
SELECT t1.StockTicker, t2.CompanyName, t1.TradeDate, t1.ClosePrice
FROM StockDailyQuote t1 join CompanyInstruction t2 on t1.StockTicker =
t2.StockTicker
And I always try to query against this view with a WHERE clause on TradeDate column. I thought when oracle sees my query SELECT * FROM MyStockQuoteView where TradeDate = '20170726', it would be smart enough to add the indexed where clause to the sql and then send it to the engine so the query should be as equally fast as SELECT * from StockDailyQuote where TradeDate = '20170726', assuming my other joins do not take much time. But it doesn't behave like that: it takes really long time to return values. I can only assume it queries against the whole view and then uses the where on the returned value. This makes this query impractical to use. So how can I solve my problem? One solution is to make a procedure and run it daily and save one day's data to a table. But is there some other options?
It would be nice to see the query plan SELECT * FROM MyStockQuoteView where TradeDate = '20170726'.
Oracle normally understands what it needs to read to the partition inside the view.
You can try to explicitly specify the hint, for example:
SELECT /* + full(MyStockQuoteView.StockDailyQuote) */ * FROM MyStockQuoteView where TradeDate = '20170726'

How to get the the most recent queries in Oracle DB

I have a web application and I doubt some others have deleted some records manually. Upon enquiry nobody is admitting the mistakes. How to find out at what time those records were deleted ?? Is it possible to get the history of delete queries ?
If you have access to v$ view then you can use the following query to get it. It contains the time as FIRST_LOAD_TIME column.
select *
from v$sql v
where upper(sql_text) like '%DELETE%';
If flashback query is enabled for your database (try it with select * from table as of timestamp sysdate - 1) then it may be possible to determine the exact time the records were deleted. Use the as of timestamp clause and adjust the timestamp as necessary to narrow down to a window where the records still existed and did not exist anymore.
For example
select *
from table
as of timestamp to_date('21102016 09:00:00', 'DDMMYYYY HH24:MI:SS')
where id = XXX; -- indicates record still exists
select *
from table
as of timestamp to_date('21102016 09:00:10', 'DDMMYYYY HH24:MI:SS')
where id = XXX; -- indicates record does not exist
-- conclusion: record was deleted in this 10 second window

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

Resources