Snowflake show tables not accessed in last 20 days - snowflake-cloud-data-platform

There is a situation where I need to clean up my database in snowflake.
we have around 40 database and each database has more than 100 table. Some are getting loaded everyday and some are not, but used everyday.
However, There has been lots of table added for testing and other purpose (by lots of developer and user).
Now we are working on cleaning up un-used table.
We have query_history table which gives us the information of query run in past, however it has field such as database, warehouse, User etc. but not table.
I was wondering is there is any way we can write a query which give us table name not used (DDL and DML b0th) in last 10 days.

select obj.value:objectName::string objName
, max(query_start_time) as QUERY_DATE_TIME
from snowflake.account_usage.access_history
, table(flatten(direct_objects_accessed)) obj
group by 1
order by QUERY_DATE_TIME desc;

The information schema has a tables view and in that you have a last altered column, will that work with you? It will not give you the last accessed table but will give the last altered table. Other than this, there are no easy way to get this information from snowflake at this time. I also needed this feature, I think we should request for this feature.
select table_schema,
table_name,
last_altered
from information_schema.tables
where table_type = 'BASE TABLE'
and last_altered < dateadd( 'DAY', -10, current_timestamp() )
order by table_schema,
table_name;

Related

SQL Query to find number of tables available in Snowflake account(including all DB and schemas)

SQL Query to get total number of tables available in Snowflake account(including all DB and schemas)
You can query account_usage.tables or information_schema.tables views to find the total number of tables:
select count(*) from information_schema.tables;
https://docs.snowflake.com/en/sql-reference/info-schema/tables.html
select count(*) from snowflake.account_usage.tables;
https://docs.snowflake.com/en/sql-reference/account-usage/tables.html
There are three ways:
You can query the view INFORMATION_SCHEMA.TABLES to find all tables of your current database. So: You have to write a SELECT COUNT(*) FROM [database].INFORMATION_SCHEMA.TABLES for each of your databases, do a UNION ALL afterwards and SUM() your results per database to get the whole number of tables in all databases.
You can query the view ACCOUNT_USAGE.TABLES to find all tables and views of your account. One row represents one table. As ACCOUNT_USAGE.TABLES also contains views, you have to add a WHERE-Klause for the attribute TABLE_TYPE. Here you also have to keep in mind that you may have a latency of 90 minutes.
Run SHOW TABLES IN ACCOUNT; to see all tables
More infos about INFORMATION_SCHEMA.TABLES: https://docs.snowflake.com/en/sql-reference/info-schema/tables.html
More infos about ACCOUNT_USAGE.TABLES: https://docs.snowflake.com/en/sql-reference/account-usage/tables.html
More infos about SHOW TABLES: https://docs.snowflake.com/en/sql-reference/sql/show-tables.html
Note: For all three ways you can only see objects for which your current role has access privileges.

How do you efficiently pull data from multiple records into 1 record

I currently have data in a table related to transactions. Each record has a purchase ID, a Transaction Number, and up to 5 purchases assigned to the transaction. Associated with each purchase there can be up to 10 transactions. For the first transaction of each purchase I need a field that is a string of each unique purchase concatenated. My solution was slow I estimated it would take 40 days to complete. What would be the most effective way to do this?
What you are looking for can be achieved in 2 steps:
Step1: Extracting the first transaction of each purchase
Depending upon your table configuration this can be done in a couple of different ways.
If your transaction IDs are sequential, you can use something like:
select * from table a
inner join
(select purchaseid,min(transactionid) as transactionid
from table group by purchaseid) b
on a.purchaseid-b.purchaseid and a.transactionid=b.transactionid
If there is a date variable driving the transaction sequence, then:
select a.* from
(select *,row_number() over(partition by purchaseid order by date) as rownum from table)a
where a.rownum=1
Step2: Concatenating the Purchase details
This can be done by using the String_agg function if you are using the latest version of SQL server. If not, the following link highlights a couple of different ways you can do this:
Optimal way to concatenate/aggregate strings
Hope this helps.

How to access the a table ABC_XXX constantly in Teradata where XXX changes periodically?

I have a table in Teradata ABC_XXX where XXX will change monthly basis.
For Ex: ABC_1902, ABC_1812, ABC_1904 etc...
I need to access this table in my application without modifying the code every month.
Is that any way to do in Teradata or any alternate solution.??
Please help
Can you try using DBC.TABLES in subquery like below:
with tbl as (select 'select * from ' || databasename||'.'||tablename as tb from
dbc.tables where tablename like 'ABC_%')
select * from tbl;
If you can get the final query executed in your application, you will be able to query the required table without editing the query.
The above solution is with expectation that the previous month's table gets dropped whenever a new month's table is created.
However, if previous table is not being dropped, then you can try the below approach:
select 'select * from db.ABC_' ||to_char(current_date,'YYMM')
Output will be
select * from db.ABC_1902
execute the output in your application, you will be able to query dynamic table.

How to get the the most recent queries in Oracle DB

I have a web application and I doubt some others have deleted some records manually. Upon enquiry nobody is admitting the mistakes. How to find out at what time those records were deleted ?? Is it possible to get the history of delete queries ?
If you have access to v$ view then you can use the following query to get it. It contains the time as FIRST_LOAD_TIME column.
select *
from v$sql v
where upper(sql_text) like '%DELETE%';
If flashback query is enabled for your database (try it with select * from table as of timestamp sysdate - 1) then it may be possible to determine the exact time the records were deleted. Use the as of timestamp clause and adjust the timestamp as necessary to narrow down to a window where the records still existed and did not exist anymore.
For example
select *
from table
as of timestamp to_date('21102016 09:00:00', 'DDMMYYYY HH24:MI:SS')
where id = XXX; -- indicates record still exists
select *
from table
as of timestamp to_date('21102016 09:00:10', 'DDMMYYYY HH24:MI:SS')
where id = XXX; -- indicates record does not exist
-- conclusion: record was deleted in this 10 second window

joining latest of various usermetadata tags to user rows

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId

Resources