extract table from query_text in history in snowflake - snowflake-cloud-data-platform

I have to know most queried table in snowflake , so want to extract table name from query_text in Snowflake_query_history table. Is there any way to do it in SQL.

Instead of parsing query_text you could use ACCESS_HISTORY views.
Query could look like:
SELECT f1.value:"objectId" AS table_id, COUNT(*) AS cnt
FROM "SNOWFLAKE"."ACCOUNT_USAGE".access_history
,LATERAL flatten(base_objects_accessed) f1
WHERE f1.value:"objectDomain"::string='Table'
AND query_start_time >= dateadd('day', -30, current_timestamp()) -- last 30 days
GROUP BY table_id
ORDER BY cnt DESC;
The actual table name could be found in SNOWFLAKE.ACCOUNT_USAGE.TABLES using TABLE_ID column as a lookup.

See also: Snowflake - View what tables and columns are queried the most
It is quite difficult to do this with high accuracy, but if it is acceptable to get reasonably close then the following may be sufficient.
This query will not handle joins, unions, subqueries, multiple CTEs, etc -- which would be difficult to do with REGEXP and would be better suited to be done in Python via the Snowflake Python connector with something like pglast.
select REGEXP_SUBSTR(query_text, ' from ([^\\ ]*)', 1, 1, 'ie', 1) table_name,count(*) query_count from "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" group by 1 order by 2 desc;
TABLE_NAME QUERY_COUNT
table_a 95
"SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" 5

Related

SQL how to find distinct values between two tables?

We have an internal and an external table. The internal table is actually a copy of the external table with some fields renamed and they are roughly the same. For some reason, the data in the internal table might not match the external table because of inappropriate operation. Here is the case:
SELECT COUNT(*) AS [Total Rows]
FROM [dbo].[Auct_Car_Ex];
-- (ANS.) 76716
SELECT COUNT(*) AS [Total Rows]
FROM [dbo].[Auct_Car];
-- (ANS.) 76716
They have the same number of rows.
SELECT COUNT(DISTINCT([HORSEPOWER]))
FROM [dbo].[Auct_Car_ex];
-- (ANS.) 459
SELECT COUNT(DISTINCT([Horsepower]))
FROM [dbo].[Auct_Car];
-- (ANS.) 458
However, the number of distinct Horsepower is different. I'd like to know which value of HORSEPOWER exists in Auct_Car_ex but not in Auct_Car. How can I find it?
Just use EXCEPT
SELECT acx.HORSEPOWER
FROM dbo.Auct_Car_ex acx
EXCEPT
SELECT ac.Horsepower
FROM dbo.Auct_Car ac;
yes,it is easy by sub-query.
SELECT [HORSEPOWER]
FROM [dbo].[Auct_Car_ex]
WHERE [HORSEPOWER] NOT IN (
SELECT [Horsepower]
FROM [dbo].[Auct_Car]
GROUP BY [Horsepower]
)
GROUP BY [HORSEPOWER]
This looks like a case for not exists
select horsepower
from Auct_Car_ex x
where not exists (
select * from Auct_Car a
where a.horsepower = x.horsepower
);

Use of FireDAC's macrofunction LIMIT on subqueries

I want to know if there are any restrictions on using FireDAC's "LIMIT" macrofunction on subqueries like this:
SELECT TOP 10 * FROM TABLE1 WHERE NOT EXISTS( SELECT TOP 1 FIELD1 FROM TABLE2 )
With LIMIT applied it would be as follows:
SELECT {LIMIT(0,10)} * FROM TABLE WHERE NOT EXISTS( SELECT {LIMIT(0,10)} FIELD1 FROM TABLE2 )
If so, I would like to know what alternatives exist to limit the number of rows returned in a subquery considering that it is necessary to be compatible with more than one database manager (Oracle and SQL Server).
This is a simplified use case, in a real scenario it is expected to use this macrofunction in much more complex queries.
Thanks in advance.

Select distinct records from MS SQL database when querying row numbers

This query returns 5 identical products, because there are 5 keywords associated with the resulting product:
SELECT
products.field1,
products.field2
FROM products,
keywords
WHERE products.itemnum = keywords.itemnum
AND products.itemnum = 123
ORDER BY products.field1, products.field2
If I put a "distinct" after "select", then I get 1 result, which is what I want.
However, when I setup my query like this:
SELECT
*
FROM (SELECT
ROW_NUMBER() OVER (ORDER BY products.field1, products.field2) AS rownum,
products.field1,
products.field2
FROM products,
keywords
WHERE products.itemnum = keywords.itemnum
AND products.itemnum = 123) AS qryresults
WHERE rownum >= 1
AND rownum <= 20
I get 5 identical products again. There doesn't seem to be anywhere I can put a "distinct" statement to limit it to 1 result. I'm sure the reason is that by adding the row numbers, that doesn't make the results "distinct" anymore.
I am using the technique shown in this query to limit potentially large search results to only 20 records at a time, which greatly reduces overhead and speeds up my query. So if there are 100,000 results, I can easily set this up to return records 90,000-90,020, for example.
MySQL has this kind of thing built-in, but with MS SQL this is the workaround.
However, I am having trouble figuring out how to make it work when I am combining the keywords table.
If I replace the * with a list of columns, then I get an error:
The multi-part identifier could not be bound.
I'm not sure what else to try. Is there a way to correct this?
Thank you.
Use a CTE to separate the distinct and the ROW_NUMBER() function:
with cte as (select distinct products.field1
, products.field2
from products, keywords
where products.itemnum=keywords.itemnum and products.itemnum=123),
row_n as (select field1
, field2
, row_number() over (order by field1, field2) as rownum
from cte)
select field1, field2
from row_n
where rownum>=1 and rownum<=20

sql server get several group by results from one query result

Is it possible to to retrieve several group by results from one query?
Currently I've a freetext book-title search system which returns the top X rows:
First it queries the book-titles
SELECT TOP 16 grouped_location, grouped_author, book_title
FROM books
WHERE book_title like '%foo%'
then it queries the location group
SELECT grouped_location, COUNT(*)
FROM books
WHERE book_title like '%foo%'
GROUP BY grouped_location
then it queries the author group: ....
Is it possible to retrieve this information with one search?
I have no problem by sending multiple command to the SQL server, but the goal is that the SQL server only performs one search and not using up all resources by searching three times.
Please keep in mind that a client-side solution, by returning all records to the client and calculate the grouped results, is not an option. It requires to only return the TOP X records due to performance reasons.
This query will give you row détails, with count by grouped_location for each row.
Change the ORDER BY to meet your requirements
select top 16 grouped_location, grouped_author, book_title,
count(*) over (partition by grouped_location) as [count]
FROM books
WHERE book_title like '%foo%'
-- order by grouped_author or some other column
order by [count] desc
To see information only grouped by grouped_location, you could so something like this..
SELECT grouped_location , COUNT(*) Totals
FROM
(
SELECT TOP 16 grouped_location, grouped_author, book_title
FROM books
WHERE book_title like '%foo%'
ORDER BY Some_Column
) Q
GROUP BY grouped_location
To see information grouped by All the columns, you could so something like this..
SELECT grouped_location, grouped_author, book_title, COUNT(*) Totals
FROM
(
SELECT TOP 16 grouped_location, grouped_author, book_title
FROM books
WHERE book_title like '%foo%'
ORDER BY Some_Column
) Q
GROUP BY grouped_location, grouped_author, book_title

SQL query that applies aggregate function over another aggregate function

I have several query results that use one or more aggregate functions and a date GROUP-BY so they look something like this:
Date VisitCount(COUNT) TotalBilling(SUM)
1/1/10 234 15765.21
1/2/10 321 23146.27
1/3/10 289 19436.51
The simplified SQL for the above is:
SELECT
VisitDate,
COUNT(*) AS VisitCount,
SUM(BilledAmount) AS TotalBilling
FROM Visits
GROUP BY VisitDate
What I would like is a way to apply an aggregate function such as AVG to one of the columns in the result set. For example, I would like to add "AvgVisits" and "AvgBilling" columns to the result set like this:
Date VisitCount(COUNT) TotalBilling(SUM) AvgVisits AvgBilling
1/1/10 234 15765.21 281.3 19449.33
1/2/10 321 23146.27 281.3 19449.33
1/3/10 289 19436.51 281.3 19449.33
SQL does not permit the application of an aggregate function to another aggregate function or a subquery, so the only ways I can think to do this are by using a temporary table or by iterating through the result set and manually calculating the values. Are there any ways I can do this in MSSQL2008 without a temp table or manual calculation?
with cteGrouped as (
SELECT
VisitDate,
COUNT(*) AS VisitCount,
SUM(BilledAmount) AS TotalBilling
FROM Visits
GROUP BY VisitDate),
cteTotal as (
SELECT COUNT(*)/COUNT(DISTINCT VisitDate) as AvgVisits,
SUM(BilledAmount)/COUNT(DISTINCT VisitDate) as AvgBilling
FROM Visits)
SELECT *
FROM cteGrouped
CROSS JOIN cteTotal;
You can achieve the same with sub-queries, I just find CTEs more expressive.
Something Similar to
select *,avg(visitcount) over(),
avg(totalbilling) over()
from(
SELECT
VisitDate,
COUNT(*) AS VisitCount,
SUM(BilledAmount) AS TotalBilling
FROM Visits
GROUP BY VisitDate) as a
Well if you are specifically trying to avoid using temporary tables i believe it could be done using a Common Table Expression.

Resources