Snowflake - View what tables and columns are queried the most

Snowflake - View what tables and columns are queried the most - snowflake-cloud-data-platform

is there any way within snowflake/sql query to view what tables are being queried the most as well as what columns? I want to know what data is of most value to my users and not sure how to do this programatically. Any thoughts are appreciated - thank you!

2021 update
The new ACCESS_HISTORY view has this information (in preview right now, enterprise edition).
For example, if you want to find the most used columns:
select obj.value:objectName::string objName
, col.value:columnName::string colName
, count(*) uses
, min(query_start_time) since
, max(query_start_time) until
from snowflake.account_usage.access_history
, table(flatten(direct_objects_accessed)) obj
, table(flatten(obj.value:columns)) col
group by 1, 2
order by uses desc
Ref: https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
2020 answer
The best I found (for now):
For any given query, you can find what tables are scanned through looking at the plan generated for it:
SELECT *, "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.any_table_or_view')))
WHERE "operation"='TableScan'
You can find all of your previous ran queries too:
select QUERY_TEXT
from table(information_schema.query_history())
So the natural next step would be combine both - but that's not straightforward, as you'll get an error like:
SQL compilation error: argument 1 to function EXPLAIN_JSON needs to be constant, found 'SYSTEM$EXPLAIN_PLAN_JSON('SELECT * FROM a.b.c')'
The solution would be to combine the queries from the query_history() with the SYSTEM$EXPLAIN_PLAN_JSON outside (to make the strings constant), and then you will be able to find out the most queried tables.

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

Actually I am building a Skype like tool wherein I have to show last 10 distinct users who have logged in my web application.
I have maintained a table in sql-server where there is one field called last_active_time. So, my requirement is to sort the table by last_active_time and show all the columns of last 10 distinct users.
There is another field called WWID which uniquely identifies a user.
I am able to find the distinct WWID but not able to select the all the columns of those rows.
I am using below query for finding the distinct wwid :
select distinct(wwid) from(select top 100 * from dbo.rvpvisitors where last_active_time!='' order by last_active_time DESC) as newView;
But how do I find those distinct rows. I want to show how much time they are away fromm web apps using the diff between curr time and last active time.
I am new to sql, may be the question is naive, but struggling to get it right.

If you are using proper data types for your columns you won't need a subquery to get that result, the following query should do the trick
SELECT TOP 10
[wwid]
,MAX([last_active_time]) AS [last_active_time]
FROM [dbo].[rvpvisitors]
WHERE
[last_active_time] != ''
GROUP BY
[wwid]
ORDER BY
[last_active_time] DESC
If the column [last_active_time] is of type varchar/nvarchar (which probably is the case since you check for empty strings in the WHERE statement) you might need to use CAST or CONVERT to treat it as an actual date, and be able to use function like MIN/MAX on it.
In general I would suggest you to use proper data types for your column, if you have dates or timestamps data use the "date" or "datetime2" data types
Edit:
The query aggregates the data based on the column [wwid], and for each returns the maximum [last_active_time].
The result is then sorted and filtered.
In order to add more columns "as-is" (without aggregating them) just add them in the SELECT and GROUP BY sections.
If you need more aggregated columns add them in the SELECT with the appropriate aggregation function (MIN/MAX/SUM/etc)
I suggest you have a look at GROUP BY on W3
To know more about the "execution order" of the instruction you can have a look here

You can solve problem like this by rank ordering the results by a key and finding the last x of those items, this removes duplicates while preserving the key order.
;
WITH RankOrdered AS
(
SELECT
*,
wwidRank = ROW_NUMBER() OVER (PARTITION BY wwid ORDER BY last_active_time DESC )
FROM
dbo.rvpvisitors
where
last_active_time!=''
)
SELECT TOP(10) * FROM RankOrdered WHERE wwidRank = 1

If my understanding is right, below query will give the desired output.
You can have conditions according to your need.
select top 10 distinct wwid from dbo.rvpvisitors order by last_active_time desc

Group by an evaluated field (sql server) [duplicate]

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.

One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.

There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.

I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.

use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year

databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.

Order BY is not supported in view in sql server

i am trying to create a view in sql server.
create view distinct_product as
select distinct name from stg_user_dtlprod_allignmnt_vw order by name;
this is showing an error.
error message is:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
plz help me out where i am wrong.

You could use TOP with a number that is greater than the number of records:
CREATE VIEW [dbo].[distinct_product]
AS
SELECT DISTINCT TOP 10000000 name
FROM stg_user_dtlprod_allignmnt_vw
ORDER BY name
You cannot use TOP 100 PERCENT since the optimizer recognizes that TOP 100 PERCENT qualifies all rows and does not need to be computed at all, so the ORDER BY wouldn't be guaranteed.

A view cannot be sorted with an ORDER BY clause. You need to put the ORDER BY clause into any query that references the view.
A view is not materialized - the data isn't stored, so how could it be sorted? A view is kind of like a stored procedure that just contains a SELECT with no parameters... it doesn't hold data, it just holds the definition of the query. Since different references to the view could need data sorted in different ways, the way that you do this - just like selecting from a table, which is also an unsorted collection of rows, by definition - is to include the order by on the outer query.

You can't order a view like that when it's created as the message states, unless you follow the other answers from Tim / Raphael, but you can order results selected from a view:
So create it in step 1:
create view distinct_product as
select distinct name
from stg_user_dtlprod_allignmnt_vw
Then order it when you retrieve data:
select *
from distinct_product
order by name

TSQL: order by asc without column name

I'm quite new to SQL Server. Now I came across a query like this:
SELECT country FROM Hovercraft.Orders GROUP BY country ORDER BY ASC
There is no column name given in the order by clause. Is this possible? SSMS says no.
Jörg

It's probably a misprint - you have to specify what you are ordering by; this can be a column name, an expression or the number of a column in the output. It's most likely that the query you have seen was one of the latter, that simply omitted the column number 1 - like so:
SELECT country FROM Hovercraft.Orders GROUP BY country ORDER BY 1 ASC
- so this would order by the contents of the first column of output (ie. country).

I agree with #Mahmoud Gamal. But, also, it's possible to write such hack like this -
SELECT o.country, const_column = 1
FROM Hovercraft.Orders o
GROUP BY o.country
ORDER BY const_column ASC
In this case, sorting will be performed, but rows' order will not be changed.
On MS SQL 2005:
On MS SQL 2012:

This is not possible !..
Order by clause always require column name or column number .
Would you please response me why you want this kind of situation , I think you are working with dynamic query or else please let me know.
As per SQL Standard this not possible.
Thanks.

When to use with clause in sql

Can Anybody tell me when to use with clause.

The WITH keyword is used to create a temporary named result set. These are called Common Table Expressions.
A very basic, self-explanatory example:
WITH Administrators (Name, Surname)
AS
(
SELECT Name, Surname FROM Users WHERE AccessRights = 'Admin'
)
SELECT * FROM Administrators
For further reading and more examples, I suggest starting out with the following MSDN article:
Common Table Expressions by John Papa

In SQL Server you sometimes need the WITH clause to force a query to use an Index. This is often a necessity in spatial queries that can reduce query time from 1 minute to a few seconds.
select * from MyTable with(index(MySpatialIndex)) where...