How can I get information about table usage? - snowflake-cloud-data-platform

I'm new to Snowflake and trying to get a ranking of table usage which means how many queries requested in a certain period of time for each table. I found out this link but could get query text not a table name. Should I parse the query text and extract table names or find another fancy way? If the first one is true, is there a good SQL parser library for python?

You can use the SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view to see object usage.
This will get you started. The flatten function turns an array in BASE_OBJECTS_ACCESSED into rows so you can aggregate, count, etc. Check the VALUE column for what you can count, aggregate, etc.
select * from
"SNOWFLAKE"."ACCOUNT_USAGE"."ACCESS_HISTORY",
table(flatten(BASE_OBJECTS_ACCESSED))
;

Related

Specify columns while appending Snowpark Python Dataframe to table

So right now, I have a Dataframe created using the session.createDataFrame() in Python. The intention is to append this Dataframe to an existing table object in Snowflake.
However the schema of the source dataframe doesn't match exactly with the schema of the target table. In Snowpark Scala, the DataFrameWriter object has the method option() Saving/Appending Dataframe to a table that allows the specification of column order, and hence allows for skipping columns from the dataframe as the columns could be matched by their names.
However, Snowpark Python lacks the option() for DataframeWriter at the moment. This forces Snowflake to look for the schemas and the count of columns (between source and target ) to match, else an error is thrown.
Not sure when Snowpark for Python would receive this feature, but in the interim, is there any alternative to this (apart from hardcoding columns names in the INSERT query) ?
You are right that Snowpark does not make inserting novel records easy. But it is possible. I did it with the Snowpark Java SDK that lacked any source/docs just banging my head on the desk until it worked.
I first did a select against the target table (see first line), then got the schema, then created a new Row object with the correct order and types. Use column "order" mode not the column "name" mode. It's also really finicky about types - doesn't like java.util.Dates but wants Timestamps, doesn't like Integers but need Longs, etc.
Then do an "append"->"saveAsTable". By some miracle it worked. Agreed it would be fantastic if they accepted a Map<String, Object> to insert a row or let you map columns using names. But they probably want to discourage this given the nature of warehouse performance for row based operations.
In Java...
DataFrame dfSchema = session.sql("select * from TARGET_TABLE limit 1");
StructType schema = dfSchema.schema();
System.out.println(schema);
Row[] rows = new Row[]{Row.fromArray(new Object[]{endpoint.getDatabaseTable(), statusesArr, numRecords, Integer.valueOf(filenames.size()).longValue(), filenamesArr, urlsArr, startDate, endDate})};
DataFrame df = session.createDataFrame(rows, schema);
System.out.println(df.showString(0, 120));
df.write().mode("Append").saveAsTable("TARGET_TABLE");
In the save_as_table method, use the parameter column_order="name". See Snowflake save_as_table docs. This should match the columns by name and allow you to omit missing columns without the column number mismatch error.
It's also good practice to include a schema when creating your session. See Snowflake create_dataframe docs on using the StructType class.

Mule Salesforce connector in query operation. List of Maps as input

I have a huge list of Ids and Names for Contacts defined in this simple DW script. Let's show just two for simplicity:
[
{id:'169057',Name:'John'},
{id:'169099',Name:'Mark'}
]
I need salesforce connector to execute a single query (not 1000 unitary queries) about this two fields.
I expect the connector to be able to use a List of Maps as input, as it does using update/insert operations. So, I tried with this Connector config:
I am getting as response a ConsumerIterator. If I do a Object to String transformer but I am getting an empty String as response.
I guess must be a way of executing a big query in just one API call... But I am not finding it. Have in mind I need to use two or more WHERE clauses.
Any idea? Thanks!
I think you need to create two separate list for id and Name values from your input and then use those list in the query using IN.
Construct id list to be used in where clause. This can be executed inside for each with batch size 1000.
Retrieve name and id from contact for each batch
select id, name from contact where id in (id list)
Get the desired id's by matching name retrieved in step 2 within mule.

TSQL - Get maximum length of data in every column in every table without Dynamic SQL

Is there a way to get maximum length of data stored in every column in the database? I have seen some solutions which used Dynamic SQL, but I was wondering if it can be done with a regular query.
Yes, Just query the INFORMATION_SCHEMA.COLUMNS view for the database, you can get the information out from all columns of all tables in the database if you desire, see the following for more details:
Information_Schema - COLUMNS
If you are talking about the length of particular data in and not the declared length of a column, I am afraid that is not achievable without dynamic SQL.
The reason is that there is only way to retrieve data, and that is the SELECT statement. This statement however requires an explicit column, which is part of the statement itself. There is nothing like
-- This does not work
select col.Data
from Table
where Table.col.Name='ColumnName'
So the answer is: No.

Fastest way to map a list of names in an excel doc to their IDs in a lookup table?

For one of my projects I display a list of counties in a drop down list (this list comes from a lookup table containing all counties). The client just requested that I limit it to a subset of their choice. The subset was given to me in an excel spreadsheet containing only names (seen below):
I'm trying to figure out the quickest way possible for me to map each of these to their corresponding id in the original lookup table. The client cannot give me these IDs. The names in here match the names in my table (except for the case).
This will most likely be a one time only thing.
Can anyone suggest a fast way to get these values into a query so I don't have to manually do it?
When I say fast I'm not talking about processing speed, just the fastest start to finish time that results in me getting the corresponding IDs using any tool available.
Note: I'm aware that I could have probably done this manually in the time it will take to get an answer, but I'd like to know for future reference.
You could do an External Data Query into another Excel sheet with
SELECT countryname, countryid FROM countries
then use a VLOOKUP to get the ids into the client provided sheet
=VLOOKUP(A1,Sheet2!$A$1:$B$200,2,FALSE)
See http://excelusergroup.org/blogs/nickhodge/archive/2008/11/04/excel-2007-getting-external-data.aspx for creating External Data Table in Excel 2007. Skip down to the "Other Sources" part.
Put the list in a text file. Write a powershell script which will get the contents of that file and then query your database to output the keys. Here is a rough, rough example.
get-content c:\list.txt | ForEach-Object {invoke-sqlcmd -E -query"select blah blah where county =" _$} | Format-Table
If you have access to SSIS, you could probably do a join between the excel source and your table.
You could load the excel sheet in to a temp table to take advantage of all your SQL query knowledge.
I believe (and yes it is true) that SQL can create a linked server out of a spreadsheet. Then you get to joining and you're done.

Return dataset in dataflow

Could I get ideas on retrieving the dataset using lookup method. Basically, my scenario as I have source data needs to lookup for other source table and on matching column from source I need to get all the records from other source data.
its a one to many relations. I tried Lookup but gives only one record on matching condition, OLE DB command don't retrieve any data as it will do only Insert/Update operations.
Thanks
prav
If you want to use a Lookup Component then the two columns you match on must be exact. To clarify, if you are doing a Lookup on a varchar-type column and only finding one match it may be because there is only one exact match - trying doing a SELECT..FROM..JOIN..WHERE statement to confirm. If there are matches but they aren't going through the Lookup check your source data after it comes out of the OLEDB source (it may need to be trimmed).
If exact matching isn't necessary, you could try Fuzzy Lookup which allows you to specific how close (by giving a percentage) you want the matching columns to be.
This has solved using the script component, which will prepare the sql script then execute so in a single hit I could get the full result set as it is not possible with lookup to retrieve result set. On matching look up will return only one row even though multiple key matches.
thanks
prav

Resources