I am aware of the query_history views associated with account_usage and with the information_schema query_history function family. However, I cannot find a way to monitor what queries are being sent to Snowflake from outside Snowflake. For example, I have Power BI hitting a Snowflake database but queries PBI is sending to Snowflake don't seem to appear in query history. Is this possible to see?
Queries run against Snowflake could be found in Query History view(ACCOUNT_USAGE schema has 45min data latency).
If the external tool(here Power BI) is using dedicated virtual warehouse/user/role then it is easy to filter the records by one of the columns: WAREHOUSE_ID/USER_NAME/ROLE_NAME.
Another option is to use dedicated table-valued function:
select *
from table(information_schema.query_history_by_user(USER_NAME =>'<power_bi_user>'))
order by start_time desc;
Related
I'm a SSIS Developer. I do lots of SQL stored procedure lookup concepts in SSIS. But when coming to Azure Data Factory I haven't any idea how to perform a lookup using a SQL stored procedure.
Could anyone please guide me on this?
Thanks in advance !
Jay
Azure Data Factory (ADF) is more of an ELT tool rather than ETL, therefore direct lookups are not supported. Instead, this type of operation, along with other transforms is pushed down into the compute you are actually using. For example, if you are moving data to SQL Server, Azure SQL Database or Azure SQL Data Warehouse, you would ensure all data is on the same server and use a Stored Procedure task to execute the lookups using T-SQL and joins. If you are using Azure Data Lake Analytics (ADLA) you would use the U-SQL Activity to run U-SQL or execute ADLA stored procedures, again doing lookups via joins or custom U-SQL code such as Combiner, Applier, Reducer. In fact you can use any of the ADF compute options like SQL, HDInsight (including Hive, Pig, Map Reduce, Streaming and Spark script), Machiine Learning or custom .net activities.
So you need to think about things differently with ADF. Have a look through this article to gain greater understanding of transforming data in ADF:
Transform data in Azure Data Factory
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-data-transformation-activities
As an aside, I would rarely use Lookups in SSIS as performance in early versions used to be poor. Although this has been improved in later versions, generally if you can do it in SQL you probably should. This pattern harnesses the power of SQL Server, rather than dragging data up into the SSIS pipeline, eg for the purposes of lookups (which are essentially joins) and pushing the data back out again. I reserve Data Flow transformations mainly when non-relational data is involved, eg xml or joining your email server with relational data. This is my personal view anyway : )
given the following question:
What are two possible classes of intervention that an administrator of a system database
could take to improve the performance of a query?
What is a possible answer?
first consider that DB optimizations are fairly dependent from the DB type/vendor, but here's a short list:
Fetch the "Explain plan" of the query (if your db let you do this): this is useful to understand what the DB does (using indexes, for instances) to retrieve the query results
If your query is filtering on attributes with small cardinality, create a Bitmap Index on these to speed up retrieving of table rows
If your query is using joins and your DB supports this, use a Join Index
If your query is retrieving results against a portion of a table identified by a single attribute, you could use Sharding
Here's my scenario:
I have to query two PeopleSoft Databases on different servers (both are SQL Server 2000) and do a join of the data. My application is a .Net application (BizTalk).
I'm wondering what the best option is with regards to performance?
use standard select queries to get data
and do the join in memory (e.g. LINQ) for example
generated complex dynamic queries using LINKED Server, e.g.
select blah
from Server1.HRDB.dbo.MyTable1
left join Server2.FinanceDb.dbo.MyTable2
use standard select queries to get the data into an intermediate / staging sql server database and do my queries / joins on this database instead.
should I consider using SSIS? ( are there features here that might be better than doing an in-memory, e.g. LINQ? )
I wish I could use stored procedures on the source database, but the owners of the PeopleSoft database refuse it
The main constraints we have is that the source database is old (SQL Server 2000) and that performance of the source database is paramount. Whatever queries I run on this server must not block the other users. Hence, the DBAs are adamant about no Stored Procedures. They also believe that queries involving Linked Servers will trump (i.e. take higher priority) to other queries being run against the the database.
Any feedback would be greatly appreciated.
Thanks!
Update: additional background information on the project
We are primarily integrating PeopleSoft databases (the HR and Finance) into another product. Some are simple - like AccountCode and Department. Others are more complex, like the personal data, job, and leave accrual. Some are real-time, other's are scheduled, and other's are 'batch' (e.g. at payroll runs).
Regardless, we have to get source data out of PeopleSoft database -- and my hope had been to let the (source) database do the 'heavy' lifting by executing SQL Queries. I don't really want BizTalk, or SSIS, or C# LINQ to be the ones doing the transformations/filtering.
Definitely open to suggestions.
I write a module to translate 1 sql query into another query. When users send sql queries to DB-Engine, then DB-Engine will firstly forward these queries to my defined-module before processing sql syntax.
How can I integrate my-defined module to DB-Engine of SQL Server?
You can redirect queries for certain data to different tables using a partitioned view:
http://technet.microsoft.com/en-US/library/ms188299(v=SQL.105).aspx
In a nutshell, you tell the server some rules as to which values reside in which tables (usually based on primary or foreign key ranges for example). When you query using the partition field, the database can direct your query to the correct remote table. But you can still do queries over all the tables just as if they were held locally (except more slowly).
I have a table with a bunch of queries.
What I have to do is selecting records which has query using SQL Server system function.
It would be the best to select all system function names so that I can get it done with the subquery within a where clause.
The advice would be great.
We recommended that you use the system functions, Information Schema Views, or the system stored procedures to obtain system information without directly querying the system tables.
Source:
http://msdn.microsoft.com/en-us/library/ms191238%28v=sql.90%29.aspx