Select on External table running very very slow on Azure SQL

Select on External table running very very slow on Azure SQL - sql-server

We have external table created, we need to run select on the table and select all the records, the select runs very very slow. Its not completing even after 30 mins, the table contains around 2millon recs
We also need to query this table from another DB and even this runs very very slow, doesn't return even after 30 mins.
Select is of the form:
select col1, col2,...col3 from ext_table;
Need help in:
1. Any suggestions on reducing the time taken for execution?
Note: we need to select entire content of the table so where condition might not be used.
Thanks in advance.

If you are not using the WHERE clause to push parameters to the remote database, then there is no way to optimize the performance of the query. You are returning the whole table.
My suggestion is to use SQL Data Sync to have a local copy of the table on this SQL Database that synchronizes with the remote Azure SQL Database at X interval of time.

Related

Executing query from SSDT

I'm using Visual Studio 2015, SSIS to run set of sql tasks in Execute Sql task and then do a data transfer between tables which are in SSMS by executing package in SSIS. When we run a series of sql statements on SSMS, we get results like rows effected for every sql successful activity. However, now I want to automate the process using SSIS to reduce the turn around time. I would like to get the rows effected for every sql query like select, insert, delete which are in execute sql task. How can it be done in SSIS? I don't have dbo_owner permission to stored procedures in SSMS. I'm thinking SSIS would be a quick way. But it is very important for me to make a log of rows effected to validate the data, as it is financial data. I have nearly 10 sql statements in each sql task like select and delete. But the output is only one table.
For example my sql task is like below
select * from dbo.table1;
select * from dbo.table2 where city = 'Chicago';
create dbo.table3(id int, name varchar(50);
insert into dbo.table3(1,'a');
select * from dbo.table3;
If I execute this in SSMS I get rows effected for each select statement and also table is created. If I execute the same through package in SSIS, how will get messages for each of them?

I assume your data lies on SQL Server. With selects, you could use data flow tasks and row counts instead of Excecute Sql's.
For inserts and updates there's a few ways to get affected rowcount, like this: https://stackoverflow.com/a/1834264/5605866
or like this: http://microsoft-ssis.blogspot.fi/2011/03/rowcount-for-execute-sql-statement.html
Basically the same thing but with a bit different syntax.

You can use the Row Count transaformation after the Data source and save it the variable. Can refer to this get the number of rows returned from the Source that SHOULD be processed.
Hope this help.

How to create a "Ghost Table" in SQL Server based off of other tables?

I need to create a "ghost" table in SQL Server, which doesn't actually exist but is a result set of a SQL Query. Pseudo code is below:
SELECT genTbl_col1, genTblcol2
FROM genTbl;
However, "genTbl" is actually:
SELECT table1.col AS genTbl_col1,
table2.col AS genTbl_col2
FROM table1 INNER JOIN table2 ON (...)
In other words, I need that every time a query is run on the server trying to select from "genTbl", it simply creates a result set from the query and treats it like a real table.
The situation is that I have a software that runs queries on a database. I need to modify it, but I cannot change the software itself, so I need to trick it into thinking it can actually query "genTbl", when it actually doesn't exist but is simply a query of other tables.
To clarify, the query would have to be a sort of procedure, available by default in the database (i.e. every time there is a query for "genTbl").

Use #TMP
SELECT genTbl_col1, genTblcol2
INTO #TMP FROM genTbl;
It exists only in current session. You can also use ##TMP for all sessions.

How to delete a table after a period of inactivity?

For the purpose of my project I cannot use session based temp tables. They need to be persistent but automatically deleted after a certain period of inactivity (no CRUD performed). Is this at all possible?

You can use the SQL Server Agent to Schedule a Job that calls a Stored Procedure that does this work for you. (How to Schedule a Job?)
How do you identify the tables that have not updated since X amount of time ?
Use this Query:
SELECT OBJECT_NAME(OBJECT_ID) AS TableName, last_user_update,
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID('DatabaseName')
AND OBJECT_NAME(OBJECT_ID) LIKE '%%' -- Here is the template name for your tables
AND DATEDIFF(MINUTE, last_user_update, GETDATE()) > 10 -- Last updated more than 10 minutes
Now that you have the tables to be deleted, you can use whatever logic you want to DROP them (Cursor, While, Procedure)

Sure it is. Write it into your program layer.
AUTOMATICALLY - within SQL Server: no. Well, you cold use the agent to start a script regularly.
Tracking what "inactivity" means - your responsibility.

You need save modification date of this table somewhere (for example in the same table or in another special table) and then you can create job, which checks last modification date and then drops the table.

sys.event_log in Azure database select query times out

I need to diagnose some issues in production but I cannot query the event_log, query times out.
I was trying to executing the following query on Master database in my Azure database,
select * from sys.event_log where start_time>='2016-02-20:12:00:00' and end_time<='2016-02-20 12:00:00'
Query starts executing, and runs over more than 8 mins and Cancels query execution. I am pretty sure that the eventlog must be a very large one in this database server. How to overcome this situation and query the sys.event_log table?
Even the top 10 query times out. Need some help!

Query I ran was, this might also get a time out, just keep trying (worked for me in the 3rd time)
SELECT *
,CAST(event_data AS XML).value('(/event/#timestamp)[1]', 'datetime2') AS TIMESTAMP
,CAST(event_data AS XML).value('(/event/data[#name="error"]/value)[1]', 'INT') AS error
,CAST(event_data AS XML).value('(/event/data[#name="state"]/value)[1]', 'INT') AS STATE
,CAST(event_data AS XML).value('(/event/data[#name="is_success"]/value)[1]', 'bit') AS is_success
,CAST(event_data AS XML).value('(/event/data[#name="database_name"]/value)[1]', 'sysname') AS database_name
FROM sys.fn_xe_telemetry_blob_target_read_file('el', NULL, NULL, NULL)
WHERE object_name = 'database_xml_deadlock_report'
This gives very useful details in the xml data field.
Use an XML viewer to view details. I used XMLGrid.
It will show what are the two processes (deadlock victim and winner) and the good news is that it gives you the SQL statements those processes were trying to execute.
In my case two processes were trying to update one data table, but two different rows. Winner process was using a SQL "Merge" which creates a table lock for the row update. Solution was I changed that Merge query to use SQL UPDATE.

Connection scoped temp tables across stored procedures

I'm working on a data virtualization solution. The user is able to write his own SQL queries as filters for a query i make. I would like not having to run this filter query every time i select something from the database(It will likely be a complex series of joins).
My idea was to use a # temp table at script level and keep the connection alive. This #temp table would then be selected from but updated only when the user changes the filter. The idea being i can actually use it from stored procedures and the table is scoped to that connection.
I got the idea from someone who suggested to use dynamic sql and ## global temp tables named with the connection process ID so to make each connection have a unique global temp table. This was to overcome sharing temp tables across stored procedures. But it seems a bit clumsy.
I did a quick test with the below code and seemed to work fine
-- Run script at connection open from some app
SELECT * INTO #test
FROM dataTable
-- Now we can use stored procedures with #test table
EXECUTE selectFromTempTable
EXECUTE updateTempTable #sqlFilterString
EXECUTE selectFromTempTable
Only real problem i can see is the connection have to be kept alive for the duration which can be a few hours maybe. A single user can have multiple connections running at the same time. The number of users on a single database server would be like max 20.
If its a huge issue i could make it so the application can close and open them as needed so each user only have 1 connection open at a time. And maybe even then close it if not in use, and reopen when needed again with the delay of having to wait for the query to run.
Would this be bad practice? or kill any performance benefit from not running the filter query? This is on SQL Server 2008 and up.

I think I would create a permanent table, using the spid (process ID) as a key value. Each connection has its own process ID, so anyone can use it to identify their entries in the table:
create table filter(
spid int,
filternum int,
filterstring varchar(255),
<other cols> );
create unique index filterindx on filter(spid, filternum);
Then when a user creates filter entries:
delete from filter where spid = ##spid
insert into filter(spid, filternum, filterstring) select ##spid, 1, 'some sql thing'
insert into filter(spid, filternum, filterstring) select ##spid, 2, 'some other sql thing'
Then you can access each user's filter values by selecting where spid = ##spid etc

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight