Resource Utilization for External Table Refresh - snowflake-cloud-data-platform

Resource Utilization for External Table Refresh - snowflake-cloud-data-platform

What compute is utilized to refresh an external table.
When executing
alter external table x refresh 'custom_location' I am not seeing any activity in warehouse tab.
But, the statement ran into timeout after running for 1 hour.

The external tables refresh execution is a metadata management feature that uses the cloud services layer of Snowflake to perform its work. It does not use a virtual warehouse from your account (although, some charges are incurred).
Your question includes an issue but lacks information on how many files were added before a manual refresh was performed, their format, if partition columns are in use, etc.
The refresh work can take longer dependent on various factors. If it is a repeating issue and the incremental automatic refresh feature is not an option for you, consider logging a support case to have them investigate.

Related

How to set Azure SQL to rebuild indexes automatically?

In on premise SQL databases, it is normal to have a maintenance plan for rebuilding the indexes once in a while, when it is not being used that much.
How can I set it up in Azure SQL DB?
P.S: I tried it before, but since I couldn't find any options for that, I thought maybe they are doing it automatically until I've read this post and tried:
SELECT
DB_NAME() AS DBName
,OBJECT_NAME(ps.object_id) AS TableName
,i.name AS IndexName
,ips.index_type_desc
,ips.avg_fragmentation_in_percent
FROM sys.dm_db_partition_stats ps
INNER JOIN sys.indexes i
ON ps.object_id = i.object_id
AND ps.index_id = i.index_id
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), ps.object_id, ps.index_id, null, 'LIMITED') ips
ORDER BY ps.object_id, ps.index_id
And found out that I have indexes that need maintaining

Update: Note that the engineering team has published updated guidance to better codify some of the suggestions in this answer in a more "official" from Microsoft place as some customers asked for that. SQL Server/DB Index Guidance. Thanks, Conor
original answer:
I'll point out that most people don't need to consider rebuilding indexes in SQL Azure at all. Yes, B+ Tree indexes can become fragmented, and yes this can cause some space overhead and some CPU overhead compared to having perfectly tuned indexes. So, there are some scenarios where we do work with customers to rebuild indexes. (The primary scenario is when the customer may run out of space, currently, as disk space is somewhat limited in SQL Azure due to the current architecture). So, I will encourage you to step back and consider that using the SQL Server model for managing databases is not "wrong" but it may or may not be worth your effort.
(If you do end up needing to rebuild an index, you are welcome to use the models posted here by the other posters - they are generally fine models to script tasks. Note that SQL Azure Managed Instance also supports SQL Agent which you can also use to create jobs to script maintenance operations if you so choose).
Here are some details that may help you decide if you may be a candidate for index rebuilds:
The link you referenced is from a post in 2013. The architecture for SQL Azure was completely redone after that post. Specifically, the hardware architecture moved from a model that was based on local spinning disks to one based on local SSDs (in most cases). So, the guidance in the original post is out of date.
You can have cases in the current architecture where you can run out of space with a fragmented index. You have options to rebuild the index or to move to a larger reservation size for awhile (which will cost more money) that supports a larger disk space allocation. [Since the local SSD space on the machines is limited, reservation sizes are roughly linked to proportions of the machine. As we get newer hardware with larger/more drives, you have more scale-up options].
SSD fragmentation impact is relatively low compared to rotating disks since the cost of a random IO is not really any higher than a sequential one. The CPU overhead of walking a few more B+ Tree intermediate pages is modest. I've usually seen an overhead of perhaps 5-20% max in the average case (which may or may not justify regular rebuilds which have a much bigger workload impact when rebuilding)
If you are using query store (which is on by default in SQL Azure), you can evaluate whether a specific index rebuild helps your performance visibly or not. You can do this as a test to see if your workload improves before bothering to take the time to build and manage index rebuild operations yourself.
Please note that there is currently no intra-database resource governance within SQL Azure for user workloads. So, if you start an index rebuild, you may end up consuming lots of resources and impacting your main workload. You can try to time things to be done off-hours, of course, but for applications with lots of customers around the world this may not be possible.
Additionally, I will note that many customers have index rebuild jobs "because they want stats to be updated". It is not necessary to rebuild an index just to rebuild the stats. In recent SQL Server and SQL Azure, the algorithm for stats update was made more aggressive on larger tables and the model for how we estimate cardinality in cases where customers are querying recently inserted data (since the last stats update) have been changed in later compatibility levels. So, it is often the case that the customer doesn't even need to do any manual stats update at all.
Finally, I will note that the impact of stats being out of date was historically that you'd get plan choice regressions. For repeated queries, a lot of the impact of this was mitigated by the introduction of the automatic tuning feature over query store (which forces prior plans if it notices a large regression in query performance compared to the prior plan).
The official recommendation that I give customers is to not bother with index rebuilds unless they have a tier-1 app where they've demonstrated real need (benefits outweigh the costs) or where they are a SaaS ISV where they are trying to tune a workload over many databases/customers in elastic pools or in a multi-tenant database design so they can reduce their COGS or avoid running out of disk space (as mentioned earlier) on a very big database. In the largest customers we have on the platform, we sometimes see value in doing index operations manually with the customer, but we often do not need to have a regular job where we do this kind of operation "just in case". The intent from the SQL team is that you don't need to bother with this at all and you can just focus on your app instead. There are always things that we can add or improve into our automatic mechanisms, of course, so I completely allow for the possibility that an individual customer database may have a need for such actions. I've not seen any myself beyond the cases I mentioned, and even those are rarely an issue.
I hope this gives you some context to understand why this isn't being done in the platform yet - it just hasn't been an issue for the vast majority of customer databases we have today in our service compared to other pressing needs. We revisit the list of things we need to build each planning cycle, of course, and we do look at opportunities like this regularly.
Good luck - whatever your outcome here, I hope this helps you make the right choice.
Sincerely,
Conor Cunningham
Architect, SQL

You can use Azure Automation to schedule index maintenance tasks as explained here :Rebuilding SQL Database indexes using Azure Automation
Below are steps :
1) Provision an Automation Account if you don’t have any, by going to https://portal.azure.com and select New > Management > Automation Account
2) After creating the Automation Account, open the details and now click on Runbooks > Browse Gallery
Type on the search box the word “indexes” and the runbook “Indexes tables in an Azure database if they have a high fragmentation” appears:
4) Note that the author of the runbook is the SC Automation Product Team at Microsoft. Click on Import:
5) After importing the runbook, now let’s add the database credentials to the assets. Click on Assets > Credentials and then on “Add a credential…” button.
6) Set a Credential name (that will be used later on the runbook), the database user name and password:
7) Now click again on Runbooks and then select the “Update-SQLIndexRunbook” from the list, and click on the “Edit…” button. You will be able to see the PowerShell script that will be executed:
8) If you want to test the script, just click on the “Test Pane” button, and the test window opens. Introduce the required parameters and click on Start to execute the index rebuild. If any error occurs, the error is logged on the results window. Note that depending on the database and the other parameters, this can take a long time to complete:
9) Now go back to the editor, and click on the “Publish” button enable the runbook. If we click on “Start”, a window appears asking for the parameters. But as we want to schedule this task, we will click on the “Schedule” button instead:
10) Click on the Schedule link to create a new Schedule for the runbook. I have specified once a week, but that will depend on your workload and how your indexes increase their fragmentation over time. You will need to tweak the schedule based on your needs and by executing the initial queries between executions:
11) Now introduce the parameters and run settings:
NOTE: you can play with having different schedules with different settings, i.e. having a specific schedule for a specific table.
With that, you have finished. Remember to change the Logging settings as desired:

Azure Automation is good and pricing is also negligible..
Some other options you have are
1.Create a execute sql task and schedule it through sql agent .The execute sql task should contain the index rebuild code along with stats rebuild
2.You also can create a linked server to SQLAZURE and create a sql agent job.To create a linked server to azure, you can see this SO link:I need to add a linked server to a MS Azure SQL Server

As #TheGamiswar suggested, add a linked server, then create a stored procedure like this:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [LinkedServerName].[RemoteDB].[dbo].[sp_RebuildReorganizIndexes]
AS
BEGIN
ALTER INDEX PK_MyTable ON MyTable REBUILD WITH (STATISTICS_NORECOMPUTE = ON, ONLINE=ON);
ALTER INDEX IX_MyTable ON MyTable REBUILD WITH (STATISTICS_NORECOMPUTE = ON, ONLINE=ON); --Nonclustered index
ALTER INDEX PK_MyTable ON MyTable REORGANIZE;
ALTER INDEX IX_MyTable ON MyTable REORGANIZE;
END
Then on your linked server use "SQL Server Agent" to create a new job and a schedule:
For details please see https://learn.microsoft.com/en-us/sql/ssms/agent/create-a-job?view=sql-server-2017

What is Write Back?

When I process a cube in SSMS and script to XMLA, I notice the following element:
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
What is the writeback table creation feature, and what does it mean for SSAS to UseExisting?

WritebackTableCreation Element (XMLA):
Determines whether a writeback table is created during the Process
operation...
UseExisting Use the existing writeback table, if one already exists. If one does not exist, an error occurs.
Also on Specifying Processing Options:
Writeback Table Options If writeback is enabled in the Analysis
Services project, this setting defines how writeback is handled....
For more details, see Enabling and Securing Data Entry with Analysis Services Writeback:
Why would you want to write data back to Analysis Services rather than
the relational database that provides the raw data? One reason is
latency. When you write data back to a relational database, users have
to wait until the cube is processed before the latest data becomes
available in their reports. However, when you enable writeback, users
can submit data straight into the cube in the current session, making
it instantly visible to other users of the Analysis Services database

Check out this blog post, describes writeback and why it would create a table (or use an existing table):
http://bimatters1403.wordpress.com/2008/02/05/ssas-2008-molap-writeback/

MS Access holds locks on table rows indefinitely

We're using MS Access as the GUI for one of our systems, but we've run into an issue where Access is holding locks on the underlying tables or rows, which prevents SQL server from running any update queries on this data. This is problematic because while our Access frontend only requires read only access to this data, we have systems in place that are refreshing the data at regular intervals. These refresh operations fail (or are delayed indefinitely) due to Access already holding locks on the data.
This problem is illustrated by opening the Access frontend and using the sys.dm_tran_locks DMV to show locks on the data. The steps I take to reproduce the problem are:
Open the Access frontend. This shows a scrollable form with several thousand records
Use SQL server DMVs to show locks on the data. This shows 5 "object" type locks with request mode of "IS" (Intent shared). Using sys.dm_exec_requests shows the command status as "suspended" and the wait type as "ASYNC_NETWORK_IO". These locks are held as long as the user has the Access frontend open, and prevent any update/delete/truncate operations on the tables involved. Now if the user scrolls to the end of the record set in Access, the locks are released!
The second issue occurs when the user clicks through to show a single record in the frontend. When a single record is displayed onscreen, the SQL server DMVs show these locks: 3x object, 1x key, 1x page. The key is a shared lock, others are intent shared. Again, command status is suspended and wait type is ASYNC_NETWORK_IO. And these locks are held as long as the user is viewing the record
We need to stop access from holding these locks on an indefinite basis. Unfortunately MS Access is not part of my skill set so I don't know what needs to be done to fix this.

I didn't solve this problem, but a colleague did. What was done is that instead of creating linked tables to SQL Server tables he created linked tables to views. The views looked like this:
CREATE VIEW dbo.acc_tblMyTable
AS
SELECT * FROM tblMyTable WITH (NOLOCK)
No locking, and as a bonus Access treated the data as read-only.
Make sure you understand what can happen when you use NOLOCK, however.
Unfortunately MS Access is not part of my skill set so I don't know what needs to be done to fix this.
Get rid of Access :)

Been developing applications that use SQL Server as the backend for years mostly .NET. Never ran into the locking (blocking) issues you are discussing. And a properly designed database should be using SQL Servers' default row level locking on update.
It is Access that is the issue. Since once upon a time it had an internal database that it had full control of it continues to think that is what is has and the behavior is what it thinks is correct. Effectively it has end run the SQL Server to do what it thinks is correct. Not really a good thing since Access is a file based product and a less than production ready one at that. Good for phone books or recipes and that is about all. Doesn't scale either.

User activity vs. System activity on the Index Usage Statistics report

I recently decided to crawl over the indexes on one of our most heavily used databases to see which were suboptimal. I generated the built-in Index Usage Statistics report from SSMS, and it's showing me a great deal of information that I'm unsure how to understand.
I found an article at Carpe Datum about the report, but it doesn't tell me much more than I could assume from the column titles.
In particular, the report differentiates between User activity and system activity, and I'm unsure what qualifies as each type of activity.
I assume that any query that uses a given index increases the '# of user X' columns. But what increases the system columns? building statistics?
Is there anything that depends on the user or role(s) of a user that's running the query?

But what increases the system columns?
building statistics?
SQL Server maintains statistics on an index (it's controlled by an option called "Auto Update Statistics", by default it's enabled.) Also, sometimes an index grows or is reorganized on disk. Those things come in under System Activity.
Is there anything that depends on the
user or role(s) of a user that's
running the query?
You could look into using SQL Server Profiler to gather data about which users use which indexes. It allows you to save traces as a table. If you can include index usage in the trace, you could correlate it with users. I'm sure the "showplan" would include it, but that's rather coarse.
This article describes a way to collect a trace, run it through the index tuning wizard, and analyze the result.

Does Oracle have something like Change Data Capture in SQL Server 2008?

Change Data Capture is a new feature in SQL Server 2008. From MSDN:
Change data capture provides
historical change information for a
user table by capturing both the fact
that DML changes were made and the
actual data that was changed. Changes
are captured by using an asynchronous
process that reads the transaction log
and has a low impact on the system
This is highly sweet - no more adding CreatedDate and LastModifiedBy columns manually.
Does Oracle have anything like this?

Sure. Oracle actually has a number of technologies for this sort of thing depending on the business requirements.
Oracle has had something called Workspace Manager for a long time (8i days) that allows you to version-enable a table and track changes over time. This can be a bit heavyweight, though, because it is based on views with instead-of triggers.
Starting in 11.1 (as an extra cost option to the enterprise edition), Oracle has a Total Recall that asynchronously mines the redo logs for data changes that get logged to a separate table which can then be queried using flashback query syntax on the main table. Total Recall is automatically going to partition and compress the historical data and automatically takes care of purging the data after a specified data retention period.
Oracle has a LogMiner technology that mines the redo logs and presents transactions to consumers. There are a number of technologies that are then built on top of LogMiner including Change Data Capture and Streams.
You can also use materialized views and materialized view logs if the goal is to replicate changes.

Oracle has Change Data Notification where you register a query with the system and the resources accessed in that query are tagged to be watched. Changes to those resources are queued by the system allowing you to run procs against the data.
This is managed using the DBMS_CHANGE_NOTIFICATION package.
Here's an infodoc about it:
http://www.oracle-base.com/articles/10g/dbms_change_notification_10gR2.php
If you are connecting to Oracle from a C# app, ODP.Net (Oracles .Net client library) can interact with Change Data Notification to alert your c# app when Oracle changes are made - pretty kewl. Goodbye to polling repeatedly for data changes if you ask me - just register the table, set up change data notifcation through ODP.Net and wala, c# methods get called only when necessary. woot!

"no more adding CreatedDate and LastModifiedBy columns manually" ... as long as you can afford to keep complete history of your database online in the redo logs and never want to move the data to a different database.
I would keep adding them and avoid relying on built-in database techniques like that. If you have a need to keep historical status of records then use an audit table or ship everything off to a data warehouse that handles slowly changing dimensions properly.
Having said that, I'll add that Oracle 10g+ can mine the log files simply by using flashback query syntax. Examples here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2112847
This technology is also used in Oracle's Datapump export utility to provide consistent data for multiple tables.

I believe Oracle has provided auditing features since 8i, however the tables used to capture the data are rather complex and there is a significant performance impact when this is turned on.
In Oracle 8i you could only enable this for an entire database and not a table at a time, however 9i introduced Fine Grained Auditing which provides far more flexibility. This has been expanded upon in 10/11g.
For more information see http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.
Also in 11g Oracle introduced the Audit Vault, which provides secure storage for audit information, even DBA's cannot change this data (according to Oracle's documentation, I haven't used this feature yet). More info can be found at http://www.oracle.com/technology/deploy/security/database-security/fine-grained-auditing/index.html.

Oracle has mechanism called Flashback Data Archive. From A Fresh Look at Auditing Row Changes:
Oracle Flashback Query retrieves data as it existed at some time in the past.
Flashback Data Archive provides the ability to track and store all transactional changes to a table over its lifetime. It is no longer necessary to build this intelligence into your application. A Flashback Data Archive is useful for compliance with record stage policies and audit reports.
CREATE TABLESPACE SPACE_FOR_ARCHIVE
datafile 'C:\ORACLE DB12\ARCH_SPACE.DBF'size 50G;
CREATE FLASHBACK ARCHIVE longterm
TABLESPACE space_for_archive
RETENTION 1 YEAR;
ALTER TABLE EMPLOYEES FLASHBACK ARCHIVE LONGTERM;
select EMPLOYEE_ID, FIRST_NAME, JOB_ID, VACATION_BALANCE,
VERSIONS_STARTTIME TS,
nvl(VERSIONS_OPERATION,'I') OP
from EMPLOYEES
versions between timestamp timestamp '2016-01-11 08:20:00' and systimestamp
where EMPLOYEE_ID = 100
order by EMPLOYEE_ID, ts;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight