Doing large updates against indexed view

Doing large updates against indexed view - sql-server

We have an indexed view that runs across three large tables. Two of these tables (A & B) are constantly getting updated with user transactions and the other table (C) contains data product info that is needs to be updated once a week. This product table contains over 6 million records.
We need this view across these three tables for our core business process and unfortunately we cannot change this aspect. We even had a sql server MVP come in to help test under load to make sure we have the most efficient configuration. There is one column in the product table that gets utilized in the view and has to be updated each week.
The problem we are now encountering is that as volume is increasing on our transactions against tables A & B, the update to Table C is causing deadlocks.
I have tried several different methods to no avail:
1) I was hoping that we could change the view so that table C could be a dirty read "WITH (NOLOCK)" but apparently that functionality is not available with indexes views.
2) I thought about updating a new column in Table C and then just renaming it when the process is done but you cannot do that due to the dependency in the view.
3) I also entertained the idea of writing this value to a temporary product table, and then running an ALTER statement against the view to have it point to my new table. however when i did that the indexes on my view were dropped and it took quite a bit of time to recreate them.
4) we tried to do the weekly update in small chunks (as small as 100 records at a time) but we still run into dead locks.
questions:
a) we are using sql server 2005. Does sql server 2008 have a new functionality with their indexed views that would help us? Is there now a way to do dirty reads w/ an indexed view?
b) a better approach to altering an existing view to point to a new table?
thanks!

The issue you're experiencing is that adding the indexed view between the three tables is causing lock contention. There is a really good post about the issue here : http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/06/02/be-ready-to-drop-your-indexed-view.aspx
Partitioning the table might provide some relief, although I don't know if the partitioning will circumvent the lock issue. You will have to upgrade to 2008 if you want to investigate this option however - as you need to use partition-aligned indexed views. 2005 will require you to drop the view before you swap in/out any partitions.
More information about partition-aligned indexed views: http://msdn.microsoft.com/en-us/library/dd171921.aspx

Have you considered making C a partitioned table and swapping in/out a partition as your price update mechanism? I'm not sure how that would work with an indexed view - I would think the index needs to be rebuilt at that point. I think this is probably the same situation you are seeing with the ALTER TABLE, actually.
Is the indexed view really necessary? i.e. could appropriate indexes on the 3 underlying tables perform just as well when a normal view is used? Remember that the indexed view may have to be updated on key changes to any of the three tables, while a index on a single table would only have to be updated if a key changes or data moves in just that table. Typically indexed views are indexed on different columns than the base tables because it is a different kind of section accross the data than is available in the underlying tables - does that description really apply?
How long does the pricing update take? This would appear to be the core of your problem, but it's hard to say without more information.

We can try this for avoid locking.
SELECT a,b,c FROM indexedview as v WITH (NOEXPAND,NOLOCK) WHERE ...

Related

UPDATE millions of rows, or DELETE/INSERT?

Sorry for the longish description... but here we go...
We have a fact table somewhat flattened with a few properties that you might have put in a dimension in a more "classic" data warehouse.
I expect to have billions of rows in that table.
We want to enrich these properties with some cleansing/grouping that would not change often, but would still do from time to time.
We are thinking of keeping this initial fact table as the "master" that we never update or delete from, and making an "extended fact" table copy of it where we just add the new derived properties.
The process of generating these extended property values requires mapping to some fort of lookup table, from which we get several possibilities for each row, and then select the best one (one per initial row).
This is likely to be processor intensive.
QUESTION (at last!):
Imagine my lookup table is modified and I want to re-assess the extended properties for only a subset of my initial fact table.
I would end up with a few million rows I want to MODIFY in the target extended fact table.
What would be the best way to achieve this update? (updating a couple of million rows within a couple of billion rows table)
Should I write an UPDATE statement with a join?
Would it be better to DELETE this million rows and INSERT the new ones?
Any other way, like creating a new extended fact table with only the appropriate INSERTs?
Thanks
Eric
PS: I come from a SQL Server background where DELETE can be slow
PPS: I still love SQL Server too! :-)

Write performance for Snowflake vs. traditional RDBS behaves quite differently. All your tables persist in S3, and S3 does not let you rewrite only select bytes of an existing object; the entire file object must be uploaded and replaced. So, while in say SQL server where data and indexes are modified in place, creating new pages as necessary, an UPDATE/DELETE in snowflake is a full sequential scan on the table file, creating an immutable copy of the original with applicable rows filtered out (deleted) or modified (update), which then replaces the file just scanned.
So, whether updating 1 row, or 1M rows, at minimum the entirety of the micro-partitions that the modified data exists in will have to be rewritten.
I would take a look at the MERGE command, which allows you to insert, update, and delete all in one command (effectively applying the differential from table A into table B. Among other things, it should keep your Time-Travel costs down vs constantly wiping and rewriting tables. Another consideration is that since snowflake is column oriented, a column update in theory should only require operations on the S3 files for that column, whereas an insert/delete would replace all S3 files for all columns, which would lower performance.

"master-slave" table replication in Oracle

Would there be something similar as the master-slave database but at the table level in the database?
For example, I have the following scenario:
I have a table with millions of records and the reason is because the system is more than 15 years old.
I only want to show the records of the last year (2019-2020).
I decided to create a view that only shows the records of that range (1 year) from the information of that table that contains millions of records.
Thanks to the view, the loading time of that system screen is lighter, thanks to the fact that I have less load of records.
The problem: What if the user adds a new record to the table that contains millions of records? how do I make my view modify when the other table are modified ...
I can use triggers to update the view I think, but, is there a functionality in oracle that allows me something similar to what I just asked (master-slave) where the "slave" table is updated as the "master" table suffers changes?

First of all, you misunderstood views. View is not a physical table, and does not store any information. If you insert data into view, you are actually inserting into the source table.
Since the view is not physical, you are just filtering the data. This does not have any performance benefits.
For the big tables, you can use partitioning which drastically improves performance. And if you still need archival you can archive the partitioned data.
Partitioning is generally the best method, because you can typically archive data by simply doing an "exchange" command to archive off old data.
Data doesn't "move" in that scenario, it simply gets 'detached' from the table via data dictionary manipulation.

Would there be something similar as the master-slave database but at the table level in the database
If you are asking about master/slave replication on a table level, then,
I suppose, table/materialized view relationship is appropriate to call as a master-slave. Quote from Oracle Docs:
A materialized view is a database object that contains the results of a query. The FROM clause of the query can name tables, views, and other materialized views. Collectively these objects are called master tables (a replication term)...
When you need to "update" or, more appropriately - refresh the mview, you can use different options:
update mview periodically and refresh it periodically
update mview each time the data in the master table is changed and commited.
update manually calling DBMS_MVIEW.REFRESH or DBMS_SNAPSHOT.REFRESH
Mview could be faster then view because each time you select from a mview you select from a different "table" which was replicated from the original one. Especially if you have complex logic in a sql, you can put the logic to mview definition.
The drawbacks are you need extra disk space for mview, and there will be a delay of refreshing the data.

What affects does changing the column-order of a table in a database have on the server-memory?

I just started my new job and after looking at the DBM I was shocked. Its a huge mess.
Now first thing I wanted to do is get some consistency in the order of table columns. We publish new database versions over a .dacpac. My co-worker told me that changing the order of a column would force MSSQL to create a temporary table which stores all the data. MSSQL then creates a new table and inserts all the data into that table.
So lets say my server only runs 2GB of RAM and has 500MB storage left on the harddrive. The whole database weights 20GB. Is it possible that changing the order of columns will cause trouble (memory related)? Is the statement of my co-worker correct?
I couldnt find any good source for my question.
Thanks!

You should NOT "go one table by one".
You should leave your tables as they are, if you don't like the order of columns of some table just create a view reordering your columns as you want.
Not only changing order of columns will cause your tables to be recreated, all the indexes will be recreated, you'll get problems with FK constraints.
And after all, you'll gain absolutely nothig but do damage only. You'll waste server resources, make your tables temporarily inaccessible and the columns will not be stored as you defind anyway, internally they will be stored in "var-fix" format (divided into fixed-length and variable-length)

Real time table alternative vs swapping table

I use SSMS 2016. I have a view that has a few millions of records. The view is not indexed and should not be as it's being updated (insert, delete, update) every 5 minutes by a job on the server to then display update data sets in to the client calling application in GUI.
The view does a very heavy volume of conversion INT values to VARCHAR appending to them some string values.
The view also does some CAST operations on the NULL assigning them column names aliases. And the worst performance hit is that the view uses FOR XML PATH('') function on 20 columns.
Also the view uses two CTEs as the source as well as Subsidiaries to define a single column value.
I made sure I created the right indexes (Clustered, nonclustered,composite and Covering) that are used in the view Select,JOIN,and WHERE clauses.
Database Tuning Advisor also have not suggested anything that could substantialy improve performance.
AS a workaround I decided to create two identical physical tables with clustered indexes on each and using the Merge statement (further converted into a SP and then Into as SQL Server Agent Job) maintain them updated. And to assure there is no long locking of the view. I will then swap(rename) the tables names immediately after each merge finishes. So in this case all the heavy workload falls onto a SQL Server Agent Job keep the tables updated.
The problem is that the merge will take roughly 15 minutes considering current size of the data, which may increase in the future. So, I need to have a real time design to assure that the view has the most up-to-date information.
Any ideas?

What is the best database for my needs?

I am currently using MS SQL Server 2008 but I'm not sure it it is the best system for this particular task.
I have a single table like so:
PK_ptA PK_ptB DateInserted LookupColA LookupColB ... LookupColF DataCol (ntext)
A common query is
SELECT TOP(1000000) DataCol FROM table
WHERE LookupColA=x AND LookupColD=y AND LookupColE=z
ORDER BY DateInserted DESC
The table has about a billion rows with 5 million inserted per day.
My main problem with SQL Server is that it isn't too easy to shard or spread out the datafiles. Also, exporting seems to max out at 1000rows per second (about 1MB/s) which seems very slow.
Another problem I have is, with SQL Server, if I want to add a new LookupCol the log file grows enormously requiring a large amount of rarely used free space on tap.
Are there any obvious better solutions for this problem?

You have a problem, and it is not SQL Server. let me also ignore that you seem to ahve a bad table design.
Spreading data files is actually pretty easy. REORGANIZING later is not that easy, but also doable. How is your table, filegroup and file layout?
export 1mb per second is a joke. Seriously. I have been handling 150 million row files in minutes - that runs down to a LOT more than 60.000 rows per minute. Something is freaking out. Temp space? Did you do a performance analysis? How does the hardware look?
Nothing will work for the log usage. Basically like most pro databases the log contains all changed database pages during a transaction. Adding a field changes - ALL pages.
You should:
Redesign the database (use a view to keep the same old table in place if you ahve to) so that it does not ahve "LookupColA" etc., but is normalized (LookupValue, and a LookuPTable that is coded by "column"). This way you get instant additional fields. This turns into a data warehouse like star schema.
Do a performance analysis. Looks like you ahve some problems.
Definitely tell us abou your hardware ;)
This problem here is definitely NOT SQL Server, it is related to bad table design AND - possibly - insufficient - badly utilized hardware.

Ok, the table design (separate answer). Lokup are bassically lookup tables.
So....
LookupTable
pk (int)
TableType
Value
as vields
ValueTable
pk
ValueLookupMap table
pk of ValueTable entry
pk of LookupTable entry
So, basically, if you add a lookup "field" then you just create a set of entries in the LookupTable then add entries in the ValueLookupMap.