Update process CUBE - sql-server

I am using the SSIS Analysis task to process CUBE using the update only process.
The source records is a transactional TYPE 1, so when we get an update records to exisitng records in the source will delete and insert the new records, here Update process cube failing due to missing key at source(which is due to deleting the source records on updated source)
source table records cube process records
1000 1000
deleted 5 and inserted 5 new 5 and key error while processing the cube.
we could use FULL PROCESS to avoid this problem but having performance hit,
How to do this process using the update process only.
thanks
prav

A quickfix could be done in analysis services deployment
Select Process>Change Settingsā€¦> Dimension key errors
Choose Use custom error configuration
In Stop on Error, ensure that On Error Action is set to "Stop Logging".
In Specific error conditions make sure that "Report and Continue" is selected.
Click OK and the cube will be reprocessed ignoring this error.

If there are deleted records in the source table, you have to process the cube/measure group/partition using full mode.

Related

SSIS: Truncate table statement causing LCK_M_SCH_S lock on the table

I have an SSIS package that consist of 2 main blocks within Begin and Commit/Rollback transaction blocks
1. Truncate tables (with truncate table query)
2. Import Data (Import data from flat file and insert to these truncated tables)
When I run the package the job is getting hung. The activity monitor shows that there is a LCK_M_SCH_S lock created that blocks further execution.
Sometimes this work and sometimes not.
To add, if I truncate these table separately and run the package removing the truncate block it executes fine.
Also to add there are not just one Import Data component. We have around 6 import data component for 6 different table. For time being i kept only one in the screen shot
,
Looking at your screenshots, the first thing I'd verify is that the property RetainSameConnection is set to true on your OLE DB Connection manager (right click on CM, select Properties, find RetainSameConnection). The default for this is False.
If that resolves the issue, then the root cause was you had two requests in different transactions attempting to modify the same resource.
If you had already switched the Connection Manager's property to true, then my next guess would be that you want to set the DelayValidation property for the Data Flow "Import Data" to True.
If that resolves the issue, then the root cause was the component was attempting to validate the metadata for the target table and was getting blocked by the truncate statement (or vice versa). Setting DelayValidation will prevent the package from validating that specific task until the last possible second - giving other processes time to get out of the way. This seems less likely but it's the only other opportunity for the package to be blocking itself.

SSIS Deadlock with a Slowly Changing Dimension

I am running an SSIS package that contains many (7) reads from a single flat file uploaded from an external source. There is consistently a deadlock in every environment(Test, Pre-Production, and Production) on one of the data flows that uses a Slowly Changing Dimension to update an existing SQL table with both new and changed rows.
I have three groups coming off the SCD:
-Inferred Member Updates Output goes directly to an OLE DB Update command.
-Historical Attribute goes to a derived column boxed that sets a delete date and then goes to an update OLE DB command, then goes to a union box where it unions with the last group New Output.
-New Output goes into a union box along with the Historical output then to a derived column box that adds an update/create date, then inserts the values into the same SQL table as the Inferred Member Output DB Command.
The only error I am getting in my log looks like this:
"Transaction (Process ID 170) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."
I could put the (NOLOCK) statement into the OLE db commands, but I have read that this isn't the way to go.
I am using SQL Server 2012 Data Tools to investigate and edit the Package, but I am unsure where to go from here to find the issue.
I want to get out there that i am a novice in terms of SSIS programming... with that out of the way... Any help would be greatly appreciated, even if it is just pointing me to a place I haven't looked for help.
Adding index on the WHERE condition column may resolve your issue. After adding index on the column, transactions will executes in faster way which reduce the chances of deadlock.

Ignore duplicate records in SSIS' OLE DB destination

I'm using a OLE DB Destination to populate a table with value from a webservice.
The package will be scheduled to run in the early AM for the prior day's activity. However, if this fails, the package can be executed manually.
My concern is if the operator chooses a date range that over-laps existing data, the whole package will fail (verified).
I would like it:
INSERT the missing values (works as expected if no duplicates)
ignore the duplicates; not cause the package to fail; raise an exception that can be captured by the windows application log (logged as a warning)
collect the number of successfully-inserted records and number of duplicates
If it matters, I'm using Data access mode = Table or view - fast load and
Suggestions on how to achieve this are appreciated.
That's not a feature.
If you don't want error (duplicates), then you need to defend against it - much as you'd do in your favorite language. Instead of relying on error handling, you test for the existence of the error inducing thing (Lookup Transform to identify existence of row in destination) and then filter the duplicates out (Redirect No Match Output).
The technical solution you absolutely should not implement
Change the access mode from the "Table or View Name - Fast Load" to "Table or View Name". This changes the method of insert from a bulk/set-based operation to singleton inserts. By inserting one row at a time, this will allow the SSIS package to evaluate the success/fail of each row's save. You then need to go into the advanced editor, your screenshot, and change the Error disposition from Fail Component to Ignore Failure
This solution should not used as it yields poor performance, generates unnecessary work load and has the potential to mask other save errors beyond just "duplicates" - referential integrity violations for example
Here's how I would do it:
Point your SSIS Destination to a staging table that will be empty
when the package is run.
Insert all rows into the staging table.
Run a stored procedure that uses SQL to import records from the
staging table to the final destination table, WHERE the records don't
already exist in the destination table.
Collect the desired meta-data and do whatever you want with it.
Empty the staging table for the next use.
(Those last 3 steps would all be done in the same stored procedure).

How to control which rows were sent via SSIS

I'm trying to create SSIS package which will periodically send data to other database. I want to send only new records(I need to keep sent records) so I created status column in my source table.
I want my package to update this column after successfuly sending data, but I can't update all rows wih "unsent" status because during package execution some rows may have been added, and I also can't use transactions(I mean on isolation levels that would solve my problem: I can't use Serializable beacause i musn't prevent users from adding new rows, and Sequence Container doesn't support Snapshot).
My next idea was to use recordset and after sending data to other db use it to get ids of sent rows, but I couldn't find a way to use it as datasource.
I don't think I should set status "to send" and then update it to "sent", I believe it would be to costly.
Now I'm thinking about using temporary table, but I'm not convinced that this is the right way to do it, am I missing something?
Record Set is a destination. You cannot use it in Data Flow task.
But since the data is saved to a variable, it is available in the Control flow.
After completing the DataFlow, come to the control flow and create a foreach component that can run on the ResultSet varialbe.
Read each Record Set value into a variable and use it to run an update query.
Also, see if "Lookup Transform" can be useful to you. You can generate rows that match or doesn't match.
I will improve the answer based on discussions
What you have here is a very typical data mirroring problem. To start with, I would not simply have a boolean that signifies that a record was "sent" to the destination (mirror) database. At the very least, I would put a LastUpdated datetime column in the source table, and have triggers on that table, on insert and update, that put the system date into that column. Then, every day I would execute an SSIS package that reads the records updated in the last week, checks to see if those records exist in the destination, splitting the datastream into records already existing and records that do not exist in the destination. For those that do exist, if the LastUpdated in the destination is less than the LastUpdated in the source, then update them with the values from the source. For those that do not exist in the destination, insert the record from the source.
It gets a little more interesting if you also have to deal with record deletions.
I know it may seem wasteful to read and check a week's worth, every day, but your database should hardly feel it, it provides a lot of good double checking, and saves you a lot of headaches by providing a simple, error tolerant algorithm. Some record does not get transferred because of some hiccup on the network, no worries, it gets picked up the next day.
I would still set up the SSIS package as a server task that sends me an email with any errors, so that I can keep track. Most days, you get no errors, and when there are errors, you can wait a day or resolve the cause and let the next days run pick up the problems.
I am doing a similar thing, in my case, I have a status on the source record.
I read in all records with a status of new.
Then use a OLE DB Command to execute SQL on each row, changing
the status to "In progress"(in you where, enter a ? as the value in
the Component Property tab, and you can configure it as a parameter
from the table row like an ID or some pk in the Column Mappings
tab).
Once the records are processed, you can change all "In Progress"
records to "Success" or something similar using another OLE DB
Command.
Depending on what you are doing, you can use the status to mark records that errored at some point, and require further attention.

data migration in informatica

A large amount of data is coming from source to target. After a successful insertion in target, we have to change the status to every rows as "committed". But when will we know that all datas have come or not in target without directly querying the source?
For example - suppose 10 records have migrated to target from source.
We cannot change the status of all the records as "committed" before successful insertion of all records in target.
So before changing the status of all the records, how will we know that 11th record is coming or not?
Is there anything that will give me the information about total records in source?
I need a real-time based answer.
we had the same scenario and this is what we did:
First of all
to check if data is loaded in target you can join source and target table, update will lock the rows so for this commit must be fired at database level in target table (so that lock for update can happen).
after joining, update the loaded data based on join with target column.
Few things.
You have to stop you session (used pmcmd to stop session in command task)
update data in your source table and restart session.
keep load for counter of 20k-30 rows so update goes smoothly.

Resources