Bulk Insert with database connector with different payload and queries

Bulk Insert with database connector with different payload and queries - database

I am using mule database connector to insert update in database . now i have different queries like insert and update in different table , and payload for them will be different as well . how can i achieve bulk operations in this. can i save the queries in a flow variable as list , and accordingly save the values in another list and pass it both to database flow ? will it work .
so i want to generate raw sql queries and save it to file and then use bulk execute for that . does mule provide any tostring method to just convert the query with placeholders to actual raw query ?
like i have query
update table mytable set column1 = #[payload.column1], column2 = #[payload.id]
to
update table mytable set column1 = 'stringvalue', column2 = 1234 ;

Mule's database component does support bulk operations. You can select Bulk Execute in the Operation. The implementation is descriptive when you select the operation.
With regards to making the query dynamic, you can pass the values from variables or property files, as per your convenience.

You can have stored procedure for insert and update accepting input parameters as array.Send the records in blocks inside for loop by setting batch size. This will result in less round trips.
Below is the link to article and has all the details
https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro

Related

SSIS: enrich query and table with input file as base

I need to extract data from a DB2 database to a SQL Server. I need to create my query based on a Excel file I have 176 records, which I need to create repeating queries & put in SQL server
So for example;
I have an Excel with a Number, From date, To date, and a Country
So the query should use these information from the records
SELECT *
FROM dbo.Test
WHERE Number = excel.Number1 AND Date BETWEEN excel.fromDate1 AND excel.toDate1 AND Country = excel.country1
And then another query with
SELECT *
FROM dbo.Test
WHERE Number = excel.Number2 AND Date BETWEEN excel.fromDate2 AND excel.toDate2 AND Country = excel.country2
Etc...
How should I do something like this in SSIS?
If needed I can put the DB2 and Excel data in MS SQL

You can proceed with the following approach:
Extract data rows from Excel and put it into SSIS Object Variable
Proceed with a Foreach loop to get each row from the Object Variable, parsing Object Variable to separate variables
Inject variable values into SQL Select command with Expressions
Perform Data Flow task based on SQL command, transform and put it into the target
Overall, your task seems to be feasible, but requires some knowledge on parsing Object Variable in Foreach Loop, and writing Variable Expressions.

SSIS reading from Record set instead of Database

I'm doing some data migration of a large amount of data in which I need to perform some data matching in order to identify the operation that needs to be done on the record. For that What I'm currently doing is to read the data from the source and then match the records using a SQL Command - so that I need to hit the Database twice for each record. So Will it improve the performance if I read the data to a recordset and then match the values inside that ?
I'm reading from SQL Server 2008 R2

1) using Look up transformation is one efficient way of merging records
ex:
2) Use merge procedures
ex:
MERGE [dbo].[Value] AS TARGET
USING [dbo].[view_Value] AS SOURCE
ON (
TARGET.[Col1] = SOURCE.[col1]
)
WHEN MATCHED
THEN
UPDATE SET
TARGET.[col3] = SOURCE.[col3]
TARGET.[col2] = SOURCE.[col2]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([col1], [col2], [col3] )
VALUES (SOURCE.[col1], SOURCE.[col2], SOURCE.[col3] )

How can I insert a 100+k rows of XML into SQL Server 2012 in a fast way

I have the following scenario/requirements for which I am not sure what is the best way to address to perform in the fastest way possible, looking for some guidance of features to use and examples of them, if available
I will receive anywhere between 10k to 100k of entities (in XML format) from a web service that I want to upsert (some rows might exist, others might not).
here are some of the requirements:
The source of the XML is a web service that I'm calling from C# code. Actually two different methods. For one of the methods, the return schema will be something flat that I can map directly to one of my tables. For the other, it will return an XML representation that I might need to work with in C# in order to be able to map it to flat entities for my tables. In that scenario, would it be best to do the modifications needed and then write to file to an XML to use as source?
The returned XML can contain up to 150k entities in XML, that may or may not exist in my tables yet, so I'm looking to upsert them. The files, when written to disk, can weight up to 20 megabytes. I asked if they could do JSON instead of XML, but apparently that's not a choice.
The SQL database is on a different server than my IIS server, so I rather avoid having the SQL server retrieve the XML from a file, I rather pass it from C# as a string or as a Table Value Parameter.
The tables are rather simple and don't have indexes other than the PK ones.
I've never been big on XML, although it got way easier with LINQ to XML, which I was initially using to parse each record and send individual inserts but the performance was just bad, so based on some research I've been doing, I'm thinking I could use:
Upserts from SQL server through MERGE statements.
Pass the whole XML as a parameter and use OPENXML to use as source in the MERGE statement.
Or, somehow generate a Table Value Parameter in C# and pass that to SQL to use on the MERGE.
I read on this similar question (which didnt have access to upsert/merge) that instead of trying to upsert directly from the XML, that it might be better to insert everything to a temporary table and do the merge/upsert against the temporary table?
Would this work and be considerably fast?
If anyone has had a similar scenario, can you share your thoughts/ideas about what combination of features would be best?
Thanks.

You are on the right track. I have a similar setup using XML to transfer data between an online portal and the client-server application. The rest of the setup is very similar to what you have.
The fact that your tables are not indexed is a bit of a concern, if you are comparing any fields that are not PK Fields, regardless of how you index the temp tables. It is important to have either one index with all of the fields used in the merge match clause, or an index for each of them - I find the former yields better performance. Beyond that, using an XML parameter, OpenXML and temp tables is the way to go.
The following code has not been tested, so may need a bit of debugging, but it will put you on the right track. A couple of notes: If all of the fields in the OpenXML WITH clause are attributes, then you can drop the last parameter (i.e. ", 2") and field source specifiers (i.e. "#id" for the detail table). Although the data in your description is flat, in which case you will only need one table, I do often need to import into linked records. I have included a simple master-detail relationship example in the code below, just for the sake of completeness.
CREATE PROCEDURE usp_ImportFromXML (#data XML) AS
BEGIN
/*
<root>
<data>
<match_field_1>1</match_field_1>
<match_field_2>val2</match_field_2>
<data_1>val3</data_1>
<data_2>val4</data_2>
<detail_records>
<detail_data id="detailID1">
<detail_1>blah1<detail_1>
<detail_2>blah2<detail_2>
</detail_data>
<detail_data id="detailID2">
<detail_1>blah3<detail_1>
<detail_2>blah4<detail_2>
</detail_data>
</detail_records>
</data>
<data>
...
</root>
*/
DECLARE #iDoc INT
EXEC sp_xml_preparedocument #iDoc OUTPUT, #data
SELECT * INTO #temp
FROM OpenXML(#iDoc, '/root/data', 2) WITH (
match_field_1 INT,
match_field_2 VARCHAR(50),
data_1 VARCHAR(50),
data_2 VARCHAR(50)
)
SELECT * INTO #detail
FROM OpenXML(#iDoc, '/root/data/detail_data', 2) WITH (
match_field_1 INT '../../match_field_1',
match_field_2 VARCHAR(50) '../../match_field_2',
detail_id VARCHAR(50) '#id',
detail_1 VARCHAR(50),
detail_2 VARCHAR(50)
)
EXEC sp_xml_removedocument #iDoc
CREATE INDEX #IX_temp ON #temp(match_field_1, match_field_2)
CREATE INDEX #IX_detail ON #detail(match_field_1, match_field_2, detail_id)
MERGE data_table a
USING #temp ta
ON ta.match_field_1 = a.match_field_1 AND ta.match_field_2 = a.match_field_2
WHEN MATCHED THEN
UPDATE SET data_1 = ta.data_1, data_2 = ta.data_2
WHEN NOT MATCHED THEN
INSERT (match_field_1, match_field_2, data_1, data_2) VALUES (ta.match_field_1, ta.match_field_2, ta.data_1, ta.data_2)
MERGE detail_table a
USING (SELECT d.*, p._key FROM #detail d, data_table p WHERE d.match_field_1 = p.match_field_1 AND d.match_field_2 = p.match_field_2) ta
ON a.id = ta.id AND a.parent_key = ta._key
WHEN MATCHED THEN
UPDATE SET detail_1 = ta.detail_1, detail2 = ta.detail_2
WHEN NOT MATCHED THEN
INSERT (parent_key, id, detail_1, detail_2) VALUES (ta._key, ta.id, ta.detail_1, ta.detail_2)
DROP TABLE #temp
DROP TABLE #detail
END

Use (3). Process the data ready for upset in C#. C# is made for this kind of algorithmic work. It is both the right programming language as well as the faster programming language. T-SQL is not the right tool. You do not want to use XML with T-SQL for very high performance stuff because it burns CPU like crazy. Instead use the fast TDS protocol to send TVP or bulk data.
Then, send the data to the server using either a TVP or a bulk-insert (SqlBulkCopy) to a temp table. The latter technique is great for very many rows (>10k?). Bulk insert uses special TDS features. It does not use SQL batches to transfer the data. It does not get faster than this.
Then use the MERGE statement as you described. Use big batch sizes, potentially all rows in one batch.

The best way I've found is to bulk insert into a temp table from your C# code, then issue the merge once the data is in SQL Server. I have an example here on my blog SQL Server Bulk Upsert
I use this in production to insert millions of rows daily, and have yet to find a faster way to do it. Give it a try, I think you will be impressed with the performance of the solution.

SQL Server Insert failure due to XML Schema validation error

I have a XML column in a table and it is defined by a schema. I am trying to insert values into this table by using Insert into tbl1 Select * from tbl for xml. But this is failing due to schema validation failure for one of the records. But i want to insert the records which have passed the validation atleast and i can capture the others later. Can someone help me in this.

SQL server validates all dataset, not single row. If you want to validate Row-by-Row using SQL server tools, methods are:
SQLCLR (fastest) link
SSIS (easy to create) - using loop FOREACH you try to insert row into table. All failed rows are redirecting to another table.
TSQL TRY/CATCH Block - insert xml from single row to schema validated variable. Slowest one.

error when insert into linked server

I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?

I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)

The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]

I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';

I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.

I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight