SQL - Bulk Insert and Data Types - sql-server

Today I have a bulk insert from fixed width file like this:
BULK INSERT #TBF8DPR501
FROM 'C:\File.txt' WITH (
FORMATFILE = 'C:\File.txt.xml'
,ROWTERMINATOR = '\n'
)
The format file is just to set the width of each field, and after the bulk insert into the temp table, I crated an INSERT INTO X SELECT FROM temp to convert some columns that the bulk cannot convert.
My question is, is it possible to make the bulk insert be able to convert values such as:
Date in format dd.MM.yyyy OR ddMMyyyy
Decimal values like this 0000000000010022 (where it is 100.22)
Without the need to make the bulk insert into a temp table to convert the values?

No, it isn't: BULK INSERT simply copies data as fast as possible, it doesn't transform the data in any way. Your current solution with a temp table is a very common one used in data warehousing and reporting scenarios, so if it works the way you want I would just keep using it.
If you do want to do the transformation during the load, then you could use an ETL tool such as SSIS. But there is nothing wrong with your current approach and SSIS would be a very 'heavy' alternative.

Related

How to parse string into multiple tables in SQL Server 2017

I have a text file that was created by dumping 8 SQL tables into it. Now I need to import this data back into SQL Server.
Using BULK insert I was able to load data into one table with single column 'FileData'.
DECLARE #FileTable TABLE (FileData NVARCHAR(MAX))
INSERT INTO #FileTable
SELECT BulkColumn
FROM OPENROWSET( BULK N'C:\My\Path\Name\FileName.txt', SINGLE_CLOB) AS Contents
SELECT * FROM #FileTable
So now I have this huge string that I need to organize into different tables.
For example this part of string corresponds to the below table :
FileData
00001 00000009716496000000000331001700000115200000000000
Table:
It also seems like all fields have a set length and I can get that length.
I can see doing something like this:
select SUBSTRING('00001 00000009716496000000000331001700000115200000000000 ', 1,5) as RecordKey
select SUBSTRING('00001 00000009716496000000000331001700000115200000000000 ', 6,17) as Filler
select SUBSTRING('00001 00000009716496000000000331001700000115200000000000 ', 23,16) as BundleAnnualPremium
But is any faster and better way to load this data into different tables?
You could just bulk insert with a format file right from the start. But since the data is already loaded into a big table, if you'd rather use pure TSQL, you can pull elements out of a string using left(), right(), and substring().

Bulk Insert with database connector with different payload and queries

I am using mule database connector to insert update in database . now i have different queries like insert and update in different table , and payload for them will be different as well . how can i achieve bulk operations in this. can i save the queries in a flow variable as list , and accordingly save the values in another list and pass it both to database flow ? will it work .
so i want to generate raw sql queries and save it to file and then use bulk execute for that . does mule provide any tostring method to just convert the query with placeholders to actual raw query ?
like i have query
update table mytable set column1 = #[payload.column1], column2 = #[payload.id]
to
update table mytable set column1 = 'stringvalue', column2 = 1234 ;
Mule's database component does support bulk operations. You can select Bulk Execute in the Operation. The implementation is descriptive when you select the operation.
With regards to making the query dynamic, you can pass the values from variables or property files, as per your convenience.
You can have stored procedure for insert and update accepting input parameters as array.Send the records in blocks inside for loop by setting batch size. This will result in less round trips.
Below is the link to article and has all the details
https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro

How can I insert a 100+k rows of XML into SQL Server 2012 in a fast way

I have the following scenario/requirements for which I am not sure what is the best way to address to perform in the fastest way possible, looking for some guidance of features to use and examples of them, if available
I will receive anywhere between 10k to 100k of entities (in XML format) from a web service that I want to upsert (some rows might exist, others might not).
here are some of the requirements:
The source of the XML is a web service that I'm calling from C# code. Actually two different methods. For one of the methods, the return schema will be something flat that I can map directly to one of my tables. For the other, it will return an XML representation that I might need to work with in C# in order to be able to map it to flat entities for my tables. In that scenario, would it be best to do the modifications needed and then write to file to an XML to use as source?
The returned XML can contain up to 150k entities in XML, that may or may not exist in my tables yet, so I'm looking to upsert them. The files, when written to disk, can weight up to 20 megabytes. I asked if they could do JSON instead of XML, but apparently that's not a choice.
The SQL database is on a different server than my IIS server, so I rather avoid having the SQL server retrieve the XML from a file, I rather pass it from C# as a string or as a Table Value Parameter.
The tables are rather simple and don't have indexes other than the PK ones.
I've never been big on XML, although it got way easier with LINQ to XML, which I was initially using to parse each record and send individual inserts but the performance was just bad, so based on some research I've been doing, I'm thinking I could use:
Upserts from SQL server through MERGE statements.
Pass the whole XML as a parameter and use OPENXML to use as source in the MERGE statement.
Or, somehow generate a Table Value Parameter in C# and pass that to SQL to use on the MERGE.
I read on this similar question (which didnt have access to upsert/merge) that instead of trying to upsert directly from the XML, that it might be better to insert everything to a temporary table and do the merge/upsert against the temporary table?
Would this work and be considerably fast?
If anyone has had a similar scenario, can you share your thoughts/ideas about what combination of features would be best?
Thanks.
You are on the right track. I have a similar setup using XML to transfer data between an online portal and the client-server application. The rest of the setup is very similar to what you have.
The fact that your tables are not indexed is a bit of a concern, if you are comparing any fields that are not PK Fields, regardless of how you index the temp tables. It is important to have either one index with all of the fields used in the merge match clause, or an index for each of them - I find the former yields better performance. Beyond that, using an XML parameter, OpenXML and temp tables is the way to go.
The following code has not been tested, so may need a bit of debugging, but it will put you on the right track. A couple of notes: If all of the fields in the OpenXML WITH clause are attributes, then you can drop the last parameter (i.e. ", 2") and field source specifiers (i.e. "#id" for the detail table). Although the data in your description is flat, in which case you will only need one table, I do often need to import into linked records. I have included a simple master-detail relationship example in the code below, just for the sake of completeness.
CREATE PROCEDURE usp_ImportFromXML (#data XML) AS
BEGIN
/*
<root>
<data>
<match_field_1>1</match_field_1>
<match_field_2>val2</match_field_2>
<data_1>val3</data_1>
<data_2>val4</data_2>
<detail_records>
<detail_data id="detailID1">
<detail_1>blah1<detail_1>
<detail_2>blah2<detail_2>
</detail_data>
<detail_data id="detailID2">
<detail_1>blah3<detail_1>
<detail_2>blah4<detail_2>
</detail_data>
</detail_records>
</data>
<data>
...
</root>
*/
DECLARE #iDoc INT
EXEC sp_xml_preparedocument #iDoc OUTPUT, #data
SELECT * INTO #temp
FROM OpenXML(#iDoc, '/root/data', 2) WITH (
match_field_1 INT,
match_field_2 VARCHAR(50),
data_1 VARCHAR(50),
data_2 VARCHAR(50)
)
SELECT * INTO #detail
FROM OpenXML(#iDoc, '/root/data/detail_data', 2) WITH (
match_field_1 INT '../../match_field_1',
match_field_2 VARCHAR(50) '../../match_field_2',
detail_id VARCHAR(50) '#id',
detail_1 VARCHAR(50),
detail_2 VARCHAR(50)
)
EXEC sp_xml_removedocument #iDoc
CREATE INDEX #IX_temp ON #temp(match_field_1, match_field_2)
CREATE INDEX #IX_detail ON #detail(match_field_1, match_field_2, detail_id)
MERGE data_table a
USING #temp ta
ON ta.match_field_1 = a.match_field_1 AND ta.match_field_2 = a.match_field_2
WHEN MATCHED THEN
UPDATE SET data_1 = ta.data_1, data_2 = ta.data_2
WHEN NOT MATCHED THEN
INSERT (match_field_1, match_field_2, data_1, data_2) VALUES (ta.match_field_1, ta.match_field_2, ta.data_1, ta.data_2)
MERGE detail_table a
USING (SELECT d.*, p._key FROM #detail d, data_table p WHERE d.match_field_1 = p.match_field_1 AND d.match_field_2 = p.match_field_2) ta
ON a.id = ta.id AND a.parent_key = ta._key
WHEN MATCHED THEN
UPDATE SET detail_1 = ta.detail_1, detail2 = ta.detail_2
WHEN NOT MATCHED THEN
INSERT (parent_key, id, detail_1, detail_2) VALUES (ta._key, ta.id, ta.detail_1, ta.detail_2)
DROP TABLE #temp
DROP TABLE #detail
END
Use (3). Process the data ready for upset in C#. C# is made for this kind of algorithmic work. It is both the right programming language as well as the faster programming language. T-SQL is not the right tool. You do not want to use XML with T-SQL for very high performance stuff because it burns CPU like crazy. Instead use the fast TDS protocol to send TVP or bulk data.
Then, send the data to the server using either a TVP or a bulk-insert (SqlBulkCopy) to a temp table. The latter technique is great for very many rows (>10k?). Bulk insert uses special TDS features. It does not use SQL batches to transfer the data. It does not get faster than this.
Then use the MERGE statement as you described. Use big batch sizes, potentially all rows in one batch.
The best way I've found is to bulk insert into a temp table from your C# code, then issue the merge once the data is in SQL Server. I have an example here on my blog SQL Server Bulk Upsert
I use this in production to insert millions of rows daily, and have yet to find a faster way to do it. Give it a try, I think you will be impressed with the performance of the solution.

How to script VARBINARY to copy it from one DB to another using a script?

I need to generate an SQL insert script to copy data from one SQL Server to another.
So with .net, I'm reading the data a given SQL Server table and write this to a new text file which can then be executed in order to insert this data on other databases.
One of the columns is a VARBINARY(MAX).
How should and can I transform the obtained byte[] into text for the script so that it can still be inserted on the other databases?
SSMS shows this data as hex string. Is this the format to use?
I can get this same format with the following
BitConverter.ToString(<MyByteArray>).Replace("-", "")
But how can this be inserted again?
I tried
CONVERT(VARBINARY(MAX), "0xMyHexString")
This does an insert, but the value is not the same as in the source table.
It turned out you can just directly insert the hex string, no need to convert anything:
INSERT TableName (VarBinColumnName)
VALUES (0xMyHexString)
Just don't ask why I didn't test this directly...
There are two questions on SO that may help:
What is the fastest way to get varbinary data from SQL Server into a C# Byte array?
and
How Do I Insert A Byte[] Into an SQL Server VARBINARY column?

Sql Server XML-type column duplicate entry detection

In Sql Server I am using an XML type column to store a message. I do not want to store duplicate messages.
I only will have a few messages per user. I am currently querying the table for these messages, converting the XML to string in my C# code. I then compare the strings with what I am about to insert.
Unfortunately, Sql Server pretty-prints the data in the XML typed fields. What you store into the database is not necessarily exactly the same string as what you get back out later. It is functionally equivalent, but may have white space removed, etc.
Is there an efficient way to compare an XML string that I am considering inserting with those that are already in the database? As an aside, if I detect a duplicate I need to delete the older message then insert the replacement.
0 - Add a hash column to your table
1 - when you receive a new message, convert the whole XML to uppercase, remove all blanks and returns/linefeed, then compute the hash value of the normalized string.
2 - check if you already have a row with the resulting hash code in it.
If yes, this is duplicated, treat it
accordingly
If not, store the original XML along with the hash in a new row
I'm not 100% sure on your exact implementation but here is something I played around with. The idea being a stored procedure would do the inserting. Inserting into the messages table does a basic check on existing messages (SQL 2008 syntax):
declare #messages table (msg xml)
insert into #messages values
('<message>You like oranges</message>')
,('<message>You like apples</message>')
declare #newMessage xml = '<message>You like apples</message>'
insert into #messages (msg)
select #newMessage
where #newMessage.value('(message)[1]', 'nvarchar(50)') not in (
select msg.value('(message)[1]', 'nvarchar(50)')
from #messages
)
One solution is to stop using the XML typed field. Store the XML string into a varchar typed field.
I don't really like this solution, but I don't really like p.marino's solution either. It doesn't seem right to store a hash of something that is already in the row in the table.
What if you use OPENXML on each row in the table and query the actual XML information for key nodes and/or key attributes? But then you need to do it row by row, I don't think OPENXML works with a whole set of table rows.

Resources