SQL Server deserialize XML into table - sql-server

I have generate many xml with the SQL Server command : FOR XML RAW
In that way i have filled a table with the following schema
ACTION as CHAR(1)
TABLE_NAME as NVARCHAR(25)
PAYLOAD as NVARCHAR(MAX)
I
tbl_1
xmlrow_1
I
tbl_1
xmlrow_2
U
tbl_1
xmlrow_3
D
tbl_1
xmlrow_4
D
tbl_2
xmlrow_5
ACTION is a char ( I = insert, U = update, D = delete)
TABLE_NAME is the table on which i have to act (for insert the data, update it or delete it)
PAYLOAD is a XML serialized by SQL Server using the command FOR XML RAW on original table
PAYLOAD Example :
<row COL1="val_col_1" COL2="val_col_2" .. COLN="val_col_n"/>
I am looking for a way (i am writing a stored procedure so i am looking for TSQL) to deserialize it on the "configured" TABLE_NAME possibly for the UPSERT.
If this is not faseable (as i suspect) i will build the SQL script for insert,update or delete dynamically but i still need to deserialize the XML in the PAYLOAD and don't know how to do
I mean, if there is not a better way, how i can do some like that ?
UPDATE [dbo].[tbl_1]
SET [COL1] = CURRENT_ROW.COL1
,[COL2] = CURRENT_ROW.COL2
,[COL3] = CURRENT_ROW.COL3
FROM ( xmlrow_3 --DESERIALIZE ) AS CURRENT_ROW
--EIDTED : added working example in fiddler
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ac5925a0a2f93791cc7e7c34179137ae

Related

How to drop duplicate records in the SQL Server using Python?

I have a .csv file and it gets updated every day. Below is the example of my .csv file
I am pushing this .csv file into SQL Server using Python. My script reads the .csv file and uploads it into a SQL Server database.
This is my Python script:
import pandas as pd
import pyodbc
df = pd.read_csv ("C:/Users/Dhilip/Downloads/test.csv")
print(df)
conn = pyodbc.connect('Driver={SQL Server};'
'Server=DESKTOP-7FCK7FG;'
'Database=test;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
#cursor.execute('CREATE TABLE people_info (Name nvarchar(50), Country nvarchar(50), Age int)')
for row in df.itertuples():
cursor.execute('''
INSERT INTO test.dbo.people_info (Name, Country, Age)
VALUES (?,?,?)
''',
row.Name,
row.Country,
row.Age
)
conn.commit()
The script is working fine. I am trying to automate my Python script using batch file and task scheduler, and it's working fine. However, whenever I add new data in the .csv file and SQL Server gets updated with new data and the same time it prints the old data multiple times.
Example, if I add new record called Israel, the output appears in SQL Server as below
I need output as below,
Can anyone advise me the change I need to do in the above python script?
You can use below query in your python script. if Not exists will check if the record already exists based on the condition in where clause and if record exists then it will go to else statement where you can update or do anything.
checking for existing records in database works faster than checking using python script.
if not exists (select * from Table where Name = '')
begin
insert into Table values('b', 'Japan', 70)
end
else
begin
update Table set Age=54, Country='Korea' where Name = 'A'
end
to find existing duplicate records then use the below query
select Name, count(Name) as dup_count from Table
group by Name having COUNT(Name) > 1
I find duplicates like this
def find_duplicates(table_name):
"""
find duplicates inside table
:param table_name:
:return:
"""
connection = sqlite3.connect("./k_db.db")
cursor = connection.cursor()
findduplicates = """ SELECT a.*
FROM {} a
JOIN (
SELECT shot, seq, lower(user), date_time,written_by, COUNT(*)
FROM {}
GROUP BY shot, seq, lower(user), date_time,written_by
HAVING count(*) > 1 ) b
ON a.shot = b.shot
AND a.seq = b.seq
AND a.date_time = b.date_time
AND a.written_by = b.written_by
ORDER BY a.shot;""".format(
table_name, table_name
)
# print(findduplicates)
cursor.execute(findduplicates)
connection.commit()
records = cursor.fetchall()
cursor.close()
connection.close()
You could rephrase your insert such that it checks for existence of the tuple before inserting:
for row in df.itertuples():
cursor.execute('''
INSERT INTO test.dbo.people_info (Name, Country, Age)
SELECT ?, ?, ?
WHERE NOT EXISTS (SELECT 1 FROM test.dbo.people_info
WHERE Name = ? AND Country = ? AND Age = ?)
''', (row.Name, row.Country, row.Age, row.Name, row.Country, row.Age,))
conn.commit()
An alternative to the above would be to add a unique index on (Name, Country, Age). Then, your duplicate insert attempts would fail and generate an error.

Merge..USING..WHEN SQL Server equivalent in PostgreSQL

Right now, we are trying to migrate a stored procedure from SQL Server to PostgreSQL. In this, We saw a querying method MERGE..USING..WHEN. So I couldn't find the equivalent. So is there any option to replicate this functionality? The actual query is given below.
WITH phase_list AS (
SELECT #id_plant AS id_plant
, id_phase
, date_started
FROM OPENJSON(#phase_list)
WITH (
id_phase INT '$.id_phase',
date_started DATE '$.date_started'
)
WHERE date_started IS NOT NULL
)
MERGE grow.plant_phase AS t
USING phase_list AS s
ON s.id_plant = t.id_plant AND s.id_phase = t.id_phase
WHEN MATCHED AND (t.date_started <> s.date_started) THEN
UPDATE
SET t.date_started=s.date_started,
t.date_updated=getutcdate(),
t.updated_by=#id_user
WHEN NOT MATCHED BY TARGET THEN
INSERT (id_plant, id_phase, date_started, created_by, updated_by)
VALUES (s.id_plant, s.id_phase, s.date_started, #id_user, #id_user)
WHEN NOT MATCHED BY SOURCE AND t.id_plant = #id_plant THEN
DELETE;
Can we replicate the same using any join operation with some if/else conditions? or any other approach?

Looking for software that will help script update statements faster

I have a lot of mundane update statements I have to push through PLSQL and MSSQL and it requires a lot of editing. I will usually pull the data into excel than reformat it in notepad++ . This is pretty time consuming and I was wondering if there are any solutions for this?
UPDATE ATGSITES SET IPADDRESS = 'xxx' WHERE OWNSITEID = '270789'
UPDATE ATGSITES SET IPADDRESS = '1yyy' WHERE OWNSITEID = '270506'
UPDATE ATGSITES SET IPADDRESS = '158568' WHERE OWNSITEID = '27745'
X(35353) update statements
Perhaps you can create a table of updates and perform a single join
UPDATE ATGSITES SET IPADDRESS = B.IPADDRESS
From ATGSITES A
Join NewTable B on A.OWNSITEID = B.OWNSITEID
Where NewTable has the structure of OWNSITEID and IPADDRESS
I would also add an index on OWNSITEID
In Ms Sql Server
Save the data in a CSV text file delimited by comma like that format
IPADDRESS , OWNSITEID
xxx ,270789
1yyy ,270506
158568 ,27745
......
...........
suppose that the table to be updated is named: users_ip
Run the following script to update table from the text file
-- update table users_ip from csv file delimited wth , and first row is header
-- create temp table for loading the text file
CREATE TABLE #tmp_x (IPADDRESS nvarchar (10),OWNSITEID nvarchar(10))
go
-- import the csv file
BULK INSERT #tmp_x
FROM 'c:\temp\data.txt' --CSV file with header delimited by ,
WITH ( FIELDTERMINATOR =',',rowterminator = '\n',FIRSTROW = 2 )
go
update u
set u.IPADDRESS = t.IPADDRESS
from users_ip u
join #tmp_x t on t.OWNSITEID = u.OWNSITEID
//drop the temp table
drop table #tmp_x

Send #Output table results to a flat file in SSIS package returning 0 rows

Hi and thank you for your help.
I have an SSIS package the first step of which executes a sql delete query to delete rows in a table and send the rows it deleted to #output table. The next step tries to take the #output table and send it to a flat file destination. When I ran the delete query in sql server mgmt. studio it successfully output the rows it deleted but for some reason the flat file in the package ends up with 0 rows. Is there something I need to do to make the #output table data accessible in the subsequent flat file destination component? Do I need to create a temp table instead?
Here is the query to output deleted rows in the table #output. I'd like to take the contents of the #output table and send them to a flat file destination.
DECLARE #Output table
(PatientVisitID INT`
,VisitNumber NVARCHAR(45)`
,LastName NVARCHAR(45)`
,FirstName NVARCHAR(45)`
,MiddleName NVARCHAR(45)`
,NamePrefix NVARCHAR(45)`
,NameSuffix NVARCHAR(45)`
,BirthDate NVARCHAR(45)
,MedicalRecordNumber NVARCHAR(45)
,Gender NVARCHAR(1)
,AdmitState NVARCHAR(45)
,AdmitDateTime NVARCHAR(45)
,DischargeDateTime NVARCHAR(45)
,SSN NVARCHAR(12)
,PatientType NVARCHAR(45)
,HospitalService NVARCHAR(45)
,Location NVARCHAR(45)
,DischargeDisposition NVARCHAR(45)
)
DELETE
FROM PatientVisits
OUTPUT
DELETED.PatientVisitID
,DELETED.VisitNumber
,DELETED.LastName
,DELETED.FirstName
,DELETED.MiddleName
,DELETED.NamePrefix
,DELETED.NameSuffix
,DELETED.BirthDate
,DELETED.MedicalRecordNumber
,DELETED.Gender
,DELETED.AdmitState
,DELETED.AdmitDateTime
,DELETED.DischargeDateTime
,DELETED.SSN
,DELETED.PatientType
,DELETED.HospitalService
,DELETED.Location
,DELETED.DischargeDisposition
INTO #Output
where
CURRENT_TIMESTAMP - 33 > cast(convert(varchar,AdmitDateTime,101) as DATETIME)
AND PatientType NOT IN ('01','12')
SELECT * FROM #Output`
You have something awry with your data and/or your query.
Consider the following simplified demo
IF NOT EXISTS
(
SELECT
*
FROM
sys.schemas AS S
INNER JOIN sys.tables AS T
ON S.schema_id = T.schema_id
WHERE
S.name = 'dbo'
AND T.name = 'so_36868244'
)
BEGIN
CREATE TABLE dbo.so_36868244
(
SSN nvarchar(12) NOT NULL
);
END
INSERT INTO
dbo.so_36868244
(
SSN
)
SELECT
D.SSN
FROM
(
VALUES
(N'111-22-3333')
, (N'222-33-4444')
, (N'222-33-4445')
, (N'222-33-4446')
) D(SSN)
LEFT OUTER JOIN
dbo.so_36868244 AS S
ON S.SSN = D.SSN
WHERE
S.SSN IS NULL;
We now have a table with a single column and 5 rows of a data.
I used the following query which uses the OUTPUT clause to push the DELETED data into an table variable and then selects from it
DECLARE
#output table
(
SSN nvarchar(12) NOT NULL
);
DELETE TOP (2) S
OUTPUT
Deleted.SSN
INTO
#output ( SSN )
FROM
dbo.so_36868244 AS S
SELECT O.SSN FROM #output AS O;
Run that 3 times you'll end up with 2 rows, 2 rows and no rows. No problem, rerun the first query and you have 4 rows again - hooray for idempotent operations.
I used that query as the source for an OLE DB Source and then wrote the data to a flat file.
Reproduction
Biml, the business intelligence markup language, allows me to use a simplified XML dialect to describe an SSIS package. The following Biml, when fed through the Biml engine, will be translated into an SSIS package for whichever version of SQL Server you are working with.
Sound good? Go grab BimlExpress, it's free and install it for your version of SSIS.
Once installed, under the BimlExpress menu select "Add New Biml File". Paste the following
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="tempdb" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;"/>
<FlatFileConnection FilePath="C:\ssisdata\so\output\so_36868244.txt" FileFormat="FFF so_36868244" Name="FFCM" />
</Connections>
<FileFormats>
<FlatFileFormat Name="FFF so_36868244" IsUnicode="false" ColumnNamesInFirstDataRow="true" FlatFileType="Delimited">
<Columns>
<Column Name="SSN" DataType="String" Length="12" Delimiter="CRLF" />
</Columns>
</FlatFileFormat>
</FileFormats>
<Packages>
<Package Name="so_36868244">
<Tasks>
<Dataflow Name="DFT Stuff">
<Transformations>
<OleDbSource ConnectionName="tempdb" Name="SQL Stuff">
<DirectInput><![CDATA[DECLARE
#output table
(
SSN nvarchar(12) NOT NULL
);
DELETE TOP (2) S
OUTPUT
Deleted.SSN
INTO
#output ( SSN )
FROM
dbo.so_36868244 AS S
SELECT O.SSN FROM #output AS O;]]></DirectInput>
</OleDbSource>
<DerivedColumns Name="DER Placeholder"></DerivedColumns>
<FlatFileDestination ConnectionName="FFCM" Name="FFDST Extract" Overwrite="true" />
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Edit lines 3 and 4 to be valid database connection strings (mine is using tempdb on a named instance of DEV2014) as well as point at a valid path on disk (mine is using C:\ssisdata\so\output)
Right click on the bimlscript.biml file and out pops a package named so_36868244 which should be able to run immediately and generate a flat file with contents like
SSN
111-22-3333
222-33-4444
What's wrong with your example
Without access to your systems and/or sample data, it's very hard to say.
I will give you unsolicited advice though that will improve your development career. You should avoid shorthand notation like CURRENT_TIMESTAMP - 33 It's unclear what the result will be and saves a negligible amount of keystrokes compared to DATEADD(DAY, -33, CURRENT_TIMESTAMP)
cast(convert(varchar,AdmitDateTime,101) as DATETIME) There are also far more graceful mechanisms of dropping the time portion of a date than this.
You may try with temp tables -
Put this statement in a Execute SQL task, and set the connection manager's RetainSameConnection to 'True' (this will make sure the temp table will be visible in another tasks)
IF OBJECT_ID('tempdb..##DeletedRows') IS NOT NULL
DROP TABLE ##DeletedRows
CREATE TABLE ##DeletedRows(EmpId TINYINT, EmpName VARCHAR(10))
DELETE
FROM dbo.Emp
OUTPUT
DELETED.EmpId,
DELETED.EmpName
INTO ##DeletedRows
Next, use a Data flow task and set the Data flow task's Delay Validation property to True. Drop an OLE DB Source task and a Flat File Destination.
For first time, run this statement in db
CREATE TABLE ##DeletedRows(EmpId TINYINT, EmpName VARCHAR(10))
In the OLE DB Source, use the sql statement
SELECT * FROM ##DeletedRows
and the map the columns to your Flat file. Since we want to initially map the columns from OLE DB Source to Flat File, so we created the temp table in db. Since the Delay validation is set to True, so from next time we don't need to create the temp table manually.
You would need to make it a real (permanent) table. Table variables and Temp tables created in one Execute SQL task, are not available in other Execute SQL tasks.
You can always drop the permanent table when you are done with it.

Xml column attributes manipulation in sql server

I have a table named User in my database. In that there is a xml column named XmlText which contains lots of attributes.
<userDetails>
<MobileVerified>True</MobileVerified>
<EmailVerified>True</EmailVerified>
<IPAddress>122.160.65.232</IPAddress>
<Gender>Male</Gender>
<DateOfBirth>1970-03-22T00:00:00</DateOfBirth>
<DealingInProperties>residential_apartment_flat</DealingInProperties>
<DealingInProperties>residential_villa_bungalow</DealingInProperties>
<DealingInProperties>residential_farm_house</DealingInProperties>
</userDetails>
What is needed to do is i have to merge all the 'residential_villa_bungalow' values to 'residential_apartment_flat' if 'residential_apartment_flat' exists in the XmlText Column else 'residential_apartment_flat' will be left by default. There are approx 700000 record in the database so keep in mid that what technique can be used among normal update vs cursot.
Fire query with following columns "UserID,XmlText"
Probable logic wud be something like this..
if ('residential_villa_bungalow') exists
(
if ('residential_apartment_flat') exists
remove the 'residential_villa_bungalow' node as there must be only one 'residential_apartment_flat' node
else
update 'residential_villa_bungalow' into 'residential_apartment_flat'
)
XML Data Modification Language (XML DML)
-- Delete bungalow where already exist a flat
update YourTable
set XMLText.modify('delete /userDetails/DealingInProperties[. = "residential_villa_bungalow"] ')
where XMLText.exist('/userDetails[DealingInProperties = "residential_apartment_flat"]') = 1 and
XMLText.exist('/userDetails[DealingInProperties = "residential_villa_bungalow"]') = 1
-- Change value from bungalow to flat
update YourTable
set XMLText.modify('replace value of (/userDetails/DealingInProperties[. = "residential_villa_bungalow"]/text())[1]
with "residential_apartment_flat"')
where XMLText.exist('/userDetails[DealingInProperties = "residential_villa_bungalow"]') = 1

Resources