Pandas insert into SQL Server - sql-server

I've read in an excel file with 5 columns into a dataframe (using Pandas) and I'm trying to write it to an existing empty sql server table using this code
for index, row in df.iterrows():
PRCcrsr.execute("Insert into table([Field1], [Field2], [Field3], [Field4], [Field5]) VALUES(?,?,?,?,?)"
, row['dfcolumn1'],row['dfcolumn2'], row['dfcolumn3'], row['dfcolumn4'], row['dfcolumn5'])
I get the following error message:
TypeError: execute() takes from 2 to 5 positional arguments but 7 were given
df.shape says I have 5 columns but when I print the df to the screen it includes the RowNumber. Also one of the columns is city_state which includes a comma. Is this the reason it thinks I'm providing 7 arguments(5 actual columns + row number + the comma issue)? Is there a way to deal with the comma and rowindex columns in the dataframe before writing in to SQL Server? If shape says 5 columns why am I getting this error?

The code above indicated 7 parameters were being passed to the cursor execute command and only between 2 and 5 are permissible. I am actually passing 7 parameters (Insert into, Values, and row[dfcolumn1, 2, 3, 4, 5 - 7 in all). The fix was to convert the row[dfcolumn1] to a tuple using this code
new tuple = [tuple(r) for r in df.values.tolist()]
then I rewrote the for loop as follows:
for tuple in new_tuple:
PRCcrsr.execute = Insert into table([Field1], [Field2], [Field3], [Field4], [Field5]) VALUES(?,?,?,?,?)", tuple)
This delivered the fields as a tuple and inserted correctly

Related

handle incorrect values in snowflake

HI I have one doubt in snow flake server.
how to handle non ascii value in snowflake
table : emp
Empno|Empname
1 |ravÉi
2 |banu raju
3 |raḠu kumar
based on above data i want output like below
Empno|Empname
1 |ravEi
2 |banu raju
3 |raGu kumar
I have tried like below
select empno,uncode(empname,ecoding='utf-8')lname from emp
but above query throwing error:
sql compilation erro: error line 1 at postition 27 invalid identifier encoding
can you please tell me how to write query to achive this task in snow flake server .
did you try, it should work , the function Unicode returns the Unicode code point for the first Unicode character in a string.
select empno,empname from emp;
select unicode('ravÉi')lname from dual; will return the value
Given your sample input and output what you really want is to replace Unicode characters with their closest ASCII equivalent.
A JS UDF in Snowflake can help you do that:
You can solve this with a JS UDF in Snowflake:
CREATE OR REPLACE FUNCTION normalize_js(S string)
RETURNS string
LANGUAGE JAVASCRIPT
AS 'return S.normalize("NFD").replace(/\p{Diacritic}/gu, "")'
;
select normalize_js('áéÉña');
-- 'aeEna'
See https://stackoverflow.com/a/66606937/132438.

Split data from strings into columns

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew
This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)
OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

How to output an R lm() object into a SQL database?

I've been tinkering with running R commands on a SQL server by calling a procedure which runs an OLS regression using the R lm() function on a few made-up data pts in the SQL database "my_schema.data", and then outputs the object as a SQL database.
My strategy is to first create an empty SQL database named "my_schema.ols_model_db" which will then be populated with the values in the ols_model object which has been transformed into a data.frame class.
I'm almost there, but can't quite figure out how to convert the ols_model object into an R data.frame, nor do I know what the column headers will be (which we need to know in advance in order to populate the empty SQL database my_schema.ols_model_db).
Which code should be inserted into "???" in the program below?
my_schema.data
y x
1 5
2 9
3 17
4 26
CREATE COLUMN TABLE "my_schema"."my_schema.ols_model_db"(???);
CREATE PROCEDURE my_schema.proc_Rcode( IN train my_schema.data, OUT ols_model_db my_schema.ols_model_db )
LANGUAGE RLANG AS BEGIN
ols_model <- lm(y ~ x, data=train)
ols_model_db <- data.frame(g=I(lapply(ols_model, function(x) {serialize(x, NULL)})))
???
END
CALL my_schema.proc_Rcode( my_schema.data my_schema.ols_model_db )

BULK INSERT - header and data rows with different delimiters

I'm using the following BULK INSERT command
BULK INSERT dbo.A
FROM 'd:\AData.csv'
WITH (FIELDTERMINATOR = ',',ROWTERMINATOR = ',\n',FIRSTROW = 2)
to process the data shown. My import skips the first row but also skips the second row. In this case i believe this is because my header and data rows have different delimiters,the data rows have a training comma.
DATASET 1
Trial,Timestep,Column1 - line 1
1,0,0,- line 2
1,1,0.00687237750794734, - line 3
1,2,-0.00190074803257245,- line 4
The import works with this data (note the comma on line 1)
DATASET 2
Trial,Timestep,Column1, - line 1
1,0,0,- line 2
1,1,0.00687237750794734, - line 3
1,2,-0.00190074803257245,- line 4
Is there a way to tweak the parameters of the BULK INSERT command to handle DATASET1 without using a custom formatting file?
Delete header row from your file and you should be good to go.
Your data rows have a comma at the end, but your header row doesn't. Get rid of the last commas in the data rows and try again.

Loop 5 records at a time and assign it to variable

I have a table of 811 records. I want to get five records at a time and assign it to variable. Next time when I run the foreach loop task in SSIS, it will loop another five records and overwrite the variable. I have tried doing with cursor but couldn't find the solution. Any help will be highly appreciated. I have table like this for e.g.
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
6 Opq66
7 Rst77
. .
. .
. .
I want query should take first five names as follows and assign it to variable
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
Then next loop takes another five name and overwrite the variable value and so on till the last record is consumed.
Taking ltn's answer into consideration this is how you can achieve limiting the rows in SSIS.
The Design will look like
Step 1 : Create the variables
Name DataType
Count int
Initial int
Final int
Step 2 : For the 1st Execute SQL Task write the sql to store the count
Select count(*) from YourTable
In the General tab of this task Select the ResultSet as Single Row.
In the ResultSet tab map the result to the variable
ResultName VariableName
0 User::Count
Step 3 : In the For Loop container enter the expression as shown below
Step 4 : Inside the For Loop drag an Execute SQL Task and write the expression
In Parameter Mapping map the initial variable
VariableName Direction DataType ParameterName ParameterSize
User::Initial Input NUMERIC 0 -1
Result Set tab
Result Name Variable Name
0 User::Final
Inside the DFT u can write the sqL to get the particular rows
Click on Parameters and select the variable INITIAL and FINAL
if your data will not be update between paging cycles and the sort order is always the same then you could try an approach similiar to:
CREATE PROCEDURE TEST
(
#StartNumber INT,
#TakeNumber INT
)
AS
SELECT TOP(#TakeNumber)
*
FROM(
SELECT
RowNumber=ROW_NUMBER() OVER(ORDER BY IDField DESC),
NameField
FROM
TableName
)AS X
WHERE RowNumber>=#StartNumber

Resources