How to output an R lm() object into a SQL database? - sql-server

I've been tinkering with running R commands on a SQL server by calling a procedure which runs an OLS regression using the R lm() function on a few made-up data pts in the SQL database "my_schema.data", and then outputs the object as a SQL database.
My strategy is to first create an empty SQL database named "my_schema.ols_model_db" which will then be populated with the values in the ols_model object which has been transformed into a data.frame class.
I'm almost there, but can't quite figure out how to convert the ols_model object into an R data.frame, nor do I know what the column headers will be (which we need to know in advance in order to populate the empty SQL database my_schema.ols_model_db).
Which code should be inserted into "???" in the program below?
my_schema.data
y x
1 5
2 9
3 17
4 26
CREATE COLUMN TABLE "my_schema"."my_schema.ols_model_db"(???);
CREATE PROCEDURE my_schema.proc_Rcode( IN train my_schema.data, OUT ols_model_db my_schema.ols_model_db )
LANGUAGE RLANG AS BEGIN
ols_model <- lm(y ~ x, data=train)
ols_model_db <- data.frame(g=I(lapply(ols_model, function(x) {serialize(x, NULL)})))
???
END
CALL my_schema.proc_Rcode( my_schema.data my_schema.ols_model_db )

Related

Pandas insert into SQL Server

I've read in an excel file with 5 columns into a dataframe (using Pandas) and I'm trying to write it to an existing empty sql server table using this code
for index, row in df.iterrows():
PRCcrsr.execute("Insert into table([Field1], [Field2], [Field3], [Field4], [Field5]) VALUES(?,?,?,?,?)"
, row['dfcolumn1'],row['dfcolumn2'], row['dfcolumn3'], row['dfcolumn4'], row['dfcolumn5'])
I get the following error message:
TypeError: execute() takes from 2 to 5 positional arguments but 7 were given
df.shape says I have 5 columns but when I print the df to the screen it includes the RowNumber. Also one of the columns is city_state which includes a comma. Is this the reason it thinks I'm providing 7 arguments(5 actual columns + row number + the comma issue)? Is there a way to deal with the comma and rowindex columns in the dataframe before writing in to SQL Server? If shape says 5 columns why am I getting this error?
The code above indicated 7 parameters were being passed to the cursor execute command and only between 2 and 5 are permissible. I am actually passing 7 parameters (Insert into, Values, and row[dfcolumn1, 2, 3, 4, 5 - 7 in all). The fix was to convert the row[dfcolumn1] to a tuple using this code
new tuple = [tuple(r) for r in df.values.tolist()]
then I rewrote the for loop as follows:
for tuple in new_tuple:
PRCcrsr.execute = Insert into table([Field1], [Field2], [Field3], [Field4], [Field5]) VALUES(?,?,?,?,?)", tuple)
This delivered the fields as a tuple and inserted correctly

SSIS foreach loop to group all unique customers in a table and write them to their own file

I have a table which stores all of my customers and their invoices (less than 5k total), I want to to use a foreach loop container to write each one of these (customers) to their own file listing their own invoices.
I have used a foreach loop container to read/load/write files before so I understand that part but how do I apply the foreach loop on the AccountNumber as the enumerator?
For each file, I only want that customers info.
My table:
AccountNumber InvoiceNumber OriginalCharge
A255 2017-11 225.00
A255 2017-12 13.50
A255 2018-01 25.00
D870 2017-09 7.25
D870 2017-10 10.00
R400 2016-12 100.00
R400 2017-03 5.00
R400 2017-04 7.00
R400 2017-09 82.00
So this would produce 3 files and would include the invoices/original charge for the given customers.
File 1 = Customer A255
File 2 = Customer D870
File 3 = Customer R400
Or should I approach this differently?
Environment: SQL Server 2014
SSIS-2012
Thanks!
You'll need to apply a few different recipes to make this work.
Dynamic file name
Source query parameterization
Shredding record set
Assumptions
You have three SSIS Variables:
CurrentAccountNumber String (initial value of A255)
rsAccountNumbers Object
FileNameOutput String EvaluateAsExpression = True "C:\\ssisdata\output\\" + #[User::CurrentAccountNumber] + ".txt"
The package would look something like
[Execute SQL Task] -> [Foreach (Ado.net) Enumerator] -> [Data Flow Task]
Execute SQL Task
Set the resultset type to Full
Your source query would be SELECT DISTINCT AccountNumber FROM dbo.Invoices;
In the Results tab, assuming OLE DB Connection Manager, click add result button and use a "name" of 0 and the variable becomes User::rsAccountNumbers
Foreach (Ado.net) Enumerator
Set your enumerator type as Ado.NET and single table. Use the variable User::rsAccountNumbers and assign the zeroeth element to our variable CurrentAccountNumber
Run the package as is to verify the Execute SQL Task is returning a resultset that the Foreach can shred. Observe that each loop in the enumerator results in the value of our Variable FileNameOutput changing (C:\ssisdata\output\A255.txt, C:\ssisdata\output\D870.txt, etc)
Data flow task
This a simple flow
[OLE DB Source] -> [Flat File Destination]
Configure your OLE DB Source to be a Query SELECT * FROM dbo.Invoices WHERE D.AccountNumber = ?;
Click the Parameter button. Configure the name 0 to be #[User::CurrentAccountNumber]
Flat File Destination - Connect the Source to the destination, create a new
Flat File Connection Manager and connect the columns.
Dynamic file name
The final piece will be to edit the Flat File Connection manager created above to use the variable FileNameOutput instead of the hard coded value you indicated. Right click on the Flat File Connection manager and select Properties. In the resulting properties window, find the Expressions property and click the ellipses (...) In the lefthand window, find ConnectionString and in the righthand window, use #[User::FileNameOutput]
F5 and the package should fire up and generate an output file per account number.

R Converting SQL Server Query with Geometry Datatype to spatialpolygonsdataframe

I am trying to plot geometry (binary) polygon data from an SQL Server data source. What I want to do is use the Geometry Data Type from the SQL query for the polygons, and also the rest of the columns in the query as the #data attribute table within the SpatialPolygonsDataFrame class.
This is my code so far, to get the SQL query data into a simple data.frame and convert the binary datatype using wkb::readWKB().
From this stage, I do not know how to create the SpatialPolygonsDataFrame dataframe.
library(RODBC)
library(maptools)
library(rgdal)
library(ggplot2)
dbhandle <- odbcDriverConnect("connection string",rows_at_time = 1)
sqlStatement <- "SELECT ID
, shape.STAsBinary() as shape
, meshblock_number
, areaunit_code
, dpz_code
, catchment_id
FROM [primary_parcels] hp "
sqlStatement <- gsub("[\r\n]", "", sqlStatement)
parcelData <- sqlQuery(dbhandle,sqlStatement )
odbcClose(dbhandle)
parcelData$shape <- wkb::readWKB(parcelData$shape)
This might be too late, however, with the help of a friend and this I found a solution for that. It seems a bit funny, but I am also working on a similar set of data and had the same problem. Please remember, it is difficult to replicate your approach as you did not provide a replicable example. You also need to adjust the projection. Please note that it is easier to build the connection in odbc once and call it each time, rather than writing the connection each time.
library(rgeos)
library(mapview)
library(raster)
library(dplyr)
library(sp)
#You may ignore this
odbcCh<-odbcConnect("Rtest")
sqlStatement=sqlQuery(odbcCh, 'SELECT ID , shape.STAsBinary() as shape, meshblock_number , areaunit_code, dpz_code, catchment_id FROM [primary_parcels] hp')
parcelData <- sqlQuery(odbcCh,sqlStatement )
#untill here
things <- vector("list", 1)
z = 0
for(line in parcelData$shape)
{
{
things[[z+1]]<-readWKT(line)
}
z = z + 1
}
Things <- do.call(bind,things)
Things.df= SpatialPolygonsDataFrame(Things,data.frame(parcelData$ID,parcelData$catchment_id))
plot(Things.df)
#you may not need the rest
Things.df#proj4string= CRS("+proj=nzmg +lat=-41.0 +lon=173.0 +x_0=2510000.0 +y_0=6023150.0 +ellps=intl +units=m")
mapview(Things.df)

Referencing SQL temporary table in R Studio

I have run a query creating a temporary table in SQL that I am now trying to analyze in R Studio. I don't think I am properly pulling the data into R Studio, and as a result, I cannot perform calculations on it. I am first running:
cn <- odbcDriverConnect(connection="Driver={SQLServer};
server=servername;database=databasename;trusted_connection=yes;")
new_data<-sqlQuery(cn,"SELECT TOP 1000 * FROM #TempDatabaseName")
After I run this code, new_data appears under Values in the R Environment, not under Data. Then, when I run:
new_reg<-lm(yvar~xvar,data=new_data)
I get the following error:
Error in eval(predvars, data, env) : invalid 'envir' argument of type 'character'
Can anyone help me out?
Your lm function is looking for a data.frame. Try running your query with as.data.frame().
cn <- odbcDriverConnect(connection="Driver={SQLServer};
server=servername;database=databasename;trusted_connection=yes;")
new_data<-as.data.frame(sqlQuery(cn,"SELECT TOP 1000 * FROM
#TempDatabaseName"))
new_reg<-lm(yvar~xvar,data=new_data)

get multiple table result of SQL Server in R [duplicate]

This question already has an answer here:
How to read multiple result sets returned from a SQL Server stored procedure in R
(1 answer)
Closed 3 years ago.
I am using RODBC to query data from SQL SERVER.
How can I get both tables when the result contains two tables?
Currently my code is as follow
library(RODBC)
channel <- odbcDriverConnect("driver={SQL Server};server=xxxx;atabase=xx;uid=xx;pwd=xxx")
initdata<- sqlQuery(channel,paste("select * from roles;select * from seat"))
odbcClose(channel)
The initdata contains result only from table roles
In fact, my query is a stored procedure like "exec XXX" and the stored procedure returns multiple tables. I wonder if there is a way to get all the result tables.
why don't you use 2 results or dataframes?
dataframes:
initroles <- sqlFetch(channel, "roles")
initseats <- sqlFetch(channel, "seat")
resultsets:
initroles <- sqlQuery(channel, "select * from roles")
initseats <- sqlQuery(channel, "select * from seat")
My syntax might be a little off, but I would try
SELECT * FROM roles JOIN seat ON roles.id = seat.id
Where roles.id and seat.id are the ID variables that link roles and seat.

Resources