T-SQL Temp Tables and Stored Procedures from R using RevoScaleR - sql-server

In the example below, I was able to get the query to work with one exception. When I use q in place of source.query during the RxSqlServerData step, I get the error rxCompleteClusterJob Execution halted.
The first goal is to use a stored procedure in place of a longer query. Is this possible?
The second goal would be to create and call upon a #TEMPORARY table within the stored procedure. I'm wondering if that is possible, as well?
library (RODBC)
library (RevoScaleR)
sqlConnString <- "Driver=SQL Server;Server=SAMPLE_SERVER; Database=SAMPLE_DATABASE;Trusted_Connection=True"
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
sql_share_directory <- paste("D:\\RWork\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlCompute <- RxInSqlServer(connectionString = sqlConnString, wait = sqlWait, consoleOutput = sqlConsoleOutput)
rxSetComputeContext(sqlCompute)
#This Sample Query Works
source.query <- paste("SELECT CASE WHEN [Order Date Key] = [Picked Date Key]",
"THEN 1 ELSE 0 END AS SameDayFulfillment,",
"[City Key] AS city, [STOCK ITEM KEY] AS item,",
"[PICKER KEY] AS picker, [QUANTITY] AS quantity",
"FROM [WideWorldImportersDW].[FACT].[ORDER]",
"WHERE [WWI ORDER ID] >= 63968")
#This Query Does Not
q <- paste("EXEC [dbo].[SAMPLE_STORED_PROCEDURE]")
inDataSource <- RxSqlServerData(sqlQuery=q, connectionString=sqlConnString, rowsPerRead=500)
order.logit.rx <- rxLogit(SameDayFulfillment ~ city + item + picker + quantity, data = inDataSource)
order.logit.rx

Currently, only T-SQL SELECT statements are allowed as input data-set, not stored procedures.

Related

SQL Server : nested looping over two Selects

I have the following two queries that produce the results I need. Now the final output I truly need I would usually use python for after the results are returned, but unfortunately only SQL can be used.
Query A:
SELECT *
FROM openquery(PROD, 'SELECT `status`, computer_name, device_type
FROM assets
WHERE (device_type="SERVER")
AND (status="ACTIVE")')
Query B:
SELECT *
FROM openquery(AppMap, 'SELECT `t1`.`uaid` AS `uaid`, `t3`.`computer_name`,
FROM ((`applications` `t1`
JOIN `app_infrastructure` `t2` ON (((`t1`.`uaid` = `t2`.`uaid`))))
JOIN `infrastructure` `t3` ON ((`t2`.`infrastructure_id` = `t3`.`infrastructure_id`)));')
How I would want to process the results:
if a computer_name is in both A and B:
final_row = ['computer_name', 1]
elseif a computer_name is in A but not B:
final_row = ['computer_name', 0]
elseif a computer_name is in B but not A:
final_row = ['computer_name', 2]
So my final query results need to look like those rows, does that make sense?
In a stored procedure, use both queries to load table variables.
Then do a FULL OUTER JOIN query, joining the two table variables on computer_name, and use a CASE expression to get your final_row value for each computer name.

Fetching Data from SQL Server Through Shiny Application on Date selection bases

Hey Every One i am developing a Shiny Application, where we Extract a data from sql Server through ODBC Connector by selecting Date to and from in a Application. i am unable to identify where the issue is because if i execute the code independently on R studio i am able to extract the data from sql Server But then when the same code is executed in Shiny Environment i am unable to achieve the data on shiny here is the below Kindly Guide me on this Thank you.
# ---------------------ui Code -----------------------------
library(shiny)
shinyUI(pageWithSidebar(
headerPanel("Time Analytics"),
sidebarPanel(
dateRangeInput(inputId = "dateRange",
label = "Date range",
start = "2007-09-17",
max = Sys.Date()
)
),#sidebar Panel Ends
# 09-Main Panel ----
mainPanel(
tabsetPanel(id ="theTabs",
tabPanel("Summary", dataTableOutput("tabi"),textOutput("tabii"))
)
)#Main Panel Ends
))
#------------------Server ----------------------------------
library(shiny);library(sqldf)
library(plyr);library(RODBC)
library(ggplot2)
#Creating the connection
shinyServer(function(input, output, session){ # pass in a session argument
# prep data once and then pass around the program
passData <- reactive({
ch = odbcConnect("Test")
#qry <- "SELECT * FROM Nifty50"
#qry <- cat("SELECT * FROM Nifty50 WHERE Date >= ",as.date(input$dateRange[1])," AND Date <= ",input$dateRange[2])
qry <- paste("SELECT * FROM Nifty50 WHERE Date >= ",input$dateRange[1]," AND Date <= ",input$dateRange[2])
#paste("SELECT * FROM Nifty50 WHERE Date >= ",input$dateRange[1]," AND Date <= ",input$dateRange[2])
subset_Table <- sqlQuery(ch,qry)
odbcClose(ch)
subset_Table <- as.data.frame(subset_Table)
return(subset_Table)
})
output$tabi <- renderDataTable({
d<- as.data.frame(passData())
d
})
output$tabii <- renderText({
paste("Minimium Data :",input$dateRange[1], "Max Date:",input$dateRange[2])
})
# ----------------------------------------------------End
})
Here the task is i need to fetch the data from selected Table on the bases of Date to and From criteria, which will be the subset data as per the selected Date from shiny app.
Modify qry as follows:
qry <- paste("SELECT * FROM Nifty50 WHERE Date >= '", input$dateRange[1], "' AND Date <= '", input$dateRange[2], "'", sep = "")

Sql server and R, data mining

I'm working on Microsoft SQL Management Studio 2016, using the feature that make me to add an R script into the SQL code.
My goals is to achieve an aPriori algorithm procedure, that puts the data in a manner that I like, i.e. a table with x, first object, y, second object.
I am stuck here, because in my opinion I have some problem in data. The error is this.
A 'R' script error occurred during execution of
'sp_execute_external_script' with HRESULT 0x80004004.
An external script error occurred: Error in eval(expr, envir, enclos)
: bad allocation Calls: source -> withVisible -> eval -> eval -> .Call
Here my code.
The source data are a table of two column like this:
A B
a f
f a
b c
...
y z
And here the code:
GO
create procedure dbo.apriorialgorithm as
-- delete old table
IF OBJECT_ID('Data') IS NOT NULL
DROP TABLE Data
-- create a table that store the query result.
CREATE TABLE Data ( art1 nvarchar(100), art2 nvarchar(100));
-- store the query
INSERT INTO Data ( art1, art2)
select
firstfield as art1,
secondfield as art2
from allthefields
;
IF OBJECT_ID('output') IS NOT NULL
DROP TABLE output
-- create table of the results of the analysis.
CREATE TABLE output (x nvarchar(100), y nvarchar(100));
INSERT INTO output (x, y)
-- R script.
EXECUTE sp_execute_external_script
#language = N'R'
, #script = N'
Now the R script. The data that I get from the query are numeric, but for the apriori, I need factors, so first I bend the data to factor;
df<-data.frame(x=as.factor("art1"),y=as.factor("art2"))
Then, I can apply the apriori:
library("arules");
library("arulesViz");
rules = apriori(df,parameter=list(minlen=2,support=0.05, confidence=0.05));
I need the data without the format of the rules, but simply the objects:
ruledf <- data.frame(
lhs <- labels(lhs(rules)),
rhs <- labels(rhs(rules)),
rules#quality)
a<-substr(ruledf$lhs,7,nchar(as.character( ruledf$lhs))-1)
b<-substr(ruledf$rhs,7,nchar(as.character( ruledf$rhs))-1)
ruledf2<-data.frame(a,b)
'
And the last part:
, #input_data_1 = N'SELECT * from Data'
, #output_data_1_name = N'ruledf2'
, #input_data_1_name = N'ruledf2';
GO
I do not know where I am failing, because doing the same things in R using RODBC to catch the db data, everything is ok.
Could you help me? Thanks in advance!
The problem was here, the R script is better this way:
EXECUTE sp_execute_external_script
#language = N'R'
, #script = N'
library("arules");
rules = apriori(df[, c("art1", "art2")], parameter=list(minlen=2,support=0.0005, confidence=0.0005));
ruledf <- data.frame(
lhs <- labels(lhs(rules)),
rhs <- labels(rhs(rules)),
rules#quality)
ruledf2<-data.frame(
lhs2<-substr(ruledf$lhs,7,nchar(as.character( ruledf$lhs))-1),
rhs2<-substr(ruledf$rhs,7,nchar(as.character( ruledf$rhs))-1)
)
colnames(ruledf2)<-c("a","b") '
Then it needs to have the right input and output:
, #input_data_1 = N'SELECT * from Data'
, #input_data_1_name = N'df'
, #output_data_1_name = N'ruledf2'
So the result is going to be a table named output like this
x y
artA artB
artB artA
...
artY artZ
Very helpful this.

How to pass data.frame for UPDATE with R DBI

With RODBC, there were functions like sqlUpdate(channel, dat, ...) that allowed you pass dat = data.frame(...) instead of having to construct your own SQL string.
However, with R's DBI, all I see are functions like dbSendQuery(conn, statement, ...) which only take a string statement and gives no opportunity to specify a data.frame directly.
So how to UPDATE using a data.frame with DBI?
Really late, my answer, but maybe still helpful...
There is no single function (I know) in the DBI/odbc package but you can replicate the update behavior using a prepared update statement (which should work faster than RODBC's sqlUpdate since it sends the parameter values as a batch to the SQL server:
library(DBI)
library(odbc)
con <- dbConnect(odbc::odbc(), driver="{SQL Server Native Client 11.0}", server="dbserver.domain.com\\default,1234", Trusted_Connection = "yes", database = "test") # assumes Microsoft SQL Server
dbWriteTable(con, "iris", iris, row.names = TRUE) # create and populate a table (adding the row names as a separate columns used as row ID)
update <- dbSendQuery(con, 'update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Length"=?, "Petal.Width"=?, "Species"=? WHERE row_names=?')
# create a modified version of `iris`
iris2 <- iris
iris2$Sepal.Length <- 5
iris2$Petal.Width[2] <- 1
iris2$row_names <- rownames(iris) # use the row names as unique row ID
dbBind(update, iris2) # send the updated data
dbClearResult(update) # release the prepared statement
# now read the modified data - you will see the updates did work
data1 <- dbReadTable(con, "iris")
dbDisconnect(con)
This works only if you have a primary key which I created in the above example by using the row names which are a unique number increased by one for each row...
For more information about the odbc package I have used in the DBI dbConnect statement see: https://github.com/rstats-db/odbc
Building on R Yoda's answer, I made myself the helper function below. This allows using a dataframe to specify update conditions.
While I built this to run transaction updates (i.e. single rows), it can in theory update multiple rows passing a condition. However, that's not the same as updating multiple rows using an input dataframe. Maybe somebody else can build on this...
dbUpdateCustom = function(x, key_cols, con, schema_name, table_name) {
if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")
if (!all(key_cols %in% colnames(x))) stop("All columns specified in 'key_cols' must be present in 'x'")
# Build the update string --------------------------------------------------
df_key <- dplyr::select(x, one_of(key_cols))
df_upt <- dplyr::select(x, -one_of(key_cols))
set_str <- purrr::map_chr(colnames(df_upt), ~glue::glue_sql('{`.x`} = {x[[.x]]}', .con = con))
set_str <- paste(set_str, collapse = ", ")
where_str <- purrr::map_chr(colnames(df_key), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))
where_str <- paste(where_str, collapse = " AND ")
update_str <- glue::glue('UPDATE {schema_name}.{table_name} SET {set_str} WHERE {where_str}')
# Execute ------------------------------------------------------------------
query_res <- DBI::dbSendQuery(con, update_str)
DBI::dbClearResult(query_res)
return (invisible(TRUE))
}
Where
x: 1-row dataframe that contains 1+ key columns, and 1+ update columns.
key_cols: character vector, of 1 or more column names that are the keys (i.e. used in the WHERE clause)
Here is a little helper function I put together using REPLACE INTO to update a table using DBI, replacing old duplicate entries with the new values. It's basic and for my own needs, but should be easy to modify. All you need to pass to the function is the connection, table name, and dataframe. Note that the table must have a PRIMARY KEY column. I've also included a simple working example.
row_to_list <- function(Y) suppressWarnings(split(Y, f = row(Y)))
sql_val <- function(y){
if(!is.numeric(y)){
return(paste0("'",y,"'"))
}else{
if(is.na(y)){
return("NULL")
}else{
return(as.character(y))
}
}
}
to_sql_row <- function(x) paste0("(",paste(do.call("c", lapply(x, FUN = sql_val)), collapse = ", "),")")
bracket <- function(x) paste0("`",x,"`")
to_sql_string <- function(x) paste0("(",paste(sapply(x, FUN = bracket), collapse = ", "),")")
replace_into_table <- function(con, table_name, new_data){
#new_data <- data.table(new_data)
cols <- to_sql_string(names(new_data))
vals <- paste(lapply(row_to_list(new_data), FUN = to_sql_row), collapse = ", ")
query <- paste("REPLACE INTO", table_name, cols, "VALUES", vals)
rs <- dbExecute(con, query)
return(rs)
}
tb <- data.frame("id" = letters[1:20], "A" = 1:20, "B" = seq(.1,2,.1)) # sample data
dbWriteTable(con, "test_table", tb) # create table
dbExecute(con, "ALTER TABLE test_table ADD PRIMARY KEY (id)") # set primary key
new_data <- data.frame("id" = letters[19:23], "A" = 1:5, "B" = seq(101,105)) # new data
new_data[4,2] <- NA # add some NA values
new_data[5,3] <- NA
table_name <- "test_table"
replace_into_table(con, "test_table", new_data)
result <- dbReadTable(con, "test_table")

Converting complex sql stored proc into linq

I'm using Linq to Sql and have a stored proc that won't generate a class. The stored proc draws data from multiple tables into a flat file resultset.
The amount of data returned must be as small as possible, the number of round trips to the Sql Server need to be limited, and the amount of server-side processing must be limited as this is for an ASP.NET MVC project.
So, I'm trying to write a Linq to Sql Query however am struggling to both replicate and limit the data returned.
Here's the stored proc that I'm trying to convert:
SELECT AdShops.shop_id as ID, Users.image_url_75x75, AdShops.Advertised,
Shops.shop_name, Shops.title, Shops.num_favorers as hearts, Users.transaction_sold_count as sold,
(select sum(L4.num_favorers) from Listings as L4 where L4.shop_id = L.shop_id) as listings_hearts,
(select sum(L4.views) from Listings as L4 where L4.shop_id = L.shop_id) as listings_views,
L.title AS listing_title, L.price as price, L.listing_id AS listing_id, L.tags, L.materials, L.currency_code,
L.url_170x135 as listing_image_url_170x135, L.url AS listing_url, l.views as listing_views, l.num_favorers as listing_hearts
FROM AdShops INNER JOIN
Shops ON AdShops.shop_id = Shops.shop_id INNER JOIN
Users ON Shops.user_id = Users.user_id INNER JOIN
Listings AS L ON Shops.shop_id = L.shop_id
WHERE (Shops.is_vacation = 0 AND
L.listing_id IN
(
SELECT listing_id
FROM (SELECT l2.user_id , l2.listing_id, RowNumber = ROW_NUMBER() OVER (PARTITION BY l2.user_id ORDER BY NEWID())
FROM Listings l2
INNER JOIN (
SELECT user_id
FROM Listings
GROUP BY
user_id
HAVING COUNT(*) >= 3
) cnt ON cnt.user_id = l2.user_id
) l2
WHERE l2.RowNumber <= 3 and L2.user_id = L.user_id
)
)
ORDER BY Shops.shop_name
Now, so far I can return a flat file but am not able to limit the number of listings. Here's where I'm stuck:
Dim query As IEnumerable = From x In db.AdShops
Join y In (From y1 In db.Shops
Where y1.Shop_name Like _Search + "*" AndAlso y1.Is_vacation = False
Order By y1.Shop_name
Select y1) On y.Shop_id Equals x.shop_id
Join z In db.Users On x.user_id Equals z.User_id
Join l In db.Listings On l.Shop_id Equals y.Shop_id
Select New With {
.shop_id = y.Shop_id,
.user_id = z.user_id,
.listing_id = l.Listing_id
} Take 24 ' Fields ommitted for briefity...
I assume to select a random set of 3 listings per shop, I'd need to use a lambda expression however am not sure how to do this. Also, need to add in somewhere consolidated totals for listing fieelds against individual shops...
Anyone have any thoughts?
UPDATE:
Here's the current solution that I'm looking at:
Result class wrapper:
Public Class NewShops
Public Property Shop_id As Integer
Public Property listing_id As Integer
Public Property tl_listing_hearts As Integer?
Public Property tl_listing_views As Integer?
Public Property listing_creation As Date
End Class
Linq + code:
Using db As New Ads.DB(Ads.DB.Conn)
Dim query As IEnumerable(Of IGrouping(Of Integer, NewShops)) =
(From x In db.AdShops
Join y In (From y1 In db.Shops
Where (y1.Shop_name Like _Search + "*" AndAlso y1.Is_vacation = False)
Select y1
Skip ((_Paging.CurrentPage - 1) * _Paging.ItemsPerPage)
Take (_Paging.ItemsPerPage))
On y.Shop_id Equals x.shop_id
Join z In db.Users On x.user_id Equals z.User_id
Join l In db.Listings On l.Shop_id Equals y.Shop_id
Join lt In (From l2 In db.Listings _
Group By id = l2.Shop_id Into Hearts = Sum(l2.Num_favorers), Views = Sum(l2.Views), Count() _
Select New NewShops With {.tl_listing_views = Views,
.tl_listing_hearts = Hearts,
.Shop_id = id})
On lt.Shop_id Equals y.Shop_id
Select New NewShops With {.Shop_id = y.Shop_id,
.tl_listing_views = lt.tl_listing_views,
.tl_listing_hearts = lt.tl_listing_hearts,
.listing_creation = l.Creation,
.listing_id = l.Listing_id
}).GroupBy(Function(s) s.Shop_id).OrderByDescending(Function(s) s(0).tl_listing_views)
Dim Shops as New Dictionary(Of String, List(Of NewShops))
For Each item As IEnumerable(Of NewShops) In query
Shops.Add(item(0).shop_name, (From i As NewShops In item
Order By i.listing_creation Descending
Select i Take 3).ToList)
Next
End Using
Anyone have any other suggestions?
From the looks of that SQL and code, I'd not be turning it into LINQ queries. It'll just obfuscate the logic and probably take you days to get it correct.
If SQLMetal doesn't generate it properly, have you considered using the ExecuteQuery method of the DataContext to return a list of the items you're after?
Assuming that your sproc you're trying to convert is called sp_complicated, and takes in one parameter, something like the following should do the trick
Protected Class TheResults
Public Property ID as Integer
Public Property image_url_75x75 as String
'... and so on and so forth for all the returned columns. Be careful with nulls
End Class
'then, when you want to use it
Using db As New Ads.DB(Ads.DB.Conn)
dim results = db.ExecuteQuery(Of TheResults)("exec sp_complicated {0}", _Search)
End Using
Before you freak out, that's not susceptible to SQL Injection. L2SQL uses proper SQLParameters, as long as you use the squigglies and don't just concatenate the strings yourself.

Resources