Sql server and R, data mining - sql-server

I'm working on Microsoft SQL Management Studio 2016, using the feature that make me to add an R script into the SQL code.
My goals is to achieve an aPriori algorithm procedure, that puts the data in a manner that I like, i.e. a table with x, first object, y, second object.
I am stuck here, because in my opinion I have some problem in data. The error is this.
A 'R' script error occurred during execution of
'sp_execute_external_script' with HRESULT 0x80004004.
An external script error occurred: Error in eval(expr, envir, enclos)
: bad allocation Calls: source -> withVisible -> eval -> eval -> .Call
Here my code.
The source data are a table of two column like this:
A B
a f
f a
b c
...
y z
And here the code:
GO
create procedure dbo.apriorialgorithm as
-- delete old table
IF OBJECT_ID('Data') IS NOT NULL
DROP TABLE Data
-- create a table that store the query result.
CREATE TABLE Data ( art1 nvarchar(100), art2 nvarchar(100));
-- store the query
INSERT INTO Data ( art1, art2)
select
firstfield as art1,
secondfield as art2
from allthefields
;
IF OBJECT_ID('output') IS NOT NULL
DROP TABLE output
-- create table of the results of the analysis.
CREATE TABLE output (x nvarchar(100), y nvarchar(100));
INSERT INTO output (x, y)
-- R script.
EXECUTE sp_execute_external_script
#language = N'R'
, #script = N'
Now the R script. The data that I get from the query are numeric, but for the apriori, I need factors, so first I bend the data to factor;
df<-data.frame(x=as.factor("art1"),y=as.factor("art2"))
Then, I can apply the apriori:
library("arules");
library("arulesViz");
rules = apriori(df,parameter=list(minlen=2,support=0.05, confidence=0.05));
I need the data without the format of the rules, but simply the objects:
ruledf <- data.frame(
lhs <- labels(lhs(rules)),
rhs <- labels(rhs(rules)),
rules#quality)
a<-substr(ruledf$lhs,7,nchar(as.character( ruledf$lhs))-1)
b<-substr(ruledf$rhs,7,nchar(as.character( ruledf$rhs))-1)
ruledf2<-data.frame(a,b)
'
And the last part:
, #input_data_1 = N'SELECT * from Data'
, #output_data_1_name = N'ruledf2'
, #input_data_1_name = N'ruledf2';
GO
I do not know where I am failing, because doing the same things in R using RODBC to catch the db data, everything is ok.
Could you help me? Thanks in advance!

The problem was here, the R script is better this way:
EXECUTE sp_execute_external_script
#language = N'R'
, #script = N'
library("arules");
rules = apriori(df[, c("art1", "art2")], parameter=list(minlen=2,support=0.0005, confidence=0.0005));
ruledf <- data.frame(
lhs <- labels(lhs(rules)),
rhs <- labels(rhs(rules)),
rules#quality)
ruledf2<-data.frame(
lhs2<-substr(ruledf$lhs,7,nchar(as.character( ruledf$lhs))-1),
rhs2<-substr(ruledf$rhs,7,nchar(as.character( ruledf$rhs))-1)
)
colnames(ruledf2)<-c("a","b") '
Then it needs to have the right input and output:
, #input_data_1 = N'SELECT * from Data'
, #input_data_1_name = N'df'
, #output_data_1_name = N'ruledf2'
So the result is going to be a table named output like this
x y
artA artB
artB artA
...
artY artZ
Very helpful this.

Related

How to Declare Input Variables in Stored Procedure with R and TSQL?

I have integrated an R model into a stored procedure using the R Tools for Visual Studio guidance. The syntax is as follows:
ALTER PROCEDURE [dbo].[spRegressionPeak]
AS
BEGIN
EXEC sp_execute_external_script #language = N'R'
, #script = N'
# #InputDataSet: input data frame, result of SQL query execution
# #OutputDataSet: data frame to pass back to SQL
# Test code
#library(RODBC)
# channel <- odbcDriverConnect(dbConnection)
# InputDataSet <- sqlQuery(channel,iconv(paste(readLines(''~/visual studio 2017/prod360/regressionpeak.query.sql'', encoding = ''UTF-8'', warn = FALSE), collapse=''\n''), from = ''UTF-8'', to = ''ASCII'', sub = '''') )
# odbcClose(channel)
#'' Regression Peaks
#''
#'' Runs polynomial regressions on a data table with one model for each
#'' user ID - independent variable pair. Note that independent variables
#'' are identified as all columns matching the following pattern: the
#'' letter "c" followed by a one-or-more digit number. The dependent
#'' variable is identified by its name "dv". The user ID is identified by
#'' its name "id". Also note that the regressors are the means of the
#'' original observations, grouped by \code{code}.
#''
#'' #param x Table to run regressions on
#'' #param c_means Code means table
#'' #importFrom rlang .data
#''
#'' #return Summary table where each distinct \code{code} value is
#'' represented by one row with columns for the respective standard
#'' deviations of each independent variable.
#'' #export
regression_peak <- function(x, c_means) {
df <-
dplyr::select(x, .data$id, .data$code, .data$dv) %>%
dplyr::left_join(c_means, by = "code")
id <- unique(df$id)
iv <- names(df)[stringr::str_detect(names(df), "c\\d+")]
grid <- tidyr::crossing(id, iv)
peaks <- purrr::map2_df(grid$id, grid$iv, function(i_id, i_iv) {
x <-
dplyr::filter(df, .data$id == i_id) %>%
dplyr::select_at(dplyr::vars(.data$dv, i_iv)) %>%
dplyr::rename(iv = !!i_iv)
fit <- stats::lm(dv ~ iv + I(iv ^ 2), data = x)
coef_a <- stats::coef(fit)["iv"]
coef_b <- stats::coef(fit)["I(iv^2)"]
extr <-
tibble::tibble(
type = c(2, 1, 1),
iv = c(unname(-coef_a / (2 * coef_b)), min(x$iv), max(x$iv))
) %>%
dplyr::mutate(y = stats::predict(fit, newdata = ., type = "response"))
t_max <- extr$iv[extr$type == 2]
tibble::tibble(
id = i_id,
iv = i_iv,
max = dplyr::case_when(
min(x$iv) < t_max & t_max < max(x$iv) ~ extr$iv[which.max(extr$y)],
TRUE ~ extr$iv[which.max(extr$y[extr$type == 1])]
) # should be x value
)
})
tidyr::spread(peaks, .data$iv, .data$max) %>%
dplyr::select(.data$id, iv) %>%
dplyr::filter_at(dplyr::vars(dplyr::matches("c\\d+")),
dplyr::any_vars(!is.na(.)))
}
OutputDataSet <- InputDataSet
'
, #input_data_1 = N'-- Place SQL query retrieving data for the R stored procedure here
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX),
#StudyID int,
#sStudyID VARCHAR(50)
Select a.StudyId, a.RespID, p.ProductNumber, p.ProductSequence,
CONVERT(varchar(50),a.DateAdded,101) as StudyDate,
CONVERT(VARCHAR(15),CAST((a.DateAdded)AS TIME),100) as
StudyTime,DATENAME(dw,a.DateAdded) as [DayOfWeek],
p.A_Value as A,p.B_Value as B,p.C_Value as C,p.D_Value AS D,p.E_Value AS
E,p.F_Value AS F, q.QuestionNumber
from answers a
inner join Products p on a.ProductID = p.ProductID
inner join Questions q on a.QuestionID = q.QuestionID
where a.StudyID = #sStudyID'
--- Edit this line to handle the output data frame.
WITH RESULT SETS (([StudyID] int, [RespID] int, [ProductNumber] int,
[ProductSequence] int, [StudyDate] date, [StudyTime] time, [DayOfWeek]
VARCHAR(10),[QuestionNumber] int, [A] int, [B] int, [C] int, [D] int, [E]
int, [F] INT));
END;
When I execute the stored procedure in SQL Server Management Studio, the Return Value = 0 and no data is output. I'm not sure that the variables are being appropriately declared as I'm not prompted for them when I execute the stored procedure.
How do I modify the stored procedure to return the intended data? Can I call this from ASP.NET by providing the appropriate study ID?
Your code is very hard to read, but to me, it looks like you are not declaring the parameters you want to get out. Below is an example of how you can do it:
DECLARE #out_val float;
exec sp_execute_external_script
#language = N'R',
#script = N'
iris_dataset <- iris
setosa <- iris[iris$Species == "setosa",]
menSepWidth <- mean(setosa$Sepal.Width)
iris_dataset$Sepal.Length <- iris_dataset$Sepal.Length * multiplier
OutputDataSet <- data.frame(iris_dataset$Sepal.Length)
',
#params = N'#multiplier float, #menSepWidth float OUTPUT',
#multiplier = 5,
#menSepWidth = #out_val OUTPUT
WITH RESULT SETS ((SepalLength float));
SELECT #out_val AS MeanSepWidth
Have a look at this blog post where I talk about how to handle parameters etc. when you use sp_execute_external_script.
Hope this helps!

SQL Server float comparison in stored procedure

Unfortunately, I have two tables to compare float datatypes between. I've read up on trying casts, converts, using a small difference and tried them all.
The strange part is, this only fails when I'm executing a stored procedure. If I cut-and-paste the body of the stored procedure into a SSMS window, it works just great.
Sample SQL:
set #newEnvRiskLevel = -1
select
#newEnvRiskLevel = rl.RiskLevelId
from
LookupTypes lt
inner join
RiskLevels rl on lt.LookupTypeId = rl.RiskLevelTypeFk
where
lt.Code = 'RISK_LEVEL_ENVIRONMENTAL'
and convert(numeric(1, 0), rl.RiskFactor) = #newEnvScore
set #errorCode = ##ERROR
if (#newEnvRiskLevel = -1 or #errorCode != 0)
begin
print 'newEnvScore = ' + cast(#newEnvScore as varchar) + ' and risk level = ' + cast(isnull(#newEnvRiskLevel, -1) as varchar)
print 'ERROR finding environmental risk level for code ' + #itemCode + ', skipping record'
set #recordsErrored = #recordsErrored + 1
goto NEXTREC
end
My #newEnvScore variable is also a float converted to numeric(1, 0). I've verified that there are only 0, 1, 2, and 3 for values in the RiskFactor column, and (via debug) that #newEnvScore has a value of 2. I've also verified that my query has a row with code = 'RISK_LEVEL_ENVIRONMENTAL' and RiskFactor = 2.
I've verified via debug that failure is due to #newEnvRiskLevel staying at -1 and that #errorCode is 0.
I've also tried cast to both decimal and int, convert to int, and "rl.RiskFactor - #newEnvScore < 1" in my where clause, none of which set newEnvRiskLevel.
As I say, it's only when running this as a stored procedure that failure happens, which is the part I really don't understand. I'd expect SQL Server to be deterministic, whether the SQL is running the body of a stored procedure, or running the exact same SQL in a SSMS tab.
It is unfortunate that you do post neither your stored procedure nor a complete script. It is difficult to diagnose a problem without a useful demonstration. But I see the use of "goto" which is concerning in many ways. I also see the use of a select statement to assign a local variable - which is often a problem because the developer might be assuming an assignment always occurs. To demonstrate - with a bonus at the end
set nocount on;
declare #risk smallint;
declare #risklevels table (risklevel float primary key, code varchar(10));
insert #risklevels(risklevel, code) values (1, 'test'), (2, 'test'), (-5, 'test');
-- here is your assignment logic. Notice that #risk is
-- never changed because there are no matching rows.
set #risk = 0;
select #risk = risklevel from #risklevels where code = 'zork';
select #risk;
-- here is a better IMO way to make the assignment. Note that
-- #risk is set to NULL when there are no matching rows.
set #risk = -1;
set #risk = (select risklevel from #risklevels where code = 'zork');
select #risk;
-- and a last misconception. What value is #risk set to? and why?
set #risk = -1;
select #risk = risklevel from #risklevels where code = 'test';
select #risk;
Whether this is the source of your problem (or contributes to it) I can't say. But it is a possibility. And storing integers in a floating point datatype is just a problem generally. Even if you cannot change your table, you can change your local variables and force the use of a more appropriate datatype. So perhaps that is another change you should consider.

R scripts in SQL Server 2016 corrupted with  character

I have found a strange behaviour of SQL Server 2016 when handling broken pipes inside R scripts. See the T-SQL code below:
DECLARE
#r nvarchar(100)
/* Create a data frame with a broken pipe as one of its fields and a simple ASCII encoded string in another. */
SET #r = N'
df <- data.frame(
a = "¦",
b = "a,b,c"
)';
/* Print #r to detect the inclusion of any unwanted characters. */
PRINT #r;
/* Execute and retrieve the output. */
EXECUTE sp_execute_external_script
#language = N'R',
#script = #r,
#output_data_1_name = N'df'
WITH RESULT SETS ((
BadEncodingColumn varchar(2),
GoodEncodingColumn varchar(5)
));
The PRINT command returns this in the Messages tab:
df <- data.frame(
a = "¦",
b = "a,b,c"
)
However, the final Results tab looks like this:
BadEncodingColumn GoodEncodingColumn
¦ a,b,c
This behaviour seems to emerge at the EXECUTE sp_execute_external_script phase of the script, and I have seen this character (Â) when dealing with other encoding issues with Excel, R and other versions of SQL Server.
Any solutions to this behaviour? And bonus points, what is 'special' about the  character?
Edit: I have tried tinkering with data types inside SQL Server and R to no avail.
The issue appears to be with encoding of non-ASCII characters in the R script (broken pipe is outside the 128 ASCII characters). You can override the encoding using the ‘Encoding’ function explicitly to Unicode(UTF-8) to workaround the issue. For instance your script can be updated as follows
DECLARE
#r nvarchar(100)
/* Create a data frame with a broken pipe as one of its fields and a simple ASCII encoded string in another. */
SET #r = N'
df <- data.frame(
a = "¦",
b = "a,b,c"
)
Encoding(levels(df$a)) <- "UTF-8" ###### Encoding override'
/* Print #r to detect the inclusion of any unwanted characters. */
PRINT #r;
/* Execute and retrieve the output. */
EXECUTE sp_execute_external_script
#language = N'R',
#script = #r,
#output_data_1_name = N'df'
WITH RESULT SETS ((
BadEncodingColumn varchar(2),
GoodEncodingColumn varchar(5)
));
Produces the following results
BadEncodingColumn GoodEncodingColumn
¦ a,b,c

T-SQL Temp Tables and Stored Procedures from R using RevoScaleR

In the example below, I was able to get the query to work with one exception. When I use q in place of source.query during the RxSqlServerData step, I get the error rxCompleteClusterJob Execution halted.
The first goal is to use a stored procedure in place of a longer query. Is this possible?
The second goal would be to create and call upon a #TEMPORARY table within the stored procedure. I'm wondering if that is possible, as well?
library (RODBC)
library (RevoScaleR)
sqlConnString <- "Driver=SQL Server;Server=SAMPLE_SERVER; Database=SAMPLE_DATABASE;Trusted_Connection=True"
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
sql_share_directory <- paste("D:\\RWork\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlCompute <- RxInSqlServer(connectionString = sqlConnString, wait = sqlWait, consoleOutput = sqlConsoleOutput)
rxSetComputeContext(sqlCompute)
#This Sample Query Works
source.query <- paste("SELECT CASE WHEN [Order Date Key] = [Picked Date Key]",
"THEN 1 ELSE 0 END AS SameDayFulfillment,",
"[City Key] AS city, [STOCK ITEM KEY] AS item,",
"[PICKER KEY] AS picker, [QUANTITY] AS quantity",
"FROM [WideWorldImportersDW].[FACT].[ORDER]",
"WHERE [WWI ORDER ID] >= 63968")
#This Query Does Not
q <- paste("EXEC [dbo].[SAMPLE_STORED_PROCEDURE]")
inDataSource <- RxSqlServerData(sqlQuery=q, connectionString=sqlConnString, rowsPerRead=500)
order.logit.rx <- rxLogit(SameDayFulfillment ~ city + item + picker + quantity, data = inDataSource)
order.logit.rx
Currently, only T-SQL SELECT statements are allowed as input data-set, not stored procedures.

Simple Firebird Query

I am trying to do a while loop in Firebird to execute all the values from an SP using FlameRobin tool. However this is not working. Any suggestion?
declare i int = 0;
while ( i <= 2 ) do BEGIN
SELECT p.SOD_AUTO_KEY, p.CURRENCY_CODE, p.SO_CATEGORY_CODE, p.SO_NUMBER, p.INVC_NUMBER, p.ENTRY_DATE, p.SHIP_DATE, p.NEXT_SHIP_DATE, p.CONDITION_CODE, p.QTY_ORDERED, p.QTY_PENDING_INVOICE, p.QTY_INVOICED, p.UNIT_PRICE, p.EXCHANGE_RATE, p.UNIT_COST, p.ITEM_NUMBER, p.CONSIGNMENT_CODE, p.NOTES, p.STOCK_LINE, p.STM_AUTO_KEY, p.SERIAL_NUMBER, p.REMARKS, p.PN, p.PNM_AUTO_KEY, p.GR_CODE, p.CUSTOMER_PRICE, p.OPEN_FLAG, p.ROUTE_CODE, p.ROUTE_DESC, p.COMPANY_CODE, p.SITE_CODE, p.COMPANY_NAME, p.COMPANY_REF_NUMBER, p.CUST_REF, p.HOT_PART
FROM SPB_SALESHISTORY(i) p
i = i + 1;
end
Error Message I get:
Preparing query: declare i int = 0
Error: *** IBPP::SQLException ***
Context: Statement::Prepare( declare i int = 0 )
Message: isc_dsql_prepare failed
SQL Message : -104
can't format message 13:896 -- message file C:\Windows\firebird.msg not found
Engine Code : 335544569
Engine Message :
Dynamic SQL Error
SQL error code = -104
Token unknown - line 1, column 9
i
Total execution time: 0.004s
This is what I tried but it only says "Script Execution Finished" and does not return any results:
set term !!
EXECUTE BLOCK returns(p) AS
declare i integer = 0
BEGIN
while ( i <= 1000 ) do BEGIN
SELECT p.SOD_AUTO_KEY, p.CURRENCY_CODE, p.SO_CATEGORY_CODE, p.SO_NUMBER, p.INVC_NUMBER, p.ENTRY_DATE, p.SHIP_DATE, p.NEXT_SHIP_DATE, p.CONDITION_CODE, p.QTY_ORDERED,p.QTY_PENDING_INVOICE, p.QTY_INVOICED, p.UNIT_PRICE, p.EXCHANGE_RATE, p.UNIT_COST, p.ITEM_NUMBER, p.CONSIGNMENT_CODE, p.NOTES, p.STOCK_LINE, p.STM_AUTO_KEY, p.SERIAL_NUMBER, p.REMARKS, p.PN, p.PNM_AUTO_KEY, p.GR_CODE, p.CUSTOMER_PRICE, p.OPEN_FLAG, p.ROUTE_CODE, p.ROUTE_DESC, p.COMPANY_CODE, p.SITE_CODE, p.COMPANY_NAME, p.COMPANY_REF_NUMBER, p.CUST_REF, p.HOT_PART
FROM SPB_SALESHISTORY(i) p
i = i + 1
end
END !!
Mark,
I tried your suggestion however I got the following error:
set term!!;
EXECUTE BLOCK RETURNS (
SOD_AUTO_KEY Integer,
CURRENCY_CODE Char(3),
SO_CATEGORY_CODE Char(10),
SO_NUMBER Char(12),
INVC_NUMBER Char(12),
ENTRY_DATE Timestamp,
SHIP_DATE Timestamp,
NEXT_SHIP_DATE Timestamp,
CONDITION_CODE Varchar(10),
QTY_ORDERED Double precision,
QTY_PENDING_INVOICE Double precision,
QTY_INVOICED Double precision,
UNIT_PRICE Double precision,
EXCHANGE_RATE Double precision,
UNIT_COST Double precision,
ITEM_NUMBER Integer,
CONSIGNMENT_CODE Char(10),
NOTES Blob sub_type 1,
STOCK_LINE Integer,
STM_AUTO_KEY Integer,
SERIAL_NUMBER Varchar(40),
REMARKS Varchar(50),
PN Varchar(40),
PNM_AUTO_KEY Integer,
GR_CODE Varchar(10),
CUSTOMER_PRICE Double precision,
OPEN_FLAG Char(1),
ROUTE_CODE Char(1),
ROUTE_DESC Varchar(20),
COMPANY_CODE Varchar(10),
SITE_CODE Varchar(10),
COMPANY_NAME Varchar(50),
COMPANY_REF_NUMBER Varchar(30),
CUST_REF Varchar(15),
HOT_PART Char(1)
)
AS
declare i integer;
BEGIN
i=0;
while ( i <= 2 ) do
BEGIN
for SELECT SOD_AUTO_KEY,CURRENCY_CODE,SO_CATEGORY_CODE, SO_NUMBER,INVC_NUMBER,ENTRY_DATE, SHIP_DATE, NEXT_SHIP_DATE, CONDITION_CODE, QTY_ORDERED,QTY_PENDING_INVOICE, QTY_INVOICED, UNIT_PRICE, EXCHANGE_RATE, UNIT_COST,ITEM_NUMBER, CONSIGNMENT_CODE, NOTES, STOCK_LINE, STM_AUTO_KEY, SERIAL_NUMBER,REMARKS, PN, PNM_AUTO_KEY, GR_CODE, CUSTOMER_PRICE, OPEN_FLAG, ROUTE_CODE,ROUTE_DESC, COMPANY_CODE, SITE_CODE, COMPANY_NAME, COMPANY_REF_NUMBER, CUST_REF, HOT_PART
FROM SPB_SALESHISTORY (i)
into :SOD_AUTO_KEY, :CURRENCY_CODE, :SO_CATEGORY_CODE, :SO_NUMBER, :INVC_NUMBER,
:ENTRY_DATE, :SHIP_DATE, :NEXT_SHIP_DATE, :CONDITION_CODE, :QTY_ORDERED,:QTY_PENDING_INVOICE,
:QTY_INVOICED, :UNIT_PRICE, :EXCHANGE_RATE, :UNIT_COST, :ITEM_NUMBER, :CONSIGNMENT_CODE, :NOTES, :STOCK_LINE,
:STM_AUTO_KEY, :SERIAL_NUMBER, :REMARKS, :PN, :PNM_AUTO_KEY, :GR_CODE, :CUSTOMER_PRICE, :OPEN_FLAG, :ROUTE_CODE,:ROUTE_DESC,
:COMPANY_CODE, :SITE_CODE, :COMPANY_NAME, :COMPANY_REF_NUMBER, :CUST_REF,:HOT_PART
DO
suspend;
i = i + 1;
end
END!!
SET TERM;!!
Error:
Message: isc_dsql_prepare failed
SQL Message : -206
can't format message 13:794 -- message file C:\Windows\firebird.msg not found
Engine Code : 335544569
Engine Message :
Dynamic SQL Error
SQL error code = -206
Column unknown
I
At line 46, column 27
Total execution time: 0.005s
Based on your comments on the answer of Ain, it looks like you also want to return the selected values from the EXECUTE BLOCK. Your RETURNS (p) is invalid and will not work. You need to explicitly declare all columns you want to return, and you need to SUSPEND each row.
In addition you are also forgetting several statement terminators (;), and you can't declare the variable and its value together. The resulting execute block would be something like:
set term !!;
EXECUTE BLOCK returns (
SOD_AUTO_KEY INTEGER,
/* ... */
HOT_PART VARCHAR(255)
) AS
declare i integer;
BEGIN
i = 0;
while ( i <= 1000 ) do
BEGIN
FOR SELECT SOD_AUTO_KEY, /* ... */ HOT_PART
FROM SPB_SALESHISTORY(i)
INTO :SOD_AUTO, /* ... */ :HOT_PART
DO
SUSPEND;
i = i + 1;
end
END!!
SET TERM ;!!
I have left out some of the columns for brevity and guessed at their types.
No, you can't execute such scripts directly in Flamerobin's query window. I think simplest way would be to wrap your script into an stored procedure which you would drop after you're done with the result. To create the temp SP right-click on the Procedures node in the Flamerobin's database tree and select Create new - this creates the SP sceleton for you where you can insert your code.
You will need to wrap your stored procedure like code in a EXECUTE BLOCK statement.
Your sql script may be corrupted

Resources