R scripts in SQL Server 2016 corrupted with  character - sql-server

I have found a strange behaviour of SQL Server 2016 when handling broken pipes inside R scripts. See the T-SQL code below:
DECLARE
#r nvarchar(100)
/* Create a data frame with a broken pipe as one of its fields and a simple ASCII encoded string in another. */
SET #r = N'
df <- data.frame(
a = "¦",
b = "a,b,c"
)';
/* Print #r to detect the inclusion of any unwanted characters. */
PRINT #r;
/* Execute and retrieve the output. */
EXECUTE sp_execute_external_script
#language = N'R',
#script = #r,
#output_data_1_name = N'df'
WITH RESULT SETS ((
BadEncodingColumn varchar(2),
GoodEncodingColumn varchar(5)
));
The PRINT command returns this in the Messages tab:
df <- data.frame(
a = "¦",
b = "a,b,c"
)
However, the final Results tab looks like this:
BadEncodingColumn GoodEncodingColumn
¦ a,b,c
This behaviour seems to emerge at the EXECUTE sp_execute_external_script phase of the script, and I have seen this character (Â) when dealing with other encoding issues with Excel, R and other versions of SQL Server.
Any solutions to this behaviour? And bonus points, what is 'special' about the  character?
Edit: I have tried tinkering with data types inside SQL Server and R to no avail.

The issue appears to be with encoding of non-ASCII characters in the R script (broken pipe is outside the 128 ASCII characters). You can override the encoding using the ‘Encoding’ function explicitly to Unicode(UTF-8) to workaround the issue. For instance your script can be updated as follows
DECLARE
#r nvarchar(100)
/* Create a data frame with a broken pipe as one of its fields and a simple ASCII encoded string in another. */
SET #r = N'
df <- data.frame(
a = "¦",
b = "a,b,c"
)
Encoding(levels(df$a)) <- "UTF-8" ###### Encoding override'
/* Print #r to detect the inclusion of any unwanted characters. */
PRINT #r;
/* Execute and retrieve the output. */
EXECUTE sp_execute_external_script
#language = N'R',
#script = #r,
#output_data_1_name = N'df'
WITH RESULT SETS ((
BadEncodingColumn varchar(2),
GoodEncodingColumn varchar(5)
));
Produces the following results
BadEncodingColumn GoodEncodingColumn
¦ a,b,c

Related

Replace Unicode apostrophe in SQL Server

I am trying to replace Modified Letter Apostrophes - nchar(700) with an empty character for a legacy system that cannot handle uniocde, but my replace is not working.
I am doing this in SQL Server Express 2019.
I have been using a function to do the replaces, but to make sure it wasn't just the function (which is working for other replaces of unicode characters), I used the following code fragment:
print replace('canʼt',nchar(700),'X')
and the result I am getting is: can't (which interestingly enough is the ASCII ' not the Unicode Modified Letter Apostrophe).
I have confirmed that the string has the modified apostrophe looking at the string in a hex editor feature of Notepad++ (the value is CA BC).
Any ideas why this isn't working in SQL?
Thanks,
David
WORKAROUND:
I got the replace to work with the following code, but this is not very "pretty":
DECLARE #ReplaceString nvarchar(100)
DECLARE #position INT, #nstring NCHAR(12);
SET #ReplaceString = N'canʼt'
SET #position = 1;
WHILE #position <= LEN(#Replacestring)
BEGIN
IF UNICODE(SUBSTRING(#ReplaceString,#position,1)) = 700
IF #position < len(#ReplaceString)
SET #ReplaceString = SUBSTRING(#ReplaceString,1,#position - 1) +
SUBSTRING(#ReplaceString,#position + 1,len(#ReplaceString))
ELSE
SET #ReplaceString = SUBSTRING(#ReplaceString,1,#position - 1)
SET #position = #position + 1;
END

SQL Server float comparison in stored procedure

Unfortunately, I have two tables to compare float datatypes between. I've read up on trying casts, converts, using a small difference and tried them all.
The strange part is, this only fails when I'm executing a stored procedure. If I cut-and-paste the body of the stored procedure into a SSMS window, it works just great.
Sample SQL:
set #newEnvRiskLevel = -1
select
#newEnvRiskLevel = rl.RiskLevelId
from
LookupTypes lt
inner join
RiskLevels rl on lt.LookupTypeId = rl.RiskLevelTypeFk
where
lt.Code = 'RISK_LEVEL_ENVIRONMENTAL'
and convert(numeric(1, 0), rl.RiskFactor) = #newEnvScore
set #errorCode = ##ERROR
if (#newEnvRiskLevel = -1 or #errorCode != 0)
begin
print 'newEnvScore = ' + cast(#newEnvScore as varchar) + ' and risk level = ' + cast(isnull(#newEnvRiskLevel, -1) as varchar)
print 'ERROR finding environmental risk level for code ' + #itemCode + ', skipping record'
set #recordsErrored = #recordsErrored + 1
goto NEXTREC
end
My #newEnvScore variable is also a float converted to numeric(1, 0). I've verified that there are only 0, 1, 2, and 3 for values in the RiskFactor column, and (via debug) that #newEnvScore has a value of 2. I've also verified that my query has a row with code = 'RISK_LEVEL_ENVIRONMENTAL' and RiskFactor = 2.
I've verified via debug that failure is due to #newEnvRiskLevel staying at -1 and that #errorCode is 0.
I've also tried cast to both decimal and int, convert to int, and "rl.RiskFactor - #newEnvScore < 1" in my where clause, none of which set newEnvRiskLevel.
As I say, it's only when running this as a stored procedure that failure happens, which is the part I really don't understand. I'd expect SQL Server to be deterministic, whether the SQL is running the body of a stored procedure, or running the exact same SQL in a SSMS tab.
It is unfortunate that you do post neither your stored procedure nor a complete script. It is difficult to diagnose a problem without a useful demonstration. But I see the use of "goto" which is concerning in many ways. I also see the use of a select statement to assign a local variable - which is often a problem because the developer might be assuming an assignment always occurs. To demonstrate - with a bonus at the end
set nocount on;
declare #risk smallint;
declare #risklevels table (risklevel float primary key, code varchar(10));
insert #risklevels(risklevel, code) values (1, 'test'), (2, 'test'), (-5, 'test');
-- here is your assignment logic. Notice that #risk is
-- never changed because there are no matching rows.
set #risk = 0;
select #risk = risklevel from #risklevels where code = 'zork';
select #risk;
-- here is a better IMO way to make the assignment. Note that
-- #risk is set to NULL when there are no matching rows.
set #risk = -1;
set #risk = (select risklevel from #risklevels where code = 'zork');
select #risk;
-- and a last misconception. What value is #risk set to? and why?
set #risk = -1;
select #risk = risklevel from #risklevels where code = 'test';
select #risk;
Whether this is the source of your problem (or contributes to it) I can't say. But it is a possibility. And storing integers in a floating point datatype is just a problem generally. Even if you cannot change your table, you can change your local variables and force the use of a more appropriate datatype. So perhaps that is another change you should consider.

SQL Server 2014 takes off leading zeroes when making Excel file. . . but.

This sp_send_dbmail script works in one of our processes. It attaches an Excel file filled with whatever the query is. It knows to do this because of the extension on the file's name (.xls).
However, it changes a varchar(50) field into a number field, and removes the leading zeroes. This is a known annoyance dealt with in a million ways that won't work for my process.
EXEC msdb.dbo.sp_send_dbmail
#profile_name = #profileName
,#recipients = #emailRecipientList
,#subject = #subject
,#importance = #importance
,#body = #emailMsg
,#body_format = 'html'
,#query = #QuerySQL
,#execute_query_database = #QueryDB
,#attach_query_result_as_file = 1
,#query_attachment_filename = #QueryExcelFileName
,#query_result_header = 1
,#query_result_width = #QueryWidth
,#query_result_separator = #QuerySep
,#query_result_no_padding = 1
Examples of problem below: this simple query changes the StringNumber column from varchar to number in Excel, and removes the zeroes.
SELECT [RowID],[Verbage], StringNumber FROM [dbo].[tblTestStringNumber]
In SQL Server (desired format):
After in Excel (leading zeroes missing):
Now, there might be a way. I only say this because in SQL Server 2016 results pane, if you right click in upper left hand corner, it gives the option of "Open in Excel"
And. . . . drum roll . . . the dataset opens in Excel and the leading zeroes are still there!
If you start a number with a single quote (') in Excel, it will interpret it as a string, so a common solution is to change the query to add one in:
SELECT [RowID]
,[Verbage]
, StringNumber = '''' + [StringNumber]
FROM [dbo].[tblTestStringNumber]
And Excel will usually not display the single quote because it knows that it's a way to cast to type string.
#JustJohn I think it will work fine:
SELECT [RowID]
,[Verbage]
, '="' + [StringNumber]+ '"' StringNumber
FROM [dbo].[tblTestStringNumber]

Sql server and R, data mining

I'm working on Microsoft SQL Management Studio 2016, using the feature that make me to add an R script into the SQL code.
My goals is to achieve an aPriori algorithm procedure, that puts the data in a manner that I like, i.e. a table with x, first object, y, second object.
I am stuck here, because in my opinion I have some problem in data. The error is this.
A 'R' script error occurred during execution of
'sp_execute_external_script' with HRESULT 0x80004004.
An external script error occurred: Error in eval(expr, envir, enclos)
: bad allocation Calls: source -> withVisible -> eval -> eval -> .Call
Here my code.
The source data are a table of two column like this:
A B
a f
f a
b c
...
y z
And here the code:
GO
create procedure dbo.apriorialgorithm as
-- delete old table
IF OBJECT_ID('Data') IS NOT NULL
DROP TABLE Data
-- create a table that store the query result.
CREATE TABLE Data ( art1 nvarchar(100), art2 nvarchar(100));
-- store the query
INSERT INTO Data ( art1, art2)
select
firstfield as art1,
secondfield as art2
from allthefields
;
IF OBJECT_ID('output') IS NOT NULL
DROP TABLE output
-- create table of the results of the analysis.
CREATE TABLE output (x nvarchar(100), y nvarchar(100));
INSERT INTO output (x, y)
-- R script.
EXECUTE sp_execute_external_script
#language = N'R'
, #script = N'
Now the R script. The data that I get from the query are numeric, but for the apriori, I need factors, so first I bend the data to factor;
df<-data.frame(x=as.factor("art1"),y=as.factor("art2"))
Then, I can apply the apriori:
library("arules");
library("arulesViz");
rules = apriori(df,parameter=list(minlen=2,support=0.05, confidence=0.05));
I need the data without the format of the rules, but simply the objects:
ruledf <- data.frame(
lhs <- labels(lhs(rules)),
rhs <- labels(rhs(rules)),
rules#quality)
a<-substr(ruledf$lhs,7,nchar(as.character( ruledf$lhs))-1)
b<-substr(ruledf$rhs,7,nchar(as.character( ruledf$rhs))-1)
ruledf2<-data.frame(a,b)
'
And the last part:
, #input_data_1 = N'SELECT * from Data'
, #output_data_1_name = N'ruledf2'
, #input_data_1_name = N'ruledf2';
GO
I do not know where I am failing, because doing the same things in R using RODBC to catch the db data, everything is ok.
Could you help me? Thanks in advance!
The problem was here, the R script is better this way:
EXECUTE sp_execute_external_script
#language = N'R'
, #script = N'
library("arules");
rules = apriori(df[, c("art1", "art2")], parameter=list(minlen=2,support=0.0005, confidence=0.0005));
ruledf <- data.frame(
lhs <- labels(lhs(rules)),
rhs <- labels(rhs(rules)),
rules#quality)
ruledf2<-data.frame(
lhs2<-substr(ruledf$lhs,7,nchar(as.character( ruledf$lhs))-1),
rhs2<-substr(ruledf$rhs,7,nchar(as.character( ruledf$rhs))-1)
)
colnames(ruledf2)<-c("a","b") '
Then it needs to have the right input and output:
, #input_data_1 = N'SELECT * from Data'
, #input_data_1_name = N'df'
, #output_data_1_name = N'ruledf2'
So the result is going to be a table named output like this
x y
artA artB
artB artA
...
artY artZ
Very helpful this.

How do i format a sql numeric type with commas on Sybase SQLAnywhere?

I came across the following solution but it does not work on Sybase
SELECT CONVERT(varchar, CAST(987654321 AS money), 1)
I have read the Convert Sybase information but still i receive the same number without the commas.
Have you tried giving a varchar (20) for example instead ? something like :
SELECT CONVERT(varchar(20), CAST(987654321 AS money), 1)
In SqlAnywhere money datatype is a domain, implemented as NUMERIC(19,4).
in CAST function , If you do not indicate a length for character string types, the database server chooses an appropriate length. If neither precision nor scale is specified for a DECIMAL conversion, the database server selects appropriate values.
So maybe this is what's causing the issue, what do you get as output ? do you get 987654321.00 , or just 987654321 ?
Update:
My last suggestion would be using insertstr() function and loop through the char value of your number to insert comma every 3 digits .. this is not the cleanest/easiest way but apparently SQLAnywhere deal with money datatype as normal NUMERIC datatype ...
insertstr() documentation is here.
I would give you a code sample but I don't have SQLAnywhere installed to test it ...
Here is the SP i created based on F Karam suggestion.
CREATE FUNCTION "DBA"."formattednumber"( in #number numeric)
returns char(60)
begin
declare #returnnumber char(60);
declare #workingnumber char(60);
declare #n_ind char(1);
declare #decimalnumber char(10);
declare #tempnumber char(60);
declare #decimalpos integer;
if isnull(#number,'') = '' then
return null
end if;
if #number < 0 then set #n_ind = 'Y'
else set #n_ind = 'N'
end if;
set #workingnumber = convert(char(60),ABS(#number));
set #decimalpos = patindex('%.%',#workingnumber);
if #decimalpos > 0 then
set #decimalnumber = substr(#workingnumber,#decimalpos);
set #decimalnumber = "left"(#decimalnumber,3+1);
set #workingnumber = "left"(#workingnumber,#decimalpos-1)
end if end if;
set #returnnumber = '';
while length(#workingnumber) > 3 loop
set #tempnumber = "right"(#workingnumber,3);
set #returnnumber = insertstr(0,#returnnumber,#tempnumber);
set #workingnumber = "left"(#workingnumber,length(#workingnumber)-3);
if length(#workingnumber) > 0 then
set #returnnumber = insertstr(0,#returnnumber,',')
end if
end loop;
if length(#workingnumber) > 0 then
set #returnnumber = insertstr(0,#returnnumber,#workingnumber)
end if;
if length(#decimalnumber) > 0 then
set #returnnumber = #returnnumber+#decimalnumber
end if;
if #n_ind = 'Y' then set #returnnumber = '-' || #returnnumber
end if;
return(#returnnumber)
end;
You need to distinguish between server-side and client-side formatting. When you use the 'isql' client for example (the TDS client), then the result will be this:
1> select convert(money, 9876543210)
2> go
9876543210
------------------------
9,876,543,210.00
(1 row affected)
But this is purely because the client application happens to format 'money' values this way. Also, this is actually not specific for SQLA, since isql is originally the client tool for ASE (a different Sybase database).
When you run the same conversion at the SQLA server (i.e. as part of an expression in a SQL statement), those commas will not be there since SQLA doesn't have such a built-in formatting style.
If you want this, you should write a SQL function that formats the number as you desire.

Resources