Referencing SQL temporary table in R Studio - sql-server

I have run a query creating a temporary table in SQL that I am now trying to analyze in R Studio. I don't think I am properly pulling the data into R Studio, and as a result, I cannot perform calculations on it. I am first running:
cn <- odbcDriverConnect(connection="Driver={SQLServer};
server=servername;database=databasename;trusted_connection=yes;")
new_data<-sqlQuery(cn,"SELECT TOP 1000 * FROM #TempDatabaseName")
After I run this code, new_data appears under Values in the R Environment, not under Data. Then, when I run:
new_reg<-lm(yvar~xvar,data=new_data)
I get the following error:
Error in eval(predvars, data, env) : invalid 'envir' argument of type 'character'
Can anyone help me out?

Your lm function is looking for a data.frame. Try running your query with as.data.frame().
cn <- odbcDriverConnect(connection="Driver={SQLServer};
server=servername;database=databasename;trusted_connection=yes;")
new_data<-as.data.frame(sqlQuery(cn,"SELECT TOP 1000 * FROM
#TempDatabaseName"))
new_reg<-lm(yvar~xvar,data=new_data)

Related

R : problem with the dplyr::tbl() function due to restricted permission

I work with large databases that needs to be stored into a server.
So, to work with them on Rstudio I have to open a connection to my Microsoft SQL Server with the dbConnect function :
conn <- dbConnect(odbc(),"myconnection",uid="***",pwd="***",schema="dbo",access="readonly")
and in order to use dplyr, I have to create data references with the tbl function :
data <- tbl(conn, "data")
But one of the online dataframe contains a columns that I can't read because I dont have the access, but I can read everything else.
The SQL query behind the tbl() function is :
SELECT * FROM data
and this is my problem.
Even when I try to select a specific column it doesn't work (see below), so I can't create my references and I can't work.
select(tbl(conn, "data"), "columnX")
=
SELECT columnX FROM data
I think this is the tbl() function and the call of "SELECT *" that blocks me.
Do you know what can I do ? Is there smilar functions that could resolve my problem ?
If you know the columns that you have access to, then one option is to bypass the default access SELECT * FROM ... with your own SQL query.
A remote table is defined by two components:
The database conneciton
The query to the database
When you connect with the default approach tbl(conn, 'data') then it defaults to a query SELECT * FROM data.
But here is another approach:
custom_query = 'SELECT columnX FROM data'
remote_table = tbl(conn, dbplyr::sql(customer_query))

How to output an R lm() object into a SQL database?

I've been tinkering with running R commands on a SQL server by calling a procedure which runs an OLS regression using the R lm() function on a few made-up data pts in the SQL database "my_schema.data", and then outputs the object as a SQL database.
My strategy is to first create an empty SQL database named "my_schema.ols_model_db" which will then be populated with the values in the ols_model object which has been transformed into a data.frame class.
I'm almost there, but can't quite figure out how to convert the ols_model object into an R data.frame, nor do I know what the column headers will be (which we need to know in advance in order to populate the empty SQL database my_schema.ols_model_db).
Which code should be inserted into "???" in the program below?
my_schema.data
y x
1 5
2 9
3 17
4 26
CREATE COLUMN TABLE "my_schema"."my_schema.ols_model_db"(???);
CREATE PROCEDURE my_schema.proc_Rcode( IN train my_schema.data, OUT ols_model_db my_schema.ols_model_db )
LANGUAGE RLANG AS BEGIN
ols_model <- lm(y ~ x, data=train)
ols_model_db <- data.frame(g=I(lapply(ols_model, function(x) {serialize(x, NULL)})))
???
END
CALL my_schema.proc_Rcode( my_schema.data my_schema.ols_model_db )

SSIS Foreach Loop failure

I have created a lookup for a list of IDs and a subsequent Foreach loop to run an sql stmt for each ID.
My variable for catching the list of IDs is called MissingRecordIDs and is of type Object. In the Foreach container I map each value to a variable called RecordID of type Int32. No fancy scripts - I followed these instructions: https://www.simple-talk.com/sql/ssis/implementing-foreach-looping-logic-in-ssis-/ (without the file loading part - I am just running an SQL stmt).
It runs fine from within SSIS, but when I deploy it my Integration Services Catalogue in MSSQL it fails.
This is the error I get when running from SQL Mgt Studio:
I thought I could just put a Precendence Constraint after MissingRecordsIDs get filled to check for NULL and skip the Foreach loop if necessary - but I can't figure out how to check for NULL in an Object variable?
Here is the Variable declaration and Object getting enumerated:
And here is the Variable mapping:
The SQL stmt that is in 'Lookup missing Orders':
select distinct cast(od.order_id as int) as order_id
from invman_staging.staging.invman_OrderDetails_cdc od
LEFT OUTER JOIN invman_staging.staging.invman_Orders_cdc o
on o.order_id = od.order_id and o.BatchID = ?
where od.BatchID = ?
and o.order_id is null
and od.order_id is not null
In the current environment this query returns nothing - there are no missing Orders, so I don't want to go into the 'Foreach Order Loop' at all.
This is a known issue Microsoft is aware of: https://connect.microsoft.com/SQLServer/feedback/details/742282/ssis-2012-wont-assign-null-values-from-sql-queries-to-variables-of-type-string-inside-a-foreach-loop
I would suggest to add an ISNULL(RecordID, 0) to the query as well as set an expression to the component "Load missing Orders" in order to enable it only when RecordID != 0.
In my case it wasn't NULL causing the problem, the ID value which I loaded from database was stored as nvarchar(50), even if it was a integer, I attempted to use it as integer in SSIS and it kept giving me the same error message, this worked for me:
SELECT CAST(id as INT) FROM dbo.Table

FreeTDS / SQL Server UPDATE Query Hangs Indefinitely

I'm trying to run the following UPDATE query from a python script (note I've removed the database info):
print 'Connecting to db for update query...'
db = pyodbc.connect('DRIVER={FreeTDS};SERVER=<removed>;DATABASE=<removed>;UID=<removed>;PWD=<removed>')
cursor = db.cursor()
print ' Executing SQL queries...'
for i in range(len(data)):
sql = '''
UPDATE product.sanction
SET action_summary = '{action_summary}'
WHERE sanction_id = {sanction_id};
'''.format(sanction_id=data[i][0], action_summary=data[i][1])
cursor.execute(sql)
cursor.close()
db.commit()
db.close()
However, it hangs indefinitely, no error.
I'm new to pyodbc, but it should be setup correctly considering I'm having no problems performing SELECT queries. I did have to call CAST for SELECT queries (I've cast sanction_id AS INT [int identity on the database] and action_summary AS TEXT [nvarchar on the database]) to properly populate data, so perhaps the problem lies somewhere there, but I don't know where to start debugging. Converting the text to NVARCHAR didn't do anything either.
Here's an example of one of the rows in data:
(2861357, 'Exclusion Program: NonProcurement; Excluding Agency: HHS; CT Code: Z; Exclusion Type: Prohibition/Restriction; SAM Number: S4MR3Q9FL;')
I was unable to find my issue, but I ended up using QuerySets rather than running an UPDATE query.

Accessing SQLite Database in R

I want to access and manipulate a large data set in R. Since it's a large CSV file (~ 0.5 GB), I plan to import it
to SQLite and then access it from R. I know the sqldf and RSQLite packages can do this but I went
over their manuals and they are not helpful. Being a newbie to SQL doesn't help either.
I want to know do I have to set the R directory to SQLite's and then go from there? How do I read in the database in R then?
Heck, if you know how to access the DB from R without using SQL, please tell me.
Thanks!
It really is rather easy -- the path and filename to the sqlite db file is passed as the 'database' parameter. Here is CRANberries does:
databasefile <- "/home/edd/cranberries/cranberries.sqlite"
## ...
## main worker function
dailyUpdate <- function() {
stopifnot(all.equal(system("fping cran.r-project.org", intern=TRUE),
"cran.r-project.org is alive"))
setwd("/home/edd/cranberries")
dbcon <- dbConnect(dbDriver("SQLite"), dbname = databasefile)
repos <- dbGetQuery(dbcon,
paste("select max(id) as id, desc, url ",
"from repos where desc!='omegahat' group by desc")
# ...
That's really all there is. Of course, there are other queries later on...
You easily test all SQL queries in the sqlite client before trying from R, or trying directly from R.
Edit: As the above was apparently too terse, here is an example straight from the documentation:
con <- dbConnect(SQLite(), ":memory:") ## in-memory, replace with file
data(USArrests)
dbWriteTable(con, "arrests", USArrests)
res <- dbSendQuery(con, "SELECT * from arrests")
data <- fetch(res, n = 2)
data
dbClearResult(res)
dbGetQuery(con, "SELECT * from arrests limit 3")

Resources