How to get started using the RevoScaleR library in R? - sql-server

I've installed Microsoft R Client, however when I write
require("RevoScaleR")
and something like
conn <- "Driver=SQL Server; Server=CZPHADDWH01/DEV; Database=DWH_Staging; trusted_connection=true"
sqlWait <- TRUE;
sqlConsoleOutput <- FALSE;
cc <- RxInSqlServer(connectionString = conn, wait = sqlWait)
rxSetComputeContext(cc)
it returns
Loading required package: RevoScaleR Error in
RxInSqlServer(connectionString = conn, wait = sqlWait) : could not
find function "RxInSqlServer" In addition: Warning message: In
library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, :
there is no package called 'RevoScaleR'
Any help would be appreciated.

Related

postgresSQL terminating abnormally

I am trying to update postgresSQL table with psycopg2 (python package) sometimes it is failing with below error.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Here is the code
from psycopg2 import pool
now = datetime.now()
logoff_time = datetime(now.year, now.month, now.day, 15, 0, 0)
while True:
time.sleep(1)
try:
status = 'EXECUTED'
exec_type1 = 'CANCELLED'
exec_type2 = 'COMPLETED'
try:
postgreSQL_pool = pool.SimpleConnectionPool(1, 20, host = db_host,
database = db_name,
port = db_port,
user = db_user,
password = db_pwd)
if postgreSQL_pool:
print("Connection pool created successfully")
conn = postgreSQL_pool.getconn()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sql = """ UPDATE orders SET status = %s, executed_type = %s WHERE order_id = %s"""
updated_rows = 0
try:
cur = conn.cursor()
cur.execute(sql, (status, exec_type1, order_id,))
conn.commit()
updated_rows = cur.rowcount
cur.close()
break
except (Exception, psycopg2.DatabaseError) as error:
print(error)
print(updated_rows)
except Exception as e:
print(e)
psycopg2 version: '2.8.6 (dt dec pq3 ext lo64)'
Postgres: PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
it is pretty much simple task but facing challenges. suggestions please
The server is crashing for some reason that you might be able to read in the server's logs.

Cannot connect to Amazon Keyspaces with cqlsh

I am having trouble connecting to Amazon Keyspaces, both with my application code and cqlsh:
cqlsh cassandra.eu-west-2.amazonaws.com 9142 -u "xxxxxxxxxxxxxxx" -p "xxxxxxxxxxxxxxxxxxxxxx" --ssl
Connection error: ('Unable to connect to any servers', {'3.10.201.209': error(1, u"Tried connecting to [('3.10.201.209', 9142)]. Last error: [SSL] internal error (_ssl.c:727)")})
What is particularly confusing is that my setup worked in the past.
My cqlshrc:
[connection]
port = 9142
factory = cqlshlib.ssl.ssl_transport_factory
[ssl]
validate = true
certfile = /home/abc/.cassandra/AmazonRootCA1.pem
I fetched the certificate like this:
wget -c https://www.amazontrust.com/repository/AmazonRootCA1.pem
DNS seems fine:
nslookup cassandra.eu-west-2.amazonaws.com
Server: 8.8.8.8
Address: 8.8.8.8#53
Non-authoritative answer:
Name: cassandra.eu-west-2.amazonaws.com
Address: 3.10.201.209
I recently upgraded to Ubuntu 20.04 from 18.04, which may be causing issues.
Update: Yes, it probably changed the default SSL protocol
I figured it out for cqlsh; you need to set the SSL version:
[connection]
port = 9142
factory = cqlshlib.ssl.ssl_transport_factory
[cql]
version = 3.4.4
[ssl]
validate = true
certfile = /home/abc/.cassandra/AmazonRootCA1.pem
version = TLSv1_2
The fix for .NET solution is similar; you must set the SslProtocols correctly.
Here is an F# script that works:
#load "../.paket/load/netcoreapp3.1/CassandraCSharpDriver.fsx"
open System
open System.Net.Security
open System.Security
open System.Security.Authentication
open System.Security.Cryptography
open System.Security.Cryptography.X509Certificates
open Cassandra
let private getEnvVar (name : string) =
let x = Environment.GetEnvironmentVariable name
if String.IsNullOrWhiteSpace x
then
failwithf "The environment variable %s must be set" name
else
x
let region = getEnvVar "AWS_REGION"
let keyspace = getEnvVar "AWS_KEYSPACES_KEYSPACE"
let keyspacesUsername = getEnvVar "AWS_KEYSPACES_USERNAME"
let keyspacesPassword = getEnvVar "AWS_KEYSPACES_PASSWORD"
async {
let certCollection = X509Certificate2Collection ()
use cert = new X509Certificate2 (#"./AmazonRootCA1.pem", "amazon")
certCollection.Add (cert) |> ignore
let sslOptions =
SSLOptions
(
SslProtocols.Tls12,
true,
(fun sender certificate chain sslPolicyErrors ->
if sslPolicyErrors = SslPolicyErrors.None
then
true
else
printfn "Cassandra node SSL certificate validation error(s): {%A}" sslPolicyErrors
false)
)
|> (fun x -> x.SetCertificateCollection(certCollection))
let contactPoints = [| sprintf "cassandra.%s.amazonaws.com" region |]
let cluster =
Cluster.Builder()
.AddContactPoints(contactPoints)
.WithPort(9142)
.WithAuthProvider(PlainTextAuthProvider (keyspacesUsername, keyspacesPassword))
.WithSSL(sslOptions)
.Build()
use! cassandra =
cluster.ConnectAsync keyspace
|> Async.AwaitTask
printfn "Connected. "
}
|> Async.RunSynchronously
It should be easy to translate to C# :)

Asyncio Run Loop Errors in Python 3.7

I am trying to use the asyncio packages to execute concurrent calls from one SQL Server to another in order to extract data. I'm hitting an issue of at the portion of myLoop.run_until_complete(cors) where it is telling me that the event loop is already running. I will admit that I am new to this package and may be overlooking something simple.
import pyodbc
import sqlalchemy
import pandas
import asyncio
import time
async def getEngine(startString):
sourceList = str.split(startString,'=')
server = str.split(sourceList[1],';')[0]
database = str.split(sourceList[2],';')[0]
user = str.split(sourceList[3],';')[0]
password = str.split(sourceList[4],';')[0]
returnEngine = sqlalchemy.create_engine("mssql+pyodbc://"+user+":"+password+"#"+server+"/"+database+"?driver=SQL+Server+Native+Client+11.0")
return returnEngine
async def getConnString(startString):
sourceList = str.split(startString,'=')
server = str.split(sourceList[1],';')[0]
database = str.split(sourceList[2],';')[0]
user = str.split(sourceList[3],';')[0]
password = str.split(sourceList[4],';')[0]
return "Driver={SQL Server Native Client 11.0};Server="+server+";Database="+database+";Uid="+user+";Pwd="+password+";"
async def executePackage(source,destination,query,sourceTable,destTable,lastmodifiedDate,basedOnStation):
sourceConnString = getConnString(source)
destEngine = getEngine(destination)
sourceConn = pyodbc.connect(sourceConnString)
newQuery = str.replace(query,'dateTest',str(lastmodifiedDate))
df = pandas.read_sql(newQuery,sourceConn)
print('Started '+sourceTable+'->'+destTable)
tic = time.perf_counter()
await df.to_sql(destTable,destEngine,index=False,if_exists="append")
toc = time.perf_counter()
secondsToFinish = toc - tic
print('Finished '+sourceTable+'->'+destTable+' in '+ str(secondsToFinish) +' seconds')
async def main():
connString = "Driver={SQL Server Native Client 11.0};Server=myServer;Trusted_Connection=yes;"
myConn = pyodbc.connect(connString)
cursor = myConn.cursor()
df = pandas.read_sql('exec mySql_stored_proc',myConn)
if len(df.index) > 0:
tasks = [executePackage(df.iloc[i,10],df.iloc[i,11],df.iloc[i,7],df.iloc[i,8],df.iloc[i,9],df.iloc[i,5],df.iloc[i,17])for i in range(len(df))]
myLoop = asyncio.get_event_loop()
cors = asyncio.wait(tasks)
myLoop.run_until_complete(cors)
if __name__ =="__main__":
asyncio.run(main())

How to connect to SQL Server with R?

I want to create a model in R using a connection to data stored in SQL Server datawarehouse.
I tried to use RevoScaleR library which returned
package RevoScaleR is not available (for R version 3.4.1)
so, I edited the connection string (given on the code below) for ODBC library:
install.packages("RevoScaleR")
#require("RevoScaleR")
if (!require("RODBC"))
install.packages("RODBC")
conn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0}; Server=CZPHADDWH01/DEV; Database=DWH_Staging; trusted_connection=true")
sqlWait <- TRUE;
sqlConsoleOutput <- FALSE;
cc <- RxInSqlServer(connectionString = conn, wait = sqlWait)
rxSetComputeContext(cc)
train_query <- "SELECT TOP(10000) * FROM dim.Contract"
formula <- as.formula("Cosi ~ ContractID + ApprovedLoanAmount + ApprovedLoadDuration")
forest_model <- rxDForest(formula = formula,
data = train_query,
nTree = 20,
maxDepth = 32,
mTry = 3,
seed = 5,
verbose = 1,
reportProgress = 1)
rxDForest_model <- as.raw(serialize(forest_model, connection = conn))
lenght(rxDForest_model)
However:
package 'RODBC' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\sjirak\AppData\Local\Temp\Rtmpqa9iKN\downloaded_packages
Error in odbcDriverConnect(connection = "Driver={SQL Server Native
Client 11.0}; Server=CZPHADDWH01/DEV; Database=DWH_Staging;
trusted_connection=true") : could not find function
"odbcDriverConnect" In library(package, lib.loc = lib.loc,
character.only = TRUE, logical.return = TRUE, : there is no
package called 'RODBC'
Any help would be appreciated.
Looking at the documentation of the ODBC, I see the following functions
odbc-package
dbConnect,OdbcDriver-method
dbUnQuoteIdentifier
odbc
odbc-tables
OdbcConnection
odbcConnectionActions
odbcConnectionIcon
odbcDataType
OdbcDriver
odbcListColumns
odbcListDataSources
odbcListDrivers
odbcListObjects
odbcListObjectTypes
odbcPreviewObject
OdbcResult
odbcSetTransactionIsolationLevel
test_roundtrip
hence I dont see your function in this list. This could be the reason why...
Hence, check the documentation for the proper function.

Execute Microsoft SQL query on R Shiny

I am writing an R-Shiny app. Can some one tell me how to execute a Microsoft SQL query in R Shiny ?
This is what I have done so far:
data <- reactive({
conn <- reactive ({ databaseOpen(serverName="[serverName]", databaseName=[dbName])})
qr <- reactive ({ SELECT * from myTable })
res <- reactive ({databaseQuery(conn = conn,query = qr)})
close(conn)
View(res)
})
Any help is appreciated !
I was able to call a query by creating a function outside of the server and ui functions (in other words, in a global.r). Then the server function could call that query function using one of the inputs in the function.
Here is my code:
queryfunction <- function(zipper){
odbcChannel <- odbcConnect("myconnection")
querydoc <- paste0("
SELECT distinct *
FROM mydb
where substring(nppes_provider_zip,1,2) = '43'
and [provider_type] = 'General Practice'
")
pricetable <- sqlQuery(odbcChannel, querydoc)
close(odbcChannel)
pricetable[which(substring(pricetable$nppes_provider_zip,1,5)==zipper),]
}
server <- shinyServer(function(input, output) {
output$mytable1 <- renderDataTable(data.table(queryfunction(input$zip)))
})
I figured it out. It can be done as:
server.r
serverfun<-function(input, output){
# Storing values in myData variable
myData <- reactive({
# Opening database connection
conn <- databaseOpen(serverName = "myServer",databaseName = "myDB")
# Sample query which uses some input
qr <- paste( "SELECT name FROM Genes g WHERE Id = ",input$myId," ORDER BY name")
# Storing results
res <- databaseQuery(conn = conn,query = qr)
# closing database
databaseClose(conn)
# Returning results
res
})
output$tbTable <- renderTable({
# Checking if myData is not null
if(is.null(myData())){return ()}
# return myData
myData()
})
ui.r
library("shiny")
shinyUI(
pageWithSidebar(
headerPanel("Hide Side Bar example"),
sidebarPanel(
textInput("Id", "Enter ID below","1234")
),
mainPanel(
tabsetPanel(
tabPanel("Data", tableOutput("tbTable"))
)
)
)
)

Resources