KeyError while trying to connect to database using pymssql - sql-server

The below code tries to connect to a mssql database using pymssql. I have a CSV file and I am trying to push all the rows into a single data table in the mssql database. I am getting a 'KeyError' when I try to execute the code after opening the CSV file.
import csv
import pymssql
conn = pymssql.connect(host="host name",
database="dbname",
user = "username",
password = "password")
cursor = conn.cursor()
if(conn):
print("True")
else:
print("False")
with open ('path to csv file', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = "INSERT INTO Marketing({'URL', 'Domain_name', 'Downloadables', 'Text_without_javascript', 'Downloadable_Link'}) VALUES ({%s,%s,%s,%s,%s})"
query = query.format(','.join('[' + x + ']' for x in columns), ','.join('?' * len(columns)))
cursor = conn.cursor()
for data in reader:
cursor.execute(query, tuple(data))
cursor.commit()
The below is the error that I get:
KeyError: "'URL', 'Domain_name', 'Downloadables', 'Text_without_javascript', 'Downloadable_Link'"
Using to_sql
file_path = "path to csv"
engine = create_engine("mssql://user:password#host/database")
df = pd.read_csv(file_path, encoding = 'latin')
df.to_sql(name='Marketing',con=engine,if_exists='append')
Output:
InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')

I tried everything, from converting the parameters which were being passed to a tuple, passing it as is, but didn't help. Below is the code that helped me fix the issue:
with open ('path to csv file', 'r') as f:
for row in f:
reader = csv.reader(f)
# print(reader)
columns = next(reader)
# print(columns)
cursor = conn.cursor()
for data in reader:
# print(data)
data = tuple(data)
# print(data)
query = ("INSERT INTO Marketing(URL, Domain_name, Downloadables, Text_without_javascript, Downloadable_Link) VALUES (%s,%s,%s,%s,%s)")
parameters = data
# query = query.format(','.join('?' * len(columns)))
cursor.execute(query, parameters)
conn.commit()
Note: The connecting to the database part remains as in the question.

Related

NIFI - upload binary.zip to SQL Server as varbinary

I am trying to upload a binary.zip to SQL Server as varbinary type column content.
Target Table:
CREATE TABLE myTable ( zipFile varbinary(MAX) );
My NIFI Flow is very simple:
-> GetFile:
filter:binary.zip
-> UpdateAttribute:<br>
sql.args.1.type = -3 # as varbinary according to JDBC types enumeration
sql.args.1.value = ??? # I don't know what to put here ! (I've triying everything!)
sql.args.1.format= ??? # Is It required? I triyed 'hex'
-> PutSQL:<br>
SQLstatement= INSERT INTO myTable (zip_file) VALUES (?);
What should I put in sql.args.1.value?
I think it should be the flowfile payload, but it would work as part of the INSERT in the PutSQL? Not by the moment!
Thanks!
SOLUTION UPDATE:
Based on https://issues.apache.org/jira/browse/NIFI-8052
(Consider I'm sending some data as attribute parameter)
import java.nio.charset.StandardCharsets
import org.apache.nifi.controller.ControllerService
import groovy.sql.Sql
def flowFile = session.get()
def lookup = context.controllerServiceLookup
def dbServiceName = flowFile.getAttribute('DatabaseConnectionPoolName')
def tableName = flowFile.getAttribute('table_name')
def fieldName = flowFile.getAttribute('field_name')
def dbcpServiceId = lookup.getControllerServiceIdentifiers(ControllerService).find
{ cs -> lookup.getControllerServiceName(cs) == dbServiceName }
def conn = lookup.getControllerService(dbcpServiceId)?.getConnection()
def sql = new Sql(conn)
flowFile.read{ rawIn->
def parms = [rawIn ]
sql.executeInsert "INSERT INTO " + tableName + " (date, "+ fieldName + ") VALUES (CAST( GETDATE() AS Date ) , ?) ", parms
}
conn?.close()
if(!flowFile) return
session.transfer(flowFile, REL_SUCCESS)
session.commit()
maybe there is a nifi native way to insert blob however you could use ExecuteGroovyScript instead of UpdateAttribute and PutSQL
add SQL.mydb parameter on the level of processor and link it to required DBCP pool.
use following script body:
def ff=session.get()
if(!ff)return
def statement = "INSERT INTO myTable (zip_file) VALUES (:p_zip_file)"
def params = [
p_zip_file: SQL.mydb.BLOB(ff.read()) //cast flow file content as BLOB sql type
]
SQL.mydb.executeInsert(params, statement) //committed automatically on flow file success
//transfer to success without changes
REL_SUCCESS << ff
inside the script SQL.mydb is a reference to groovy.sql.Sql oblject

Python parameterized query and insert into SQL Server

I'm using pyodbc connector for storing data and image into SQL Server. The storing function contains parameterized arguments which the value was supply by global variables from others function.
With hard-coded the values, I able to insert into the DB without any issue, but seems like no luck when trying to insert using the variable values.
What is the right method for me to execute this transaction in Python? Any help/advice is highly appreciated!
def convertToBinaryData(filename):
# Convert digital data to binary format
with open(filename, 'rb') as file:
binaryData = file.read()
return binaryData
def saveRecord1(self,DocumentType, FileName, DocumentContent, DocumentText, LastUpdate, UpdatedBy):
print("Inserting into database")
conn = pyodbc.connect('Driver={SQL Server};'
'Server=localhost;'
'Database=testDB;'
'uid=test;'
'pwd=test01;'
'Trusted_Connection=No;')
cursor = conn.cursor(prepared=True)
sql_insert_blob_query = """INSERT INTO testDB.dbo.OCRDocuments (DocumentType, FileName, DocumentContent, DocumentText, LastUpdate, UpdatedBy) VALUES (?,?,?,?,?,?)"""
pics = convertToBinaryData(DocumentContent)
insert_blob_tuple = (DocumentType, FileName, pics, DocumentText, LastUpdate, UpdatedBy)
result = cursor.execute(sql_insert_blob_query, insert_blob_tuple)
QtGui.QMessageBox.warning(self, 'Status', 'Successfully saved!',
QtGui.QMessageBox.Cancel, QtGui.QMessageBox.Ok)
conn.commit()
conn.close()
#saveRecord( 'k1', 'imgFileType', "output.png", '2020-10-27 11:20:47.000', '2020-10-27 11:20:47.000','1000273868')
saveRecord1(self, docType, imgFileType, output, docNum, datetime,userID)

Read error with spark.read against SQL Server table (via JDBC Connection)

I have a problem in Zeppelin when I try to create a dataframe reading directly from a SQL table. The problem is that I dont know how to read a SQL column with the geography type.
SQL table
This is the code that I am using, and the error that I obtain.
Create JDBC connection
import org.apache.spark.sql.SaveMode
import java.util.Properties
val jdbcHostname = "XX.XX.XX.XX"
val jdbcDatabase = "databasename"
val jdbcUsername = "user"
val jdbcPassword = "XXXXXXXX"
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
Read from SQL
import spark.implicits._
val table = "tablename"
val postcode_polygons = spark.
read.
jdbc(jdbcUrl, table, connectionProperties)
Error
import spark.implicits._
table: String = Lookup.Postcode50m_Lookup
java.sql.SQLException: Unsupported type -158
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:233)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:290)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:290)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:289)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
Adding to thebluephantom answer have you tried changing the type to string as below and loading the table.
val jdbcDF = spark.read.format("jdbc")
.option("dbtable" -> "(select toString(SData) as s_sdata,toString(CentroidSData) as s_centroidSdata from table) t")
.option("user", "user_name")
.option("other options")
.load()
This is the final solution in my case, the idea of moasifk is correct, but in my code I cannot use the function "toString". I have applied the same idea but with another sintaxis.
import spark.implicits._
val tablename = "Lookup.Postcode50m_Lookup"
val postcode_polygons = spark.
read.
jdbc(jdbcUrl, table=s"(select PostcodeNoSpaces, cast(SData as nvarchar(4000)) as SData from $tablename) as postcode_table", connectionProperties)

revoscaler sqlServerdData rxImport uniqueidentifier column failed

I'm trying to import data from SQL Server, but I'm having issues importing a table which consists of uniqueidentifier column type.
I'm using R Client 3.3.2.0 to query database.
Database table:
Code:
sqlConnString = "DRIVER=ODBC Driver 11 for SQL Server;SERVER=JDIMKO;DATABASE=Test;UID=sa;PWD=***;"
colClasses = c("id" = "integer", "ui" = "character")
sqlServerData <- RxSqlServerData(
sqlQuery = "select * from tbl1",
connectionString = sqlConnString, colClasses = colClasses)
custData = rxImport(sqlServerData)
Error:
Unhandled SQL data type!!!
Unhandled SQL data type!!!
Could not open data source.
Error in doTryCatch(return(expr), name, parentenv, handler) :
Could not open data source.
RxSqlServerData not supported UNIQUEIDENTIFIER data type. You should convert it to varchar.
sqlServerData <- RxSqlServerData(
sqlQuery = "select id, CONVERT(VARCHAR(36), ui) ui from tbl1",
connectionString = sqlConnString, colClasses = colClasses)

How to specify a destination DB while exporting data frame to mssql

I would like to export a data frame to mssql table.I used the code below but I would like to set the destination and not only the server and table name.I have a few DBs inside the server,how can i save the table in one of them?
df<-read.csv(file.choose(),header = T,sep= T)
DB= odbcConnect(dsn ='R_BISRV',uid = 'XXXX', pwd = 'XXX')
sqlSave(DB, df, tablename = 'Tanya', rownames = F,append = T)
close(DB)
I figured it out:
Data base name should be in the odbcDriverConnect() and table name in the sqlSave()
channel <- odbcDriverConnect('driver={SQL Server};server=YYY;database=YY;port=1433;
uid=XX;pwd=XXX')
# Client systems use TCP 1433 to connect to the database engine
sqlSave(channel = channel,dat = df, rownames = TRUE, tablename = "Tanya")

Resources