Clojure how to insert a blob in database? - database

How to insert a blob in database using the clojure.contrib.sql?
I've tried the following reading from a file but I'm getting this exception:
SQLException:
Message: Invalid column type
SQLState: 99999
Error Code: 17004
java.lang.Exception: transaction rolled back: Invalid column type (repl-1:125)
(clojure.contrib.sql/with-connection
db
(clojure.contrib.sql/transaction
(clojure.contrib.sql/insert-values :test_blob [:blob_id :a_blob] [3 (FileInputStream. "c:/somefile.xls")]) ))
Thanks.

I was able to solve this by converting the FileInputStream into a ByteArray.
(clojure.contrib.sql/with-connection
db
(clojure.contrib.sql/transaction
(clojure.contrib.sql/insert-values :test_blob [:blob_id :a_blob] [3 (to-byte-array(FileInputStream. "c:/somefile.xls"))]) ))

In theory, you can use any of the clojure.contrib.sql/insert-* methods to insert a blob, passing the blob as either a byte array, java.sql.Blob or a java.io.InputStream object. In practice, it is driver-dependent.
For many JDBC implementations, all of the above work as expected, but if you're using sqlitejdbc 0.5.6 from Clojars, you'll find your blob coerced to a string via toString(). All the clojure.contrib.sql/insert-* commands are issued via clojure.contrib.sql/do-prepared, which calls setObject() on a java.sql.PreparedStatement. The sqlitejdbc implementation does not handle setObject() for any of the blob data types, but defaults to coercing them to a string. Here's a work-around that enables you to store blobs in SQLite:
(use '[clojure.contrib.io :only (input-stream to-byte-array)])
(require '[clojure.contrib.sql :as sql])
(defn my-do-prepared
"Executes an (optionally parameterized) SQL prepared statement on the
open database connection. Each param-group is a seq of values for all of
the parameters. This is a modified version of clojure.contrib.sql/do-prepared
with special handling of byte arrays."
[sql & param-groups]
(with-open [stmt (.prepareStatement (sql/connection) sql)]
(doseq [param-group param-groups]
(doseq [[index value] (map vector (iterate inc 1) param-group)]
(if (= (class value) (class (to-byte-array "")))
(.setBytes stmt index value)
(.setObject stmt index value)))
(.addBatch stmt))
(sql/transaction
(seq (.executeBatch stmt)))))
(defn my-load-blob [filename]
(let [blob (to-byte-array (input-stream filename))]
(sql/with-connection db
(my-do-prepared "insert into mytable (blob_column) values (?)" [blob]))))

A more recent reply to this thread with the code to read the data as well :
(ns com.my-items.polypheme.db.demo
(:require
[clojure.java.io :as io]
[clojure.java.jdbc :as sql]))
(def db {:dbtype "postgresql"
:dbname "my_db_name"
:host "my_server"
:user "user",
:password "user"})
(defn file->bytes [file]
(with-open [xin (io/input-stream file)
xout (java.io.ByteArrayOutputStream.)]
(io/copy xin xout)
(.toByteArray xout)))
(defn insert-image [db-config file]
(let [bytes (file->bytes file)]
(sql/with-db-transaction [conn db-config]
(sql/insert! conn :image {:name (.getName file) :data bytes}))))
(insert-image db (io/file "resources" "my_nice_picture.JPG"))
;;read data
(defn read-image [db-config id]
(-> (sql/get-by-id db-config :image id)
:data
(#(new java.io.ByteArrayInputStream %))))

I believe it's just the same way you'd insert any other value: use one of insert-records, insert-rows or insert-values. E.g.:
(insert-values :mytable [:id :blobcolumn] [42 blob])
More examples: http://github.com/richhickey/clojure-contrib/blob/master/src/test/clojure/clojure/contrib/test_sql.clj

Related

Trying to convert string values to array in pandas dataframe

I have a dataframe that looks like this
Company CompanyDetails
A [{"companyId": 1482, "companyAddress": 'sampleaddress1', "numOfEmployees": 500}]
B [{"companyId": 1437, "companyAddress": 'sampleaddress2', "numOfEmployees": 50}]
C [{"companyId": 1452, "companyAddress": 'sampleaddress3', "numOfEmployees": 10000}]
When I execute df.dtypes I find that both the Company and CompanyDetails columns are objects.
df[['CompanyDetails']].iloc[0, :] would return '{["companyId": 1482, "companyAddress": 'sampleaddress1', numOfEmployees: 500]}' (there will be quotes ' ' around my array").
I am trying to extract the details within the dictionary in the CompanyDetails column so that I can add new columns to my dataframe to look like this:
Company CompanyId CompanyAddress numOfEmployees
A 1482 'sampleaddress1' 500
B 1437 'sampleaddress2' 50
C 1452 'sampleaddress3' 10000
I tried something like this as I was trying to convert the CompanyDetails column to contain arrays for all my values so I can easily extract each property in the object.
import ast
df['CompanyDetails'] = df['CompanyDetails'].apply(ast.literal_eval)
However, the above code caused this error
ValueError: malformed node or string: <ast.Name object at 0x000002D73D0C13A0>
Would appreciate any help on this, thanks!
You're getting the error because it's actually not valid JSON. numOfEmployees is not quoted, and JSON required ALL keys to be double-quoted.
The easiest, safest (in terms of likelyhood to break) way I can think of to fix this would be to repair the JSON using a regular expression replace:
df['CompanyDetails'] = df['CompanyDetails'].str.replace(r',\s*(\w+)\s*:', r', "\1":', regex=True)
Then do your other stuff:
import ast
df['CompanyDetails'] = df['CompanyDetails'].apply(ast.literal_eval)
df = pd.concat([df.drop('CompanyDetails', axis=1), pd.json_normalize(df['CompanyDetails'].explode())], axis=1)
...or whatever you have in mind.
You can use
import pandas as pd
# Test dataframe
df = pd.DataFrame({'Company':['A'], 'CompanyDetails':[[{"companyId": 1482, "companyAddress": 'sampleaddress1', "numOfEmployees": 500}]]})
df['CompanyDetails'] = df['CompanyDetails'].str[0]
df = pd.concat([df.drop(['CompanyDetails'], axis=1), df['CompanyDetails'].apply(pd.Series)], axis=1)
# => >>> df
# Company companyId companyAddress numOfEmployees
# 0 A 1482 sampleaddress1 500
Note:
df['CompanyDetails'] = df['CompanyDetails'].str[0] gets the first item from each list since each of them only contains one item
pd.concat([df.drop(['CompanyDetails'], axis=1), df['CompanyDetails'].apply(pd.Series)], axis=1) does the actual expansion and merging with the current dataframe.

AttributeError: 'NoneType' object has no attribute 'fetchall' in Prefect

I have been using the following code in production for the last couple of months,
#task
def sql_run_procs():
"""This is a delete and update..."""
# Get our logger
logger = prefect.utilities.logging.get_logger() # type: ignore
conn = connect_db(prefect.config.kv.p.prod_db_constring, logger) ## wrapper around create_engine()
with conn.connect() as con:
try:
r = con.execute(
f"EXECUTE fs.spETL_MyProc '{prefect.config.kv.p.staging_db_name}'"
).fetchall()
for q in r[0]:
if q == 1:
logger.info(f"Query {q} has failed")
raise signals.FAIL()
except :
raise SQLAlchemyError("Error in SQL Script")
So like any good coder I copy and pasted the code into another script
#task
def sql_run_procs():
"""This is a clean, truncate and insert"""
# Get our logger
logger = prefect.utilities.logging.get_logger() # type: ignore
conn = connect_db(prefect.config.kv.p.prod_db_constring, logger)
with conn.connect() as con:
try:
r = con.execute(
f"EXECUTE forms.spETL_MyOtherProc '{prefect.config.kv.p.staging_db_name}'"
).fetchall()
for q in r[0]:
if q == 1:
logger.info(f"Query {q} has failed")
raise signals.FAIL()
except :
raise SQLAlchemyError("Error in SQL Script")
And got the following error:
AttributeError: 'NoneType' object has no attribute 'fetchall'
The only difference other than the name of the stored procedure is they're in different Prefect projects. I've searched this site and others for a possible solution but had no success. I know that it's probably something staring me right in the face but after an hour and a half... you know. Thanks in advance.

Macros that generate code from a for-loop

This example is a little contrived. The goal is to create a macro that loops over some values and programmatically generates some code.
A common pattern in Python is to initialize the properties of an object at calling time as follows:
(defclass hair [foo bar]
(defn __init__ [self]
(setv self.foo foo)
(setv self.bar bar)))
This correctly translates with hy2py to
class hair(foo, bar):
def __init__(self):
self.foo = foo
self.bar = bar
return None
I know there are Python approaches to this problem including attr.ib and dataclasses. But as a simplified learning exercise I wanted to approach this with a macro.
This is my non-working example:
(defmacro self-set [&rest args]
(for [[name val] args]
`(setv (. self (read-str ~name)) ~val)))
(defn fur [foo bar]
(defn __init__ [self]
(self-set [["foo" foo] ["bar" bar]])))
But this doesn't expand to the original pattern. hy2py shows:
from hy.core.language import name
from hy import HyExpression, HySymbol
import hy
def _hy_anon_var_1(hyx_XampersandXname, *args):
for [name, val] in args:
HyExpression([] + [HySymbol('setv')] + [HyExpression([] + [HySymbol
('.')] + [HySymbol('self')] + [HyExpression([] + [HySymbol(
'read-str')] + [name])])] + [val])
hy.macros.macro('self-set')(_hy_anon_var_1)
def fur(foo, bar):
def __init__(self, foo, bar):
return None
Wbat am I doing wrong?
for forms always return None. So, your loop is constructing the (setv ...) forms you request and then throwing them away. Instead, try lfor, which returns a list of results, or gfor, which returns a generator. Note also in the below example that I use do to group the generated forms together, and I've moved a ~ so that the read-str happens at compile-time, as it must in order for . to work.
(defmacro self-set [&rest args]
`(do ~#(gfor
[name val] args
`(setv (. self ~(read-str name)) ~val))))
(defclass hair []
(defn __init__ [self]
(self-set ["foo" 1] ["bar" 2])))
(setv h (hair))
(print h.bar) ; 2

Clojure nested doseq loop

I'm new to Clojure and I have a question regarding nested doseq loops.
I would like to iterate through a sequence and get a subsequence, and then get some keys to apply a function over all the sequence elements.
The given sequence has an structure more or less like this, but with hundreds of books, shelves and many libraries:
([:state/libraries {6 #:library {:name "MUNICIPAL LIBRARY OF X" :id 6
:shelves {3 #:shelf {:name "GREEN SHELF" :id 3 :books
{45 #:book {:id 45 :name "NECRONOMICON" :pages {...},
{89 #:book {:id 89 :name "HOLY BIBLE" :pages {...}}}}}}}}])
Here is my code:
(defn my-function [] (let [conn (d/connect (-> my-system :config :datomic-uri))]
(doseq [library-seq (read-string (slurp "given-sequence.edn"))]
(doseq [shelves-seq (val library-seq)]
(library/create-shelf conn {:id (:shelf/id (val shelves-seq))
:name (:shelf/name (val shelves-seq))})
(doseq [books-seq (:shelf/books (val shelves-seq))]
(library/create-book conn (:shelf/id (val shelves-seq)) {:id (:book/id (val books-seq))
:name (:book/name (val books-seq))})
)))))
The thing is that I want to get rid of that nested doseq mess but I don't know what would be the best approach, since in each iteration keys change. Using recur? reduce? Maybe I am thinking about this completely the wrong way?
Like Carcigenicate says in the comments, presuming that the library/... functions are only side effecting, you can just write this in a single doseq.
(defn my-function []
(let [conn (d/connect (-> my-system :config :datomic-uri))]
(doseq [library-seq (read-string (slurp "given-sequence.edn"))
shelves-seq (val library-seq)
:let [_ (library/create-shelf conn
{:id (:shelf/id (val shelves-seq))
:name (:shelf/name (val shelves-seq))})]
books-seq (:shelf/books (val shelves-seq))]
(library/create-book conn
(:shelf/id (val shelves-seq))
{:id (:book/id (val books-seq))
:name (:book/name (val books-seq))}))))
I would separate "connecting to the db" from "slurping a file" from "writing to the db" though. Together with some destructuring I'd end up with something more like:
(defn write-to-the-db [conn given-sequence]
(doseq [library-seq given-sequence
shelves-seq (val library-seq)
:let [{shelf-id :shelf/id,
shelf-name :shelf/name
books :shelf/books} (val shelves-seq)
_ (library/create-shelf conn {:id shelf-id, :name shelf-name})]
{book-id :book/id, book-name :book/name} books]
(library/create-book conn shelf-id {:id book-id, :name book-name})))

Loading data into R with rsqlserver package

I've just installed rsqlserver like so (no errors)
install_github('rsqlserver', 'agstudy',args = '--no-multiarch')
And created a connection to my database:
> library(rClr)
> library(rsqlserver)
Warning message:
multiple methods tables found for ‘dbCallProc’
> drv <- dbDriver("SqlServer")
> conn <- dbConnect(drv, url = "Server=MyServer;Database=MyDB;Trusted_Connection=True;")
>
Now when I try to get data using dbGetQuery, I get this error:
> df <- dbGetQuery(conn, "select top 100 * from public2013.dim_Date")
Error in clrCall(sqlDataHelper, "GetConnectionProperty", conn, prop) :
Type: System.MissingMethodException
Message: Method not found: 'System.Object System.Reflection.PropertyInfo.GetValue(System.Object)'.
Method: System.Object GetConnectionProperty(System.Data.SqlClient.SqlConnection, System.String)
Stack trace:
at rsqlserver.net.SqlDataHelper.GetConnectionProperty(SqlConnection _conn, String prop)
>
When I try to fetch results using dbSendQuery, I also get an error.
> res <- dbSendQuery(conn, "select top 100 * from public2013.dim_Date")
> df <- fetch(res, n = -1)
Error in clrCall(sqlDataHelper, "Fetch", stride) :
Type: System.InvalidCastException
Message: Object cannot be stored in an array of this type.
Method: Void InternalSetValue(Void*, System.Object)
Stack trace:
at System.Array.InternalSetValue(Void* target, Object value)
at System.Array.SetValue(Object value, Int32 index)
at rsqlserver.net.SqlDataHelper.Fetch(Int32 capacity) in c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs:line 116
Strangely, the file c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs doesn't actually exist on my computer.
Am I doing something wrong?
I am agstudy the creator of rsqlserver package. Sorry for the late but I finally I get some time to fix this bug. ( actually it was a not yet implemented feature). I demonstrate here how you can read/write data.frame with missing values in Sql server.
First I create a data.frame with missing values. It is important to distinguish the difference between numeric and character variables.
library(rsqlserver)
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## create a table with some missing value
dat <- data.frame(txt=c('a',NA,'b',NA),
value =c(1L,NA,NA,2))
My input looks like this :
# txt value
# 1 a 1
# 2 <NA> NA
# 3 b NA
# 4 <NA> 2
I insert dat in my data base with the handy function dbWriteTable:
dbWriteTable(conn,name='T_TABLE_WITH_MISSINGS',
dat,row.names=FALSE,overwrite=TRUE)
Then I will read it using 2 methods:
dbSendQuery
res = dbSendQuery(conn,'SELECT *
FROM T_TABLE_WITH_MISSINGS')
fetch(res,n=-1)
dbDisconnect(conn)
txt value
1 a 1
2 <NA> NaN
3 b NaN
4 <NA> 2
dbReadTable:
rsqlserver is DBI compliant and implement many convenient functions to deal at least at possible with SQL.
conn <- dbConnect('SqlServer',url=url)
dbReadTable(conn,name='T_TABLE_WITH_MISSINGS')
dbDisconnect(conn)
txt value
1 a 1
2 <NA> NaN
3 b NaN
4 <NA> 2
(EDIT: I had missed something in your post (call to fetch). I can now reproduce the issue too.)
Short story is: do you have a NULL value in your database? this may be the cause.
Longer story, for a full repro:
I've used a sample DB reproducible by following the instructions at http://www.codeproject.com/Tips/326527/Create-a-Sample-SQL-Database-in-Less-Than-2-Minute
EDIT:
I can reproduce your issue with:
library(rClr)
library(rsqlserver)
drv <- dbDriver("SqlServer")
conn <- dbConnect(drv, url = "Server=Localhost\\somename;Database=Fabrics;Trusted_Connection=True;")
res <- dbSendQuery(conn, "SELECT TOP 100 * FROM [Fabrics].[dbo].[Client]")
str(res)
## Formal class 'SqlServerResult' [package "rsqlserver"] with 1 slots
..# Id:<externalptr>
> df <- fetch(res, n = -1)
Error in clrCall(sqlDataHelper, "Fetch", stride) :
Type: System.InvalidCastException
Message: Object cannot be stored in an array of this type.
Method: Void InternalSetValue(Void*, System.Object)
Stack trace:
at System.Array.InternalSetValue(Void* target, Object value)
at System.Array.SetValue(Object value, Int32 index)
at rsqlserver.net.SqlDataHelper.Fetch(Int32 capacity) in c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs:line 116
the following commands suggest things work as expected if using other commands.
> dbExistsTable(conn, name='Client')
Error in sqlServerExecScalar(conn, statement, ...) :
Message: There is already an open DataReader associated with this Command which must be closed first.
> dbClearResult(res)
[1] TRUE
> dbExistsTable(conn, name='Client')
[1] TRUE
> dbExistsTable(conn, name='SomeIncorrectColumn')
[1] FALSE
Note that I cannot reproduce the very odd one about MissingMethodException
df <- dbGetQuery(conn, "SELECT TOP 100 * FROM [Fabrics].[dbo].[Client]")
Error in clrCall(sqlDataHelper, "Fetch", stride) :
Type: System.InvalidCastException
Message: Object cannot be stored in an array of this type.
Method: Void InternalSetValue(Void*, System.Object)
Stack trace:
at System.Array.InternalSetValue(Void* target, Object value)
at System.Array.SetValue(Object value, Int32 index)
at rsqlserver.net.SqlDataHelper.Fetch(Int32 capacity) in c:\projects\R\rsqlserver\src\rsqlserver.net\src\SqlDataHelper.cs:line 116
Since the debug symbols seem present, I can debug it further through visual studio. It bombs in SqlDataHelper.Fetch at
_resultSet[_cnames[i]].SetValue(_reader.GetValue(i), cnt);
and the variable watch gives me:
i 11 int
_cnames[i] "Street2" string
_reader.GetValue(i) {} object {System.DBNull}
_reader.GetValue(i-1) "806 West Sir Francis Drake St" object {string}
_reader.GetValue(i+1) "Spokane" object {string}
The entry for Street2 is indeed a NULL:
ClientId FirstName MiddleName LastName Gender DateOfBirth CreditRating XCode OccupationId TelephoneNumber Street1 Street2 City ZipCode Longitude Latitude Notes
1 Nicholas Pat Kane M 1975-10-07 00:00:00.000 3 ZU8 5ML 4 (279) 459 - 2707 2870 North Cherry Blvd. NULL Carlsbad 64906 32.7608137325835 117.112738329071
For information, sessionInfo() output includes:
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
other attached packages:
[1] rsqlserver_1.0 rClr_0.5-2
loaded via a namespace (and not attached):
[1] DBI_0.2-7 tools_3.0.2
Hope this helps.

Resources