Generate SQL string using schema.CreateTable fails with postgresql ARRAY - arrays

I'd like to generate the verbatim CREATE TABLE .sql string from a sqlalchemy class containing a postgresql ARRAY.
The following works fine without the ARRAY column:
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy import *
from geoalchemy import *
from sqlalchemy.ext.declarative import declarative_base
metadata=MetaData(schema='refineries')
Base=declarative_base(metadata)
class woodUsers (Base):
__tablename__='gquery_wood'
id=Column('id', Integer, primary_key=True)
name=Column('name', String)
addr=Column('address', String)
jsn=Column('json', String)
geom=GeometryColumn('geom', Point(2))
this woks just as i'd like it to:
In [1]: from sqlalchemy.schema import CreateTable
In [3]: tab=woodUsers()
In [4]: str(CreateTable(tab.metadata.tables['gquery_wood']))
Out[4]: '\nCREATE TABLE gquery_wood (\n\tid INTEGER NOT NULL, \n\tname VARCHAR, \n\taddress VARCHAR, \n\tjson VARCHAR, \n\tgeom POINT, \n\tPRIMARY KEY (id)\n)\n\n'
however when I add a postgresql ARRAY column in it fails:
class woodUsers (Base):
__tablename__='gquery_wood'
id=Column('id', Integer, primary_key=True)
name=Column('name', String)
addr=Column('address', String)
types=Column('type', ARRAY(String))
jsn=Column('json', String)
geom=GeometryColumn('geom', Point(2))
the same commands as above result in a long traceback string ending in:
/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/visitors.pyc in _compiler_dispatch(self, visitor, **kw)
70 getter = operator.attrgetter("visit_%s" % visit_name)
71 def _compiler_dispatch(self, visitor, **kw):
---> 72 return getter(visitor)(self, **kw)
73 else:
74 # The optimization opportunity is lost for this case because the
AttributeError: 'GenericTypeCompiler' object has no attribute 'visit_ARRAY'
If the full traceback is useful, let me know and I will post.
I think this has to do with specifying a dialect for the compiler (?) but im not sure. I'd really like to be able to generate the sql without having to create an engine. I'm not sure if this is possible though, thanks in avance.

There's probably a complicated solution that involves digging in sqlalchemy.dialects.
You should first try it with an engine though. Fill in a bogus connection url and just don't call connect().

Related

Apache Spark: Type conversion problem on write using JDBC driver to SQL Server / Azure DWH for column of BINARY type

My initial goal is to save UUId values to SQL Server/Azure DWH to column of BINARY(16) type.
For example, I have demo table:
CREATE TABLE [Events] ([EventId] [binary](16) NOT NULL)
I want to write data to it using Spark like this:
import java.util.UUID
val uuid = UUID.randomUUID()
val uuidBytes = Array.ofDim[Byte](16)
ByteBuffer.wrap(uuidBytes)
.order(ByteOrder.BIG_ENDIAN)
.putLong(uuid.getMostSignificantBits())
.putLong(uuid.getLeastSignificantBits()
val schema = StructType(
List(
StructField("EventId", BinaryType, false)
)
)
val data = Seq((uuidBytes)).toDF("EventId").rdd;
val df = spark.createDataFrame(data, schema);
df.write
.format("jdbc")
.option("url", "<DATABASE_CONNECTION_URL>")
.option("dbTable", "Events")
.mode(org.apache.spark.sql.SaveMode.Append)
.save()
This code returns an error:
java.sql.BatchUpdateException: Conversion from variable or parameter type VARBINARY to target column type BINARY is not supported.
My question is how to cope with this situation and insert UUId value to BINARY(16) column?
My investigation:
Spark uses conception of JdbcDialects and has a mapping for each Catalyst type to database type and vice versa. For example here is MsSqlServerDialect which is used when we work against SQL Server or Azure DWH. In the method getJDBCType you can see the mapping:
case BinaryType => Some(JdbcType("VARBINARY(MAX)", java.sql.Types.VARBINARY))
and this is the root of my problem as I think.
So, I decide to implement my own JdbcDialect to override this behavior:
class SqlServerDialect extends JdbcDialect {
override def canHandle(url: String) : Boolean = url.startsWith("jdbc:sqlserver")
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case BinaryType => Option(JdbcType("BINARY(16)", java.sql.Types.BINARY))
case _ => None
}
}
val dialect = new SqlServerDialect
JdbcDialects.registerDialect(dialect)
With this modification I still catch exactly the same error. It looks like that Spark do not use mapping from my custom dialect. But I checked that the dialect is registered. So it is strange situation.

Integrate django model with legacy db

I understand that Django Model is pretty fussy when it comes to the absence of a primary key.
I have a legacy database (SQL Server) that I am connecting too (it is not the default one), and in that I have a database.view which I am supposed access. However, the issue is that the view does not have any primary key.
How do i enable django to query from that table without being able to modify the schema?
Here is what I did:
I created a ReadOnlyModel and got my other django model to subclass it. This was done because I wanted to bypass the django's need for PK (evidently it did work, but threw another error - see below)
class ReadOnlyModel(models.Model):
def save(self, *args, **kwargs):
pass
def delete(self, *args, **kwargs):
pass
I created an ActiveObjectManager for Django to know which database to point too. The reason i used this over a router is because the router would work best if I was creating two seperate applications in the same repo. However, I am using one application:
class Db2ActiveObjectManager(models.Manager):
def get_queryset(self):
qs = super(Db2ActiveObjectManager, self).get_queryset()
if hasattr(self.model, 'use_db'):
qs = qs.using(self.model.use_db)
return qs
Below is a sample model from the Db2:
class modelA(ReadOnlyModel):
use_db = 'db2'
objects = Db2ActiveObjectManager()
col1 = models.CharField(primary_key=True, max_length=256)
col2 = models.CharField(primary_key=True, max_length=1024)
col3 = models.CharField(primary_key=True, max_length=256)
col4 = models.CharField(primary_key=True, max_length=100)
col5 = models.CharField(max_length=256)
col6 = models.CharField(primary_key=True, max_length=50)
class Meta:
managed = False
db_table = 'db2.someTable'
#classmethod
def methodA(cls):
try:
filter_condition = {}
return cls.objects.filter(**filter_condition)
except Exception as e:
raise ValueError("Invalid input")
#classmethod
def methodB(cls):
try:
return cls.objects.raw('SQL Query Here')
# cursor = connection.cursor()
# cursor.execute('SQL Query Here')
# field_names = [field[0].lower() for field in cursor.description]
# nt_result = namedtuple('Result', field_names)
# return [nt_result(*row) for row in cursor.fetchall()]
except Exception as e:
raise
So far my current setup is throwing the following error:
django.db.utils.ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid object name 'api_readonlymodel'. (208) (SQLExecDirectW)")
I seen many developers on stackoverflow re-iterating that we need to add a pk to the table/view. But i cannot make any changes to that view. I am pretty sure that there is a work around, but no matter what I have tried so far, nothing seems to work out for me yet.
Either I get the above error^ or I get an error saying that no valid column called id is found, or I get an error stating that I need a PK...
Thanks for the help

IMDB to SQLite: How to create table and insert imdb data into sqlite database?

I'm working on a project where I want the select IMDB files (The files format is in *.list) that I downloaded from here into a sqlite database. Unfortunately, I'm not able to solve this issue. I'm able to create a database but can't tabulate the table with IMDB data.
The documentation I've been following is here. So far, I created a sqlite table but it will not populate it.
import sqlite3
from sqlite3 import Error
def create_connection(db_file):
""" create a database connection to the SQLite database
specified by db_file
:param db_file: database file
:return: Connection object or None
"""
try:
conn = sqlite3.connect(db_file)
return conn
except Error as e:
print(e)
return None
def create_table(conn, create_table_sql):
""" create a table from the create_table_sql statement
:param conn: Connection object
:param create_table_sql: a CREATE TABLE statement
:return:
"""
try:
c = conn.cursor()
c.execute(create_table_sql)
except Error as e:
print(e)
def main():
database = "/Users/Erudition/Desktop/imdb_database/sqldatabase.db"
sql_create_tile_akas = """ CREATE TABLE IF NOT EXISTS title (
titleid text PRIMARY KEY,
ordering integer NOT NULL,
title text,
region text,
language text NOT NULL,
types text NOT NULL,
attributes text NOT NULL,
isOriginalTitle integer NOT NULL
); """
conn = create_connection(database)
if conn is not None:
# create projects table
create_table(conn, sql_create_tile_akas)
else:
print("Error! cannot create the database connection.")
if __name__ == '__main__':
main()
In the terminal, I enter
imdbpy2sql.py -d /Users/Erudition/Desktop/imdb_database/aka-titles.list/
-u sqlite:///sqldatabase.db'''
The output I expect is a sqlite table with all the rows filled. Instead, I get a several sqlite tables with nothing filled out.
The terminal output is:
WARNING The file will be skipped, and the contained
WARNING information will NOT be stored in the database.
WARNING Complete error: [Errno 20] Not a directory:
'/Users/Erudition/Desktop/imdb_database/aka-titles.list/complete-
cast.list.gz'
WARNING WARNING WARNING
WARNING unable to read the "/Users/Erudition/Desktop/imdb_database/aka-
titles.list/complete-crew.list.gz" file.
WARNING The file will be skipped, and the contained
WARNING information will NOT be stored in the database.
WARNING Complete error: [Errno 20] Not a directory:
'/Users/Erudition/Desktop/imdb_database/aka-titles.list/complete-
crew.list.gz'
I found the solution!
pip install imdb-sqlite
Then
imdb-sqlite
Here's the link

sqlalchemy: '<' not supported between instances of 'str' and 'int' for table schema attribute (pyodbc)

I am trying to retrieve a table from an MSSQL server (using a DSN)
I have this code:
engine=sqlalchemy.create_engine('mssql+pyodbc://MATRIX')
md=sqlalchemy.MetaData()
tsk = sqlalchemy.Table('MATRIX_SMSIMA', md, autoload=True, autoload_with=engine, schema='USER')
Initially, I tried it without the schema attribute and got an error message that pyodbc doesn't support a default schema.
When adding the schema attribute I get an error message that seems to stem from the guts of the Table function:
in _compile(element, compiler, **kw)
39 def _compile(element, compiler, **kw):
40 from . import base
---> 41 if compiler.dialect.server_version_info < base.MS_2005_VERSION:
42 return compiler.process(element.bindvalue, **kw)
43 else:
TypeError: '<' not supported between instances of 'str' and 'int'
Is there a way around this?
OK, I have just found that the application I'm connecting to is working with InterSystems Cache SQL and not with MS SQL, so this is a false alarm

How to pass a string literal parameter to a peewee fn call

I've got a issue passing a string literal parameter to a SQL function using peewee's fn construct. I've got an object defined as:
class User(BaseModel):
computingID = CharField()
firstName = CharField()
lastName = CharField()
role = ForeignKeyField(Role)
lastLogin = DateTimeField()
class Meta:
database = database
I'm attempting to use the mySQL timestampdiff function in a select to get the number of days since the last login. The query should look something like this:
SELECT t1.`id`, t1.`computingID`, t1.`firstName`, t1.`lastName`, t1.`role_id`, t1.`lastLogin`, timestampdiff(day, t1.`lastLogin`, now()) AS daysSinceLastLogin FROM `user` AS t1
Here's the python peewee code I'm trying to use:
bob = User.select(User, fn.timestampdiff('day', User.lastLogin, fn.now()).alias('daysSinceLastLogin'))
result = bob[0].daysSinceLastLogin
But when I execute this code, I get an error:
ProgrammingError: (1064, u"You have an error in your SQL syntax; check
the manual that corresponds to your MySQL server version for the right
syntax to use near ''day', t1.lastLogin, now()) AS
daysSinceLastLogin FROM user AS t1' at line 1")
Judging from this message, it looks like the quote marks around the 'day' parameter are being retained in the SQL that peewee is generating. And mySQL doesn't like quotes around the parameter. I obviously can't leave off the quotes in the python code, so can someone tell me what I'm doing wrong please?
Update: I have my query working as intended by using the SQL() peewee command to add the DAY parameter, sans quote marks:
User.select(User, fn.timestampdiff(SQL('day'), User.lastLogin, fn.now()).alias('daysSinceLastLogin'))
But I'm not sure why I had to use SQL() in this situation. Am I missing anything, or is this the right answer?
Is there a reason you need to use an SQL function to do this?
In part because I'm not very comfortable with SQL functions, I would probably do something like this:
import datetime as dt
bob = user.get(User = "Bob") #or however you want to get the User instance
daysSinceLastLogin = (dt.datetime.now() - bob.lastLogin).days

Resources