sqlalchemy.orm.exc.UnmappedInstanceError: Class 'builtins.dict' is not mapped AND using marshmallow-sqlalchemy - sql-server

I don't get it. I'm trying to start a brand new table in MS SQL Server 2012 with the following:
In SQL Server:
TABLE [dbo].[Inventory](
[Index_No] [bigint] IDENTITY(1,1) NOT NULL,
[Part_No] [varchar(150)] NOT NULL,
[Shelf] [int] NOT NULL,
[Bin] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[Index_No] ASC
)
UNIQUE NONCLUSTERED
(
[Part_No] ASC
)
GO
NOTE: This is a BRAND NEW TABLE! There is no data in it at all
Next, this is the Database.py file:
import pymssql
from sqlalchemy import create_engine, Table, MetaData, select, Column, Integer, Float, String, text,
func, desc, and_, or_, Date, insert
from marshmallow_sqlalchemy import SQLAlchemyAutoSchema
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
USERNAME = "name"
PSSWD = "none_of_your_business"
SERVERNAME = "MYSERVER"
INSTANCENAME = "\SQLSERVER2012"
DB = "Inventory"
engine = create_engine(f"mssql+pymssql://{USERNAME}:{PSSWD}#{SERVERNAME}{INSTANCENAME}/{DB}")
class Inventory(Base):
__tablename__ = "Inventory"
Index_No = Column('Index_No', Integer, primary_key=True, autoincrement=True)
Part_No = Column("Part_No", String, unique=True)
Shelf = Column("Shelf", Integer)
Bin = Column("Bin", Integer)
def __repr__(self):
return f'Drawing(Index_No={self.Index_No!r},Part_No={self.Part_No!r}, Shelf={self.Shelf!r}, ' \
f'Bin={self.Bin!r})'
class InventorySchema(SQLAlchemyAutoSchema):
class Meta:
model = Inventory
load_instance = True
It's also to note that I'm using SQLAlchemy 1.4.3, if that helps out.
and in the main.py
import Database as db
db.Base.metadata.create_all(db.engine)
data_list = [{Part_No:123A, Shelf:1, Bin:5},
{Part_No:456B, Shelf:1, Bin:7},
{Part_No:789C, Shelf:2, Bin:1}]
with db.Session(db.engine, future=True) as session:
try:
session.add_all(data_list) #<--- FAILS HERE AND THROWS AN EXCEPTION
session.commit()
except Exception as e:
session.rollback()
print(f"Error! {e!r}")
raise
finally:
session.close()
Now what I've googled on this "Class 'builtins.dict' is not mapped", most of the solutions brings me to marshmallow-sqlalchemy package which I've tried, but I'm still getting the same error. So I've tried moving the Base.metadata.create_all(engine) from the Database.py into the main.py. I also tried implementing a init function in the Inventory class, and also calling the Super().init, which doesn't work
So what's going on?? Why is it failing and is there a better solution to this problem?

Try creating Inventory objects:
data_list = [
Inventory(Part_No='123A', Shelf=1, Bin=5),
Inventory(Part_No='456B', Shelf=1, Bin=7),
Inventory(Part_No='789C', Shelf=2, Bin=1)
]

Related

SqlAlchemy - How to query a table based on a key property saved as NestedMutableJson

Let suppose a Postgres user table contains a property of type NestedMutableJson.
first_name character varying (120)
last_name character varying (120
country_info NestedMutableJson
...
from sqlalchemy_json import NestedMutableJson
country_info = db.Column(NestedMutableJson, nullable=True)
country_info = {"name": "UK", "code": "11"}
How to query the user table based on a country_info key.
POSTGRES Query
SELECT * FROM user WHERE country_info ->> 'name' = 'UK'
Does any SqlAlchemy way give the same query result?
I tried several ways, example:
Way 1:
User.query.filter(User.country_info['name'].astext == 'UK').all()
Error:
Operator 'getitem' is not supported on this expression
Way 2:
User.query.filter(User.country_info.op('->>')('name') == 'UK').all()
Issue:
Always getting an empty response
I'm wondering if the issue caused by the column definition db.Column(NestedMutableJson, nullable=True)
I'm avoiding using db.session.execute("SELECT * FROM user WHERE country_info ->> 'name' = 'UK'").fetchall(). looking for something else
simply user the text which allows you to write plain text filter inside orm process, it works like get function for a dict, and it can handle None fields also.
User.query.filter(text("COALESCE(user.country_info ->> 'name', '') = 'UK'")).all()
note that the name of the table should be the real name in the database
Take a look at the JSON datatype documentation.
You should be able to do use this filter clause
select(CountryInfo).filter(CountryInfo.country_info['name'].astext == "UK")
You can use .op('->>') to use the PostgreSQL operator ->> in the following way:
from sqlalchemy import Column, Integer, create_engine, String, select
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy.dialects.postgresql import JSON
from sqlalchemy_json import NestedMutableJson
dburl = 'postgresql://...'
Base = declarative_base()
class CountryInfo(Base):
__tablename__ = 'country_info'
id = Column(Integer, unique=True, nullable = False, primary_key=True)
name = Column(String)
country_info = Column(NestedMutableJson, nullable=True)
def __repr__(self):
return f'CountryInfo({self.name!r}, {self.country_info!r})'
engine = create_engine(dburl, future=True, echo=True)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
with Session(engine) as session:
test = CountryInfo(name='test', country_info={"name": "UK", "code": "11"})
test2 = CountryInfo(name='test2', country_info={"name": "NL", "code": "12"})
test3 = CountryInfo(name='test3', country_info={"name": "UK", "code": "13"})
session.add(test)
session.add(test2)
session.add(test3)
session.commit()
stmt = select(CountryInfo).filter(CountryInfo.country_info.op('->>')('name') == "UK")
query = session.execute(stmt).all()
for row in query:
print(row)
This results in the following SQL:
SELECT country_info.id, country_info.name, country_info.country_info
FROM country_info
WHERE (country_info.country_info ->> %(country_info_1)s) = %(param_1)s
with {'country_info_1': 'name', 'param_1': 'UK'}
Which results in:
(CountryInfo('test', {'name': 'UK', 'code': '11'}),)
(CountryInfo('test3', {'name': 'UK', 'code': '13'}),)

How to redefine tables with the same name in SQLAlchemy using Classical Mapping

I am using SQLAlchemy classical mapping to define a table with the same name but different columns depending on the db, I have mapped the class as it's explained on docs, but I am getting errors every single time I try to redefine the class for another database. For instance:
from sqlalchemy import (Table, MetaData, String, Column)
from sqlalchemy.orm import mapper
class MyTable(object):
def __init__(self, *args, **kwargs):
[setattr(self, k, v) for k, v in kwargs.items()]
default_cols = (
Column('column1', String(20), primary_key=True),
Column('column2', String(20))
)
def myfunc1():
engine = create_engine('connection_to_database1')
session = sessionmaker(bind=engine)()
metadata = MetaData()
mytable = Table('mytable', metadata, *default_cols)
mapper(MyTable, mytable)
metadata.create_all(bind=engine)
def myfunc2():
engine = create_engine('connection_to_database2')
session = sessionmaker(bind=engine)()
metadata = MetaData()
columns = list(default_cols) + [Column('column3', String(20))]
mytable = Table('mytable', metadata, *columns)
mapper(MyTable, mytable)
metadata.create_all(bind=engine)
myfunc1()
myfunc2()
The error I get:
Column object 'column1' already assigned to Table 'mytable'
How is this happening if I am using completely different instances of MetaData and engines? Is there a way to achieve this?
Using the default_cols variable was actually the problem, seems like this kind of setup doesn't work unless the columns are defined individually on each function:
def myfunc1():
engine = create_engine('connection_to_database1')
session = sessionmaker(bind=engine)()
metadata = MetaData()
mytable = Table('mytable', metadata,
Column('column1', String(20), primary_key=True),
Column('column2', String(20))
)
mapper(MyTable, mytable)
metadata.create_all(bind=engine)
def myfunc2():
engine = create_engine('connection_to_database2')
session = sessionmaker(bind=engine)()
metadata = MetaData()
columns = [
Column('column1', String(20), primary_key=True),
Column('column2', String(20),
Column('column3', String(20))
]
mytable = Table('mytable', metadata, *columns)
mapper(MyTable, mytable)
metadata.create_all(bind=engine)
Otherwise it will raise the Exception:
Column object 'column1' already assigned to Table 'mytable'
I couldn't reproduce your error. To get the code to work I had to swap the order of the Mapper arguments and add primary keys to the tables definitions. More significantly perhaps, I had to set one of the mappers as non-primary after getting this error:
sqlalchemy.exc.ArgumentError: Class '<class '__main__.MyTable'>' already has a primary
mapper defined. Use non_primary=True to create a non primary Mapper. clear_mappers()
will remove *all* current mappers from all classes.
from sqlalchemy import Table, MetaData, String, Column, create_engine, Integer
from sqlalchemy.orm import mapper
class MyTable(object):
def __init__(self, *args, **kwargs):
[setattr(self, k, v) for k, v in kwargs.items()]
def myfunc1():
engine = create_engine("mysql+pymysql:///test")
metadata = MetaData()
mytable = Table(
"mytable111",
metadata,
Column("id", Integer, primary_key=True),
Column("column1", String(20)),
Column("column2", String(20)),
)
mapper(MyTable, mytable)
metadata.create_all(bind=engine)
def myfunc2():
engine = create_engine("postgresql+psycopg2:///test")
metadata = MetaData()
mytable = Table(
"mytable111",
metadata,
Column("id", Integer, primary_key=True),
Column("column1", String(20)),
Column("column2", String(20)),
Column("column3", String(20)),
)
mapper(MyTable, mytable, non_primary=True)
metadata.create_all(bind=engine)
myfunc1()
myfunc2()
Using Python3.8, SQLAlchemy 1.3.10.

Cannot Insert into SQL using PySpark, but works in SQL

I have created a table below in SQL using the following:
CREATE TABLE [dbo].[Validation](
[RuleId] [int] IDENTITY(1,1) NOT NULL,
[AppId] [varchar](255) NOT NULL,
[Date] [date] NOT NULL,
[RuleName] [varchar](255) NOT NULL,
[Value] [nvarchar](4000) NOT NULL
)
NOTE the identity key (RuleId)
When inserting values into the table as below in SQL it works:
Note: Not inserting the Primary Key as is will autofill if table is empty and increment
INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')
However when creating a temp table on databricks and executing the same query below running this query on PySpark as below:
%python
driver = <Driver>
url = "jdbc:sqlserver:<URL>"
database = "<db>"
table = "dbo.Validation"
user = "<user>"
password = "<pass>"
#import the data
remote_table = spark.read.format("jdbc")\
.option("driver", driver)\
.option("url", url)\
.option("database", database)\
.option("dbtable", table)\
.option("user", user)\
.option("password", password)\
.load()
remote_table.createOrReplaceTempView("YOUR_TEMP_VIEW_NAMES")
sqlcontext.sql("INSERT INTO YOUR_TEMP_VIEW_NAMES VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")
I get the error below:
AnalysisException: 'unknown requires that the data to be inserted have the same number of columns as the target table: target table has 5 column(s) but the inserted data has 4 column(s), including 0 partition column(s) having constant value(s).;'
Why does it work on SQL but not when passing the query through databricks? How can I insert through pyspark without getting this error?
The most straightforward solution here is use JDBC from a Scala cell. EG
%scala
import java.util.Properties
import java.sql.DriverManager
val jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
val jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
// Create a Properties() object to hold the parameters.
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val stmt = connection.createStatement()
val sql = "INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')"
stmt.execute(sql)
connection.close()
You could use pyodbc too, but the SQL Server ODBC drivers aren't installed by default, and the JDBC drivers are.
A Spark solution would be to create a view in SQL Server and insert against that. eg
create view Validation2 as
select AppId,Date,RuleName,Value
from Validation
then
tableName = "Validation2"
df = spark.read.jdbc(url=jdbcUrl, table=tableName, properties=connectionProperties)
df.createOrReplaceTempView(tableName)
sqlContext.sql("INSERT INTO Validation2 VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")
If you want to encapsulate the Scala and call it from another language (like Python), you can use a scala package cell.
eg
%scala
package example
import java.util.Properties
import java.sql.DriverManager
object JDBCFacade
{
def runStatement(url : String, sql : String, userName : String, password: String): Unit =
{
val connection = DriverManager.getConnection(url, userName, password)
val stmt = connection.createStatement()
try
{
stmt.execute(sql)
}
finally
{
connection.close()
}
}
}
and then you can call it like this:
jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
jdbcUrl = "jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sql = "select 1 a into #foo from sys.objects"
sc._jvm.example.JDBCFacade.runStatement(jdbcUrl,sql, jdbcUsername, jdbcPassword)

Read error with spark.read against SQL Server table (via JDBC Connection)

I have a problem in Zeppelin when I try to create a dataframe reading directly from a SQL table. The problem is that I dont know how to read a SQL column with the geography type.
SQL table
This is the code that I am using, and the error that I obtain.
Create JDBC connection
import org.apache.spark.sql.SaveMode
import java.util.Properties
val jdbcHostname = "XX.XX.XX.XX"
val jdbcDatabase = "databasename"
val jdbcUsername = "user"
val jdbcPassword = "XXXXXXXX"
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
Read from SQL
import spark.implicits._
val table = "tablename"
val postcode_polygons = spark.
read.
jdbc(jdbcUrl, table, connectionProperties)
Error
import spark.implicits._
table: String = Lookup.Postcode50m_Lookup
java.sql.SQLException: Unsupported type -158
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:233)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:290)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:290)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:289)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
Adding to thebluephantom answer have you tried changing the type to string as below and loading the table.
val jdbcDF = spark.read.format("jdbc")
.option("dbtable" -> "(select toString(SData) as s_sdata,toString(CentroidSData) as s_centroidSdata from table) t")
.option("user", "user_name")
.option("other options")
.load()
This is the final solution in my case, the idea of moasifk is correct, but in my code I cannot use the function "toString". I have applied the same idea but with another sintaxis.
import spark.implicits._
val tablename = "Lookup.Postcode50m_Lookup"
val postcode_polygons = spark.
read.
jdbc(jdbcUrl, table=s"(select PostcodeNoSpaces, cast(SData as nvarchar(4000)) as SData from $tablename) as postcode_table", connectionProperties)

SQLAlchemy Declarative - schemas in SQL Server and foreign/primary keys

I'm struggling to create tables that belong to a schema in a SQL Server database, and ensuring that primary/foreign keys work correctly.
I'm looking for some examples of code to illustrate how this is done
The ingredients needed for this are __table_args__ and the use of the schema prefix on the ForeignKey
DBSession = sessionmaker(bind=engine)
session = DBSession()
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import relationship
Base = declarative_base()
class Table1(Base):
__tablename__ = 'table1'
__table_args__ = {"schema": 'my_schema'}
id = Column(Integer,primary_key = True)
col1 = Column(String(150))
col2 = Column(String(100))
reviews = relationship("Table2", cascade = "delete")
class Table2(Base):
__tablename__ = 'table2'
__table_args__ = {"schema": 'my_schema'}
id = Column(Integer,primary_key = True)
key = Column(Integer)
col2 = Column(String(100))
key = Column(Integer, ForeignKey("my_schema.table1.id"), index=True)
premise = relationship("Table1")
Base.metadata.create_all(bind=engine)

Resources