Joining tables across database boundaries using Snowflake SQL alchemy connector - snowflake-cloud-data-platform

Using snowflake-sqlalchemy, is there a way to use declarative base to join tables across database boundaries. e.g.
# This table is in database1
meta = MetaData(schema="Schema1")
Base = declarative_base(metadata=meta)
class Table1(Base):
__tablename__ = 'Table1'
...
# This table is in database2
meta = MetaData(schema="Schema2")
Base = declarative_base(metadata=meta)
class Table2(Base):
__tablename__ = 'Table2'
...
# I want to do this...
session.query(Table1).join(Table2).filter(Table1.id > 1).all()
###
# The engine specifies database1 as the default db, as such the query builder
# assumes Table2 is in database1.
The account specified in the engine connection params has access to both databases. I would prefer not to use raw sql for this.. for reasons.

Related

How to change column type in SQLAlchemy table?

I'm copying table from SQL Server to Firebird. I have a column of type BIT in SQL Server, but Firebird doesn't know this type. How can I change the type of the column to create table in my Firebird database?
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import Session
# Get table from SQL Server
source_engine = create_engine(connection_url_source)
dest_engine = create_engine(connection_url_dest)
metadata = MetaData()
table = Table('my_table', metadata, autoload=True, autoload_with=source_engine, schema='my_schema')
session = Session(source_engine)
query =session.query(table)
# Create table in firebird database
new_metadata = MetaData(bind=dest_engine)
columns = [Column(desc['name'], desc['type']) for desc in query.column_descriptions]
column_names = [desc['name'] for desc in query.column_descriptions]
table_new = Table("my_table", new_metadata, *columns)
table_new.create(dest_engine)
Here I receive the error:
raise exception sqlalchemy.exc.CompileError: (in table 'my_table',
column 'my_column'): Compiler <sqlalchemy_firebird.base.FBTypeCompiler
object at 0x00000061ADAC8D60> can't render element of type BIT

How to set the CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly' for elastic queries on Azure SQL?

I am accessing the other database using elastic queries. The data source was created like this:
CREATE EXTERNAL DATA SOURCE TheCompanyQueryDataSrc WITH (
TYPE = RDBMS,
--CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly',
CREDENTIAL = ElasticDBQueryCred,
LOCATION = 'thecompanysql.database.windows.net',
DATABASE_NAME = 'TheCompanyProd'
);
To reduce the database load, the read-only replica was created and should be used. As far as I understand it, I should add the CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly' (commented out in the above code). However, I get only the Incorrect syntax near 'CONNECTION_OPTIONS'
Both databases (the one that sets the connection + external tables, and the other to-be-read-only are at the same server (thecompanysql.database.windows.net). Both are set the compatibility lever SQL Server 2019 (150).
What else should I set to make it work?
The CREATE EXTERNAL DATA SOURCE Syntax doesn't support the option CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly'. We can't use that in the statements.
If you want achieve that readonly request, the way is that please use the user account which only has the readonly(db_reader) permission to login the external database.
For example:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>' ;
CREATE DATABASE SCOPED CREDENTIAL SQL_Credential
WITH
IDENTITY = '<username>' -- readonly user account,
SECRET = '<password>' ;
CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
( TYPE = RDBMS ,
LOCATION = '<server_name>.database.windows.net' ,
DATABASE_NAME = 'Customers' ,
CREDENTIAL = SQL_Credential
) ;
Since the option is not supported, then we can't use it with elastic query. The only way to connect to the Azure SQL data with SSMS is like this:
HTH.

How to deploy PostgreSQL schema to Google Cloud SQL database using Pulumi?

I'm trying to initialize a managed PostgreSQL database using Pulumi. The PostgreSQL server itself is hosted and managed by Google Cloud SQL, but I set it up using Pulumi.
I can successfully create the database, but I'm stumped how to actually initialize it with my schemas, users, tables, etc. Does anyone know how to achieve this?
I believe I need to use the Postgres provider, similar to what they do for MySQL in this tutorial or this example. The below code shows what I have so far:
# Create database resource on Google Cloud
instance = sql.DatabaseInstance( # This works
"db-instance",
name="db-instance",
database_version="POSTGRES_12",
region="europe-west4",
project=project,
settings=sql.DatabaseInstanceSettingsArgs(
tier="db-g1-small", # Or: db-n1-standard-4
activation_policy="ALWAYS",
availability_type="REGIONAL",
backup_configuration={
"enabled": True,
}
),
deletion_protection=False,
)
database = sql.Database( # This works as well
"db",
name="db",
instance=instance.name,
project=project,
charset="UTF-8",
)
# The below should create a table such as
# CREATE TABLE users (id uuid, email varchar(255), api_key varchar(255);
# How to tell it to use this SQL script?
# How to connect it to the above created PostgreSQL resource?
postgres = pg.Database( # This doesn't work
f"users",
name="users",
is_template=False,
)
Here is sample code with an explanation on how we set everything up including create/delete table with Pulumi.
The code will look like this:
# Postgres https://www.pulumi.com/docs/reference/pkg/postgresql/
# provider: https://www.pulumi.com/docs/reference/pkg/postgresql/provider/
postgres_provider = postgres.Provider("postgres-provider",
host=myinstance.public_ip_address,
username=users.name,
password=users.password,
port=5432,
superuser=True)
# creates a database on the instance in google cloud with the provider we created
mydatabase = postgres.Database("pulumi-votes-database",
encoding="UTF8",
opts=pulumi.ResourceOptions(provider=postgres_provider)
)
# Table creation/deletion is via pg8000 https://github.com/tlocke/pg8000
def tablecreation(mytable_name):
print("tablecreation with:", mytable_name)
create_first_part = "CREATE TABLE IF NOT EXISTS"
create_sql_querty = "(id serial PRIMARY KEY, email VARCHAR ( 255 ) UNIQUE NOT NULL, api_key VARCHAR ( 255 ) NOT NULL)"
create_combined = f'{create_first_part} {mytable_name}{create_sql_querty}'
print("tablecreation create_combined_sql:", create_combined)
myconnection=pg8000.native.Connection(
host=postgres_sql_instance_public_ip_address,
port=5432,
user=postgres_sql_user_username,
password=postgres_sql_user_password,
database=postgres_sql_database_name
)
print("tablecreation starting")
cursor=myconnection.run(create_combined)
print("Table Created:", mytable_name)
selectversion = 'SELECT version();'
cursor2=myconnection.run(selectversion)
print("SELECT Version:", cursor2)
def droptable(table_to_drop):
first_part_of_drop= "DROP TABLE IF EXISTS "
last_part_of_drop= ' CASCADE'
combinedstring = f'{first_part_of_drop} {table_to_drop} {last_part_of_drop}'
conn=pg8000.native.Connection(
host=postgres_sql_instance_public_ip_address,
port=5432,
user=postgres_sql_user_username,
password=postgres_sql_user_password,
database=postgres_sql_database_name
)
print("droptable delete_combined_sql ", combinedstring)
cursor=conn.run(combinedstring)
print("droptable completed ", cursor)
After the 1st time of bringing the infrastructure up via pulumi up -y, you can uncomment the following code block in __main__.py and then add the configs for the postgressql server via cli and then run pulumi up -y
create_table1 = "votertable"
creating_table = tablecreation(create_table1)
print("")
create_table2 = "regionals"
creating_table = tablecreation(create_table2)
print("")
drop_table = "table2"
deleting_table = droptable(drop_table)
The settings for the table are in the Pulumi.dev.yaml file and are set via pulumi config set

How to Insert Data into table with select query in Databricks using spark temp table

I would like to insert the results of a Spark table into a new SQL Synapse table using SQL within Azure Data Bricks.
I have tried the following explanation [https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-datasource] but I'm having no luck.
The Synapse table must be created as a result of a SELECT statement. The source should be a Spark / Data Bricks temporary view or Parquet source.
e.g. Temp Table
# Load Taxi Location Data from Azure Synapse Analytics
jdbcUrl = "jdbc:sqlserver://synapsesqldbexample.database.windows.net:number;
database=SynapseDW" #Replace "suffix" with your own
connectionProperties = {
"user" : "usernmae1",
"password" : "password2",
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
pushdown_query = '(select * from NYC.TaxiLocationLookup) as t'
dfLookupLocation = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
dfLookupLocation.createOrReplaceTempView('NYCTaxiLocation')
display(dfLookupLocation)
e.g. Source Synapse DW
Server: synapsesqldbexample.database.windows.net
Database:[SynapseDW]
Schema: [NYC]
Table: [TaxiLocationLookup]
Sink / Destination Table (not yet in existence):
Server: synapsesqldbexample.database.windows.net
Database:[SynapseDW]
Schema: [NYC]
New Table: [TEST_NYCTaxiData]
SQL Statement I tried:
%sql
CREATE TABLE if not exists TEST_NYCTaxiLocation
select *
from NYCTaxiLocation
limit 100
If you use the com.databricks.spark.sqldw driver, then you will need a Azure Storage Account and a Container already setup. Once this is in place is is actually very easy to achive this.
Configure your BLOB Credentials in Azure Databricks, I go with the in Notebook approach
Create your JDBC Connection String and BLOB
Read your SELECT Statement into and RDD/Dataframe
Push Dataframe down to Azure Synapse using the .write function
CONFIGURE BLOB CREDENTIALS
spark.conf.set(
"fs.azure.account.key..blob.core.windows.net",
"")
CONFIGURE JDBC AND BLOB PATH
jdbc = "jdbc:sqlserver://.database.windows.net:1433;database=;user=#;password=;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
blob = "wasbs://#.blob.core.windows.net/"
READ DATA FROM SYNAPSE INTO DATAFRAME
df = spark.read
.format("com.databricks.spark.sqldw")
.option("url", jdbc)
.option("tempDir", blob)
.option("forwardSparkAzureStorageCredentials", "true")
.option("Query", "SELECT TOP 1000 * FROM <> ORDER BY NEWID()")
.load()
WRITE DATA FROM DATAFRAME BACK TO AZURE SYNAPSE
df.write
.format("com.databricks.spark.sqldw")
.option("url", jdbc)
.option("forwardSparkAzureStorageCredentials", "true")
.option("dbTable", "YOURTABLENAME")
.option("tempDir", blob)
.mode("overwrite")
.save()
another option besides #JPVoogt solution is to use CTAS in the Synapse pool after you've created your parquet files in the storage account. You could either do the copy command or external tables.
Some references:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-cetas
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/quickstart-bulk-load-copy-tsql

May I have a join between tables in different databases while using Sequel?

I'm building an integration software between two different systems. I have Clients in a database, and Groups in another. One Client can be in multiple groups, and one group can have multiple clients. So, I created an intermediate table named as clients_groups to represent this relation.
My models are so:
DB1 = Sequel.connect(adapter: 'postgresql'...)
DB2 = Sequel.connect(adapter: 'tinytds'...)
class Client < Sequel::Model
set_dataset DB1[:clients]
many_to_many :groups, left_ley: :client_id, right_key: :group_id, join_table: :clients_groups, left_primary_key_column: :id
end
class Group < Sequel::Model
set_dataset DB2[:groups]
many_to_many :clients, left_ley: :group_id, right_key: :client_id, join_table: :clients_groups, right_primary_key_column: :id
end
This statement works:
Client.last.groups
While this throws an error:
Group.last.clients # => Sequel::DatabaseError: TinyTds::Error: Invalid object name 'CLIENTS_GROUPS'
What am I doing wrong?

Resources