Loading data into Snowflake from shared locations automatically - snowflake-cloud-data-platform

Is it possible to load data into Snowflake from shared locations automatically on a schedule rather than having to manually load the data in.

It sounds like you should be looking at Snowpipe for that. There's no need for schedules: it loads new files automatically as soon as they become available in cloud storage.
See also the continuous loading guide on Snowflake's docs site

First, you should process your excel file using PYthon, then load it into Snowflake.
Use the code below:
from sqlalchemy import create_engine
import pandas as pd
snowflake_username = 'username'
snowflake_password = 'password'
snowflake_account = 'accoutname'
snowflake_warehouse = 'warehouse'
snowflake_database = 'database'
snowflake_schema = 'public'
engine = create_engine(
'snowflake://{user}:{password}#{account}/{db}/{schema}?warehouse
{warehouse}'.format(
user=snowflake_username,
password=snowflake_password,
account=snowflake_account,
db=snowflake_database,
schema=snowflake_schema,
warehouse=snowflake_warehouse,
),echo_pool=True, pool_size=10, max_overflow=20
)
try:
connection = engine.connect()
df_sensor.columns = map(str.upper, df_sensor.columns)
df_sensor.to_sql('tb_equipments'.lower(), con=connection,
schema='public', index=False, if_exists='append', chunksize=16000)
results = connection.execute('select count(1) from
tb_equipments').fetchone()
print('\nTotal de linhas inseridas: ',results[0], '\n')
finally:
connection.close()
engine.dispose()

Related

Unable to create external table in snowflake

I am trying to create external table in snowflake and it fails with the below error.
SQL compilation error: invalid property 'auto_refresh' for 'different storage type from cloud provider'"
Here are the queries which I am trying.
CREATE OR REPLACE EXTERNAL TABLE DEV_EXT_TABLE WITH LOCATION =
#XXX/dev1/metadata/ FILE_FORMAT = (TYPE = PARQUET SKIP_HEADER = 3);
and
CREATE OR REPLACE EXTERNAL TABLE DEV_EXT_TABLE AUTO_REFRESH = TRUE
WITH LOCATION = #XXX/dev1/metadata/ FILE_FORMAT = (TYPE = PARQUET
SKIP_HEADER = 3);
My account is in AWS whereas stage in Google Cloud Platform and this seems to be supported.
https://docs.snowflake.com/en/user-guide/tables-external-auto.html
Also does snowflake supports Auto refresh or not in cross deployments
Regarding the following:
=> My account is in AWS whereas the stage is in Google Cloud Platform and this seems to be supported.
https://docs.snowflake.com/en/user-guide/tables-external-auto.html
The parameter controlling this feature has not been fully rolled out yet, however, documentation implies that the feature is GA which causes confusion.
Please open a case with Snowflake Support to have the parameter enabled to allow cross cloud auto refresh.
Additionally, SKIP_HEADER is the option for CSV type only not available for PARQUET.

How to set the CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly' for elastic queries on Azure SQL?

I am accessing the other database using elastic queries. The data source was created like this:
CREATE EXTERNAL DATA SOURCE TheCompanyQueryDataSrc WITH (
TYPE = RDBMS,
--CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly',
CREDENTIAL = ElasticDBQueryCred,
LOCATION = 'thecompanysql.database.windows.net',
DATABASE_NAME = 'TheCompanyProd'
);
To reduce the database load, the read-only replica was created and should be used. As far as I understand it, I should add the CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly' (commented out in the above code). However, I get only the Incorrect syntax near 'CONNECTION_OPTIONS'
Both databases (the one that sets the connection + external tables, and the other to-be-read-only are at the same server (thecompanysql.database.windows.net). Both are set the compatibility lever SQL Server 2019 (150).
What else should I set to make it work?
The CREATE EXTERNAL DATA SOURCE Syntax doesn't support the option CONNECTION_OPTIONS = 'ApplicationIntent=ReadOnly'. We can't use that in the statements.
If you want achieve that readonly request, the way is that please use the user account which only has the readonly(db_reader) permission to login the external database.
For example:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>' ;
CREATE DATABASE SCOPED CREDENTIAL SQL_Credential
WITH
IDENTITY = '<username>' -- readonly user account,
SECRET = '<password>' ;
CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc
WITH
( TYPE = RDBMS ,
LOCATION = '<server_name>.database.windows.net' ,
DATABASE_NAME = 'Customers' ,
CREDENTIAL = SQL_Credential
) ;
Since the option is not supported, then we can't use it with elastic query. The only way to connect to the Azure SQL data with SSMS is like this:
HTH.

DolphinDB error: SegmentedTable does not support direct access. Please use sql query to retrieve data

dbDir = '/tests/dolphindb/valueDB'
devDir = '/tests/dolphindb/dev.csv'
db = database(dbDir)
dev = db.loadTable(`dev)
saveText(dev, devDir)
I want to export table "dev" as 'csv' file but I encountered this error message:
Execution was completed with exception
SegmentedTable does not support direct access. Please use sql query to retrieve data
I wonder if I have to load all data into memory to export it as 'csv' file.
Yes, the input table for saveText must be a non-partitioned table.

pandas to_sql for MS SQL

I'm trying to save a dataframe to MS SQL that uses Windows authentication. I've tried using engine, engine.connect(), engine.raw_connection() and they all throw up errors:
'Engine' object has no attribute 'cursor', 'Connection' object has no attribute 'cursor', and Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ... respectively.
params = urllib.parse.quote('DRIVER={ODBC Driver 13 for SQL Server};'
'SERVER=server;'
'DATABASE=db;'
'TRUSTED_CONNECTION=Yes;')
engine = create_engine('mssql+pyodbc:///?odbc_connect=%s' % params)
df.to_sql(table_name,engine, index=False)
This will do exactly what you want.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=name_of_your_server;'
r'DATABASE=name_of_your_database;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
Here is an update to my original answer. Basically, this is the old-school way of doing things (INSERT INTO). I recently stumbled upon a super-easy, scalable, and controllable, way of pushing data from Python to SQL Server. Try the sample code and post back if you have additional questions.
import pyodbc
import pandas as pd
engine = "mssql+pyodbc://your_server_name/your_database_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"
... dataframe here...
dataframe.to_sql(x, engine, if_exists='append', index=True)
dataframe is pretty self explanatory.
x = the name yo uwant your table to be in SQL Server.

whoosh search after database migration

I migrated my database from sqlite3 to postgresql with the following three steps
I create a sqlite dump file with
sqlite3 app.db .dump > app_dump.sql
I create a postgresql database and initialized it with my models.py file as
python manage.py db init
where manage.py is
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://appuser:777#localhost/app_db'
migrate = Migrate(app, db)
manager = Manager(app)
manager.add_command('db', MigrateCommand)
if __name__ == '__main__':
manager.run()
I used only the insert lines in the dump file to populate the new database.
This worked well, however I do want to have a whoosh search in my User table.
My models.py looks like this
if sys.version_info >= (3, 0):
enable_search = False
else:
enable_search = True
import flask.ext.whooshalchemy as whooshalchemy
class User(db.Model):
__searchable__ = ['username','id']
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(100), index=True)
if enable_search:
whooshalchemy.whoosh_index(app, User)
However, the search does not seem to work at all. I fear that the table has not been properly indexed? Anybody know how I can fix this?
thanks
carl
EDIT:
Solved it... have a look at
https://gist.github.com/davb5/21fbffd7a7990f5e066c
By default flask_whooshalchemy indexes a record when the sqlalchemy session commits. So if you want to index all non-indexed-data you can try my fork which is called flask_whooshalchemyplus see the manually indexing introduction.

Resources