I am trying to connect to snowflake using the python pandas connector.
I use the anaconda distribution on Windows, but uninstalled the existing connector and pyarrow and reinstalled using instructions on this page: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
I have the following versions
pandas 1.0.4 py37h47e9c7a_0
pip 20.1.1 py37_1
pyarrow 0.17.1 pypi_0 pypi
python 3.7.7 h81c818b_4
snowflake-connector-python 2.2.7 pypi_0 pypi
When running step 2 of this document: https://docs.snowflake.com/en/user-guide/python-connector-install.html, I get: 4.21.2
On attempting to use fetch_pandas_all() I get an error: NotSupportedError: Unknown error
The code I am using is as follows:
import snowflake.connector
import pandas as pd
SNOWFLAKE_DATA_SOURCE = '<DB>.<Schema>.<VIEW>'
query = '''
select *
from table(%s)
LIMIT 10;
'''
def create_snowflake_connection():
conn = snowflake.connector.connect(
user='MYUSERNAME',
account='MYACCOUNT',
authenticator = 'externalbrowser',
warehouse='<WH>',
database='<DB>',
role='<ROLE>',
schema='<SCHEMA>'
)
return conn
con = create_snowflake_connection()
cur = con.cursor()
temp = cur.execute(query, (SNOWFLAKE_DATA_SOURCE)).fetch_pandas_all()
cur.close()
I am wondering what else I need to install/upgrade/check in order to get fetch_pandas_all() to work?
Edit: After posting an answer below, I have realised that the issue is with the SSO (single sign on) with authenticator='externalbrowser'. When using a stand-alone account I can fetch.
I found a workaround that avoids the SSO error by relying on fetchall() instead of fetch_all_pandas():
try:
cur.execute(sql)
all_rows = cur.fetchall()
num_fields = len(cur.description)
field_names = [i[0] for i in cur.description]
finally:
cur.close()
con.close()
df = pd.DataFrame(all_rows)
df.columns = field_names
The reason is snowflake-connector-python does not install "pyarrow" which you need to play with pandas.
Either you could install and Import Pyarrow or
Do :
pip install "snowflake-connector-python[pandas]"
and try it .
What happens when you run this code?
from snowflake import connector
import time
import logging
for logger_name in ['snowflake.connector', 'botocore', 'boto3']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.FileHandler('test.log')
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
from snowflake.connector.cursor import CAN_USE_ARROW_RESULT
import pyarrow
import pandas as pd
print('CAN_USE_ARROW_RESULT', CAN_USE_ARROW_RESULT)
This will output whether CAN_USE_ARROW_RESULT is true and if it's not true, then pandas won't work. When you did the pip install, which of these did you run?
pip install snowflake-connector-python
pip install snowflake-connector-python[pandas]
Also, what OS are you running on?
I have this working now, but am not sure which part helps - the following steps were taken:
Based on comment by #Kirby, I tried pip3 install --upgrade snowflake-connector-python .. this is based on a historic screenshot .. I should have have [pandas] in brackets, i.e. pip3 install --upgrade snowflake-connector-python[pandas], but regardless, I got the following error message:
Error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads
I therefore downloaded (exact filename: vs_buildtools__121011638.1587963829.exe) and installed VS Build Tools.
This is the tricky part .. I subsequently got admin access to my machine (so hoping it is the visual studio build tools that helped, and not admin access)
I then followed the Snowflake Documentation Python Connector API instructions originally referred to:
a. Anaconda Prompt (opened as admin): pip install snowflake-connector-python[pandas]
b. Python:
import snowflake.connector
import pandas as pd
ctx = snowflake.connector.connect(
user=user,
account=account,
password= 'password',
warehouse=warehouse,
database=database,
role = role,
schema=schema)
# Create a cursor object.
cur = ctx.cursor()
# Execute a statement that will generate a result set.
sql = "select * from t"
cur.execute(sql)
# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
df = cur.fetch_pandas_all()
Edit I have since realised that I still have the error when executing df = cur.fetch_pandas_all() when using my Okta (single sign on) account, i.e. when I use my username and authenticator = 'externalbrowser'. When I use a different account, I no longer get the error (with password).
NOTE: That I am still able to connect with externalbrowser (and I see the query has executed successfully in Snowflake history); I am just not able to fetch.
Using ..
python -m pip install "snowflake-connector-python[pandas]"
..as in the docs did not fetch the correct version of pyarrow for me (docs says you need 3.0.x).
With my conda (using python 3.8) I had to manually update pyarrow to the specifiv version:
python -m pip install pyarrow=6.0
Related
I'm using SageMaker Python SDK's SKLearn class which has a parameter called framework version. Its default is 0.20.0.
I want to use a different version of Scikit-learn (0.21.0 or higher). It is possible? How do I know which versions are supported?
I actually couldn't find a nice link that showed this, but if you enter an incorrect framework the error message will print out the available versions.
estimator = SKLearn(
entry_point = "train.py",
source_dir = source_dir,
instance_type='ml.m4.xlarge',
role = role,
framework_version='0.30.0',
py_version = 'py3'
)
ValueError: Unsupported sklearn version: 0.30.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer sklearn versions.
Supported sklearn version(s): 0.20.0, 0.23-1.
I am trying to update to pandas==0.25.1 on my MS SQL Server 2019.
import sqlmlutils
connection = sqlmlutils.ConnectionInfo(server=SERVER_NAME, database=DATABASE_NAM)
sqlmlutils.SQLPackageManager(connection).install('pandas', True, '0.25.1')
which successfully installs and updates pandas:
>>> Installing dependencies...
>>> Done with dependencies, installing main package...
>>> Installing pandas version: 0.25.1
However, when I execute a python script with sp_execute_external_script command
EXEC sp_execute_external_script #language = N'Python',
#script = N'
import pandas as pd
print(pd.__version__)
'
I get the following output:
>>> STDOUT message(s) from external script:
>>> 0.23.4
i.e., that the instance is using pandas==0.23.4 rather than pandas==0.25.1.
Why is this? Is there a method for using pandas==0.25.1 within MS SQL Server 2019?
cmd (as administrator) (also stopped launchpad service in sql configuration manager, maybe not needed ?)
navigate to C:\Program Files\Microsoft SQL Server\xyz\PYTHON_SERVICES\condabin>
and type: conda install pandas=0.25.1
after package download and validation you'll be asked [y\n] confirmation for installation
if you get an ssl error, you'll need to install openssl for windows.
I had the same issue. I try using sqlmlutils, pip, conda for install online, offline but pandas still version 0.23.4, each installs always success.
One thing, you can install the new package, but you can not upgrade it.
In my case is seaborn. I had installed it with version 0.9, and trying to upgrade to 0.10, but can not upgrade.
It seems SQL server ML services don't allow for upgrade packages.
I'm trying to use the code below.
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
// Aquire a DataFrame collection (val collection)
val config = Config(Map(
"url" -> "mysqlserver.database.windows.net",
"databaseName" -> "MyDatabase",
"dbTable" -> "dbo.Clients"
"user" -> "username",
"password" -> "*********"
))
import org.apache.spark.sql.SaveMode
collection.write.mode(SaveMode.Append).sqlDB(config)
The script is from this link.
https://github.com/Azure/azure-sqldb-spark
I'm running this in a databricks environment. I'm getting these errors:
command-836397363127942:5: error: object sqlDB is not a member of package com.microsoft.azure
import com.microsoft.azure.sqlDB.spark.connect._
^
command-836397363127942:4: error: object sqlDB is not a member of package com.microsoft.azure
import com.microsoft.azure.sqlDB.spark.config.Config
^
command-836397363127942:7: error: not found: value Config
val bulkCopyConfig = Config(Map(
^
command-836397363127942:18: error: value sqlDB is not a member of org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row]
df.write.mode(SaveMode.Append).sqlDB(bulkCopyConfig)
I'm guessing that some kind of library is not installed correctly. I Googled for an answer, but didn't find anything useful. Any idea how to make this work? Thanks.
If you are getting the sqldb error means all other support libraries already imported to your notebook and only the latest JAR with dependencies are missing.
As per the repro, I got the same error message as shown above:
After bit of research, I had found that you will experience this error due to missing JAR with dependencies.
To resolve this issue, you need to download the JAR file from here: https://search.maven.org/artifact/com.microsoft.azure/azure-sqldb-spark/1.0.2/jar
After downloaded the jar file, upload the JAR library into the cluster and install it.
Note: After installing both the libraries, make sure to restart the cluster.
Now, you will be able to run the command successfully.
I think you have missed the library
If your using Maven Build Add the following library in pom.xml
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-sqldb-spark</artifactId>
<version>1.0.2</version>
</dependency>
If your using SBT Build Add the following library in build.sbt
libraryDependencies += "com.microsoft.azure" % "azure-sqldb-spark" % "1.0.2"
Have you imported and installed the library in DataBricks?
I found it easiest to import the library using Maven. See this answer: How to install a library on a databricks cluster using some command in the notebook?
Note: You need to install the Library on your cluster and then restart the cluster before you can use it.
I wanted to do a realtime deployment of my model on azure, so I plan to create an image which firsts queries an ID in azure SQL db to get the required features, then predicts using my model and returns the predictions. The error I get from PyODBC library is that drivers are not installed
I tried it on the azure ML jupyter notebook to establish the connection and found that no drivers are being installed in the environment itself. After some research i found that i should create a docker image and deploy it there, but i still met with the same results
driver= '{ODBC Driver 13 for SQL Server}'
cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password+';Encrypt=yes'+';TrustServerCertificate=no'+';Connection Timeout=30;')
('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC
Driver 13 for SQL Server' : file not found (0) (SQLDriverConnect)")
i want a result to the query instead i get this message
and/or you could use pymssql==2.1.1, if you add the following docker steps, in the deployment configuration (using either Environments or ContainerImages - preferred is Environments):
from azureml.core import Environment
from azureml.core.environment import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package('pymssql==2.1.1')
myenv = Environment(name="mssqlenv")
myenv.python.conda_dependencies=conda_dep
myenv.docker.enabled = True
myenv.docker.base_dockerfile = 'FROM mcr.microsoft.com/azureml/base:latest\nRUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc'
myenv.docker.base_image = None
Or, if you're using the ContainerImage class, you could add these Docker Steps
from azureml.core.image import Image, ContainerImage
image_config = ContainerImage.image_configuration(runtime= "python", execution_script="score.py", conda_file="myenv.yml", docker_file="Dockerfile.steps")
# Assuming this :
# RUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc
# is in a file called Dockerfile.steps, it should produce the same result.
See this answer for more details on how I've done it using an Estimator Step and a custom docker container. You could use this Dockerfile to locally create a Docker container for that Estimator step (no need to do that if you're just using an Estimator run outside of a pipeline) :
FROM continuumio/miniconda3:4.4.10
RUN apt-get update && apt-get -y install freetds-dev freetds-bin gcc
RUN pip install Cython
For more details see this posting :using estimator in pipeline with custom docker images. Hope that helps!
Per my experience, I think the comment as #DavidBrowne-Microsoft said is right.
There is a similar SO thread I am getting an error while connecting to an sql DB in Jupyter Notebook answered by me, which I think it will help you to install the latest msodbcsql driver for Linux on Microsoft Azure Notebook or Docker.
Meanwhile, there is a detail about the connection string for Azure SQL Database which you need to carefully note, that you should use {ODBC Driver 17 for SQL Server} instead of {ODBC Driver 13 for SQL Server} if your Azure SQL Database had been created recently (ignore the connection string shown in Azure portal).
you can use AzureML built in solution dataset to connect to your SQL server.
To do so, you can first create an azure_sql_database datastore. reference here
Then create a dataset by passing the datastore you created and the query you want to run.
reference here
sample code
from azureml.core import Dataset, Datastore, Workspace
workspace = Workspace.from_config()
sql_datastore = Datastore.register_azure_sql_database(workspace = workspace,
datastore_name = 'sql_dstore',
server_name = 'your SQL server name',
database_name = 'your SQL database name',
tenant_id = 'your directory ID/tenant ID of the service principal',
client_id = 'the Client ID/Application ID of the service principal',
client_secret = 'the secret of the service principal')
sql_dataset = Dataset.Tabular.from_sql_query((sql_datastore, 'SELECT * FROM my_table'))
You can also do it via UI at ml.azure.com where you can register an azure SQL datastore using your user name and password.
I want to Deploy my Flask app but the problem I am facing is with databases. I am using MySQL database. I want to use an online MySQL database for which I am using website www.freemysqlhosting.net I Have created the tables, But Now I am not getting how to use that servers credentials in my Flask app.
Kindly please Help...
You need mysql database connector for python
import MySQLdb
db = MySQLdb.connect(
host="hostname", # your host
user="username", # your username
passwd="password", # your password
db="testdb") # database name
you can flask-sqlalchemy package:
SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://username:password#host/database_name'
Before the code you need install * MySQL connector for Python Flask
and mysqldb
Use : pip install flask
Use : pip install flask_mysqldb
Use : pip install pip install flask-mysql
Use : pip install MySQL-python
Use : pip install mysql-connector-python
from flask import Flask,request,redirect,url_for,render_template
import mysql, MySQLdb, mysql.connector
# CONNECTION
DB = mysql.connector.connect(
host="localhost",
port="3306",
user="username",
password="password",
database="database",
auth_plugin='mysql_native_password'
)
# EXAMPLE INSERT INTO table users
conexao = DB.cursor()
SQL_COMMAND = "INSERT INTO users(name, email) VALUES (%s, %s)"
VAL = [
('MAURICIO', 'P'),
('NETO', 'N'),
('ARIEL', 'A')
]
conexao.executemany(SQL_COMMAND, VAL)
DB.commit()
conexao.close()
DB.close()
if __name__ == "__main__":
app.run(debug=True)