How to configure Apache Ignite in Redash - database

I need to configure Apache ignite in Redash for BI dashboard but couldn't figure out how to do the same since there is no direct support for ignite in Redash.

It is possible using Python query runner, which is available for stand-alone installs only. It allows you to run arbitrary Python code, in which you can query Apache Ignite via JDBC.
First, add redash.query_runner.python query runner to settings.py:
and install Python JDBC bridge module together with dependencies:
sudo apt-get install python-setuptools
sudo apt-get install python-jpype
sudo easy_install JayDeBeApi
Then after VM restart you should add Python data source (you might need to tweak module path):
And then you can actually run the query (you will need to provide Apache Ignite core JAR and JDBC connection string as well):
import jaydebeapi
conn = jaydebeapi.connect('org.apache.ignite.IgniteJdbcThinDriver', 'jdbc:ignite:thin://localhost', {}, '/home/ubuntu/.m2/repository/org/apache/ignite/ignite-core/2.3.2/ignite-core-2.3.2.jar')
curs = conn.cursor()
curs.execute("select c.Id, c.CreDtTm from TABLE.Table c")
data = curs.fetchall()
result = {"columns": [], "rows": []}
add_result_column(result, "Id", "Identifier", "string")
add_result_column(result, "CreDtTm", "Create Date-Time", "integer")
for d in data:
add_result_row(result, {"Id": d[0], "CreDtTm": d[1]})
Unfortunately there's no direct support for JDBC in Redash (that I'm aware of) so all that boilerplate is needed.

Related

Snowflake Python Pandas Connector - Unknown error using fetch_pandas_all

I am trying to connect to snowflake using the python pandas connector.
I use the anaconda distribution on Windows, but uninstalled the existing connector and pyarrow and reinstalled using instructions on this page: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
I have the following versions
pandas 1.0.4 py37h47e9c7a_0
pip 20.1.1 py37_1
pyarrow 0.17.1 pypi_0 pypi
python 3.7.7 h81c818b_4
snowflake-connector-python 2.2.7 pypi_0 pypi
When running step 2 of this document: https://docs.snowflake.com/en/user-guide/python-connector-install.html, I get: 4.21.2
On attempting to use fetch_pandas_all() I get an error: NotSupportedError: Unknown error
The code I am using is as follows:
import snowflake.connector
import pandas as pd
SNOWFLAKE_DATA_SOURCE = '<DB>.<Schema>.<VIEW>'
query = '''
select *
from table(%s)
LIMIT 10;
'''
def create_snowflake_connection():
conn = snowflake.connector.connect(
user='MYUSERNAME',
account='MYACCOUNT',
authenticator = 'externalbrowser',
warehouse='<WH>',
database='<DB>',
role='<ROLE>',
schema='<SCHEMA>'
)
return conn
con = create_snowflake_connection()
cur = con.cursor()
temp = cur.execute(query, (SNOWFLAKE_DATA_SOURCE)).fetch_pandas_all()
cur.close()
I am wondering what else I need to install/upgrade/check in order to get fetch_pandas_all() to work?
Edit: After posting an answer below, I have realised that the issue is with the SSO (single sign on) with authenticator='externalbrowser'. When using a stand-alone account I can fetch.
I found a workaround that avoids the SSO error by relying on fetchall() instead of fetch_all_pandas():
try:
cur.execute(sql)
all_rows = cur.fetchall()
num_fields = len(cur.description)
field_names = [i[0] for i in cur.description]
finally:
cur.close()
con.close()
df = pd.DataFrame(all_rows)
df.columns = field_names
The reason is snowflake-connector-python does not install "pyarrow" which you need to play with pandas.
Either you could install and Import Pyarrow or
Do :
pip install "snowflake-connector-python[pandas]"
and try it .
What happens when you run this code?
from snowflake import connector
import time
import logging
for logger_name in ['snowflake.connector', 'botocore', 'boto3']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.FileHandler('test.log')
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
from snowflake.connector.cursor import CAN_USE_ARROW_RESULT
import pyarrow
import pandas as pd
print('CAN_USE_ARROW_RESULT', CAN_USE_ARROW_RESULT)
This will output whether CAN_USE_ARROW_RESULT is true and if it's not true, then pandas won't work. When you did the pip install, which of these did you run?
pip install snowflake-connector-python
pip install snowflake-connector-python[pandas]
Also, what OS are you running on?
I have this working now, but am not sure which part helps - the following steps were taken:
Based on comment by #Kirby, I tried pip3 install --upgrade snowflake-connector-python .. this is based on a historic screenshot .. I should have have [pandas] in brackets, i.e. pip3 install --upgrade snowflake-connector-python[pandas], but regardless, I got the following error message:
Error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads
I therefore downloaded (exact filename: vs_buildtools__121011638.1587963829.exe) and installed VS Build Tools.
This is the tricky part .. I subsequently got admin access to my machine (so hoping it is the visual studio build tools that helped, and not admin access)
I then followed the Snowflake Documentation Python Connector API instructions originally referred to:
a. Anaconda Prompt (opened as admin): pip install snowflake-connector-python[pandas]
b. Python:
import snowflake.connector
import pandas as pd
ctx = snowflake.connector.connect(
user=user,
account=account,
password= 'password',
warehouse=warehouse,
database=database,
role = role,
schema=schema)
# Create a cursor object.
cur = ctx.cursor()
# Execute a statement that will generate a result set.
sql = "select * from t"
cur.execute(sql)
# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
df = cur.fetch_pandas_all()
Edit I have since realised that I still have the error when executing df = cur.fetch_pandas_all() when using my Okta (single sign on) account, i.e. when I use my username and authenticator = 'externalbrowser'. When I use a different account, I no longer get the error (with password).
NOTE: That I am still able to connect with externalbrowser (and I see the query has executed successfully in Snowflake history); I am just not able to fetch.
Using ..
python -m pip install "snowflake-connector-python[pandas]"
..as in the docs did not fetch the correct version of pyarrow for me (docs says you need 3.0.x).
With my conda (using python 3.8) I had to manually update pyarrow to the specifiv version:
python -m pip install pyarrow=6.0

Connecting Presto and Apache SuperSet

I have presto and apache superset hosted at GCP Cloud.
Presto server hosted at http://14.22.122.12:8088/ui/
But when i try to connect Presto to Superset it's giving me this error
Could not load database driver: presto
Already installed the presto driver using pip install pyhive.
Not sure what's wrong here ? My SQLAlchemyURL presto://14.22.122.12:8088/catalog_name
This error usually appears when the driver isn't available (so the connection string format isn't recognized).
I would confirm that the pip install happened correctly. If you're running everything within docker, make sure you rebuilt properly. You can enter the docker cli, fire up a Python shell, and try to import the presto driver (you can also try pip freeze in the docker cli).

Trying to query Azure SQL Database with Azure ML / Docker Image

I wanted to do a realtime deployment of my model on azure, so I plan to create an image which firsts queries an ID in azure SQL db to get the required features, then predicts using my model and returns the predictions. The error I get from PyODBC library is that drivers are not installed
I tried it on the azure ML jupyter notebook to establish the connection and found that no drivers are being installed in the environment itself. After some research i found that i should create a docker image and deploy it there, but i still met with the same results
driver= '{ODBC Driver 13 for SQL Server}'
cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password+';Encrypt=yes'+';TrustServerCertificate=no'+';Connection Timeout=30;')
('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC
Driver 13 for SQL Server' : file not found (0) (SQLDriverConnect)")
i want a result to the query instead i get this message
and/or you could use pymssql==2.1.1, if you add the following docker steps, in the deployment configuration (using either Environments or ContainerImages - preferred is Environments):
from azureml.core import Environment
from azureml.core.environment import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package('pymssql==2.1.1')
myenv = Environment(name="mssqlenv")
myenv.python.conda_dependencies=conda_dep
myenv.docker.enabled = True
myenv.docker.base_dockerfile = 'FROM mcr.microsoft.com/azureml/base:latest\nRUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc'
myenv.docker.base_image = None
Or, if you're using the ContainerImage class, you could add these Docker Steps
from azureml.core.image import Image, ContainerImage
image_config = ContainerImage.image_configuration(runtime= "python", execution_script="score.py", conda_file="myenv.yml", docker_file="Dockerfile.steps")
# Assuming this :
# RUN apt-get update && apt-get -y install freetds-dev freetds-bin vim gcc
# is in a file called Dockerfile.steps, it should produce the same result.
See this answer for more details on how I've done it using an Estimator Step and a custom docker container. You could use this Dockerfile to locally create a Docker container for that Estimator step (no need to do that if you're just using an Estimator run outside of a pipeline) :
FROM continuumio/miniconda3:4.4.10
RUN apt-get update && apt-get -y install freetds-dev freetds-bin gcc
RUN pip install Cython
For more details see this posting :using estimator in pipeline with custom docker images. Hope that helps!
Per my experience, I think the comment as #DavidBrowne-Microsoft said is right.
There is a similar SO thread I am getting an error while connecting to an sql DB in Jupyter Notebook answered by me, which I think it will help you to install the latest msodbcsql driver for Linux on Microsoft Azure Notebook or Docker.
Meanwhile, there is a detail about the connection string for Azure SQL Database which you need to carefully note, that you should use {ODBC Driver 17 for SQL Server} instead of {ODBC Driver 13 for SQL Server} if your Azure SQL Database had been created recently (ignore the connection string shown in Azure portal).
you can use AzureML built in solution dataset to connect to your SQL server.
To do so, you can first create an azure_sql_database datastore. reference here
Then create a dataset by passing the datastore you created and the query you want to run.
reference here
sample code
from azureml.core import Dataset, Datastore, Workspace
workspace = Workspace.from_config()
sql_datastore = Datastore.register_azure_sql_database(workspace = workspace,
datastore_name = 'sql_dstore',
server_name = 'your SQL server name',
database_name = 'your SQL database name',
tenant_id = 'your directory ID/tenant ID of the service principal',
client_id = 'the Client ID/Application ID of the service principal',
client_secret = 'the secret of the service principal')
sql_dataset = Dataset.Tabular.from_sql_query((sql_datastore, 'SELECT * FROM my_table'))
You can also do it via UI at ml.azure.com where you can register an azure SQL datastore using your user name and password.

i installed apache 2 but in sudo ufw app list there is no apache applications in the app list

I installed apache2 by the command
sudo apt install apache2
and when i type sudo ufw app list
there are no applications related to apache like apache full etc...
It is showing like this
Available applications:
CUPS
OpenSSH
Postfix
Postfix SMTPS
Postfix Submission
Can any one explain me what is the exact reason of this?
thanks in advance
For some reason the Apache2 UFW Profile is missing from etc/ufw/application.d So you have to create one for yourself. No Panic it's easy just create a new text file using this command:
gedit /etc/ufw/applications.d/apache2-utils.ufw.profile and copy this inside and save.
[Apache]
title=Web Server
description=Apache v2 is the next generation of the omnipresent Apache web server.
ports=80/tcp
[Apache Secure]
title=Web Server (HTTPS)
description=Apache v2 is the next generation of the omnipresent Apache web server.
ports=443/tcp
[Apache Full]
title=Web Server (HTTP,HTTPS)
description=Apache v2 is the next generation of the omnipresent Apache web server.
ports=80,443/tcp

Apache Zeppelin 0.7.0-SNAPSHOT not working with external Spark

I am trying to use Zeppelin (0.7-0 snapshot compiled with mvn clean package -Pcassandra-spark-1.6 -Dscala-2.11 -DskipTests)
with an external, standalone Spark of version 1.6.1
I have tried to set this up by entering export MASTER=spark://mysparkurl:7077 in /zeppelin/conf/zeppelin-env.sh
and under the %spark interpeter settings, through the Zeppelin GUI I have also tried to set the master-parameter to spark://mysparkurl:7077.
So far, attempts to connect to Spark have been unsuccessful. Here is a piece of code I have used for testing Zeppelin with external spark and the error I get with it:
%spark
val data = Array(1,2,3,4,5)
val distData = sc.parallelize(data)
val distData2 = distData.map(i => (i,1))
distData2.first
data: Array[Int] = Array(1, 2, 3, 4, 5)
Java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
Zeppelin is running in a docker container, and Spark is running on host.
Am I missing something here? Is there something else that needs to be configured in order for Zeppelin to work with an external, standalone Spark?
As Cedric H. mentions, at that time you have to compile Apache Zeppelin with -Dscala-2.10.
Few bugs have been fixed since Sept and Scala 2.11 support should be working now well, if not - please file an issue in official project JIRA.

Resources