Can Presto server connect to Snowflake directly or only via starburst? - snowflake-cloud-data-platform

I am trying to find a snowflake connector for Presto but unable to find any in Presto documentation.All searches are showing connection via starburst only.

Currently, Starburst only. But there is ongoing PR.
https://github.com/prestosql/presto/pull/2551

Related

Debezium SQL Server Connector - "Couldn't obtain database name"

I'm trying to set up a Debezium SQL Server Connector against a SQL Server instance that is controlled by DBAs at my workplace. I've been able to start up Zookeeper and Kafka Server without issue, and Kafka Connect itself works with sample Connectors, but when attempting to start a Debezium SQL Server Connector instance I've been getting the error "Couldn't obtain database name".
[2022-07-12 16:36:04,269] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 1 error(s):
Unable to connect. Check this and other connection properties. Error: Couldn't obtain database name
Here is my debezium config:
name=Dbz-SqlServer-connector
connector.class=io.debezium.connector.sqlserver.SqlServerConnector
database.hostname=MyDbHost
database.port=1433
database.user=MyUsername
database.password=MyPassword
database.dbname=MyDatabase
database.server.name=MyDbHost
table.include.list=dbo.CdcTest
database.history.kafka.bootstrap.servers=localhost:9092
database.history.kafka.topic=dbhistory.CdcTest
I've tried this in a .properties file passed to a standalone Connect instance, and as a JSON POST to a distributed Connect instance. I have tried all of the same steps on both my local Windows machine as well as on a linux VM, with the same results.
Confluent and Docker are not options for me in this situation.
for SQL Server login credentials, I am using a local account on the SQL Server instance that does have access to the database in question. I found the source code for debezium's connectors on their github and was able to find that specific error message within the code:
private static final String GET_DATABASE_NAME = "SELECT name FROM sys.databases WHERE name = ?";
...
public String retrieveRealDatabaseName(String databaseName) {
try {
return prepareQueryAndMap(GET_DATABASE_NAME,
ps -> ps.setString(1, databaseName),
singleResultMapper(rs -> rs.getString(1), "Could not retrieve exactly one database name"));
}
catch (SQLException e) {
throw new RuntimeException("Couldn't obtain database name", e);
}
}
I'm not completely familiar with Java but it appears that basically something is going wrong when the connector is trying to run "SELECT name FROM sys.databases WHERE name = 'MyDatabase'". When I run this against the database myself, logged in with the same account I'm using, it seems to work just fine, so I'm really not sure where to go from here. It is fair to say that since I'm not in full control of the SQL Server environment that I'm using, there may be some permissions issues that I'm not aware of, but from what I'm able to test it seems like it should be working.
I would greatly appreciate any help at all, whether just suggestions on settings/configs to check or a full-blown solution.
Thank you!
Update: I've built a simple console app to run that sys.databases query against MyDbHost, master database, as the relevant account, and it's working just fine so I feel like that confirms that my connection info is correct and account permissions are also correct. Seems like this is an issue within the Debezium connector.
It turned out that my problem was a mistake in the connector's config setting. I misunderstood which specific pieces of data to put into database.hostname and database.server.name, and one I corrected those fields the connector works.

How to get snowflake host and port number to create connection in SAP Analytics Cloud?

The SAP Analytics Cloud's Snowflake Connector needs these details for setting up a Snowflake connection
[
How can I get these details from Snowflake?
I'm trying to follow this guide
It appears that you're attempting to configure SAP Analytics Cloud's Snowflake Connector.
The host and port of your Snowflake account (also known as its deployment URL) can be taken from the URL you use to connect to Snowflake's Web UI. Here's an example:
For the above URL, the input in the Server field of the form will be mzf0194.us-west-2.snowflakecomputing.com:443 (the 443 port number is the default HTTPS port that Snowflake serves on).
Or alternatively, if you have access to any other Snowflake connected application (such as SnowSQL, etc.) that lets you run a SQL query, run the following to extract it:
select t.value:host || ':443' snowflake
from table(flatten(parse_json(system$whitelist()))) t
where t.value:type = 'SNOWFLAKE_DEPLOYMENT';
An example output that carries the host/port:
+---------------------------------------------+
| SNOWFLAKE |
|---------------------------------------------|
| p7b41m.eu-west-1.snowflakecomputing.com:443 |
+---------------------------------------------+
If you're uncertain about what these all mean, you'll need to speak to other, current Snowflake users or administrators in your organization.

Azure Serverless SQL Serverless Database

I Created SQL Server Database in Azure which is serverless and tried to access it using my SQL Server Management Studio in my local but I couldn't get it work.
It always gives me this message:
I tried to whitelist also my IP in Azure but still I get the same result.
Is there a possible way to make it connect?
Is the database currently online or paused?
I'll repeat the text from #David Browne's link:
If a serverless database is paused, then the first login will resume the database and return an error stating that the database is unavailable with error code 40613. Once the database is resumed, the login must be retried to establish connectivity. Database clients with connection retry logic should not need to be modified.
So;
Assuming the database is paused, this is normal operation
Please read docs
You need to retry after the database starts OR manually pre-start it using the Powershell provided in the link below
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-serverless#connectivity
And yes, you also need to whitelist your IP address as you have already done.
Obviously this flavour of SQL is unsuitable for some types of applications - there is more information in the link - I suggest you read the whole thing.

Google Data Studio MySql data source connection does not exist Error

Platform: Google Data Studio
Data Source: MySQL
Connection was working before,
meaning no issues with credentials.
All of a sudden, getting the below error:
All IPs have been whitelisted from the google data studio list of ips.
The only thing that comes to mind is a limitation of GDS to process data.
The data source table has around 200K+ rows.
Not sure what is the limitation for GDS with MySQL.
There's no indication anywhere.
Anyone out there can help to solve this or maybe provide some info would be appreciated.
Thanks
If you use a firewall, be sure to double check the Google ip adresses. They may have added new ips (in my case, the last one was missing).
Check them here !
After doing so, I had to change the Host name of the connection to the database for a url alias (www.yourserver.com <- url pointing on your server), and change it back to the IP to make it work.
Sounds like a the connector cannot establish a new connection.
Cloud SQL Connector:
At the time of writing this, the connector seems unable to establish a new connection once the existing one has timed out and modifying the JDBC url to include query parameters gives you an error when authenticating.
This is probably due to the connector appending it's own parameters.
(Seems to be a possible bug here when a connection no longer exists)
MySQL Connector (with IP Address):
This connector allows you to add query parameters to the JDBC url. Enable SSL and append useSSL=true to the url.
e.g.jdbc:mysql://<ip>/<database>?useSSL=true
This worked as expected and establishes new connections when required.
Example Source Setup
Suffering from this issue too, my experience is that using the MySQL connector instead of the Cloud SQL Connector provides better stability in combination with setting wait_timeout to a value above 12 hours.
This issue has been reported on the official Google Data Studio bug tracker. Please vote them up if you are also suffering from this !
🐛 130205306 MySQL connection does not exist Apr 9, 2019 04:36PM
🐛 118470083 Data source password not stored for MySQL sources. Oct 26, 2018 01:24PM

How to start Titan graph server and connect with gremlin?

I've been playing with Titan graph server for a while now. And my feeling is that, despite an extensive documentation, there is a lack of Getting started from scratch tutorial.
My final goal is to have a titan running on cassandra and query with StartTheShift/thunderdome.
I have seen few ways of starting Titan:
Using Rexster
from this link, I was able to run a titan server with the following steps:
download rexster-server 2.3
download titan 0.3.0
copy all files from titan-all-0.3.0/libs to rexster-server-2.3.0/ext/titan
edit rexster-server-2.3.0/rexster.xml and add (between a ):
<graph>
<graph-name>geograph</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-read-only>false</graph-read-only>
<graph-location>/Users/vallette/projects/DATA/gdb</graph-location>
<properties>
<storage.backend>local</storage.backend>
<storage.directory>/Users/vallette/projects/DATA/gdb</storage.directory>
<buffer-size>100</buffer-size>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
for a berkeleydb
or:
<graph>
<graph-name>geograph</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-location></graph-location>
<graph-read-only>false</graph-read-only>
<properties>
<storage.backend>cassandra</storage.backend>
<storage.hostname>77.77.77.77</storage.hostname>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
for a cassandra db.
launch the server with ./bin/rexster.sh -s -c rexster.xml
dowload rexster console and run it with bin/rexster-console.sh
you can now connect to your graph with g = rexster.getGraph("geograph")
The problem with this method is that you are connected via rexster and not gremlin so you do not have autocompletion. The advantage is that you can name your database (here geograph).
Using Titan server with cassandra
start the server with ./bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties
create a file called cassandra.local with
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
start titan gremlin and connect with g = TitanFactory.open("cassandra-es.local")
this works fine.
Using titan server with BerkeleyDB
From this link:
download titan 0.3.0
start the server with ./bin/titan.sh config/titan-server-rexster.xml config/titan-server-berkeleydb.properties
launch titan gremlin: ./bin/gremlin.sh
but once I try to connect to the database (graph) in gremlin with g = TitanFactory.open('graph') it creates a new database called graph in the directory I'm in. If i execute this where my directory (filled) is I get:
Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager
Could someone clarify these process, and tell me what I'm doing wrong.
Thanks
According to the documentation TitanFactory.open() takes either the name of a config file or the name of a directory to open or create a database in.
If what steven says is true, there would be two ways to connect to the database with a BerkelyDB backend:
Start the database through bin/titan.sh. Connect to the database through the rexster console.
DON'T start the database using bin/titan.sh. Use the gremlin console instead: TitanFactory.open("database-location"). This will open the database. But this does not have a rexster server. Nothing else will be able to access the database but the gremlin console.
With Titan Server/BerkeleyDB, you should attempt to connect via RexPro or REST (Thunderdome should connect over REST). You can't open another Titan-based connection to BerkeleyDB as Titan Server already owns that.
This is different than Titan Server/Cassandra where connectivity occurs over RexPro or REST, but also through embedded Cassandra which enables connectivity over thrift via TitanFactory.open('graph')
It's also possible to access Titan from python using these two libraries:
https://github.com/StartTheShift/thunderdome
and
https://github.com/espeed/bulbs.
Thunderdome is currently Titan-specific, and bulbs is generic. A (possibly biased) comparison between Thunderdome and Bulbs is given on Thunderdome's wiki: https://github.com/StartTheShift/thunderdome/wiki/Bulbs-VS-thunderdome
If you need autocompletion, you can use iPython and enable autocompletion in your iPython config.

Resources