Snowflake JDBC ResultSet with more than 1000 rows not reaching Client - snowflake-cloud-data-platform

Our Application is fetching data From Snowflake privatelink account using JDBC queries , the app is running behind a restricted firewall and proxy , When we run SnowCD it shows many URL are blocked , but if we pass proxy information in snowcd then it succesfully pass all test.
Now when we run our app to connect snowflake and execute queries , those queries which returns small data executes but those who returns large data (3000 rows+) goes in waiting , and after long wait timeout error comes.
Same queries works when data is small.
net.snowflake.client.jdbc.SnowflakeChunkDownloader : Timeout waiting for the download of #chunk0
From this stackoverflow discussion I came to know, that when snowflake JDBC execute a small resultset the response comes directly , if its a large resultset a separate request goes to Internal stage (aws s3) and that url is different than snowflake account url , and if proxy is there this might create problem. Private link don't need proxy parameters but STAGE url's need proxy.
But when i am trying proxy properties in JDBC Url and Tomcat level as well there is no difference, it's not working.
I didn't found any proper Snowflake Documentation to explain this Large ResultSet vs Small Result Set behavior.

Related

Query triangulation

I have the following usage pattern that I'm wondering if there's a known way to deal with it.
Let's say I have a website where a user can build a query to run it against the remote database. The remote database is secure and the user will not have access to it. Therefore, the query, what will be something like: SELECT * FROM myTable will be sent to our web server, and our web server will query the remote DB on another server, receive the results and pass them back in the HTTP response. So, the flow is:
Location1 (Europe): User/browser submits HTTP POST containing the SQL Query.
Location2 (US): HTTP Server receives request, runs SQL against database:
Location3 (Asia): Database runs query, returns data
Location2 (US): HTTP Server receives SQL resultset back. Sends response.
Location1 (Europe): User/browser receives the data back in the rendered webpage.
Supposing that I don't have control of the three locations, we can see that there may be a lot of data transfer latency if the size of the resultset is large. I was wondering if there is any way to do something like the following instead, and if so how it could be done:
Location1 (Europe): User/browser submits HTTP POST containing the SQL Query.
Location2 (US): HTTP Server receives request, sends back QueryID immediately, runs SQL against database, asynchronously.
Location3 (Asia) Database runs query
Location1 (Europe): User/browser receives response from database. (How? It cannot pull directly from DB)
To summarize, if we imagine the resultset is 50MB in size, in the first case, the 50MB would go from:
Asia (DB) -> US (Server) -> Europe (Client)
and in the second case it would go from:
Asia (DB) -> Europe (Client)
You can decouple authentication with authorization to allow more flexible connections between all three entities: Browser, HTTP server, and DB.
To make your second example work you could do:
The HTTP server (US) submits asynchroneously the query to the DB (Asia) and requests a auth token for it.
The HTTP server (US) sends the auth token back to the browser (Europe), while the query is now running.
The browser (Europe) now initiates a second HTTP call against the DB (Asia) using the auth token, and maybe the queryID as well.
The DB will probably need to implement a simple token auth protocol. It should:
Authenticate the incoming auth token.
Retrieve the session.
Start streaming the query result set back to the caller.
For the DB server, there are plenty of out-of-the-box slim docker images you can spin in seconds that implement authorization server and that can listen to the browser using nginx.
As you can see the architecture can be worked out. However, the DB server in Asia will need to be revamped to implement some kind of token authorization. The simplest and widespread strategy is to use OAuth2, that is all the rage nowadays.
Building on #TheImpalers answer:
How about add another table to your remote DB that is just for retrieving query result?
When client asks the backend service for database query, the backend service will generate a UUID or other secure token and tell the DB to run the query and store it under the given UUID. The backend service also returns the UUID to the client who can then retrieve the associated data from the DB directly.
TLDR:
Europe (Client) -> US (Server) -> Asia (Server) -> Asia (DB)
Open a HTTP server in Asia (if not don't have access to same DC/server - rent a different one), then re-direct request from HTTP US -> HTTP Asia, which will connect to local DB & stream the response.
Redirect can either be a public one (302) or a private proxying over VPN if you care about latency & have such possibility.
Frontend talking to DB directly is not a very good pattern, because you can't do any middleware operations that you'll need in a long term (breaking changes, analytics, authorization, redirects, rate-limiting, scalability...)
If your SQL is very heavy & you can't do sync requests with long-lasting TCP connections, set up streaming over websocket server (also in Asia).

Filtering SQL profiler web services entries

Is there a way to filter SQL profiler data to show only data from current user/session?
I tried using the LoginName or SessionLoginName filters, but the problem is that most of the calls are made by the application's web service and I see no indication who called this service.
SQL Server does not have the context of the end client when multiple tiers are involved so there is no trace column you can filter on to identify requests originating from a specific end client session. The easiest method is to trace in an isolated test environment with a single client.
If the web service has an end client session context identifier, the service could specify the client session id as the Application Name in the connection string so that you can filter on a specific client session. However, that should generally be done only in a test environment since a separate connection pool is created for each unique connection string.

No response from server when calling stored procedure in entity framework remotely

I have a Silverlight web app being hosted on Azure vm on IIS. I have another VM that is hosting SQL Server along with other application that the web app interacts with. I have set up a LAN between these two VMs using Azure Virtual Network. The vm that is running IIS and hosting web app is also my domain controller and I am using windows authentication to authenticate users in this web app. The application makes use of entity framework to execute some stored procedures.
Overall everything seem to work fine when the web app is accessed both locally and remotely, but there is one particular stored procedure that takes around 25 minutes to execute that does not seem to behave properly when executed remotely via this web app. Please keep in mind that this 25 mins is actually the expected time because this stored procedure is dealing with millions of records.
So here is the problem I am having. When this stored procedure is executed from SQL Server Management Studio, it completes in around 25 minutes. When I execute this stored procedure from the web app locally by making use of the internal ip of the server that is hosting the app, it competes in around 25 mins, the server sends the response back and the app updates its status. But when I execute via web app remotely using server's public ip, although the stored procedure executes and completes within 25 mins on the server, the app never gets response back from the server, keeps staying on busy status.
I know the stored procedure executes because I am tracking it in the database server directly and also using sql profiler to track any open connections from entity framework, which is there even after the stored procedure completes execution.
I am also using fiddler to track all HTTP traffics and here is what I see.
While executing locally via web app using internal ip
while executing remotely via web app using public ip
All other stored procedures take few seconds to execute and they seem to work both locally and remotely. I am not sure if Azure's endpoint mapping has anything to do with this. When executing this stored procedure I set commandtimeout property to 0.
Any help would be very appreciated!
image links:
local execution: http://i.imgur.com/NGNre3T.png
remote execution: http://i.imgur.com/haB3fwm.png
This could be an idle timeout by the Azure load balancer. Have a look over this page to see if it sheds any light on the problem. Using the latest version of Azure Powershell, you can check the current timeout setting with:
Get-AzureVM –ServiceName “your-service-name” –Name “your-vm-name” | Get-AzureEndpoint
Depending on your setup, you might be able to set this up using:
Set-AzurePublicIP –PublicIPName "the-ip-name" –VM “your-vm-name” -IdleTimeoutInMinutes 15

Web Application retrieving datas from SQL Server 2005

I have the follow (and bizzare) situation:
My web application loads some datas (by executing 1 proc) in 8 seconds
My proc, when executed directly on SQL, load in 1 seconds.
Im pretty sure there is no looping in both.
My question is:
Could bad IIS configuration cause this?
Tks.
you might wanna put some word about how ur connecting to your database from your web aplication... Ofc. you get slower results from your web aplication due to your need to connect to DM, read context, transfer it, and cloase conn where on other hand in sql server ur allready connected and data is just there.
But in your case, it is kind of strange to have 800% diference from your aplication and direct sql.
btw, does your application runs on your pc same as sql server or is it on some remote server?

How to check if an JPA/hibernate database is up with second-level caching

I have a JSP/Spring application using Hibernate/JPA connected to a database. I have an external program that check if the web server is up every 5 minutes.
The program call a specific URL to check if the web server is still running. The server returns "SUCCESS". Obviously if the server is now, nothing is returned. The request timesout and an alert is raised to inform the sysadmin that something is wrong...
I would like to add another layer to this process: I would like the server to return "ERROR" if the database server is down. Is there a way using Hibernate to check if the database server is alive and well?
What I tought to do was to take an object and try to save it. This would work, but I think it's probably too much for what I want. I could also read(load) an object from the database. But since we use second-level caching for all our objects, the object will be loaded from the cache, and not the database.
What I'm looking for is something like:
HibernateUtils.checkDatabase()
Does such a function exist in Hibernate?
You could use a native query, e.g.
Query query = sess.createNativeQuery("select count(*) from mytable").setCacheable(false);
BigDecimal val = (BigDecimal) query.getSingleResult();
That should force hibernate to use a connection from the pool. If the db is down, I don't know exactly what will be the error returned by hibernate given that it can't get the connection from the pool. If the db is up, you get the result of your query.

Resources