Ok Not sure what is going on here. I have runaway queries that won't cancel. I have one query to select all rows from a table that only has 250 rows and is 1.5KB in size. It's been running for 30 minutes right now and it should only take a few ms.
I've tried canceling by hitting the abort button on the worksheet, going into history and selecting the query and hitting abort, aborting based on the query ID via SQL, and aborting based on the session ID via SQL.
Ironically whenever I try to abort via SQL it shows that the queries have been terminated and then they still show as running, I wait a few minutes and re run the query and it again shows as terminated but they still are running.
I also tried loggin out and logging back in again and am seeing all kinds of weird errors:
Internal Error: Unable to retrieve the current roles.
Error
Problem with your MFA Enrollment: There was an issue with your enrollment
process. Please try again.
Worksheet Not Loaded
I have no idea what is going on but it seems like everywhere I turn there is an issue. Any assistance would be greatly appreciated.
Try logging completely out, close the browser, reboot your machine, and start from there. Here my guess:
Sometimes the query history (which I assume is where you were seeing things still running), needs a browser refresh, but based on MFA errors, refreshing your browser appears to have you logged out of your SAML/MFA process.
Once you successfully login, you'll likely see that the query had completed already before you even tried to cancel it.
If that isn't the case, and you are still seeing issues, then we'd probably need more information, or a quick call to Snowflake Support will walk you through things. My guess is this is all a display issue on your browser/UI, rather than something going wonky with Snowflake.
Related
We've been using Snowflake+DBT with key pair authentication for a long time now, and we've never had any issues.
Recently, we started getting random connection errors on some models:
250001 (08001): Failed to connect to DB: account.region.snowflakecomputing.com:443. JWT token is invalid.
Most of the models will work, but some of them might fail. This can happen to any model, and it's never the same one -- sometimes it's the very first one, sometimes it's the very last one, and sometimes it's a bunch of them. It happens in runs with lots of models, or in runs with a single model.
It's very inconsistent, and there doesn't seems to be any kind of pattern to it. Sometimes it works, and sometimes it doesn't.
We've also tried with 1, 4, or 8 threads, and it happens regardless.
Obviously, there's nothing wrong with the credentials or configurations — otherwise, nothing would run at all. So I assume there must be something wrong with how DBT is handling the connection(s).
Interestingly, the errors only happens locally (so far). We haven't seen it in DBT Cloud runs.
DBT version is 0.20.2 in both cases. We tried with other versions (0.21.0, 0.20.0 & 0.19.1), and the issue persists. I don't know why we're just encountering this, as we've used these other versions previously without any issues.
It's similar to this question, except in our case it doesn't happen consistently at all. We tried connecting "without a region" (using Snowflake Organizations), but it doesn't make any difference:
250001 (08001): Failed to connect to DB: organization-account.snowflakecomputing.com:443. JWT token is invalid.
Is there anything we can do to resolve this?
EDIT: When the error happens, the model hangs for 60 seconds until the error appears.
EDIT 2: I think the error might have started happening when we started using the DBT-provided Docker images. Not sure exactly what might be wrong with them, but we'll try going back to our own custom images and see if that works.
This has since stopped happening. 🤷♂️
Possibly because of this change from Snowflake:
"An improvement has been added to Snowflake's Cloud Services layer to relax the validity restrictions."
Also, we are now using different Docker containers, so that might have something to do with it as well:
First, we switched to xemuliam/dbt, from Docker Hub.
But since that is no longer maintained, we are now using the official DBT Docker images from GitHub (which were not available at the time the question was posted):
dbt-core
dbt-postgres
dbt-snowflake
We have a strange situation at a client site, and I wonder if anyone's seen this happen and can shed some light.
A bunch of SSRS reports disappeared.
Reporting Server logs actually showed a DeleteItemAction entry, but it was done by the service account that runs SSRS, so there was no one we could go and ask "did you delete".
Users discovered the missing reports the next morning, and we could see repeated attempts to access them. The strange bit is:
Some attempts ended in Permission Denied - that was a surprise, but could have simply been people trying different accounts because they couldn't see the report on their usual account
Most attempts ended in Report not found - it's more what we expected to see, and seemed to support that the reports really weren't there anymore
Then mid-day, the log showed that Reporting services was shutting down, then starting up. Shortly after that we see users accessing the reports successfully. All is well now, but we are left with a mystery as to what happened. Was there malicious action, or could this somehow have happened spontaneously.
We were unable to find any events indicating that the reports were being restored, created, recreated, etc. between when they were gone and when they were suddenly back.
How could this happen?
We have a Practice Management system that is about 15 years old. I've been working on it for about 12, and I've never encountered this problem until just recently, and we can't figure it out.
It is written in VB6 which uses ADO/JET to access an Access .mdb file on the network. The application opens the connection when it starts, keeps it open while it's open, and closes it when it exits. It does a LOT of stuff with the DB - the system deals with Patient Accounts, Charges, Payments, Scheduling Appointments, and about a million other things. We have dozens of clients that use this program, each with their own DB and most of them 'offsite', where they have their own server, some number of workstations, between 1 and 20 users, pounding away at the system 24 hours a day, 7 days a week, and except for the occasional DB field bug or having to compact/repair DBs, it's pretty stable.
About 3 weeks ago, we started seeing a problem that we hadn't seen before.
We have an 'in-house' system setup for us to use: the DB is on our Server which houses maybe 10 other DBs, and only a couple of people connect in to this system.
We started noticing that if someone logged into the system, went straight to our Scheduler screen, and then sat idle for about 5-10 minutes, they might get "Disk or Network Error", or "Cannot find the input table or query X on database Y".
What's strange is that it SEEMS to happen only when 2 or more people are logged into that DB from different computers, and then one of them will get the error (Randomly?) but the other 2 users will be fine.
There is a Timer on the 'main' MDI Parent form of the system which wakes up about ever minute (there are some things which will change the interval to a shorter interval, and there are some things which disable the timer, but we don't think either of them are happening in this situation). It performs a pretty basic SQL Query on the DB: SELECT loggedin FROM Users WHERE UserId = 'DBUPDATER' THIS is the SQL Query that seems to be ALWAYS triggering one of those errors.
There is also a timer that runs about every 2 minutes while a user is logged in to check for emails and a few other things that would be running during this time.
And since they are on the Scheduler screen, there's a timer on there that runs every 30s or so which will check the DB to see if any changes to the scheduler have been made to see if it needs to refresh the screen or not.
There are some other strange things:
The DB and the Users table seem to be perfectly fine. When the person who gets the error logged into the system - usually only 5 minutes earlier - the system HAD to look at the Users table to mark them as Logged in, and I'm almost 100% sure that the query that it's dying on had been run at least once - probably 4-5 times before it dies.
Once a user gets this error, if they leave the Error's MessageBox on the screen (not quit the .exe), if they try to access that DB file in any way, they get "Disk or Network error" - this includes if they try to open that DB in Access. HOWEVER, they can still get to the Network just fine, and even open up other mdb files IN THE SAME FOLDER as the one they can't open. OTHER PEOPLE on other computers can open that mdb without any errors. Once the user acknowledges the Error, and allow the exe to close, they can open that DB file again just fine.
I'm told that this is not in anyway a Network issue. Our IT guy says he's run pings and traces and all sorts of tests to ensure that the network connection is not getting dropped.
I've also run some things on the DB to make sure it's not corrupted and it seems to be fine - and we get it to happen on other DBs.
If anyone has seen anything like this and knows a possible fix, I would greatly appreciate it! Thanks!
New Information (9/5/13 - 10:30am)
We noticed that at the same time we get this error, in the errorlog on the computer that gets the error is a Warning Event that says:
{Delayed Write Failed} Windows was unable to save all the data for the file \MEDTECHSERVER\MEDTECH DATA\VB\SCHEDTEST2\MEDTECH.MDB; the data has been lost. This error was returned by the server on which the file exists. Please try to save this file elsewhere.
I am experiencing a performance bottle-neck in this website: http://oceanosdecolor.es/ and I'm not able to find it. If you try, you'll see any page (for example, homepage) takes a long time to load.
The first time you execute a page, the site reloads to detect client device, but that's only the first time and then it keeps client device so it doesn't reload again.
I log traces of the execution to database but I don't get useful information as, according to the log, the execution of the whole homepage happens in the same 1 second, but I can see that the homepage takes more time to load.
The IIS log (when trying locally) doesn't help as this also gives information in seconds, not miliseconds, and again, it says everything happens in the same second, and anyway running locally is much faster than on the server.
So, I ask for help in any tool to monitor performance with more accuracy or any technique I could use.
Thank you
I think your answer might not lie with IIS or SQL Server. According to the Developer Tools in Chrome, your actual page execution and sending out the HTML takes 400ms on first load from my location. The problem is you have a tangle of CSS files (many of which are not being found and causing extremely long delays). Also you have a lot of requests.
I would install Yahoo's YSlow for your favourite browser. This will give you a whole bunch of recommendations for what is running slow on your site from an end-user perspective.
To use the Developer Tools on Chrome: right click on your page, hit "Inspect Element" and then go to the "Network" tab and then hard-refresh your browser (shift-F5).
A few of the problems I see are: commun.css (2.5 seconds and failed), layout.css (400ms and failed), jquery-ui-1.8.10.custom.min.js (800ms and failed).
Find the reason for these failing and fix it and I'm sure your site will load faster. Also try use CSS image sprites wherever possible to cut down on the number of requests.
I know how to set up a job to alert when it's running.
But I'm writing a job which is meant to run many times a day, and I don't want to be bombarded by emails, but rather I'd like a solution where I get an alert when the job hasn't been executed for X minutes.
This can be acheived by setting the job to alert on execution, and then setting up some process which checks for these alerts, and warns when no such alert is seen for X minutes.
I'm wondering if anyone's already implemented such a thing (or equivalent).
Supporting multiple jobs with different X values would be great.
The danger of this approach is this: suppose you set this up. One day you receive no emails. What does this mean?
It could mean
the supposed-to-be-running job is running successfully (and silently), and so the absence-of-running monitor job has nothing to say
or alternatively
the supposed-to-be-running job is NOT running successfully, but the absence-of-running monitor job has ALSO failed
or even
your server has caught fire, and can't send any emails even if it wants to
Don't seek to avoid receiving success messages - instead devise a strategy for coping with them. Because the only way to know that a job is running successfully is getting a message which says precisely this.