We have a SQL cluster in an Azure environment that experienced a failover/recovery incident about a week ago. Since shortly after that, this appears every 30 seconds in the Event Viewer on the primary database node:
Event 60605, Microsoft SQL Server Server Status Reporting
[Error] ConnectivityReportTcpPortUnknown: Could not determine sqlPort for MSSQLSERVER
I'm not 100% certain that it is related to the failover, but it seems so. I've searched and can't find anything on this particular error code. It most certainly reeks of a monitor or related event, as it's pretty consistent in its frequency.
After researching Azure logs (which not only report nothing relating to this event, but nothing about our failover event that was network connectivity related, also!)
I've disabled all third-party monitoring that we have on that node, as well.
I figured with the low response, this must be a bug. After talking with Microsoft, it turns out, in fact, that it is a bug with the Microsoft SQL Server IaaS Agent that runs on Azure VMs. Turns out that the agent handles some of the new Automatic Patching and Automatic Backup features on Azure, but unfortunately, does not support SQL installations that listen on multiple ports (as ours does).
Two oddities, this only started a week ago (and the service was updated recently), and even in Manual Startup mode, it restarts itself when it's off.
Microsoft has confirmed that this indeed is a bug.
Related
I can connect to my database server (Azure hosted) with SQL Server Management Studio, however I am unable to enumerate its databases.
On the Azure portal, I am seeing a similar issue:
The fact is, these databases are up and running. My application continues to function. I am also able to issues queries via SSMS Query window. The main problem here is the inability to enumerate databases on the server. If you have any idea how I might solve this, thank you very much for any advice which you can provide!
This was an issue with the Azure platform. Here is what we received from our hosting provider in response to our inquiry:
SQL and Open-Source Database Service Management Issues - East US - Mitigated (Tracking ID 8K76-LZ8)
Summary of impact: Between approximately 13:30 and 16:30 UTC on 19 May 2020, a subset of customers in East US may have intermittently experienced timeouts and latency issues when processing service management operations - such as create, update, delete - for SQL resources hosted in this region, including Azure SQL Database and open-source databases such as Azure Database for PostgreSQL, Azure Database for MySQL, and Azure Database for MariaDB. Some customers may have also encountered issues or experienced latency when loading database management tools or expanding database resources in SQL Server Management Studio (SSMS). Retries may have been successful.
Preliminary root cause: After an initial investigation, engineers identified that an increased volume of requests had consumed a large number of available private endpoint connections for the SQL control plane in East US, thus leading to failures for subsequent control plane operations in the region.
Mitigation: Once the preliminary root cause was identified, engineering teams terminated connections from the source of the increased load.
Next steps: We apologize for the impact to affected customers. Engineers will continue to investigate the underlying cause and take steps to prevent future occurrences, which includes:
• Further investigating and understanding the source of increased workload on the SQL platform
• Increasing the resiliency of the SQL platform to better account for this type of work volume
[End of Report]
Anyway, I am posting this so that anyone who encounters this issue in the future should know - it may not be your issue. It may be an issue with the Azure hosting platform in which case you will just need to wait until engineers resolve the issue at their end.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a number of VMs on Windows Azure (Iaas) hosting a website. There are a number of load-balanced front-end VMs, all connecting to a single VM with SQL Express. It works well.
However!
I'm getting random restarts across all the VMs. As for the front-end VMs (with IIS), since they are load balanced, the site is not affected and the load balancer adjusts accordingly. But when the VM hosting the database is restarted, the site is down until the DB is up again. It takes < 3min to boot up, but that's still unacceptable if it happens frequently enough. Although the restarts are relatively rare (2 a month per VM), sometimes we get a week with 4 restarts per VM, which gets frustratingly annoying. Not all VMs restart as frequently and I cannot figure out a pattern. Restarts are also unexpected (pull-the-power-cable type of restarts, and not shutdowns). Datacenter is West Europe.
Microsoft emphasises that SLA only covers 2VMs in an availability set, which I can't have for the database VM (and the enterprise SQL edition costs an arm and three legs). Also, SQL Azure isn't an option as the application is very chatty, and the SQL Azure database was being throttled during peak times (though it works super smooth with SQL Express on a Medium VM!).
My question(s):
Is it normal to have so many restarts? Are there other people having the same problem? What is your experience with such an environment on Azure? What can I do to minimise this downtime?
Thanks all!
Is it normal to have so many restarts?
Yes this can happen in a given month, you need to stand up SQL Server in high availability mode to really get this to work.
Yes it does cost an arm and leg. ;(
What is your experience with such an environment on Azure?
Some months are really good some months are bad, depends on your cluster and which datacenter you are in. MS have mixed range our hardware out in there datacenters. That does not mean they are running on old laptops in some datacenters but it does mean in my experience the new datacenters tend to have better kit in them and thus less restarts. I.e we use USA East.
What can I do to minimise this downtime?
High availability with a witness is the only way to give you availability in VM and yes it cost and arm and leg.
Other serious options. Cache Cache ..You should use computer cache, azure cache and try to minmize your calls to the database. This might reduce your chatty app and allow you to step back in SQL Azure, but might give you enough to for the failover to recover back.
Queues Queues would help you application recover and give you user a message of we are working on it.
Use SQL Azure as failover. Data sync using SQL Azure Sync from Premise (Not sure this works with Express) to SQL Azure and write into you app code to pick up the connection error and failover.
Look at using other parts of Azure for parts of your app to reduce your amount of calls coming into SQL , i.e Can you move stuff to table storage ?
HTHS give you some ideas.
Windows Azure Infrastructure Services (IaaS) has only been in General Availability (GA, or production) about 3 weeks, since April 16 (see announcement here). Prior to GA, there was no SLA and you would have seen more frequent OS restarts as various patches were still being applied to the Host OS. Are you saying that this pattern has continued at the same velocity since April 16?
Now that IaaS is GA, I wouldn't expect 4 restarts in a week. That said: there are several reasons you'd see a restart:
Host hardware failure (this takes down all Guest OSs running on that host)
Host software update (and only if requiring a restart of the Host os). Host OS reboots shouldn't be happening at the frequency you're seeing.
Guest OS issues. Here's where things depart from PaaS (web/worker role Cloud Services). In IaaS, there's no Guest OS maintenance done by Azure; this is all in your hands. It's possible to get reboots if installing Windows Updates automatically. Possibly you could be running into an application-level issue causing the box to become unresponsive for a long period of time, resulting in the Azure fabric controller rebooting your box as it thinks it's unhealthy. And... your app could be somehow crashing the box.
If you've ruled out application error and are sure the VMs are in good health at the time they're rebooting, you may need to open a support ticket with Microsoft to help diagnose the issue further.
I am trying to get our DBA's to enable DTC on a cluster of SQL Server 2005. Unfortunately they keep refusing. Their argument that they would need to set up a dedicated host for DTC (Could take months!!) as it is not a matter of ticking a few boxes. Is this true? How intrusive is DTC on a shared environment such as a SQL farm. Do I have an argument against this?
Thanks
Had to tone down the original response your 'DBA' team deserve!
In response to your questions:
Dedicated server - Not at all. Everywhere I've worked with clusters, the DTC service is installed when the cluster is commissioned. Typically it sits in its own resource group or within the cluster group. If in its own group its usually sits on whichever server is hosting the cluster group.
Intrusive? - Absolutely not. It should be installed when the cluster is created, as per MS best practice.
Do you have an argument? - You most certainly do. The links below should cover the why and how for getting it installed:
MSDTC and SQL on a Cluster
Clustered SQL Server do's, dont's and basic warnings
DTC needs to be enabled and running on both sides of the connection. In my organization, it took some research to figure out which four boxes to check and then some hand-holding to get those boxes checked on all db servers, all app servers and most laptops. There's still a couple of hold-out developer laptops... but they're ok as long as they don't write. :)
You should have some driving scenario (such as an atomic multiple database write) to hit the DBA's over the head with. Give them some time to guess at alternatives... then let them know that DTC is the only hammer for this kind of nail.
I'm unsure of the implications of DTC on a SQL farm. I imagine the whole farm could get involved in the transaction if it involves enough data... which can't be a good thing.
I'm running Sql Server Management Studio 2008 on a decent machine. Even if it is the only thing open with no other connections to the database, anything that has to do with the Database Diagram or simple schema changes in a designer take up to 10 minutes to complete and SQL Management Studio is unresponsive during that time. The same SQL code takes less than a second. This entirely defeats the purpose of the designers and diagramers.
------------------
System Information
------------------
Operating System: Windows Vista™ Ultimate (6.0, Build 6001) Service Pack 1 (6001.vistasp1_gdr.080917-1612)
Processor: Intel(R) Core(TM)2 Quad CPU Q6700 # 2.66GHz (4 CPUs), ~2.7GHz
Memory: 6142MB RAM
Please tell me this isn't a WOW64 problem; if it is, I love MS, but step up your 64-bit support in development tools.
Is there anything I can do to get the performance anywhere near acceptable?
Edit:
I've got version 10.0.1600.22 of SQL Server Management Studio installed. Is this not the latest release? I'm sure I installed it from an MSDN CD and I pretty much rely on Windows Update these days. Is there any place I can quickly see what the latest release version number is for tools like this?
Edit:
Every time I go to open a database diagram I get the message "This database does not have one or more of the support objects required to use database diagramming. Do you wish to create them?" I say yes every time. Is this part of the problem? Also, if I press the copy icon, I get the message "Current thread must be set to single thread apartment (STA) mode before OLE calls can be made." Database corruption?
I'm running in a similar environment and not having that problem.
As with any performance problem, you'll have to analyze it a bit - just saying "it takes 10 minutes" give no information on the reason it takes so long, so no information you can use to solve the problem.
Here are some tools to play around with. I'd have mentioned them originally, but "play around" is all I've learned to do with them. I'd recommend you try learning a little about them, which I have not done. http://technet.microsoft.com is a good source on performance issues.
Start with Task Manager, believe it or not. It's been enhanced in Vista and Server 2008, and now has a better Performance tab, and a Services tab. Be sure to click "Show processes from all users", or you'll miss nasty things done by services.
The bottom of the Performance tab has a "Resource Monitor" button. Click it, watch it, learn what it can do for you.
The Resource Monitor is actually part of a larger "Reliability and Performance Monitor" tool in Administrative Tools. Try it. It even includes the new version of perfmon, which will be more useful when you have a better idea what counters to look at.
I will also suggest the Process Explorer and Process Monitor tools from Sysinternals. See http://technet.microsoft.com/en-us/sysinternals/default.aspx.
Do your simple schema changes possibly mean that you're reordering the columns of a table?
In that case, what SQL Management Studio does behind the scenes is create a new table, move all the data from the old table to the newly created table, and then drop the old table.
Thus, if you reorder columns on a table with lots of data, lots of indices or both, you CAN incur a massive amount of "reorganization" work without really realizing it.
Marc
Can you try connecting your SQL Management Studio to a different instance of SQL Server or, better, an instance on a remote machine (and try to make similar changes)?
Are there any entries in the System or Application Event Logs (or SQL logs for that matter)? Have you tried uninstalling and reinstalling SQL Server on your machine? What version of SQL Server (database) are you running?
Lastly, can you open the Activity Monitor successfully? Right click on the server (machine name) - top of the three in the object explorer window - and click on 'Activity Monitor'.
Do you have problems with other software on your machine or only with SQL Server & Management Studio?
When you open SSMS it attempts to validate itself with Microsoft. You can speed this process by performing the second of the recommendations at the following link.
http://www.sql-server-performance.com/faq/sql_server_management_studio_load_time_p1.aspx
Also, are you using the registered servers feature? If so SSMS will attempt to validate all of these.
It seems as though it was a network configuration problem. Never trust a developer (myself) to setup a haphazard domain at his office.
I had my DNS server on my computer pointed to my ISP's (default because the wireless router we're using provided by the ISP doesn't allow me to override the DNS server to my own) instead of my DNS server here, so I have to remember to configure it manually on each computer, which I forgot for this particular computer.
I only discovered it when I tried to connect for the first time to a remote SQL Server instance form this PC. It was trying to resolve to an actual sub-domain of mycompany.com instead of my DNS server's authority of COMPUTERNAME.corp.mycompany.com
I can't say why this was an issue for the designers in SQL Server but not anything else, but my only hypothesis is that when I established a connection to my own computer locally using the computer name instead of "." or "localhost", SQL queries executed immediately, knowing it was local, but the designers still waited for a timeout from the external IP address before trying the local one.
Whatever the explanation is, changing my DNS server for my network card on the local machine to my DNS server's IP made it all work very quickly.
I had a similar issue with mine. Turned out to be some interference with the biometrics login service running on my laptop. Disabled the service and now it works.
After having a period of logshipping failures going unnoticed (due to a stopped SQL Agent on the secondary server) I'm looking at configuring some monitoring.
Having seen the ability to specify a "Monitor Server instance" on the SQL Server 2008 log shipping setup and the relavant MSDN docs (http://msdn.microsoft.com/en-us/library/bb510705.aspx), I'm keen to setup a "Monitor Server". However having trawled MSDN and google I actually can't find any information on how to configure a Monitor Server.
At this point I don't even know what a monitor server is (i.e. is it just another SQL Server instance?)! Also, is this separate from the monitoring modules in SQL Server Management Studio?
Any guidance or pointers to documentation would be much appreciated.
The log shipping monitor is a third server that watches the primary and standby servers and keeps a log of the backups, copies and restores so you can find out what's going on. The benefit of using a monitor is that you still know your status even if one or both of the participating servers goes down. For instance, if the primary server goes down you know how much data you will lose if you fail over to the standby server.
I'm pretty sure you can add a monitor to your existing log shipping setup. This article should help you learn more:
http://technet.microsoft.com/en-us/library/ms190640.aspx
Well the monitor in the sql server what bee see is this Monitor in English, I don't know if i'm right but for me the SQL Monitor is not an Instance, is a Module, and in this post explains what happend with the monitor in the last release, i hoppe this information helps you, and how to configurate i found this information only for the options Activity Monitor
See you