How to fix Maintenance Plans backup error "BackupIoRequest::ReportIoError: read failure on backup device ..." in mssql to shared folder - sql-server

My backup job in Maintenance Plans has an error (Please check error encountered below) which already 3 days straight.
Job configured as follows:
(Step 1): Backup the DBs to a shared folder of Server A
(Step 2): If Step 1 Success, It will copy the backups from Server A to the shared folder of Server B.
As per error from event viewer the error stopped at Step 1. I still don't know where the error came from but as per checking the backup created in Server A are all complete which is weird.
Here are my isolation:
The shared folder of Server A and Server B are accessible (tried also the account of SQL services job as a user in windows)
Here is the error found in event viewer:
1. BackupIoRequest::ReportIoError: read failure on backup device '\\\Server_A\Shared Folder\Database.bak'. Operating system error 64(The specified network name is no longer available.)."
2. "Package "Backup Databases Directly to Server_A Shared Folder" failed."
3. "SQL Server Scheduled Job 'Backup Databases Directly to Server_A Shared Folder.Subplan_1' (0xD20650B4C8A10F41A130C734D053DB63) - Status: Failed - Invoked on: 2019-08-24 00:00:00 - Message: The job failed. The Job was invoked by Schedule 89 (Backup Databases Directly to Server_A Shared Folder.Subplan_1). The last step to run was step 1 (Subplan_1).

As per error from event viewer the error stopped at Step 1. I still don't know where the error came from
Usually, when you checking it form SQL Agent history, expand the job and select step 1 so that we can see what causing the that error, information must be printed in message section (bottom):
"BackupIoRequest::ReportIoError: read failure on backup device '\Server_A\Shared Folder\Database.bak'. Operating system error 64(The specified network name is no longer available.)
In this case, there must be something wrong with path (wrong spelled), or the SQL Server service account (not only SQL Agent service account) doesn't have permissions on the shared folder.
When you say SERVER-A and SERVER-B can access shared folder, you might have verified logging in at the server with particular user, so that logged-in user got the permissions on the shared folder, as long as same user is service account user of SQL Server then we can assume, SQL Server really have access to shared folder.
Hope, fixing permission issues would resolve the error caused.
However, do you have any issues to directly create a backup on SERVER-B shared folder as i feel like we are doing unnecessary additional step.
P.s: I personally, recommend to use custom script for maintenance tasks, especially for backups. My answer at DBA.SE might helpful in your case for better backup strategy

Related

MSSQL error: "Script level upgrade for database 'master' failed ... upgrade step 'msdb110_upgrade.sql' encountered error 200, state 7, severity 25."

All of a sudden one day (on my DEV PC) my Microsoft SQL Server 2012 instance (installed as instance name "SQL2012") would not start (all my other installed instances did). Trying to start it manually under Services failed. I don't recall making any recent changes prior to this. The cause of the failure was a mystery.
On inspecting Event Viewer, under System it showed a rather amusing error message [emphasis mine]:
The SQL Server (SQL2012) service terminated with the following service-specific error:
WARNING: You have until SQL Server (SQL2012) to logoff. If you have not logged off at this time, your session will be disconnected, and any open files or devices you have open may lose data.
checking under Application Event Log, I found these 2 error messages (preceded by a number of MSSQL$SQL2012 informational messages):
Script level upgrade for database 'master' failed because upgrade step 'msdb110_upgrade.sql' encountered error 200, state 7, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
followed by:
Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
Fearing having lost my system databases (and not having a backup of them to restore - who makes backups of their system dbs anyway??) and needing to access the instance, and attached databases - I was willing to try anything. Even the possible restore of the system databases: Restoring the SQL Server Master Database Even Without a Backup - but that looked quite complex.
Fortunately, I was eventually able to start the instance (thank you to this answer: https://stackoverflow.com/a/59676743/4993856 which I trusted, because Pinal Dave also mentions that particular switch in: SQL SERVER – Script level upgrade for database ‘master’ failed because upgrade step msdb110_upgrade.sql encountered error 926, state 1, severity 25) if I ran:
net start mssqlserver$SQL2012 /T902
This pointed to some issue with the upgrade script... (Remember SQL is installed with instance name: SQL2012, hence the mssqlserver$SQL2012 used above for the named instance).
After some more searching I discovered this post: Installing service pack / cumulative update on SQL Server 2016 / 2017 breaks database engine (not exactly the same SQL version as mine) which pointed to the following possible Region Settings setting (Control Panel [when viewed by 'icons'] > All Control Panel Items > Region > Administrative > "Change system locale..."):
"Beta: Use Unicode UTF-8 for worldwide language support" in Region Settings
THAT WAS IT!!! After de-selecting that option (and possibly restarting my computer), the MSSQL Server 2012 Instance started up without any issue, and I was able to access all my previously attached databases.
I assume the pending upgrade scripts ran successfully. Thinking back about it now, it is possible that I agreed to installing a SQL Update, and never bothered to test access to the instance afterwards.
I also don't recall exactly why I chose to enable that specific setting under Region Settings, possibly due to some Linux compatibility, but it looks like it has become defaulted 'on' in recent Windows builds.
I got the same problem SQL2017 after update Windows Patch Hotfix3391(KB5001228)
after restart server MSSQL Fail to start and event viewer shown error below
Script level upgrade for database 'master' failed because upgrade step 'msdb110_upgrade.sql' encountered error 200, state 7, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
Solution
Fix by remove Beta:Use Unicode UTF-8 for Worldwide lang.. in the Region Settings
Then it require restart server. After restart MSSQL can start as normal.
The problem is the msdb_110.sql update script, the script is a bit of a mess, with mixed tabs and spaces (wtf?).
It tries to run a couple of procedures that fail, on startup of sql-server. They fail when the code-page is 65001 (usually because the BETA utf-8 code page option has been selected) and so SQL server fails to start.
This appears to happen any time a SQL Server update is installed. I only experience this error with SQL Server 2017, not 2019
Why?
Don't know? The script is a mess.
Solution
Deselect the use utd-8 code page option
Restart the machine
Start sql server and let it run the script
(optional) reselect the use utd-8 code page option
Restart machine again and sql server
(optinal but recommended) uninstall windows, install a unix and run postgres

Error 0xC0011008 the package failed to load

I'm facing a strange behavior of SQL Server Agent when executing SSIS packages.
I have a job that includes many steps (mainly SSIS packages). Some steps fail mostly every day even the configuration is the same for all the steps.
I tried to delete/create the job, delete/create the SQL Server Agent Proxy but with no sucess.
I can't find any difference between the steps that fail and the ones that succeed.
This is the error returned by SQL Server Agent :
The package failed to load due to error 0xC0011008 "Error loading from XML. No further detailed error information can be specified for this problem because no Events object was passed where detailed error information
SQL Server version : 2014
SSIS version : 2014
EDIT :
In the Event Log I found an Information Message from User Profile Service that says :
Windows detected your registry file is still in use by other applications or services. The file will be unloaded now. The applications or services that hold your registry file may not function properly afterwards
Process 5924 (\Device\HarddiskVolume2\Program Files\Microsoft SQL Server\120\DTS\Binn\DTExec.exe) has opened key \REGISTRY\USER\S-X-X-XX-XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX-XXXX\Control Panel\International
Process 5924 (\Device\HarddiskVolume2\Program Files\Microsoft SQL Server\120\DTS\Binn\DTExec.exe) has opened key \REGISTRY\USER\S-X-X-XX-XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX-XXXX\Software\Microsoft\Windows\CurrentVersion
The SID corresponds to the Proxy User used to execute the SQL Job steps. And the timestamp corresponds is the same when the error occures in SQL Agent.
I think this is what causes the steps to fail.
Could we prevent Windows unloading this registry ?
The error was indeed caused by the fact that the User Profile Service forces the unloading of the Registry.
The solution that worked for me was to change the policy setting Do not forcefully unload the user registry at user logoff from "Not Configured" to Enabled.
Start the Local Group Policy Editor (gpedit.msc)
Go to Computer Configuration > Administrative Templates > System > User Profiles
Set "Do not forcefully unload the user registry at user logoff" to Enabled
Run gpupdate command.
Details can be found here : https://support.microsoft.com/en-us/help/2287297/a-com-application-may-stop-working-on-windows-server-2008-when-a-user

SQL Maintenance Cleanup task not deleting any files, SQL installed on a DC

The generic problem is as listed here SQL Maintenance Cleanup Task Working but Not Deleting but no solutions applicable. Environment: Windows Server 2012R2, AD DS (with policies of course), RDSH/TS Licensing, 1C-server. The primary problem is SQL Server generating insane amount of events per backup plan run, recording a pair of 18456+17052 errors per file to delete. Errors are as follows:
17052: [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'DOMAIN\mssql_srv'
18456: Reason: Could not find a login matching the name provided. [CLIENT: 192.168.x.x] (matches localhost)
Given that each pair of errors appears once per file to delete (there are about 6000 files already!), the algorithm looks like this:
First, backup plan task runs xp_delete_file, it enumerates all the files in target folder;
Second, each file is deleted by creating a separate connection to machine with service's credentials;
Each connection fails due to whatever restrictions default DC policy applies, generating the pair of events. Of course the file remains in place.
The workaround is of course assign file delete task to a local script run as system, for example, but the very reason of why does SQL server fail to delete a file remains unknown. Permissions have been checked and verified that both SQL Server Agent and SQL Server service accounts have full control to the folder.
It turned out that this "login missing" is not a Windows login, but rather SQL "login" which was not present for the service account. So I needed to create a "DOMAIN\mssql_srv" login in SSMS, give it "public" access rights and voila, files started to get deleted properly. The reason is explained in comment:
If it's T-SQL step and job owner is member of sysadmin server role, the step is executed under service account.

What state is my SQL server database in when msdeploy fails on user creation?

I am using msdeploy (version 2) to transfer a database from machine A to machine B.
On in the database on machine A there are some users that do not exist on machine B, thus the transfer (partially) fails with the message:
Error Code: ERROR_SQL_EXECUTION_FAILURE
More Information: An error occurred during execution of the database script.
The error occurred between the following lines of the script: "3" and "5".
The verbose log might have more information about the error.
The command started with the following: "CREATE USER [someDomain\someUser] FOR LOGIN [someDomain"
Windows NT user or group 'someDomain\someUser' not found.
Check the name again. http://go.microsoft.com/fwlink/?LinkId=178587
The database seems to be transfered, except for the user creation. Does anyone know what state the database is in after this failure?
Is there any way I can transfer the database without the users (or better without specific users) using msdeploy?
Web Deploy uses SMO (SQL Management Objects) to script out and apply the scripts for SQL databases, and exposes most of the SMO settings with the dbfullsql provider (so, most of these options: http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.transfer_properties.aspx). If you want to skip the users due to this kind of login-not-exists or user-not-found error, you should be able to do this by adding the scripting option: copyAllUsers=false to the source of the sync. For example:
msdeploy.exe -verb:sync -source:dbfullsql="Data Source=.\SQLExpress;Initial Catalog=MySourceDb;User Id=localUser;Password=LocalPass",copyAllUsers=false -dest:dbfullsql="Data Source=RemoteSQLServer;Initial Catalog=MyDestDb;User Id=remoteUser;Password=RemotePass"
Incidentally, I am surprised you note the db appears to have been sync'd - I would expect this is not actually the case. If you have the permissions for it, Web Deploy will create the database if it did not already exist when it initially tries to make the connection, but your failure occurred very early in the script execution, and I believe Web Deploy dbfullsql syncs are transacted by default (the db creation is separate from the script execution and is not transacted). Thus the db may exist where it did not pre-sync, but I wouldn't expect the data to be present in it.

transactional replication error due to OS error 3

I am for the first time trying to setup transactional replication. This is from an sql 2000 server sp3a to an sql 2005 server which I believe should work.
I did a quick test on my local machine (sql 2005) using it as both the publisher and subscriber and had no trouble setting it up. I repeated a similar process for the real servers using enterprise manager for the 2000 publisher parts and management studio to setup the PULL subscriber. This all seemed to work and the publisher logs seem to be indicating it was preparing the initial data however I am not getting anything coming over as of yet. I checked the logs and am getting an os error 3. I have included the two log sections I think are important below.
2009-07-21 21:37:42.043 The process could not read file 'D:\Program Files\Microsoft SQL Server\MSSQL\ReplData\unc\DOMINO_qlsdat_DOMINO qlsdat to PONGOSQL\20090721164816\enbhostname_1.sch' due to OS error 3.
Message
The replication agent encountered an error and is set to restart within the job step retry interval.
See the previous job step history message or Replication Monitor for more information.
Looks to me like I need to give share permission to the replication data, should I have setup the replication data to go to a share using unc path.
OS Error 3 is no a permission problem, is a path correctness problem: Error code: (Win32) 0x3 (3) - The system cannot find the path specified.. A permission problem would be error 5: Error code: (Win32) 0x5 (5) - Access is denied..
The path D:\Program Files\Microsoft SQL Server\MSSQL\ReplData\unc\DOMINO_qlsdat_DOMINO qlsdat to PONGOSQL\20090721164816\enbhostname_1.sch is incorrect on the server that executes it. Usually one has to use UNC paths in replication, I can't know for sure if that is the problem, but it likely is.

Resources