Polybase: Can't connect to Azure Blob from SQL Server - sql-server

I am trying out the new Polybase-Feature in SQL-Server by connecting to a CSV. However I do not manage to connect to the Azure Blob Storage:
CREATE EXTERNAL DATA SOURCE AzureBlob WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://myfolder#myblob.blob.core.windows.net',
CREDENTIAL = mycredential
);
GO
I always get an error saying:
Incorrect syntax near 'HADOOP'
My SQL Server runs on an Azure VM, however I am not sure which services are supposed to be running:
I also checked TCP/IP is enabled.
I also tried using SSDT and dsql-files as suggested in this post - but the error doesn't go away.

However I do not manage to connect to the Azure Blob Storage
Should it not be a Type=BLOB_STORAGE?
CREATE EXTERNAL DATA SOURCE AzureBlob WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'wasbs://myfolder#myblob.blob.core.windows.net',
CREDENTIAL = mycredential
);
Update 2020-02-18:
I encounter the same famous message recently:
Incorrect syntax near 'HADOOP'
It can be fixed running:
exec sp_configure 'polybase enabled', 1;
GO
RECONFIGURE
Microsoft built a nice page: Configure PolyBase to access external data in Azure Blob Storage. However, they didn't include that important command.
I think it could also be a reason of the initial issue of 5th

While I accepted Alexander's answer, it turns out that the option BLOB_STORAGE doesn't allow to create external tables. The option HADOOP was the correct one for me. There were three steps I needed to do to make the HADOOP option work:
Re-install Java Runtime Environment
Repair the SQL Server Installation
Restart the Virtual Machine
Then the SQL-Statement from my question worked.

Related

SQL Server: accessing CSV files as linked server with Windows authentication

Wish you all a happy new year!
I periodically import data from CSV files downloaded from a web application. I have created the linked server LS_Text for the local path with the CSV files. I import the file content with INSERT INTO ... SELECT queries in a stored procedure.
This works fine as long as I log in to SSMS as SA. But if I log in with my Windows authentication, I get the error
Cannot initialize the data source object of OLE DB provider
"Microsoft.ACE.OLEDB.12.0" for linked server "LS_Text".
I plan to import the files by an application that calls the stored procedure. So Windows authentication has to work.
What means the error message? The same error message results if the file path does not exist. So it looks as if SQL Server or the OLEDB provider can't see the folder with the CSV files. But I have saved the files myself with my credencials.
I have created the linked server with the following batch:
EXEC sp_addlinkedserver #server = N'LS_Text', #srvproduct=N'CSVFLATFILE',
#provider=N'Microsoft.ACE.OLEDB.12.0', #datasrc=N'C:\Trans', #provstr=N'Text;HDR=Yes';
EXEC sp_addlinkedsrvlogin #rmtsrvname = N'LS_Text', #useself = 'false',
#locallogin = NULL, #rmtuser = NULL, #rmtpassword = NULL;
EXEC sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1;
As far as I understand, #useself = 'false', #rmtuser = NULL, #rmtpassword = NULL means that the linked server can be accessed without login and password. I have tried all sort of other combinations, yet without success.
Articles found on Google for this error message deal with OPENROWSET rather than linked server. Or with ACE driver configuration. But this is not the issue since it works as SA.
So how can I query the CSV files with Windows authentication? Any hint is appreciated.
Perhaps this is not a full answer, but it should hopefully help you debug this kind of issue better yourself. Since, as you mention, it works as sa, the likely problem is related to the user/login mapping. There's an example on the doc page sp_addlinedsrvlogin doc describing how to try a specific Windows login. That might be worth trying for your credentials to see if that works. Second, there are ways to delve into what is happening in the server's code path to load and use the provider. A reasonable blog post can be found here which is about talking to Oracle but the important content is about how to set up trace events for linked servers and see what is happening once you start trying to execute your query. (linked server vs. openrowset to a linked server should not matter, but please note that the term openrowset was overloaded in SQL to allow different code paths including some that don't directly go through OLE/DB or not through this specific OLE/DB provider, as David mentions in the comments to your question). So, tracing the actions before the error may point out a spot where things have failed differently in your windows login case vs. the sa/admin path. Finally, as the Jet (now ACE) provider is fundamentally a DLL that gets loaded into the SQL Server process and then does file system operations to try to load a file, it may be valuable to just use procmon to monitor the process and see if there is some operation that is failing (such as reading a registry key or opening a file inside the provider). It doesn't seem to be the most likely problem given that sa works for you, but it may be a useful tool.
You also asked about the error message. I'll try to explain. (I wrote the original Jet OLE/DB provider that was later renamed to ACE after I changed teams). In OLE/DB, there are COM interfaces that conceptually "live" on 4 main internal classes. You can see this documented here in the OLE/DB programmer's guide. The Data Source object is the top-most object and it means somewhat different things to different data sources. The second-level concept is a "session". In Jet/ACE, these two concepts are not really different as you just have a connection to a file, but in SQL Server and other server things, the data source object is a connection to a server and the session is an individual connection to that server. The error you are getting is saying that the initial connection/authentication to the provider is failing. It means one of N things, but I'd start with examining "the mapping from SQL's login to the login for Jet/ACE is not working properly".
net-net - if you can load CSVs through the normal paths (openrowset(BULK + CSV format), your life is probably going to be better in the long-run as David suggests.
Best of luck debugging your problem whatever path you pick.

Unable to run SQL Server stored procedure query in SSMS after adding credential parameter

I am new to using using stored procedure and Azure storage account. I am exploring the following guide at:
https://www.sqlshack.com/how-to-connect-and-perform-a-sql-server-database-restore-from-azure-blob-storage/
and have created a credential in my database 'Security' > 'Credential' folder in SSMS.
Query that I ran in SSMS:
--using the url and the key
CREATE CREDENTIAL [Credential_BLOB]
WITH IDENTITY= 'https://<account>.blob.core.windows.net/',
SECRET = '<storage account key -> which I enter my Access Key 1>';
Result:
After which I proceed to run the following stored procedure where I want to restore the backup from BLOB storage:
RESTORE DATABASE Database_Name FROM URL = 'https://<account>.blob.core.windows.net/Container/SampleDatabase.bak'
WITH CREDENTIAL = 'Credential_BLOB',
And I get this error:
Msg 41901, Level 16, State 2, Line 3
One or more of the options (credential) are not supported for this statement in SQL Database Managed Instance. Review the documentation for supported options.
However, from the guide which I input the link above, they were able to run the query:
I tried to google for the syntax of the RESTORE statement from the Microsoft Docs library and others who may have encountered similar issue but I did not find any effective result. I would appreciate your help if you have encountered something similar and would like to share your solution. Thank you!
From the error which you have shared, it is easy to interpret that you are using the SQL Database Managed Instance. But the link you have shared doesn't mention anywhere which SQL Server it is using. The approach mentioned in that link might not work in your case because of difference in SQL servers and statement compatibility.
Then, I tried the steps which are given in the Microsoft official document (link shared by #Nick.McDermaid in the comment section). It is working fine without any issue.
Please follow the steps below to achieve the requirement (applicable for SQL Server 2016 (13.x) and later, Azure SQL Managed Instance only).
Use the GUI in SQL Server Management Studio to create the credential by following the steps below.
Connect with your SQL Server 2016 (13.x) and later or Azure SQL Managed Instance
Right-click your database name, hover over Tasks and then select Back up to launch the Back Up Database wizard.
Select URL from the Back up to destination drop-down, and then select Add to launch the Select Backup Destination dialog box.
Select New container on the Select Backup Destination dialog box to launch the Connect to a Microsoft Subscription window.
Sign in to the Azure portal by selecting Sign In and then proceed through the sign-in process. Select your subscription from the drop-drown.
Select your storage account from the drop-down. Select the container you created already from the drop-down. Select Create Credential to generate your Shared Access Signature (SAS). Save this value as you'll need it for the restore.
I also tried to restore the database using the newly created credential and it is working fine.
To create the credential using T-SQL, please follow the steps provided in this link.

Bulk Insert failed when executed from remote client but success on local

Please find the diagram as below for my issue:
I have 3 servers in the same domain, there is a SQL Server instance A (it's windows service run under domain\User1), In this instance, we have a Stored Procedure used for BULK INSERT a text file from a network shared folder in server C, the domain\User1 has full permissions on this folder.
My issue is: The Stored Procedure runs ok (green arrow) when connecting by SSMS in its (server A). But it failed when I change to SSMS in server B (log in by the same domain\User1 to the same Instance A). The error is "Access denied" to the text file (red arrow). Does the client have a role in this? I think the client does not matter, the file reading is done from the server (by the user that run Instance A service)
Note: If I connect Instance A from SSMS B with SQL Logon User (not windows account), the stored procedure works fine.
Could anyone give me some advice and sorry for my bad English
This is just a link answer but hopefully it helps.
BTW I commend you for taking the time to analyse the issue to the extent of drawing a diagram. This is far higher quality than most questions on here.
I believe you are running into a double hop issue. I searched everywhere for the BULK INSERT permission model and finally found this https://dba.stackexchange.com/questions/189676/why-is-bulk-insert-considered-dangerous
which says this about using BULK INSERT:
When accessing SQL Server via a Windows Login, that Windows account
will be impersonated (even if you switch the security context using
EXECUTE AS LOGIN='...') for doing the file system access
and this
when accessing SQL Server via a SQL Server Login, then the external
access is done in the context of the SQL Server service account
When you have issues with windows authentication and there is three servers and impersonation, it's often a double hop issue.
This may help you with that:
https://dba.stackexchange.com/questions/44524/bulk-insert-through-network
Which in turn references this:
https://thesqldude.com/2011/12/30/how-to-sql-server-bulk-insert-with-constrained-delegation-access-is-denied/

Azure Serverless SQL Serverless Database

I Created SQL Server Database in Azure which is serverless and tried to access it using my SQL Server Management Studio in my local but I couldn't get it work.
It always gives me this message:
I tried to whitelist also my IP in Azure but still I get the same result.
Is there a possible way to make it connect?
Is the database currently online or paused?
I'll repeat the text from #David Browne's link:
If a serverless database is paused, then the first login will resume the database and return an error stating that the database is unavailable with error code 40613. Once the database is resumed, the login must be retried to establish connectivity. Database clients with connection retry logic should not need to be modified.
So;
Assuming the database is paused, this is normal operation
Please read docs
You need to retry after the database starts OR manually pre-start it using the Powershell provided in the link below
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-serverless#connectivity
And yes, you also need to whitelist your IP address as you have already done.
Obviously this flavour of SQL is unsuitable for some types of applications - there is more information in the link - I suggest you read the whole thing.

Polybase doesn't create external file format

I get the good old
Incorrect syntax near 'EXTERNAL'.
error. I am exactly doing what this answer describes. But SQL Server returns the aforementioned error when I come to this code-chunk:
CREATE EXTERNAL FILE FORMAT csvformat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR =',')
);
GO
What am I doing wrong?
What I tried
Java runtime environment is installed (Java 8 Update 201)
PolyBase is installed with "PolyBase Query Service for External Data"
I enabled PolyBase with EXEC sp_configure 'hadoop connectivity', 4;. I also set that option to 1 and 7 - I still get that error
Using EXEC sp_configure, I also set 'polybase enabled' to 1
I checked SELECT SERVERPROPERTY ('IsPolybaseInstalled') AS IsPolybaseInstalled; - it returns 1
My TCP is enabled
My PolyBase is running:
Setup: SQL Server 2019 on a Virtual Machine (Azure), no Azure SQL Server or Azure DWH.
Perhaps too simple answer, but can you restart entire virtual server and try it again?
Update: Reboot of the server/service after installation of Polybase is not stated in the documentation, also not requested by the installer, however plenty of messages on user boards tell that it is required to make Polybase work.

Resources