Querying a HIVE Table from SQL Server 2016 or later - sql-server

I'm trying to query my Hortonworks cluster Hive tables from SQL Server. My scenario below:
HDP 2.6, Ambari, HiveServer2
SQL Server 2016 Enterprise
Kerberos configuration for secure logins in HDP
I was reading about the PolyBase service in SQL Server 2016 and I suppose later versions. However, I realize that according to the documentation what this service is going to perform in SQL Server is a bridge to reach out my HDFS and recreate external tables based in this data source.
Otherwise what I'm expecting is query Hive objects like these would be SQL Server objects as well, such as a linked server.
Someone has an example or knows if this is possible within SQL Server and Hive?
Thanks so much

Hive acts more as a job compiler than a database. This means every SQL statement you are writing will be translated into a job for Hadoop, sent over to the cluster and become executed there. From the user perspective it looks like querying a table.
The already mentioned approach by reading the HDFS data source and re-create it in SQL Server is the correct one. Since both, Hive and database server are different technologies, something like a linked server seems to technically not possible for me.
Hive provides nowadays a JDBC interface which could be used to connect to it. But even with Hive JDBC, every query will end up as cluster job for distributed computing, running over the files in HDFS, create a result set and present that to you.

If you want to querying Hive from SQL server, you can download ODBC driver (Microsoft or Hortonsworks) and create a Data Source Name (DSN) for Hive. In Advanced option check Use Native Query. Then just create new linked server in the SQL server with the same name of datasource as Data Source Name in ODBC driver.
Write openquery something like:
select top 100 * from
openquery(HadoopLinkedServer,
'column1, column2 from databaseInHadoop.tableInHadoop')

Related

Move data between different servers

I'm working on a project where I need to automatically run an insert statement to insert a result set - problem is that I need it to go from a SQL Server over to a DB2 server. I can't create a file or script and then import it or run it on the other side. I need to insert or update the DB2 side from the SQL Server side.
Is this possible? I need this to run all by itself as part of a stored procedure in SQL Server.
You're looking for the linked server feature.
Typically linked servers are configured to enable the Database Engine to execute a Transact-SQL statement that includes tables in another instance of SQL Server, or another database product such as Oracle. Many types OLE DB data sources can be configured as linked servers, including Microsoft Access and Excel. Linked servers offer the following advantages:
The ability to access data from outside of SQL Server.
The ability to issue distributed queries, updates, commands, and transactions on heterogeneous data sources across the enterprise.
The ability to address diverse data sources similarly.
(I believe most of the major RDBMSs have a similar feature)
For the most part, this essentially allows you to treat tables or sources in the other database as if they were part of the SQL Server instance - an INSERT statement should just work "normally".
As mentioned you can use a linked server on the SQL Server side to perform operations between two servers. I haven't done much with running DML on DB2 from SQL Server, but from my experience SSIS performs far better than linked servers for transactions pulling data from DB2 to SQL Server using an OLE DB connection. You can read more about OLE DB connections in SSIS here and you'll want to reference the DB2 documentation for the specific DB2 type (Mainframe, LUW, etc.) that's used for details on setting up the connection there. If you setup the SSIS catalog you can run packages using SQL Server stored procedures, which you can either use directly or execute from an existing user stored procedures.

Oracle ODBC connection to SQL Server: Table Name Length Issues

I'm running into a problem when accessing a SQL Server table from an Oracle setup via ODBC.
I can access 90% of the tables absolutely fine, but there's a few tables that have a name that's longer than 30 characters. Whenever I try to interact with the table (describes, selects, etc) Oracle throws an "identifier too long" error and gives up.
Is there a way to coax Oracle into playing nice with the SQL Server tables?
Assuming that we are talking about an Oracle database that has a database link created to a SQL Server database via Heterogeneous Services, you would need to write code using the DBMS_HS_PASSTHROUGH package to interact with the tables in question. You'd also need to use this package if you have tables where there are column names that are not valid Oracle identifiers.

Access as the front end and sql server as the backend

I have some Access tables with many number of fields. I have migrated each access table to 6 or 7 sql server tables. I am using sql server 2008. Now I want to use Access as the front end so that I can enter the data in access but it will be stored in sql server. I know that I have to make a ODBC connection. But I am not sure of how to create a access form to use it as a front-end. I am sorry if it's a basic question...
You will probably want to start with an empty Access database (since the table structures and any existing forms and reports will not match what you created in SQL server).
First step is to establish an ODBC connection to your SQL Server database. Then you will "link" the tables in SQL Server to your Access database.
Now, you have an Access database with all the tables that you linked from SQL Server. Those tables still "live" in SQL Server and when you edit them in Access the data will be stored in SQL Server.
You can then build Access forms and reports using these tables just as if the tables were native to Access.
The most versatile way is to use ODBC links to your SQL Server tables and views. That approach allows you the flexibility to link to other ODBC data sources, tables in other Jet/ACE database files, create Jet/ACE tables locally in your database, link to Excel spreadsheets, and so forth. You can incorporate a broad range of data sources.
If you choose ADP, you will be limited to OLE DB connection to a single SQL Server instance. And you will be essentially locked in to SQL Server. You would not be able to switch the application to a different client-server database without a major re-development effort.
Regarding deployment overhead with ODBC, although you may find it convenient to use a DSN during development, you should convert your ODBC links to DSN-less connections before deployment. That way your user's won't each require the DSN. See Doug Steele's page: Using DSN-Less Connections
Well you can create an ODBC connection. You can also create an ADODB connection as well. If your objective is to update or modify a SQL database, both connections will do the trick.
Now, I guess you have to get familiar with the corresponding objects. These should be tables, queries, commands, etc .., that will allow you, for example, to build recordsets out of SQL queries ... Once you are clear with that, you can, for example, assign a recordset to a form through the Set myForm.recordset = myRecordset.open ... method.

How do you pull data from SQL Server to Oracle?

I'm wanting to take data from a SQL Server table and populate a Oracle table. Right now, my solution is to dump the data into a Excel table, write a macro to create a sql file that I can load into Oracle. The problem with this is I want to automate this process and I'm not sure I can automate this.
Is there an easy way to automate populating a Oracle table with data from a SQL Server table?
Thanks in advance
I suppose it depends on your definition of "easy".
The most robust approach would be to either use heterogeneous connectivity in Oracle to create a database link to the SQL Server database and then pull the data from SQL Server or to create a linked server in SQL Server that connects to Oracle and then push the data from SQL Server to Oracle.
Yes. Take a look at MS SQL's SSIS which stands for SQL Server Integration Services. SSIS allows all sorts of advanced capabilities, including automated with Sql Server Jobs, for moving data between disparate data sources. In your case, connecting to Oracle can be achieved a variety of ways.
There are three ways to automate this:
1) You can do as Paul suggested and created an SSIS package that will do this and it can be scheduled via SQL Agent,
2) If you don't want to deal with SSIS, you can download the free SQL# (SQLsharp) CLR Library from http://www.SQLsharp.com/ and use the DB_BulkCopy Stored Procedure to do this in a T-SQL Stored Proc which can also be scheduled via SQL Agent. [note: I am the author of SQL#]
3) You can also set up a Linked Server from SQL Server to Oracle, but this has the draw-back of being a potential security hole. Of course, you could use an Oracle Login that only has write-access to that single table (or something similar to that).
There are lots and lots of ways to do it. Which you choose depends on your requirements.
Using Excel is fine if it's a one time thing.
If it's a once-in-a-while thing, then you could write a simple .NET app that uses a single DataSet and multiple DataAdapters to do the data dump. C# code example here.
if it's a regular thing, then you could put the above in a Schtasks task, or you could use SSIS. I think SSIS is an extra-cost option.
if the requirement is for "online access", then a linked database is probably appropriate.

How can I copy data records between two instances of an SQLServer database

I need to copy some records from our SQLServer 2005 test server to our live server. It's a flat lookup table, so no foreign keys or other referential integrity to worry about.
I could key-in the records again on the live server, but this is tiresome. I could export the test server records and table data in its entirety into an SQL script and run that, but I don't want to overwrite the records present on the live system, only add to them.
How can I select just the records I want and get them transferred or otherwise into the live server? We don't have Sharepoint, which I understand would allow me to copy them directly between the two instances.
If your production SQL server and test SQL server can talk, you could just do in with a SQL insert statement.
first run the following on your test server:
Execute sp_addlinkedserver PRODUCTION_SERVER_NAME
Then just create the insert statement:
INSERT INTO [PRODUCTION_SERVER_NAME].DATABASE_NAME.dbo.TABLE_NAME (Names_of_Columns_to_be_inserted)
SELECT Names_of_Columns_to_be_inserted
FROM TABLE_NAME
I use SQL Server Management Studio and do an Export Task by right-clicking the database and going to Task>Export. I think it works across servers as well as databases but I'm not sure.
An SSIS package would be best suited to do the transfer, it would take literally seconds to setup!
I would just script to sql and run on the other server for quick and dirty transferring. If this is something that you will be doing often and you need to set up a mechanism, SQL Server Integration Services (SSIS) which is similar to the older Data Transformation Services (DTS) are designed for this sort of thing. You develop the solution in a mini-Visual Studio environment and can build very complex solutions for moving and transforming data.

Resources