write hive table create schema by fetching table schema from sql sever - sql-server

I have tables in SQL Server which needs to cloned as hive tables. Since the create table schema will be different in both the cases,
I simply cannot use the create table statement from SQL Server or from any other RDBMS in hive.
I am trying to create a configurable script where one can provide the table schema as an input and it would provide me hive create table statements or a hql file with the create table schema
has anyone tried something similar.

Related

Databricks SQL doesn't seem to support SQL Server

I've created some hive tables using a JDBC in a python notebook on Databricks. This was on Data Science and Engineering UI. I'm able to query the tables in a Databricks Notebook and user direct SQL with the magic command %
When switching to Databricks SQL UI, I'm still able to see the tables in Hive metastore explorer. However I'm not able to read the data. A very clear message says that only csv, parquet and so are supported.
Even though, I found this surprising, since I can use the data on DS and Engineering UI why it's not the case on Databricks SQL? Is there any solution to overcome that?
Yes, it's a known limitation that Databricks SQL right now supports only file-based formats. As I remember it's related to a security model, plus the fact that DBSQL is using Photon under the hood where JDBC integration could be not so performant. You may reach your solution architect or customer success engineer to get information on if it will be supported in the future.
The current workaround would be only to have a job that will periodically read all data from database via JDBC and dump into Delta table - it could be even more performant compared to JDBC, the only issue is the freshness of data.
You can import a Hive table from cloud storage into Databricks using an external table and query it using Databricks SQL.
Step 1: Show the CREATE TABLE statement
Issue a SHOW CREATE TABLE <tablename> command on your Hive command line to see the statement that created the table.
Refer below example:
hive> SHOW CREATE TABLE wikicc;
OK
CREATE TABLE `wikicc`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/user/hive/warehouse/wikicc'
TBLPROPERTIES (
'totalSize'='2335',
'numRows'='240',
'rawDataSize'='2095',
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'transient_lastDdlTime'='1418173653')
Step 2: Issue a CREATE EXTERNAL TABLE statement
If the statement that is returned uses a CREATE TABLE command, copy the statement and replace CREATE TABLE with CREATE EXTERNAL TABLE.
EXTERNAL ensures that Spark SQL does not delete your data if you drop the table.
You can omit the TBLPROPERTIES field.
DROP TABLE wikicc
CREATE EXTERNAL TABLE `wikicc`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/user/hive/warehouse/wikicc'
Step 3: Issue SQL commands on your data
SELECT * FROM wikicc
Source: https://docs.databricks.com/data/data-sources/hive-tables.html

SSIS, query Oracle table using ID's from SQL Server?

Here's the basic idea of what I want to do in SSIS:
I have a large query against a production Oracle database, and I need the following where clause that brings in a long list of ids from SQL Server. From there, the results are sent elsewhere.
select ...
from Oracle_table(s) --multi-join
where id in ([select distinct id from SQL_SERVER_table])
Alternatively, I could write the query this way:
select ...
from Oracle_table(s) --multi-join
...
join SQL_SERVER_table sst on sst.ID = Oracle_table.ID
Here are my limitations:
The Oracle query is large and cannot be run without the where id in (... clause
This means I cannot run the Oracle query, then join it against the ids in another step. I tried this, and the DBA's killed the temp table after it became 3 TB in size.
I have 160k id's
This means it is not practical to iterate through the id's one by one. In the past, I have run against ~1000 IDs, using a comma-separated list. It runs relatively fast - a few minutes.
The main query is in Oracle, but the ids are in SQL Server
I do not have the ability to write to Oracle
I've found many questions like this.
None of the answers I have found have a solution to my limitations.
Similar question:
Query a database based on result of query from another database
To prevent loading all rows from the Oracle table. The only way is to apply the filter in the Oracle database engine. I don't think this can be achieved using SSIS since you have more than 160000 ids in the SQL Server table, which cannot be efficiently loaded and passed to the Oracle SQL command:
Using Lookups and Merge Join will require loading all data from the Oracle database
Retrieving data from SQL Server, building a comma-separated string, and passing it to the Oracle SQL command cannot be done with too many IDs (160K).
The same issue using a Script Task.
Creating a Linked Server in SQL Server and Joining both tables will load all data from the Oracle database.
To solve your problem, you should search for a way to create a link to the SQL Server database from the Oracle engine.
Oracle Heterogenous Services
I don't have much experience in Oracle databases. Still, after a small research, I found something in Oracle equivalent to "Linked Servers" in SQL Server called "heterogeneous connectivity".
The query syntax should look like this:
select *
from Oracle_table
where id in (select distinct id from SQL_SERVER_table#sqlserverdsn)
You can refer to the following step-by-step guides to read more on how to connect to SQL Server tables from Oracle:
What is Oracle equivalent for Linked Server and can you join with SQL Server?
Making a Connection from Oracle to SQL Server - 1
Making a Connection from Oracle to SQL Server - 2
Heterogeneous Database connections - Oracle to SQL Server
Importing Data from SQL Server to a staging table in Oracle
Another approach is to use a Data Flow Task that imports IDs from SQL Server to a staging table in Oracle. Then use the staging table in your Oracle query. It would be better to create an index on the staging table. (If you do not have permission to write to the Oracle database, try to get permission to a separate staging database.)
Example of exporting data from SQL Server to Oracle:
Export SQL Server Data to Oracle using SSIS
Minimizing the data load from the Oracle table
If none of the solutions above solves your issue. You can try minimizing the data loaded from the Oracle database as much as possible.
As an example, you can try to get the Minimum and Maximum IDs from the SQL Server table, store both values within two variables. Then, you can use both variables in the SQL Command that loads the data from the Oracle table, like the following:
SELECT * FROM Oracle_Table WHERE ID > #MinID and ID < #MaxID
This will remove a bunch of useless data in your operation. In case your ID column is a string, you can use other measures to filter data, such as the string length, the first character.

Using a SQL Server openquery to query a linked DB2 server table using conditions based on a SQL Server table

I have a result set I need to pull in from a linked DB2 server table into SQL Server. The table is huge, and I don't want or need to pull the whole thing, I only need the records for a handful of users. The problem is the User IDs are stored in a SQL Server table, not on the DB2 table. While I have select privileges on the DB2 server, I cannot create a table there, so as far as I'm aware I cannot upload the table with User IDs onto the DB2 server. Is there a way to limit the result set pulled from the DB2 server on the User IDs stored in the SQL Server table?

Is there a way to preserve indexes and keys of SQL table when performing a copy activity using azure data factory

Trying to perform a copy activity from onprem sql to azure sql.
The source database table has few indexes and keys and when I perform Copy activity to azure sql by Auto-generate new table, indexes and keys are missing on destination table.
Based on the parameter statements in this official document,there is no guarantee that the index and key will be transferred in ADF copy activity.
As you mentioned in your comment,you could have to create them by yourself,such as in the stored procedure which could be executed in ADF copy activity.
More clues,please refer to these threads:
1.https://www.sqlserverlogexplorer.com/copy-table-one-database-another-database/
2.How to copy indexes from one table to another in SQL Server

Exporting procedure result from Oracle to Postgres

I have an Oracle database that has some tables. I need to migrate the table entries from Oracle to Postgres database. My goal is to write a procedure in PL/SQL that would take an ID as an input parameter. For that ID it would generate an SQL script that adds the required columns (from oracle db) into the Posgres db. How can I export the SQL code from Oracle to Postgres?

Resources