Limit rows while fetching records using QueryDatabase Table using NiFi

Limit rows while fetching records using QueryDatabase Table using NiFi - database

I am trying to use QueryDatabase processor using Apache NiFi
Is there any way I can limit the records something like : “select * from table limit 100”
Any other processor in NiFi which supports this operation?

Use ExecuteSQL processor for this case.
Configure/Enable DBCP connection pool
In SQL select query property value keep your select query
select * from table limit 100
Now processor runs the configured sql select query and outputs the results of the query as a flowfile in AVRO format

Related

Golang SQL: More Efficient to Query Two Tables at Once or Separate Queries/Connection Pools?

I have a connection pool for database A and database B. I am moving some Node.JS code over to Go (I'm using SQL Server if that matters), and some of the queries are doing this:
db.A.Query(`
select ... from some_table;
select ... from B..other_table;
`)
Is it better to do it that way, or like:
db.A.Query(...)
db.B.Query(...)
I read this line:
create one sql.DB object for each distinct datastore you need to access
from here. And only now do I realize I read 'datastore' as 'database', so now I'm not even sure if it's efficient to have these two database connection pools!
Thank you for any help!

For most scenarios and SQL Server client programs sending multiple SELECT queries in a batch is not materially more efficient. Perhaps if the queries returned very small result sets, and you ran them at very high frequency, you could see a material difference. But in the paradigm case, whether you send the queries in one or two batches won't matter much.
It won't matter to SQL Server at all, so the only difference will be in the client/server network traffic.
SSMS will let you compare the client statistics between running queries in a one-batch script and a multi-batch script. EG running
select top 10 * from sys.objects
select top 5 * from sys.columns
and then
select top 10 * from sys.objects
GO
select top 5 * from sys.columns
in SSMS outputs the following client statistics

Select on External table running very very slow on Azure SQL

We have external table created, we need to run select on the table and select all the records, the select runs very very slow. Its not completing even after 30 mins, the table contains around 2millon recs
We also need to query this table from another DB and even this runs very very slow, doesn't return even after 30 mins.
Select is of the form:
select col1, col2,...col3 from ext_table;
Need help in:
1. Any suggestions on reducing the time taken for execution?
Note: we need to select entire content of the table so where condition might not be used.
Thanks in advance.

If you are not using the WHERE clause to push parameters to the remote database, then there is no way to optimize the performance of the query. You are returning the whole table.
My suggestion is to use SQL Data Sync to have a local copy of the table on this SQL Database that synchronizes with the remote Azure SQL Database at X interval of time.

Quicker ways to migrate data from DB2 to SQL Server?

I'm in process of migrating data from DB2 to SQL Server using linked server and open query, like below:
--SET STATISTICS IO on
-- Number of records are: 18176484
select * INTO [DBName].[DBO].Table1
FROM OPENQUERY(DB2,
'Select * From OPERATIONS.Table1')
This query is taking 9 hrs and 17mins (number of record 18176484) to be inserted.
Is there any other way to insert records more quickly? Can I use "OpenRowSet" function to do the bulk insert? OR an SSIS package will increase the performance and will take less time? Please help

You probably want to export the data to a csv file such as this answer on StackOverflow:
EXPORT TO result.csv OF DEL MODIFIED BY NOCHARDEL SELECT col1, col2, coln FROM testtable;
(Exporting result of select statement to CSV format in DB2)
Once its a CSV file you can import it into SQL Server using either BCP or SSIS both of which are extremely fast especially if you use file lock on the target table.

Table hint on linked server query

We have a DB2 database in AS400. Added a linked server, all went well, however occasionally the table is locked, even when we doing some select queries. Thinking about table hint in SQL Server, does the linked server query (e.g. select * from ...) support table hint?

Doubtful, but I don't know for sure.
Are you using openquery() , or 4 part names?
A query using 4 part names like so:
select * from LNKSVRNAME.IBMINAME.MYSCHEMA.MYTABLE where somecolumn = '00335';
Pulls back all the rows from MYTABLE and does the WHERE filtering on MS SQL Server.
In contrast, using openquery() like so:
select * from openquery(LNKSVRNAME, 'select * from MYSCHEMA.MYTABLE where somecolumn = ''00335''');
Sends the query to the IBM i, and only the matching rows from MYTABLE are pulled back into MS SQL Server.
If the table is being locked exclusively, there's not much you can do. However, if you're running into row locks. You may want to look at the following DB2 for IBM i clauses
FOR READ ONLY
SKIP LOCKED DATA or USE CURRENTLY COMMITTED or WAIT FOR OUTCOME
So something like this:
select * from openquery(LNKSVRNAME, 'select * from MYSCHEMA.MYTABLE where somecolumn = ''00335'' FOR READ ONLY USE CURRENTLY COMMITTED');
Note If you are actually talking to an AS/400, FOR READ ONLY is all you'll have available. But if you're talking to a relatively recent IBM POWER System running a relatively recent version of IBM i, then the concurrent-access-resolution clauses I've shown should be available.

Run query on sql server through teradata and store result in teradata

I have one table in SQL server and 5 tables in Teradata.I want to join those 5 table in teradata with sql server table and store result in Teradata table.
I have sql server name but i dont know how to simultaneously run a query both on sql server and teradata.
i want to do this:
sql server table query
Select distinct store
from store_Desc
teradata tables:
select cmp_id,state,sde
from xyz
where store in (
select distinct store
from sql server table)

You can create a table (or a volatile table if you do not have write privileges) to do this. Export result from SQL Server as text or into the language of your choice.
CREATE VOLATILE TABLE store_table (
column_1 datatype_1,
column_2 datatype_2,
...
column_n datatype_n);
You may need to add ON COMMIT PRESERVE ROWS before the ; to the above depending on your transaction settings.
From a language you can loop the below or do an execute many.
INSERT INTO store_table VALUES(value_1, value_2, ..., value_n);
Or you can use the import from text using Teradata SQL Assistant by going to File and selecting Import. Then execute the below and navigate to your file.
INSERT INTO store_table VALUES(?, ?, ..., n);
Once you have inserted your data you can query it by simply referencing the table name.
SELECT cmp_id,state,sde
FROM xyz
WHERE store IN(
SELECT store
FROM store_table)
The DISTINCT function is most easily done on export from SQL Server to minimize the rows you need to upload.
EDIT:
If you are doing this many times you can do this with a script, here is a very simple example in Python:
import pyodbc
con_ss = pyodbc.connect('sql_server_odbc_connection_string...')
crs_ss = con_ss.cursor()
con_td = pyodbc.connect('teradata_odbc_connection_string...')
crs_td = con_td.cursor()
# pull data for sql server
data_ss = crs_ss.execute('''
SELECT distinct store AS store
from store_Desc
''').fetchall()
# create table in teradata
crs_td.execute('''
CREATE VOLATILE TABLE store_table (
store DEC(4, 0)
) PRIMARY INDEX (store)
ON COMMIT PRESERVE ROWS;''')
con_td.commit()
# insert values; you can also use an execute many, but this is easier to read...
for row in data_ss:
crs_td.execute('''INSERT INTO store_table VALUES(?)''', row)
con_td.commit()
# get final data
data_td = crs_td.execute('''SELECT cmp_id,state,sde
FROM xyz
WHERE store IN(
SELECT store
FROM store_table);''').fetchall()
# from here write to file or whatever you would like.

Is fetching data from the Sql Server through ODBC an option?
The best option may be to use Teradata Parallel Transporter (TPT) to fetch data from SQL Server using its ODBC operator (as a producer) combined with Load or Update operator as the consumer to insert it into an intermediate table on Teradata. You must then perform rest of the operations on Teradata. For the rest of the operations, you can use BTEQ/SQLA to store the results in the final Teradata table. You can also put the same SQL in TPT's DDL operator instead of BTEQ/SQLA and get it done in a single job script.
To allow use of tables residing on separate DB environments (in your case SQL-Server and Teradata) in a single select statement, Teradata has recently released Teradata Query Grid. But I'm not sure about exact level of support for SQL-Server and it will involve licensing hassle and quite a learning curve to do this simple job.