How to have multiple INSERT statements within single PyFlink SQL job? - apache-flink

Is it possible to have more than one INSERT INTO ... SELECT ... statement within a single PyFlink job (on Flink 1.13.6)?
I have a number of output tables that I create and I am trying to write to write to these within a single job, where the example Python & SQL looks like (assume there is an input table called 'input'):
sql1 = "INSERT INTO out1 (col1, col2) SELECT col1, col2 FROM input"
sql2 = "INSERT INTO out2 (col3, col4) SELECT col3, col4 FROM input"
env.execute_sql(sql1)
env.execute_sql(sql2)
When this is run inside a Flink cluster inside Kinesis on AWS, I get a failure:
Cannot have more than one execute() or executeAsync() call in a single
environment.
When I look at the Flink web UI, I can see that there is one job called insert-into_default_catalog.default_database.out1. Does Flink separate out each INSERT statement into a separate job? It looks like it tries to create one job for the first query and then fails to create a second job for the second query.
Is there any way of getting it to run as a single job using SQL, without having to move away from SQL and the Table API?

If you want to do multiple INSERTs, you need to wrap them in a statement set:
stmt_set = table_env.create_statement_set()
# only single INSERT query can be accepted by `add_insert_sql` method
stmt_set.add_insert_sql(sql1)
stmt_set.add_insert_sql(sql2)
# execute all statements together
table_result = stmt_set.execute()
# get job status through TableResult
print(table_result.get_job_client().get_job_status())
See the docs for more info.

Related

TVP stored procedure taking too long to load than usual

Our table valued stored procedures are taking a longer time to load data than usual. Please find the below details:
Number of rows: 1.3M
Time taken to load (old): 3 min
Time taken to load (new): 25-40 min.
Nothing has changed in the server. Can you please give me pointers as to what needs to be looked into?
Application: we have deployed a SSIS package in SQL Server, which extracts data from different teradata sources in a loop and creates a datatable and executes a stored procedure using script component
Regards,
Swathi S
Have tried putting the insert within a transaction
Insert into (col1, col2....)
select col1, col2
from #tvpvariable

Excel - SQL Query - ## Temp Table

I am trying to create a global temp table using the results from one query, which can then be selected as a table and manipulated further several times without having to reprocess the data over and over.
This works perfectly in SQL management studio, but when I try to add the table through an Excel query, the table can be referenced at that time, but it is not created in Temporary Tables in the tempdb database.
I have broken it down into a simple example.
If I run this in SQL management studio, the result of 1 is returned as expected, and the table ##testtable1 is created in Temporary Tables
set nocount on;
select 1 as 'Val1', 2 as 'Val2' into ##testtable1
select Val1 from ##testtable1
I can then run another select on this table, even in a different session, as you'd expect. E.g.
Select Val2 from ##testtable1
If I don't drop ##testtable1, running the below in a query in Excel returns the result of 2 as you'd expect.
Select Val2 from ##testtable1
However, if I run the same Select... into ##testtable1 query directly in Excel, that correctly returns the result of 1, but the temptable is not created.
If I then try to run
Select Val2 from ##testtable1
As a separate query, it errors saying "Invalid object name '##testtable1'
The table is not listed within Temporary Tables in SQL management studio.
It is as if it is performing a drop on the table after the query has finished executing, even though I am not calling a drop.
How can I resolve this?
Read up on global temp tables(GTT). They persist as long as there is a session referencing it. In SSMS, if you close the session that created the GTT prior to using it in another session, the GTT would be discarded. This is what is happening in Excel. Excel creates a connection, executes and disconnects. Since there are no sessions using the GTT when Excel disconnects, the GTT is discarded.
I would highly recommend you create a normal table rather than use a GTT. Because of their temporary nature and dependence on an active session, you may get inconsistent results when using a GTT. If you create a normal table instead, you can be certain it will still exist when you try to use it later.
The code to create/clean the table is pretty simple.
IF OBJECT_ID('db.schema.tablename') IS NOT NULL
TRUNCATE TABLE [tablename]
ELSE
CREATE [tablename]...
GO
You can change the truncate to a delete to clean up a specific set of data and place it at the start of each one of your queries.
is it possible you could use a view? assuming that you are connecting to 5 DBs on the same server can you union the data together in a view:
CREATE VIEW [dbo].[testView]
AS
SELECT *
FROM database1.dbo.myTable
UNION
SELECT *
FROM database2.dbo.myTable
Then in excel:
Data> New Query > From Database > FromSQL Server Database
enter DB server
Select the view from the appropriate DB - done :)
OR call the view however you are doing it (e.g. vba etc.)
equally you could use a stored procedure and call that from VBA .. basically anything that moves more of the complexity to the server side to make your life easier :D
You can absolutely do this. Notice how I'm building a temp table from SQL called 'TmpSql' ...this could be any query you want. Then I set it to recordset 1. Then I create another recordset 2, that goes and gets the temp table data.
Imagine if you were looping on the first cn.Execute where TmpSql is changing.. This allows you to build a Temporary table coming from many sources or changing variables. This is a powerful solution.
cn.open "Provider= ..."
sql = "Select t.* Into #TTable From (" & TmpSql & ") t "
Set rs1 = cn.Execute(sql)
GetTmp = "Select * From #TTable"
rs2.Open GetTmp, cn, adOpenDynamic, adLockBatchOptimistic
If Not rs2.EOF Then Call Sheets("Data").Range("A2").CopyFromRecordset(rs2)
rs2.Close
rs1.Close
cn.Close

Bulk Insert with database connector with different payload and queries

I am using mule database connector to insert update in database . now i have different queries like insert and update in different table , and payload for them will be different as well . how can i achieve bulk operations in this. can i save the queries in a flow variable as list , and accordingly save the values in another list and pass it both to database flow ? will it work .
so i want to generate raw sql queries and save it to file and then use bulk execute for that . does mule provide any tostring method to just convert the query with placeholders to actual raw query ?
like i have query
update table mytable set column1 = #[payload.column1], column2 = #[payload.id]
to
update table mytable set column1 = 'stringvalue', column2 = 1234 ;
Mule's database component does support bulk operations. You can select Bulk Execute in the Operation. The implementation is descriptive when you select the operation.
With regards to making the query dynamic, you can pass the values from variables or property files, as per your convenience.
You can have stored procedure for insert and update accepting input parameters as array.Send the records in blocks inside for loop by setting batch size. This will result in less round trips.
Below is the link to article and has all the details
https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro

Sql Query Task in SSIS

I have added a Execute Sql Task in my Project.I have added a Sql query in it
Insert into M1
select * from M4
But the problem is M1 table is in AAA database & M4 table is in DDD Database.
It is showing some error...?
If both databases are on the same server then fully qualify the table names:
insert into AAA.dbo.M1 (col1, col2, ...)
select col1, col2, ...
from DDD.dbo.M4
Of course, if your objects are not in the dbo schema then you need to put the correct one. You should never use SELECT * by the way, it can lead to problems if you ever change the table structure (or someone else does). Instead, always specify the column names.
An alternative would be to use a data flow to copy the data, but that's probably unnecessary here.
you can use a Data Flow Task. add a OLE DB Source and a OLE DB Destination. Then configure source and destination as required.
Take a look at here

error when insert into linked server

I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?
I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)
The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]
I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';
I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.
I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.

Resources