How to get insert fields from sql? - apache-flink

I am using Flink Sql to parse sql's lineage.
I use flink planner to parse a sql as
insert into target_table(dest_f1, dest_f2) select source_f1, source_f2 from source_table
Obviously, source_f1 is the source of dest_f1.
When I get a CatalogSinkModifyOperation via Flink planner, the CatalogSinkModifyOperation doesn't contains any insert columns information, which means no dest_f1, dest_f2.
How can I get the insert columns' name from my target_table?

You can use the following code to get the column information of the target table:
List<String> targetColumnList = tableEnv.from(sinkTable)
.getResolvedSchema()
.getColumnNames();
or
relNode.getRowType().getFieldNames()
If you want to parse the lineage of the flink sql field, you can refer to the open source project:
https://github.com/HamaWhiteGG/flink-sql-lineage

Related

How to avoid re-inserting data (duplicates) into SQL Server table while re-running SSIS package that loads data?

I have created a package is SSIS. It's working fine for first time insertion. When I am running the package through SQL Server agent jobs, I am getting duplicates inserted when the scheduled job is inserting data.
I don't have any idea about how to stop inserting multiple duplicate records.
I am expecting to remove duplicates insertion while running deployed package through SQL Server Jobs
There are 2 approaches to do that:
(1) using SQL Command
This option can be used if source and destination are on the same server
Since you are using ADO.NET source you can change the Data Access mode to SQL Command and select only data that not exists in the destination:
SELECT *
FROM SourceTable
WHERE NOT EXISTS(
SELECT 1
FROM DestinationTable
WHERE SourceTable.ID = DestinationColumn.ID)
(2) using Lookup Transformation
You can use a Lookup transformation to get the non-matching rows between Source and destination and ignore duplicates:
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
SSIS - only insert rows that do not exists
SSIS import data or insert data if no match
Implementing Lookup Logic in SQL Server Integration Services
In order to remove duplicates use SQL Task with the following query (assuming that you are not extracting million of rows and you want to remove duplicates on the extracted data, not destination) :
with cte as (
select field1,field2, row_number() over(partition by allfieldsfromPK order by allfieldsfromPK) as rownum)
delete from cte where rownum > 1
Then use a Data Flow Task and insert clean data into destination table.
In case you just want to not insert duplicates , a very good option is to use MERGE statement, a more performant alternative.

Bulk Insert with database connector with different payload and queries

I am using mule database connector to insert update in database . now i have different queries like insert and update in different table , and payload for them will be different as well . how can i achieve bulk operations in this. can i save the queries in a flow variable as list , and accordingly save the values in another list and pass it both to database flow ? will it work .
so i want to generate raw sql queries and save it to file and then use bulk execute for that . does mule provide any tostring method to just convert the query with placeholders to actual raw query ?
like i have query
update table mytable set column1 = #[payload.column1], column2 = #[payload.id]
to
update table mytable set column1 = 'stringvalue', column2 = 1234 ;
Mule's database component does support bulk operations. You can select Bulk Execute in the Operation. The implementation is descriptive when you select the operation.
With regards to making the query dynamic, you can pass the values from variables or property files, as per your convenience.
You can have stored procedure for insert and update accepting input parameters as array.Send the records in blocks inside for loop by setting batch size. This will result in less round trips.
Below is the link to article and has all the details
https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro

SSIS OLEDB Command transformation (Insert if not exists)

Ok so according to Microsoft docs the OLE DB Command Transformation in SSIS does this
The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.
So I want to write some SQL to Insert rows in one of my tables only IF the record doesn't exists
So I tried this but the controls keeps complaining of bad sintaxys
IF NOT EXISTS
(SELECT * FROM M_Employee_Login WHERE
Column1=?
AND Column2=?
AND Column3=?)
INSERT INTO [M_Employee_Login]
([Column1]
,[Column2]
,[Column3])
VALUES
(?,?,?)
However if I remove the IF NOT EXISTS section (leaving the insert only) the controls says may code is Ok, what am I doing wrong.
Is there an easier solution?
Update: BTW My source is a Flat File (csv file)
Update since answer: Just to let people know. I ended up using the OLE DB Command Transformation like I planned cause is better than the OLE DB Destination for this operation. The difference is that I did used the Lookup Component to filter all the already existent records (like the answer suggested). Then use the OLE DB Command Transformation with the Insert SQL that I had in the question and it worked as expected. Hope it helps
OLEDB Command object is not the same as the OLE DB Destination
Rather than doing it as you describe, instead use a Lookup Component. Your data flow becomes Flat File Source -> Lookup Component -> OLE DB Destination
In your lookup, you will write the query SELECT Column1, Column2, Column3 FROM M_Employee_Login and configure it such that it will redirect no match entities to the stream instead of failure (depending on your version 2005 vs not 2005) this will be the default.
After the lookup, the output of No Match will contain the values that didn't find a corresponding match in the target table.
Finally, configure your OLEDB Destination to perform the fast load option.
Though you can make use of Look up component in SSIS to avoid the duplicates which is the best possible approach, but if you are looking for some query to avoid the duplicates then, you can simply insert all the data in some temp/staging table in your database, and run the following query.
INSERT INTO M_Employee_Login(Column1, Column2, Column3)
SELECT vAL1,vAL2,vAL3 from Staging_Table
EXCEPT
SELECT Column1, Column2, Column3 FROM M_Employee_Login

bulk import of xml data in to sql server

I have a set of xml files that I want to parse the data of and import in to a sql server 2012 database. The provided xml files will be validated against a schema.
I am looking as to what is the best method of doing this is. I have found this: http://msdn.microsoft.com/en-us/library/ms171878.aspx
I am wondering if this is the best way or if there are others?
You have several options:
SSIS XML Source. This does not validate against the schema. If you want to detect and properly handle invalid XML files, create a script task to validate the schema in C#.
Parse the XML in a stored procedure.
Insert the entire XML file in one column. Depending on your schema validation requirements, you can use an untyped or typed XML column. (Or both)
Parse the XML using XPath functions. This is actually very fast.
INSERT INTO SomeTable (Column1, Column2, Column3)
SELECT
YourXmlColumn.value('(/root/col1)[1]','int'),
YourXmlColumn.value('(/root/col2)[1]','nvarchar(10)'),
YourXmlColumn.value('(/root/col3)[1]','nvarchar(2000)'),
YourXmlColumn.value('(/root/col4)[1]','datetime2(0)')
FROM YourXmlTable

SQL Server Insert failure due to XML Schema validation error

I have a XML column in a table and it is defined by a schema. I am trying to insert values into this table by using Insert into tbl1 Select * from tbl for xml. But this is failing due to schema validation failure for one of the records. But i want to insert the records which have passed the validation atleast and i can capture the others later. Can someone help me in this.
SQL server validates all dataset, not single row. If you want to validate Row-by-Row using SQL server tools, methods are:
SQLCLR (fastest) link
SSIS (easy to create) - using loop FOREACH you try to insert row into table. All failed rows are redirecting to another table.
TSQL TRY/CATCH Block - insert xml from single row to schema validated variable. Slowest one.

Resources