i want to merge 2 databases with Talend Open Studio

i want to merge 2 databases with Talend Open Studio - database

my job is to merge 2 databases, one is on sql server and the other one is a metadata !! i used tmap to make the second database having the same schema as the first one !!
the 2 databases have id in commun, i want a final database that has all the id with no redundancy !!!
please i need help as soon as possible
enter image description here
i dont know if i should use tmap or something else to merge the 2 databases

using the tMap, you can configure a join on the common ID, i you want no redundancy. You can choose the inner join join type (join model) and catch the inner join reject output like this.
Then add a second output in the tMap that doesn't catch the inner join reject and use the component tUnite to combine the two data flows.

Related

Access Error "This Recordset is not Updatable" having SQL Server backend

I have a form that displays a list of clients. The form contains 3 combo boxes and others are Text boxes. So when I add the tables to the query which contains the data that goes into the combo box, it doesn't work but when I remove the table the query becomes Updatable. I have attached the images of both the queries.
Updatable Query
SELECT [1-01_Clients_tbl].CNR,
[FNC] & " " & [SNC] AS [Service User Full Name],
[1-01_Clients_tbl].PT AS [Physiotherapist name (adjust)],
[1-01_Clients_tbl].[PDS Score (txt)] AS [Score (txt)],
[1-01_Clients_tbl].[PDS Score (nmbr)] AS [Score (No)],
[1-01_Clients_tbl].[Date for WLI],
[1-01_Clients_tbl].DOB AS [Date of Birth],
[1-01_Clients_tbl].TASCode,
[1-01_Clients_tbl].PTact AS [Active Status Physio Details (adjust)],
[_Circular_Temp].ActiveStatus AS [Active Status APH]
FROM ([1-01_Clients_tbl]
LEFT JOIN [1-01-Clients_TransferXtra_tbl] ON [1-01_Clients_tbl].CNR = [1-01-Clients_TransferXtra_tbl].CNR)
LEFT JOIN _Circular_Temp ON [1-01_Clients_tbl].CNR = [_Circular_Temp].CNR
WHERE ((([1-01_Clients_tbl].CNR)<>1 Or ([1-01_Clients_tbl].CNR)=2)
AND (([_Circular_Temp].ActiveStatus)="yes"));
Non-Updatable Query
SELECT [1-01_Clients_tbl].CNR,
[FNC] & " " & [SNC] AS [Service User Full Name],
[1-01_Clients_tbl].PT AS [Physiotherapist name (adjust)],
[1-01_Clients_tbl].[PDS Score (txt)] AS [Score (txt)],
[1-01_Clients_tbl].[PDS Score (nmbr)] AS [Score (No)],
[1-01_Clients_tbl].[Date for WLI],
[1-01_Clients_tbl].DOB AS [Date of Birth],
[1-01_Clients_tbl].TASCode,
[1-01_Clients_tbl].PTact AS [Active Status Physio Details (adjust)],
[_Circular_Temp].ActiveStatus AS [Active Status APH],
[5-10_TeamActiveStatus_Codes_tbl].TextVisible,
[2-01_TeamIDNormalized_tbl].CTeamID
FROM (((([1-01_Clients_tbl]
LEFT JOIN [1-01-Clients_TransferXtra_tbl] ON [1-01_Clients_tbl].CNR = [1-01-Clients_TransferXtra_tbl].CNR)
LEFT JOIN _Circular_Temp ON [1-01_Clients_tbl].CNR = [_Circular_Temp].CNR)
INNER JOIN [5-10_TeamActiveStatus_tbl] ON [1-01_Clients_tbl].CNR = [5-10_TeamActiveStatus_tbl].CNR)
LEFT JOIN [2-01_TeamIDNormalized_tbl] ON [1-01_Clients_tbl].CNR = [2-01_TeamIDNormalized_tbl].CNR)
INNER JOIN [5-10_TeamActiveStatus_Codes_tbl] ON [5-10_TeamActiveStatus_tbl].TeamActiveStatusCode = [5-10_TeamActiveStatus_Codes_tbl].TAScodeID
WHERE ((([1-01_Clients_tbl].CNR)<>1 Or ([1-01_Clients_tbl].CNR)=2)
AND (([_Circular_Temp].ActiveStatus)="yes"));
A) Images of Updatable Query
1) The Query
2) Datasheet view of the Query
3) Design view of the form associated with the above query
4) The Actual working form
Note that CTeamID(Team column in the form) and Text Visible(Team Active Status column in the form) is missing as these data is derived from two different tables and that's where the issue starts.
B) Images of Non-Updatable Query
1) The Query
2) Datasheet view of the Query
3) The Form
So here when I added the tables from where we are getting the CTeamID(Team column in the form) and Text Visible(Team Active Status column in the form) data it's now not updatable query.
Any ideas or suggestions as to how to make it working or how to improve the query to make it updatable? Thank you in advance.

Well, it looks like the introduciton of the "inner join" is what blows up this query.
however, I would consider building the query in sql server, and linking as a view.
However, DO KEEP in mind:
Access based tables allow joins and MULTIPLE tables to be updated.
SQL server:
You can have a query (or better a view) that has mutliple tables, but ONLY ONE of the tables can be changed.
In other words, if you edit some columns that belong to MORE then one table, then this is NOT allowed with SQL server tables.
Again:
The view can have multiple tables, but if you edit a column from MORE then one different table, the update does not work with SQL based tables. (it does work with Access based ones).
So, unless you can "disable" some controls on that continues form/datasheet to "limit" or to "ensure" that ONLY ONE BASE table behind will become "dirty", then you can't use that interface. This is a limitation of SQL server, and one that does not (did not) exist when using Access tables as the back end.
SQL server due to transactions, and being ATOMIC does NOT allow a query to have more then one table AND ALSO then udpate in one shot/command.
MS-access tables do allow this!
You can in some cases "kluge" this by using a after update event of the text boxes, and do a me.dirty = false (which forces a write of the record). Since you would be "forcing" a write of the data, then NEVER does more then "one" table in the query get dirty.
So, this is a difference in SQL tables vs Access tables. (only ONE table in the view/query behind can EVER become dirty).
So, by forcing the write of the row data in the text box after update, then you can make this work.
I would thus using a continues form (and not a datasheet) for this purpose (and by continues form, I do mean what is called a multiple-items form.
So, while you can/could/probably get that 2nd query to work (you have to change the inner join to a left join), the NEW issue will be that if you allow editing in that row that will result in more then ONE table becoming dirty, then the update will not work.
So, you either:
Have to be sure ONLY one table behind in that query is ever changed.
Or, do the kluge of adding me.dirty = false in after update of each text box.
or, consider having a button that pops up a form, and only has the columns from one base table behind in that query for allowed edits.
So, just keep in mind that after using up a pot of coffee, and say you REALLY do get that query to become up-dateable?
Keep in mind that rule about ONLY ONE of the tables behind can become dirty at any given time. This means/suggests that while editing could occur in that row, only columns from ONE of the tables behind at a time can become dirty.
So, as noted, I would consider converting that client side query into a view, since that's going to help a WHOLE LOT in operation of that query.
And thus in sql manager, you can then right click on the view, and choose edit. Test it. That way you don't have to go back into ms-access when trying to determine if the query can/does allow updates.
Do keep in mind that you MUST answer the prompt when linking the view to enter the row PK id for access. Keep in mind that if you re-link (point the front end) to a different back end from SQL server? Then the PK row value of the linked view can and will be lost. This issue can be dealt with in a separate question/post, but you do need to keep this additional information in mind.

How to systematically manage a Big List of Queries and Tables in SQL Server?

Suppose someone has to work on a lot of different SQL Server Databases which have got a lot of Tables and Queries / Views inside them.
After a period of time, it becomes very difficult to remember exactly what kind of columns are present within a given Table and View.
Please suggest some method by which one can keep a systematic list of all the Tables and Views that are present within a SQL Server Database, along with the columns that are present within them.
Are there any Add-on products or services etc. available that helps in making this type of work systematic?
Currently I add comments to each queries inside SQL Server to remind me of what this query is doing, but this method is not great. I am looking for some better and more efficient methods.
Please share any ideas that you might have in this direction.
Thanks a lot

You may find the following useful for each database.
select s.name, s.type, c.name , s.refdate
from syscolumns c
inner join sysobjects s on s.id = c.id
where s.xtype in('U','V')
order by s.refdate --use refdate for manual quick looks
-- use s.name for file output and long term analysis
I output this to text files with the exact same format and check them into source control for each database. I even make comments about fields as things change. This is not part of the formal process, it is just sanity big picture version tracking independent of the formal deployments.

optimizing a lookup task in ssis

I got this doubt about this kind of queries. I am migrating an ETL from Access to SSIS. One query involves an Inner Join with a table in an Oracle Database:
SELECT
SQL_TABLE.COLUMN1,
SQL_TABLE.COLUMN2,
ORACLE_TABLE.COLUMN5,
ORACLE_TABLE.COLUMN6
FROM
SQL_TABLE INNER JOIN ORACLE_TABLE ON
SQL_TABLE.ID_PPAL = ORACLE_TABLE.IDENTIF
WHERE
(((ORACLE_TABLE.COLUMN6) Is Not Null));
The issue is, the Oracle table has more than 18 million registers and the sql table has less than 300 records. The Inner Join should gives something like 2500 records as a result.
First I tried using a merge join task as you can see in the picture, but this is not efficient at all because of the characteristics of the tables, but looking for a possible situation someone proposed me using a look up task, but this only gives me one record for every match it founds, and this is not useful for me, I can not lose any record.
I wonder if is there another way to perform this query, because I can not believe that access would be more efficient than SSIS in this aspect.

In my experience SQL Server will not optimize queries involving Oracle. The fastest approach I found was 1) Use Oracle Drivers to access data from SSIS. 2) Use fast load (with table lock) to load the Oracle table (with a where condition if appropriate) into a SQL Server Work Table. 3) Create a clustered index the table. 4) Do the join. If you are going to reuse the package you will want to truncate the work table and drop the index as the first two steps of the package.

You should check any filters or try to do joins in Oracle database and thus leaking a little. If the result is incorrect, try using variables to store data and create scripts.
This can serve you:
http://www.bidn.com/blogs/ShawnHarrison/ssis/4579/looping-through-variable-values-with-a-foreach-loop-container

Left join in influx DB

I am new to influx DB. Now I need to migrate MySQL db into influxDB. I chose influx DB because it support SQL like queries. But I could not found left join in it. I have a series called statistics which contains browser_id and another series contains browser list. How can I join these 2 tables like relational database concept?
I wrote this query but it is not giving any result.
select * from statistics as s inner join browsers as b where s.browser_type_id = b.id
statistics
browsers

You cannot join series in InfluxDB using arbitrary columns. InfluxDB only supports joining time series based on the time column. This is a special type of join unlike the one you're used to in relational databases. Time join in InfluxDB tries to correlate points from different time series that happened at approximately the same time. You can read more about joins in InfluxDB in the docs

Seems that now is possible. Check again documentation: https://docs.influxdata.com/influxdb/v0.8/api/query_language/#joining-series
select hosta.value + hostb.value
from cpu_load as hosta
inner join cpu_load as hostb
where hosta.host = 'hosta.influxdb.orb' and hostb.host = 'hostb.influxdb.org';

retrieve data from multiple tables and store them in one table confronted the duplicates

I'm working on some critical data retrieving report tasks and find some difficulties to
proceed. Basically, it's belonging to medical area and the whole data is distributed in
several tables and I can't change the architecture of database tables design. In order to
finish my report, I need the following steps:
1- divide the whole report to several parts, for each parts retrieve data by using
several joins. (like for part A can be retrieved by this:
select a1.field1, a2.field2 from a1 left join a2 on a1.fieldA= a2.fieldA ) then I can
got all the data from part A.
2- the same things happened for part B
select b1.field1, b2.field2 from b1 left join b2 on b1.fieldB= b2.fieldB, then I also
get all the data from part B.
3- same case for part C, part D.....and so on.
The reason I divide them is that for each part I need to have more than 8 joins (medical data is always complex) so I can't finish all of them within a single join (with more than 50 joins which is impossible to finish...)
After that, I run my Spring Batch program to insert all the data from part A and data from part b, part c.. into one table as my final report table. The problem is not every part will have same number of rows which means part A may return 10 rows while part b may return 20 rows. Since the time condition for each part is the same (1 day) and can't be changed so just wondering how can I store all these different part of data into one table with minimum overhead. I don't want to have to many duplicates, thanks for the great help.
Lei

Looks to me like what you need are joins over the "data from part A", "data from part B" & "data from part C". Lets call them da, db & dc.
It's perfectly alright that num rows in da/b/c are different. But as you're trying to put them all in a single table at the end, obviously there is some relation between them. Without better description of that relation it's not possible to provide a more concrete answer. So I'll just write my thoughts, which you might already know, but anyway...
Simplest way is to join results from your 3 [inner] queries in a higher level [outer] query.
select j.x, j.y, j.z
from (
' da join db join dc
) j;
If this is not possible (due to way too many joins as you said) then try one of these:
Create 3 separate materialized views (one each for da, db & dc) and perform the join these views. Materialized is optional (i.e. you can use the "normal" views too), but it should improve the performance greatly if available in your DB.
First run queries for da/b/c, fetch the data and put this data in intermediate tables. Run a join on those tables.
PS: If you want to run reports (many/frequent/large size) on some data then that data should be designed appropriately, else you'll run into heap of trouble in future.
If you want something more concrete, please post the relationship between da/b/c.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight