DataStage recursion - database

I have to find a leader of a group and update employee's leader. I am not sure how to proceed with this in DataStage.
I have an employee table as shown below
Emp_id mgr_id leader_id
1 100 400
101 201 500
3 202 600
I get a file to update employee table when an employee changes group. Change code = CHG means it is a job/group change.
I do an equi join between file and employee table and can update manager id. At the same time, I need to find a leader. I need to get all the employees who report to that top level leader and use as the leader id's for every employee.
File:
emp_id mgr_id chg_cd
1 102 CHG
101 301 CHG
File Row 1: There is change in manager for emp_id = 1; need to update mgr_id, leader_id in employee table
File Row 2: There is change in manager for emp_id = 102, need to change mgr_id and leader_id for in employee table
Can you please suggest me on how to proceed with this in DataStage?

ok this problem requires a solution with recursion. As DataStage has no way to do it (if the levels between managers and leaders are variable).
So load the data into a database table and use recursive SQL to query it - this will provide you the solution you are asking for.
Example:
Extract all leaders with their business units they manage inculding different levels) with the recursive SQL statement and use this data in da DataStage lookup to enrich the file data.

Related

Add values to a table from a second table that only match a third table allowing for duplicates

I have been tasked to match a payment file from a bank that has invoices/payments listed on a text file I have imported into a table called Bank. I need to match the invoices to the project/projects that are associated with the invoices - call this table Invoices - which contains every invoice and project we have every had. I want to match the invoices (from Bank) to the project or multiple projects (from Invoices) to another table - called Report - so I can reconcile the payment file. I can get the correct results from Bank and Invoices with the following query
SELECT invoice
FROM Invoices INV
INNER JOIN Bank as BANK
ON INV.Invoice = BANK.Invoice_Number
The Bank file has 100 invoices and I get 169 invoices on this query. But when I try and do and update or insert
Update Report
set Invoice_Num =
(SELECT invoice
FROM Invoices INV
INNER JOIN Bank as BANK
ON INV.Invoice = BANK.Invoice_Number)
I get 0 rows updated.
I have tried to copy the Bank table to the Report table with
Insert into Report(Invoice_Num)
select Invoice_Number from Bank
but can't figure out how to account for the projects that have duplicated invoices when they are found. Of course I might be going at this entirely wrong and someone has a better way entirely.
Thanks!
Does your Report table have anything in to start with? If not, your UPDATE statement will update 0 rows, because there are 0 rows there to update. (Also, with your code as it stands, note that it would update every entry there to have the same, indeterminate value; I don't think that's what you intend.)
If you just want to copy the invoice numbers from Bank to Report, but leave out any duplicates, then your final bit of SQL just needs a DISTINCT added to do that:
Insert into Report(Invoice_Num)
select DISTINCT Invoice_Number from Bank
If you're trying to put in only invoice numbers from Bank that also match Invoice, then your first bit of code almost works, but needs to be an INSERT, not an UPDATE:
INSERT INTO report (invoice_number)
SELECT invoice
FROM Invoices INV
INNER JOIN Bank as BANK
ON INV.Invoice = BANK.Invoice_Number
Again, if you're also dealing with potential duplicates invoice numbers you only want in Report once, make that a SELECT DISTINCT to avoid them.

Editing MS SQL DB with 20 tables to 1 table without data lost

Hello I don't know how to do changes of my MS SQL DB to integrate it to work with new software.
The case is related with software limitations.
Our old software can write, read and work with MS SQL DB with multiple tables but the new software understand only from one table or one view table.
My question is how can I edit my MS SQL DB form 20 tables to do it to be one DB with one table with all data from 20 tables and columns without data lost?
And one last question is true about View Tables in MS SQL that they are read only for applications and software?
Why is it not a good idea to put data from all your 20 tables into one table ?
I will try to explain with an example, since I do not know your database I just think of some tables here
suppose you have a table Clients
ClientID Name Street City
1 John ChuchStreet Denver
2 Anna FlowerStreet Boston
and a table Products
ProductID Name Price
1 Mouse 10
2 Keyboard 30
3 Usb Cable 10
and table Orders
OrderID OrderNumber CLientID TotalAmount
1 123 1 10
2 345 1 20
3 678 2 30
and finally table OrderDetail
OrderDetailID OrderID ProductID Quantity
1 1 1 1
2 2 1 1
3 2 3 1
4 3 2 1
Now to put this into one table, you could do this
ID ClientName ClientStreet ClientCity OrderNumber TotalAmount ProductName ProductPrice ProductQuantity
1 John ChurchStreet Denver 123 10 Mouse 10 1
1 John ChurchStreet Denver 345 20 Mouse 10 1
2 John ChurchStreet Denver 345 20 Usb Cable 10 1
3 Anna FlowerStreet Boston 678 30 Keyboard 30 1
Now you can already see the redundancy,
you need to repeat the address of each customer, time and time again in your table
you need to repeat the ordernumber and total amount time and time again
you need to repeat the productname and price time and time again
Now suppose that John moves to another address, now you have to search for John in every row in the table, and adjust the address
Now suppose a productname changes, again you have to search all rows and update
That is lots of work, very inefficient, and guaranteed to go wrong at some point
Now I only used 4 tables in this example, can you image what will happen if you would merge 20 tables into 1 ?
And the redundancy is not your only problem, what if you want to look at a client, what row should you use ?
What if you want to look at an order, what row should you use ?
What if you want to look at a product, what row should you use ?
In this one table design, you cannot identify a single row for customer, or order anymore. That is because each row contains everyting, there is no distinct row anymore for a customer, or a product, or an order...
Merging all tables into one big table is simply not possible to maintain
Hi thank you for your answers!
I am trying to integrate old ms sql db to new IDFLOW Software.
This software understand from ms sql db but only from one table but my old db contains 18 tables....
IDFLOW understand from views, but it is not good to work with views, they are okay for card design in IDFLOW, but views are not okay to write new data in db from IDFLOW interface , because in views it is not possible to write new data in db!
And now I started to think to create one db with one table with all columns, and I completed it successfully!
Now in my sql server I am with two db, old db and new db.
Now I don't know how to export data from old db and import it to new db?
I am talking about export column2 from table1 from db1 and import it in column2 from table1 from db2 ....
and so and so ........... column2 from table2 from db1 and import it in column3 from table1 from db2 ............
and so and so .... column 5 from table18 from db1 and import it in column10 form table1 from db2.
Is it possible to do that?
Okay, but IDFLOW works only with one table...
And I asked Jolly and they say utilize your DB for one table.
IDFLOW is a software for ID CARDS .
I think it is possible to use one db with one table and all columns for employee information.
The goal is.. in IDFLOW enter all data for new employee, when you enter data from IDFLOW interface they go into MS SQL DB and all fields from DB are in card design configured, and when you select records for one employee from IDFLOW interface (from db) you can print ID CARD with all data for employee.
The question is how to migrate old records from old db in new.
It is not big database, it is only 6 GB from 2006.
And we use it only for printing id cards.

Same table doing multiple operation in SQL Server

Table emp with columns:
id | name | sal | deptno | location
One user is inserting/updating 20 million records into that table.
Another user is retrieving data using select statement from the same emp table.
insert / update statement will take 2 hours and at the same time another user retrieve data in the mid of insertion/update process
How many records will come for execute the select statement in SQL?
How to check or what steps we need to follow to achieve this task in SQL Server?

table relationships, SQL 2005

Ok I have a question and it is probably very easy but I can not find the solution.
I have 3 tables plus one main tbl.
tbl_1 - tbl_1Name_id
tbl_2- tbl_2Name_id
tbl_3 - tbl_3Name_id
I want to connect the Name_id fields to the main tbl fields below.
main_tbl
___________
tbl_1Name_id
tbl_2Name_id
tbl_3Name_id
Main tbl has a Unique Key for these fields and in the other table, fields they are normal fields NOT NULL.
What I would like to do is that any time when the record is entered in tbl_1, tbl_2 or tbl_3, the value from the main table shows in that field, or other way.
Also I have the relationship Many to one, one being the main tbl of course.
I have a feeling this should be very simple but can not get it to work.
Take a look at SQL Server triggers. This will allow you to perform an action when a record is inserted into any one of those tables.
If you provide some more information like:
An example of an insert
The resulting change you would like
to see as a result of that insert
I can try and give you some more details.
UPDATE
Based on your new comments I suspect that you are working with a denormalized database schema. Below is how I would suggest you structure your tables in the Employee-Medical visit scenario you discussed:
Employee
--------
EmployeeId
fName
lName
EmployeeMedicalVisit
--------------------
VisitId
EmployeeId
Date
Cost
Some important things:
Note that I am not entering the
employees name into the
EmployeeMedicalVisit table, just the EmployeeId. This
helps to maintain data integrity and
complies with First Normal Form
You should read up on 1st, 2nd and
3rd normal forms. Database
normalization is a very imporant
subject and it will make your life
easier if you can grasp them.
With the above structure, when an employee visited a medical office you would insert a record into EmployeeMedicalVisit. To select all medical visits for an employee you would use the query below:
SELECT e.fName, e.lName
FROM Employee e
INNER JOIN EmployeeMedicalVisit as emv
ON e.EployeeId = emv.EmployeeId
Hope this helps!
Here is a sample trigger that may show you waht you need to have:
Create trigger mytabletrigger ON mytable
For INSERT
AS
INSERT MYOTHERTABLE (MytableId, insertdate)
select mytableid, getdate() from inserted
In a trigger you have two psuedotables available, inserted and deleted. The inserted table constains the data that is being inserted into the table you have the trigger on including any autogenerated id. That is how you get the data to the other table assuming you don't need other data at the same time. YOu can get other data from system stored procuders or joins to other tables but not from a form in the application.
If you do need other data that isn't available in a trigger (such as other values from a form, then you need to write a sttored procedure to insert to one table and return the id value through an output clause or using scope_identity() and then use that data to build the insert for the next table.

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources