I have a question in regards to adding data to a particular column of a table, i had a post yesterday where a user guided me (thanks for that) to what i needed and said an update was the way to go for what i need, but i still can't achieve my goal.
i have two tables, the tables where the information will be added from and the table where the information will be added to, here is an example:
source_table (has only a column called "name_expedient_reviser" that is nvarchar(50))
name_expedient_reviser
kim
randy
phil
cathy
josh
etc.
on the other hand i have the destination table, this one has two columns, one with the ids and the other where the names will be inserted, this column values are null, there are some ids that are going to be used for this.
this is how the other table looks like
dbo_expedient_reviser (has 2 columns, unique_reviser_code numeric PK NOT AI, and name_expedient_reviser who are the users who check expedients this one is set as nvarchar(50)) also this is the way this table is now:
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | NULL
2 | NULL
3 | NULL
4 | NULL
5 | NULL
6 | NULL
what i need is the information of the source_table to be inserted into the row name_expedient_reviser, so the result should look like this
dbo_expedient_reviser
unique_reviser_code | name_expedient_reviser
1 | kim
2 | randy
3 | phil
4 | cathy
5 | josh
6 | etc.
how can i pass the information into this table? what do i have to do?.
EDIT
the query i saw that should have worked doesn't update which is this one:
UPDATE dbo_expedient_reviser
SET dbo_expedient_reviser.name_expedient_reviser = source_table.name_expedient_reviser
FROM source_table
JOIN dbo_expedient_reviser ON source_table.name_expedient_reviser = dbo_expedient_reviser.name_expedient_reviser
WHERE dbo_expedient_reviser.name_expedient_reviser IS NULL
the query was supposed to update the information into the table, extracting it from the source_table as long as the row name_expedient_reviser is null which it is but is doesn't work.
Since the Names do not have an Id associated with them I would just use ROW_NUMBER and join on ROW_NUMBER = unique_reviser_code. The only problem is, knowing what rows are null. From what I see, they all appear null. In your data, is this the case or are there names sporadically in the table like 5,17,29...etc? If the name_expedient_reviser is empty in dbo_expedient_reviser you could also truncate the table and insert values directly. Hopefully that unique_reviser_code isn't already linked to other things.
WITH CTE (name_expedient_reviser, unique_reviser_code)
AS
(
SELECT name_expedient_reviser
,ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
FROM source_table
)
UPDATE er
SET er.name_expedient_reviser = cte.name_expedient_reviser
FROM dbo_expedient_reviser er
JOIN CTE on cte.unique_reviser_code = er.unique_reviser_code
Or Truncate:
Truncate Table dbo_expedient_reviser
INSERT INTO dbo_expedient_reviser (name_expedient_reviser, unique_reviser_code)
SELECT DISTINCT
unique_reviser_code = ROW_NUMBER() OVER (ORDER BY name_expedient_reviser)
,name_expedient_reviser
FROM source_table
it is not posible to INSERT the data into a single column, but to UPDATE and move the data you want is the only way to go in that cases
Related
I am trying to use an SQL Query to create some client side reporting for my company. There exists 3 tables that I would like to join together. One of the tables may require a CTE as I need to recursively go through a table and return a row. Here is how the tables are structured (simply).
I want a output table that, for each WorkOrder, displays the most recently completed task in DataCollection (including its time) and the next Op in the TaskListing. I figured a CTE maybe is the only way to recursively go through each row and determine what task is next. (By checking if the completed Op exists in PreOp column). If the completed cell doesn't exist as a preOp it should default to the MAX(Op) (the last task).
CREATE TABLE [dbo].[WorkOrder](
[WorkOrderID][int] NOT NULL PRIMARY KEY,
[Column1] [nvarchar](20),
[Column2] [nvarchar](20)
)
INSERT INTO WorkOrder VALUES(1,'x','y');
INSERT INTO WorkOrder VALUES(2,'x','y');
INSERT INTO WorkOrder VALUES(3,'x2','y2');
CREATE TABLE [dbo].[DataCollection](
[DataCollection][int] NOT NULL PRIMARY KEY,
[WorkOrderID][int] NOT NULL FOREIGN KEY REFERENCES WorkOrder(WorkOrderID),
[CellTask] [nvarchar](20),
[TimeCompleted] [DateTime]
)
INSERT INTO DataCollection VALUES(1,1,'cella','2016-08-09 00:00:00');
INSERT INTO DataCollection VALUES(2,1,'cellb','2016-08-10 00:00:00');
INSERT INTO DataCollection VALUES(3,1,'cellc','2016-08-11 00:00:00');
INSERT INTO DataCollection VALUES(4,2,'cella','2016-08-09 00:00:00');
INSERT INTO DataCollection VALUES(5,2,'cellb','2016-08-10 00:00:00');
CREATE TABLE [dbo].[TaskListing](
[TaskListingID][int] NOT NULL PRIMARY KEY,
[WorkOrderID][int] NOT NULL FOREIGN KEY REFERENCES WorkOrder(WorkOrderID),
[Op][nvarchar](20) NOT NULL,
[preOP][nvarchar](20),
[CellTask][nvarchar](20) NOT NULL,
[Completed][bit] NOT NULL
)
INSERT INTO TaskListing VALUES(1,1,'10',NULL,'cella',0);
INSERT INTO TaskListing VALUES(2,1,'20','10','cellb',0);
INSERT INTO TaskListing VALUES(3,1,'30',NULL,'cellc',1);
INSERT INTO TaskListing VALUES(4,1,'40','10,30','celld',0);
INSERT INTO TaskListing VALUES(5,2,'10',NULL,'cella',1);
INSERT INTO TaskListing VALUES(6,2,'20','10','cellb',1);
INSERT INTO TaskListing VALUES(7,2,'30','20','cellc',0);
The Output table will represent, for each WorkOrder, the most recently completed cell (from the DataCollection Table, TimeCompleted column) & The next cell in the Work Flow (by looking at the rows on the TaskListing Table for the given WorkOrderID and looking for a row that contain the completed task as a 'PreOp'). If it can't find the completed task as a preOp for any other row it should default to the last task.
The part of the Query I'm having most trouble with is filling in the NextTaskCell column. I need to write a query that looks at all the tasks for a given WorkOrderID (In the TaskListing Table) and based on the completed task, determine what is the next task. I'm finding it difficult to feed in both a WorkOrderID & CellTask then find an instance of itself in the PreOp column.
Output Table
+-------------+-------------------+---------------------+--------------+
| WorkOrderId | LastCompletedCell | CompletedOn | NextTaskCell |
|(WorkOrder) | (DataCollection) | (DataCollection) |(TaskListing) |
+-------------+-------------------+---------------------+--------------+
| 1 | cellc | 2016-08-11 00:00:00 | celld |
| 2 | cellb | 2016-08-10 00:00:00 | cellc |
+-------------+-------------------+---------------------+--------------+
I thank you in advance for your time. If there is any other questions please let me know and I'll try to answer them.
Link to SQL Fiddle SQL Fiddle
The following query gives you the expected output you have in your question. You should test this query against a larger dataset to make sure it is correct in all cases.
;WITH
mtc AS ( -- most recent completion date/time for a work order
SELECT
dc.WorkOrderID,
TimeCompleted=MAX(dc.TimeCompleted)
FROM
DataCollection AS dc
GROUP BY
dc.WorkOrderID
),
lop AS ( -- last operation for work order
SELECT
tl.WorkOrderID,
LastOp=MAX(CAST(tl.Op AS INT))
FROM
TaskListing AS tl
GROUP BY
tl.WorkOrderID
)
SELECT
mtc.WorkOrderID,
LastCompletedCell=dc.CellTask,
CompletedOn=dc.TimeCompleted,
NextTaskCell=ISNULL(tl_next.CellTask,tl_last.CellTask)
FROM
mtc
INNER JOIN DataCollection AS dc ON -- the last completed CellTask
dc.WorkOrderID=mtc.WorkOrderID AND
dc.TimeCompleted=mtc.TimeCompleted
INNER JOIN TaskListing AS tl ON -- Op for CellTask
tl.WorkOrderID=mtc.WorkOrderID AND
tl.CellTask=dc.CellTask
INNER JOIN lop ON
lop.WorkOrderID=mtc.WorkOrderID
INNER JOIN TaskListing AS tl_last ON -- CellTask for last Op
tl_last.WorkOrderID=mtc.WorkOrderID AND
tl_last.Op=lop.LastOp
LEFT JOIN TaskListing AS tl_next ON -- Look for next CellTask where Op is a PreOp of another CellTask
tl_next.WorkOrderID=mtc.WorkOrderID AND
','+tl_next.preOP+',' LIKE '%,'+tl.Op+',%'
ORDER BY
mtc.WorkOrderId;
Note: It is a bad idea to store PreOps as a comma-separated string. This is not how you should store data in relational databases. When you do, you will have to resort to more complex and less efficient queries. To wit, see the join condition in tl_next.
Instead you should have a table to store PreOps as separate rows, linking to the parent Op that depends on it.
The short version is I'm trying to map from a flat table to a new set of tables with a stored procedure.
The long version: I want to SELECT records from an existing table, and then for each record INSERT into a new set of tables (most columns will go into one table, but some will go to others and be related back to this new table).
I'm a little new to stored procedures and T-SQL. I haven't been able to find anything particularly clear on this subject.
It would appear I want to something along the lines of
INSERT INTO [dbo].[MyNewTable] (col1, col2, col3)
SELECT
OldCol1, OldCol2, OldCol3
FROM
[dbo].[MyOldTable]
But I'm uncertain how to get that to save related records since I'm splitting it into multiple tables. I'll also need to manipulate some of the data from the old columns before it will fit into the new columns.
Thanks
Example data
MyOldTable
Id | Year | Make | Model | Customer Name
572 | 2001 | Ford | Focus | Bobby Smith
782 | 2015 | Ford | Mustang | Bobby Smith
Into (with no worries about duplicate customers or retaining old Ids):
MyNewCarTable
Id | Year | Make | Model
1 | 2001 | Ford | Focus
2 | 2015 | Ford | Mustang
MyNewCustomerTable
Id | FirstName | LastName | CarId
1 | Bobby | Smith | 1
2 | Bobby | Smith | 2
I would say you have your OldTable Id to preserve in new table till you process data.
I assume you create an Identity column Id on your MyNewCarTable
INSERT INTO MyNewCarTable (OldId, Year, Make, Model)
SELECT Id, Year, Make, Model FROM MyOldTable
Then, join the new table and above table to insert into your second table. I assume your MyNewCustomerTable also has Id column with Identity enabled.
INSERT INTO MyNewCustomerTable (CustomerName, CarId)
SELECT CustomerName, new.Id
FROM MyOldTable old
JOIN MyNewCarTable new ON old.Id = new.OldId
Note: I have not applied Split of Customer Name to First Name and
Last Name as I was unsure about existing data.
If you don't want your OldId in MyNewCarTable, you can DELETE it
ALTER TABLE MyNewCarTable DROP COLUMN OldId
You are missing a step in your normalization. You do not need to duplicate your customer information per vehicle. You need three tables for 4th Normal form. This will reduce storage size and more importantly allow an update to the customer data to take place in one location.
Customer
CustomerID
FirstName
LastName
Car
CarID
Make
Model
Year
CustomerCar
CustomerCarID
CarID
CustomerID
DatePurchaed
This way you can have multiple owners per car, multiple cars per owner and only one record needs to be updated per car and or customer...4th Normal Form.
If I am reading this correctly, you want to take each row from table 1, and create a new record into table A using some of that row data, and then data from the same original row into Table B, Table C but referencing back to Table A again?
If that's the case, you will create TableA with an Identity and make thats the PK.
Insert the required column data into that table and use the #IDENTITY to retrieve the last identity value, then you will insert the remaining data from the original table into the other tables, TableB, TableC, etc. and use the identity you retrieved from TableA as the FK in the other tables.
By Example:
Table 1 has columns col1, col2, col3, col4, col5
Table A has TabAID, col1, col2
Table B has TabBID, TabAID, col3
TableC has TabCID, TabAID, col4
When the first row is read, the values for col1 & col2 are inserted into TableA.
The Identity is captured from that row inserted, and then value for col3 AND the identity are entered into TableB, and then value for col4 AND the identity are entered into TableC.
This is a standard data migration technique for normalizing data.
Hope this assists,
Lets say that I have to store the following information in my database,
Now my database tables will be designed and structured like this,
In a later date, if I had to add another sub category level how will I be able to achieve this without having to change the database structure at all?
I have heard of defining the columns as row data in a table and using pivots to extract the details later on...Is that the proper way to achieve this?
Can someone please enlighten me or guide me in the proper direction? Thanks in advance...
:)
It would be difficult to add more columns to your table when new levels are to be generated. The best way is to use a Hierarchy table to maintain Parent-Child relationship.
Table : Items
x----x------------------x------------x
| ID | Items | CategoryId |
|----x------------------x------------x
| 1 | Pepsi | 3 |
| 2 | Coke | 3 |
| 3 | Wine | 4 |
| 4 | Beer | 4 |
| 5 | Meals | 2 |
| 6 | Fried Rice | 2 |
| 7 | Black Forest | 7 |
| 8 | XMas Cake | 7 |
| 9 | Pinapple Juice | 8 |
| 10 | Apple Juice | 8 |
x----x------------------x------------x
Table : Category
In category table, you can add categories to n levels. In Items table, you can store the lowest level category. For example, take the case of Pepsi - its categoryId is 3. In Category table, you can find its parent using JOINs and find parent's parents using Hierarchy queries.
In Category table, the categories with ParentId is null(that is with no parentId) will be the MainCategory and the other items with ParentId will be under SubCategory.
EDIT :
Any how you need to alter the tables, because as per your current schema, you cannot add column to the first table because the number of Sub category may keep on changing. Even if you create a table as per Rhys Jones answer, you have to join two tables with string. The problem in joining with string is that, when there is a requirement to change the Sub category or Main category name, you have to change in every table which you be fall to trouble in future and is not a good database design. So I suggest you to follow the below pattern.
Here is the query that get the parents for child items.
DECLARE #ITEM VARCHAR(30) = 'Black Forest'
;WITH CTE AS
(
-- Finds the original parent for an ITEM ie, Black Forest
SELECT I.ID,I.ITEMS,C.CategoryId,C.Category,ParentId,0 [LEVEL]
FROM #ITEMS I
JOIN #Category C ON I.CategoryId=C.CategoryId
WHERE ITEMS = #ITEM
UNION ALL
-- Now it finds the parents with hierarchy level for ITEM
-- ie, Black Forest. This is called Recursive query, which works like loop
SELECT I.ID,I.ITEMS,C.CategoryId,C.Category,C.ParentId,[LEVEL] + 1
FROM CTE I
JOIN #Category C ON C.CategoryId=I.ParentId
)
-- Here we keep a column to show header for pivoting ie, CATEGORY0,CATEGORY1 etc
-- and keep these records in a temporary table #NEWTABLE
SELECT ID,ITEMS,CATEGORYID,CATEGORY,PARENTID,
'CATEGORY'+CAST(ROW_NUMBER() OVER(PARTITION BY ITEMS ORDER BY [LEVEL] DESC)-1 AS VARCHAR(4)) COLS,
ROW_NUMBER() OVER(PARTITION BY ITEMS ORDER BY [LEVEL] DESC)-1 [LEVEL]
INTO #NEWTABLE
FROM CTE
ORDER BY ITEMS,[LEVEL]
OPTION(MAXRECURSION 0)
Here is the result from the above query
Explanation
Black Forest comes under Cake.
Cake comes under Bakery.
Bakery comes under Food.
Like this you can create children or parent for any number of levels. Now if you want to add a parent to Food and Beverage, for eg, Food Industry, just add Food Industry to Category table and keep Food Industry's Id as ParentId for Food and Beverage. Thats all.
Now if you want do pivoting, you can follow the below procedures.
1. Get values from column to show those values as column in pivot
DECLARE #cols NVARCHAR (MAX)
SELECT #cols = COALESCE (#cols + ',[' + COLS + ']', '[' + COLS + ']')
FROM (SELECT DISTINCT COLS,[LEVEL] FROM #NEWTABLE) PV
ORDER BY [LEVEL]
2. Now use the below PIVOT query
DECLARE #query NVARCHAR(MAX)
SET #query = 'SELECT * FROM
(
SELECT ITEMS, CATEGORY, COLS
FROM #NEWTABLE
) x
PIVOT
(
MIN(CATEGORY)
FOR COLS IN (' + #cols + ')
) p
ORDER BY ITEMS;'
EXEC SP_EXECUTESQL #query
Click here to view result
You will get the below result after the pivot
NOTE
If you want all the records irrespective of an item, remove the WHERE clause inside CTE. Click here to view result.
Now I have provided order of columns in pivot table as DESC ie, its shows top-level parent.....Item's parent. If you want to show Item's parent first followed be next level and top-level parent at last, you can change DESC inside the ROW_NUMBER() to ASC. Click here to view result.
According to your schema there's no relationship between 'main category' and 'sub category' but your sample data suggests there would be a relationship, i.e. Alcohol IS A Beverage etc. This sounds like a hierarchy of categories, in which case you could you a single self-referencing Category table instead;
create table dbo.Category (
CategoryID int not null constraint PK_Category primary key clustered (CategoryID),
ParentCategoryID int not null,
CategoryName varchar(100) not null
)
alter table dbo.Category add constraint FK_Category_Category foreign key(ParentCategoryID) references dbo.Category (CategoryID)
insert dbo.Category values (1, 1, 'Beverages')
insert dbo.Category values (2, 1, 'Soft Drink')
insert dbo.Category values (3, 1, 'Alcohol')
This way you can create as many levels of category as you want. Any category where ParentCategoryID = CategoryID is a top level category.
Hope this helps,
Rhys
In order to add a new sub category, you should add the category to the table "ItemSubCategory1" after that you can easily add it to the "Drinks" table.
For Example:
If there is a new category name "Hot Drinks" and a new item "Coffee" which comes in Beverages main category (let CatId=1, MainCatText='Beverages' in ItemMainCategory table) then
INSERT INTO ItemSubCategory1(CatId,SubCatText) VALUES(4,'Hot Drinks')
INSERT INTO Drinks(ItemId,ItemName,ItemMainCategory,ItemSubCategory)
VALUES(5,'Coffee',1,4)
I am using MS SQL to create a report to merge 2 tables. The problem is that I need 2 different headers and 1 column needs to have values from 2 different fields from the 2 tables.
Sample
Material | Plant
-------------------------------------------------
Component | Quantity
XXX - Material | ABC--Plant
--------------------------------------------------
YYYY-Component | 3000- Quantity
Is this even possible?
select * from
(
(select *,rn=row_number()over(order by column1) from table1)x,
(select *,rn1=row_number()over(order by column2) from table2)y
)
where x.rn=y.rn1
Firstly you need to give a extra column say rownumber,then repeat same for table2
then you can join using the row_numbers
I have a table that looks similar to this:
session_id | sku
------------|-----
a | 1
a | 2
a | 3
a | 4
b | 2
b | 3
c | 3
I want to pivot this into a table similar to this:
sku1 | sku2 | score
------|------|------
1 | 2 | 1
1 | 3 | 1
1 | 4 | 1
2 | 3 | 2
2 | 4 | 1
3 | 4 | 1
The idea is to store a denormalised table that allows one to look up for a given sku, what other skus are related to sessions it has been related to, and how many times both skus are related to the same session.
What algorithms, patterns or strategies could you suggest for implementing this in PostgreSQL or other technologies?
I realise that this kind of lookup can be done on the original table using counts, or using a facetting search engine. However, I want to make the reads more performant, and just want to keep the overall statistics. The idea is that I will be performing this pivot regularly on the newest few thousand rows in the first table, then storing the result in the second. I'm only concerned with approximate statistics for the second table.
I've got some SQL that works, but VERY slowly. Also looking into the potential for using a graph database of some sort, but wanted to avoid adding another technology for a small part of the app.
Update: The SQL below seems performant enough. I can convert 1.2 million rows in the first table (tags) into 250k rows in the second table (product_relations) with around 2-3k variations of sku in about 5 minutes on my iMac. I will realistically be denormalising only up to 10k rows per day. Question is whether this is actually the best approach. Seems a little dirty to me.
BEGIN;
CREATE
TEMPORARY TABLE working_tags(tag_id int, session_id varchar, sku varchar) ON COMMIT DROP;
INSERT INTO working_tags
SELECT id,
session_id,
sku
FROM tags
WHERE time < now() - interval '12 hours'
AND processed_product_relation IS NULL
AND sku IS NOT NULL LIMIT 200000;
CREATE
TEMPORARY TABLE working_relations (sku1 varchar, sku2 varchar, score int) ON COMMIT DROP;
INSERT INTO working_relations
SELECT a.sku AS sku1,
b.sku AS sku2,
count(DISTINCT a.session_id) AS score
FROM working_tags AS a
INNER JOIN working_tags AS b ON a.session_id = b.session_id
AND a.sku < b.sku
WHERE a.sku IS NOT NULL
AND b.sku IS NOT NULL
GROUP BY a.sku,
b.sku;
UPDATE product_relations
SET score = working_relations.score+product_relations.score
FROM working_relations
WHERE working_relations.sku1 = product_relations.sku1
AND working_relations.sku2 = product_relations.sku2;
INSERT INTO product_relations (sku1, sku2, score)
SELECT working_relations.sku1,
working_relations.sku2,
working_relations.score
FROM working_relations
LEFT OUTER JOIN product_relations ON (working_relations.sku1 = product_relations.sku1
AND working_relations.sku2 = product_relations.sku2)
WHERE product_relations.sku1 IS NULL;
UPDATE tags
SET processed_product_relation = TRUE
WHERE id IN
(SELECT tag_id
FROM working_tags);
COMMIT;
If I've interpreted your intention correctly (per comments) this should do it:
SELECT
s1.sku AS sku1,
s2.sku AS sku2,
count(session_id)
FROM session s1
INNER JOIN session s2 USING (session_id)
WHERE s1.sku < s2.sku
GROUP BY s1.sku, s2.sku
ORDER BY 1,2;
See: http://sqlfiddle.com/#!15/2e0b2/1
In other words: Self-join session, then find all pairings of SKUs for each session ID, excluding ones where the left is greater than or equal to the right in order to avoid repeating pairings - if we have (1,2,count) we don't want (2,1,count) as well. Then group by the SKU pairings and count how many rows are found for each pairing.
You may want to count(distinct session_id) instead, if your SKU pairings can repeat and you want to exclude duplicates. There will probably be more efficient ways to do that, but that's the simplest.
An index on at least session_id will be very useful. You may also want to mess with planner cost parameters to make sure it chooses a good plan - in particular, make sure effective_cache_size is accurate and random_page_cost vs seq_page_cost reflects your caching and I/O costs. Finally, throw as much work_mem at it as you can afford.
If you're creating a materialized view, just CREATE UNLOGGED TABLE whatever AS SELECT .... . That way you minimise the numer of writes/rewrites/overwrites.