SQL Server: rearrange and flatten out table of values? - sql-server

I am working on a Stored Procedure that retrieves certain values from my database.
I am able to get the values I need, but I'm having a hard time figuring out how to order them they way I need.
Below is what I'm trying to achieve. I already have table properties (left), and need to create table newProperties by running a SELECT on properties.
Please note:
the field valueTypeID will ALWAYS be either 68 or 80.
the field value will never be the same. Each value will be a long string of chars that changes for each value (I have simplified for my question)

I think this should be the starting point of your SELECT statement:
select
p.genericID,
p68.Value as ValueTypeId68,
p80.Value as ValueTypeId80
from partials p
join partialsProps p68
on p.genericId = p68.genericId
and p68.valueTypeID = 68
join partialsProps p80
on p.genericId = p80.genericId
and p80.valueTypeID = 80
You can add the #cycle and #clientAccountID conditions on top of it.
SQL Fiddle: http://www.sqlfiddle.com/#!6/472de/1

Are 68 and 80 the only possible values for the column valueTypeID? if it's so then the answer provided by w0lf should work for you though I won't say it is the only solution.
However, if there can be multiple values for valueTypeID apart from 60 and 80 then I would like to propose a solution here.
I am assuming that you have numeric and non-numeric values in the column value but the problem you are having is that the numeric and non numeric values come together in a single column hence you want to separate them out. This is the scenario I imagine.
NOTE: If this does not provide you satisfactory solution, please provide the scenario in detail.
GO to SQLFiddle
-- Create table as you have provided in the question
CREATE TABLE #partials
([genericID] int);
INSERT INTO #partials ([genericID])
VALUES (11),(12),(13),(14);
CREATE TABLE #partialsProps
([genericID] int, [valueTypeID] int, [Value] varchar(1));
-- Insert values similar
INSERT INTO #partialsProps
([genericID], [valueTypeID], [Value])
VALUES
(11, 68, 'A'),
(11, 80, '1'),
(12, 68, 'Z'),
(12, 80, '2'),
(13, 68, 'B'),
(13, 80, '3'),
(14, 68, 'Y'),
(14, 80, '4')
;
select r1.value as Val1,r2.Value as Val2 from
(select pp.genericID,value, valueTypeID from #partialsProps pp join #partials p on
p.genericID = pp.genericID
where ISNUMERIC(Value) = 0 group by valueTypeID, value ,pp.genericID) r1
join
(select pp.genericID,value, valueTypeID from #partialsProps pp join #partials p on
p.genericID = pp.genericID
where ISNUMERIC(Value) = 1 group by valueTypeID, value ,pp.genericID) r2
on r1.genericID = r2.genericID ;
drop table #partials;
drop table #partialsProps;

Related

Snowflake: How do I update a column with values taken at random from another table?

I've been struggling with this for a while now. Imagine I have these two tables:
CREATE TEMPORARY TABLE tmp_target AS (
SELECT * FROM VALUES
('John', 43, 'm', 17363)
, ('Mark', 21, 'm', 16354)
, ('Jean', 25, 'f', 74615)
, ('Sara', 63, 'f', 26531)
, ('Alyx', 32, 'f', 42365)
AS target (name, age, gender, zip)
);
and
CREATE TEMPORARY TABLE tmp_source AS (
SELECT * FROM VALUES
('Cory', 42, 'm', 15156)
, ('Fred', 51, 'm', 71451)
, ('Mimi', 22, 'f', 45624)
, ('Matt', 61, 'm', 12734)
, ('Olga', 19, 'f', 52462)
, ('Cleo', 29, 'f', 23352)
, ('Simm', 31, 'm', 62445)
, ('Mona', 37, 'f', 23261)
, ('Feng', 44, 'f', 64335)
, ('King', 57, 'm', 12225)
AS source (name, age, gender, zip)
);
I would like to update the tmp_target table by taking 5 rows at random from the tmp_source table for the column(s) I'm interested in. For example, maybe I want to replace all the names with 5 random names from tmp_source, or maybe I want to replace the names and the ages.
My first attempt was this:
UPDATE tmp_target t SET t.name = s.name FROM tmp_source s;
However, when I examine the target table, I notice that quite a few of the names are duplicated, usually in pairs. As well, Snowflake gives me number of rows updated: 5 as well as number of multi-joined rows updated: 5. I believe this is due to the non-deterministic nature of what's happening, possibly as noted in the Snowflake documentation on updates. Not to mention I get the nagging feeling that this is somehow horribly inefficient if the tables had many records.
Then I tried something to grab 5 random rows from the source table:
UPDATE tmp_target t SET t.name = cte.name
FROM (
WITH upd AS (SELECT name FROM tmp_source SAMPLE ROW (5 ROWS))
SELECT name FROM upd
) AS cte;
But I seem to run into the exact same issue, both when I examine the target table, and as reported by the number of multi-joined rows.
I was wondering if I can use row numbering somehow, but while I can generate row numbers in the subquery, I don't know how to do that in the SET part of the outside query.
I want to add that neither table has any identifiers or indexes that can be used, and I'm looking for a solution that wouldn't require any.
I would very much appreciate it if anyone can provide solutions or ideas that are as clean and tidy as possible, with some consideration given to efficiency (imagine a target table of 100K rows and a source table of 10M rows).
Thank you!
I like the two answers already provided, but let me give you a simple answer to solve the simple case:
UPDATE tmp_target t
SET t.name = (
select array_agg(s.name) possible_names
from tmp_source s
)[uniform(0, 9, random())]
;
The secret of this solution is building an array of possible values, and choosing one at random for each updated row.
Update: Now with a JavaScript UDF that will help us choose each name from source only once
create or replace function incremental_thing()
returns float
language javascript
as
$$
if (typeof(inc) === "undefined") inc = 0;
return inc++;
$$
;
UPDATE tmp_target t
SET t.name = (
select array_agg(s.name) within group (order by random())
from tmp_source s
)[incremental_thing()::integer]
;
Note that the JS UDF returns an incremental value each time it’s called, and that helps me choose the next value from a sorted array to use on an update.
Since the value is incremented inside the JS UDF, this will work as long as there's only one JS env involved. To for single-node processing and avoid parallelism choose an XS warehouse and test.
Two example as follows, the first uses a temporary table to house the joined data by a rownum, the second include everything in the one query, note I used UPPER and lower case strings to make sure the records were being updated the way I wanted.
CREATE OR REPLACE TEMPORARY TABLE tmp_target AS (
SELECT * FROM VALUES
('John', 43, 'm', 17363)
, ('Mark', 21, 'm', 16354)
, ('Jean', 25, 'f', 74615)
, ('Sara', 63, 'f', 26531)
, ('Alyx', 32, 'f', 42365)
AS target (name, age, gender, zip)
);
CREATE OR REPLACE TEMPORARY TABLE tmp_source AS (
SELECT * FROM VALUES
('CORY', 42, 'M', 15156)
, ('FRED', 51, 'M', 71451)
, ('MIMI', 22, 'F', 45624)
, ('MATT', 61, 'M', 12734)
, ('OLGA', 19, 'F', 52462)
, ('CLEO', 29, 'F', 23352)
, ('SIMM', 31, 'M', 62445)
, ('MONA', 37, 'F', 23261)
, ('FENG', 44, 'F', 64335)
, ('KING', 57, 'M', 12225)
AS source (name, age, gender, zip)
);
CREATE OR REPLACE TEMPORARY TABLE t1 as (
with src as (
SELECT tmp_source.*, row_number() over (order by 1) tmp_id
FROM tmp_source SAMPLE ROW (5 ROWS)),
tgt as (
SELECT tmp_target.*, row_number() over (order by 1) tmp_id
FROM tmp_target SAMPLE ROW (5 ROWS))
SELECT src.name as src_name,
src.age as src_age,
src.gender as src_gender,
src.zip as src_zip,
src.tmp_id as tmp_id,
tgt.name as tgt_name,
tgt.age as tgt_age,
tgt.gender as tgt_gender,
tgt.zip as tgt_zip
FROM src, tgt
WHERE src.tmp_id = tgt.tmp_id);
UPDATE tmp_target a
SET a.name = b.src_name,
a.gender = b.src_gender
FROM (SELECT * FROM t1) b
WHERE a.name = b.tgt_name
AND a.age = b.tgt_age
AND a.gender = b.tgt_gender
AND a.zip = b.tgt_zip;
UPDATE tmp_target a
SET a.name = b.src_name,
a.gender = b.src_gender
FROM (
with src as (
SELECT tmp_source.*, row_number() over (order by 1) tmp_id
FROM tmp_source SAMPLE ROW (5 ROWS)),
tgt as (
SELECT tmp_target.*, row_number() over (order by 1) tmp_id
FROM tmp_target SAMPLE ROW (5 ROWS))
SELECT src.name as src_name,
src.age as src_age,
src.gender as src_gender,
src.zip as src_zip,
src.tmp_id as tmp_id,
tgt.name as tgt_name,
tgt.age as tgt_age,
tgt.gender as tgt_gender,
tgt.zip as tgt_zip
FROM src, tgt
WHERE src.tmp_id = tgt.tmp_id) b
WHERE a.name = b.tgt_name
AND a.age = b.tgt_age
AND a.gender = b.tgt_gender
AND a.zip = b.tgt_zip;
At a first pass, this is all that came to mind. I'm not sure if it suits your example perfectly, since it involves reloading the table.
It should be comparably performant to any other solution that uses a generated rownum. At least to my knowledge, in Snowflake, an update is no more performant than an insert (at least in this case where you're touching every record, and every micropartition, regardless).
INSERT OVERWRITE INTO tmp_target
with target as (
select
age,
gender,
zip,
row_number() over (order by 1) rownum
from tmp_target
)
,source as (
select
name,
row_number() over (order by 1) rownum
from tmp_source
SAMPLE ROW (5 ROWS)
)
SELECT
s.name,
t.age,
t.gender,
t.zip
from target t
join source s on t.rownum = s.rownum;

Union of masked values

I've got a problem with making union of 2 tables that have values masked using random function. Unless someone has permission to read all data, user should see random values between (-1000,2000)
making separate views for each table generates the values in a correct way, however there's problem when I try to make union of those 2 tables or views. Instead of seeing random values I see 0 for everything
Let's say there's table A defined as:
ID INT IDENTITY (1, 1) NOT NULL,
Value MONEY MASKED WITH (FUNCTION = 'random(-1000, 20000)') NOT NULL
and table B as:
ID INT IDENTITY (1, 1) NOT NULL,
Value DECIMAL (18, 6) MASKED WITH (FUNCTION = 'random(-1000.000000, 20000.000000)') NULL
in table A:
ID Value
1 12
2 21
3 34
in table B:
ID Value
7 17.12
8 23.01
9 2.56
view on each of table shows ID's of each tables and masked values for user without permissions and values as in table for user with permissions which is correct
however UNION of both tables should show ID's and masked values but instead it shows values = 0.000000.
I'm kinda confused on how to make those values appear as masked random values in union
You can UNION / UNION ALL a select of the same table without problems. But using two different tables gives 0 for all masked values.
You can use the following as a workaround using temporary tables:
SELECT * INTO #t1 FROM TableA
SELECT * INTO #t2 FROM TableB
SELECT * FROM #t1
UNION ALL
SELECT * FROM #t2
demo on dbfiddle.uk

sql server - showing values as different values

I have the following situation in Sql Server:
In a table we have a column that we fill with three possible values 1, 2 and 3. Let's suppose that 1 means yes, 2 means no and 3 means maybe.
Is there a way to select this column showing the values as yes, no and maybe instead of 1, 2 and 3?
Yes ..you can use CASE Expression to do that
select
case value
when 1 then 'yes'
when 2 then 'no'
when 3 then 'maybe'
end
from
table
You can use Case for this as shown below
SELECT
CASE test_column
WHEN 1THEN 'YES'
WHEN 2 THEN 'NO'
ELSE 'MAY BE'
END as test_op
FROM table1
Yes it is possible, you can use case which would be something like this
select
case when field = 1 then 'YES'
when field = 2 then 'NO' else 'MAYBE' end FieldName
from table
Note: any value other than 1 or 2 would be maybe, you can add another case for the number 3.
An alternative to the case/when statements seen in other answers is to create a reference table that contains the description of the values (1/2/3).
The biggest advantage of doing it this way is that if this is used in multiple places, you can update all of them at once.
The dbo.curValDesc table is the reference table you'd need to create/populate. Then your consuming queries can look like the one at the bottom.
create table dbo.existingData
(
rowID int
, curVal tinyint --the column with values of 1/2/3
)
insert into dbo.existingData
values (1, 1)
, (2, 2)
, (3, 3)
, (4, 3)
, (5, 2)
, (6, 1)
, (7, 1)
create table dbo.curValDesc
(
curVal tinyint
, curValDesc varchar(10)
)
insert into dbo.curValDesc
values (1, 'Yes')
, (2, 'No')
, (3, 'Maybe')
select ed.rowID
, cvd.curValDesc
from dbo.existingData as ed
inner join dbo.curValDesc as cvd on ed.curVal = cvd.curVal

SQL Update NULL value from select statement query

I'm new to posting on this site, but been using it for a while to get assistance to SQL queries.
I have an issue that I'm trying to resolve. I have 2 columns in a query which are machine and ID, for some machines the ID will be NULL, but for others they will have an ID value as set out below.
Machine ID
test1 3
test12 NULL
test3 4
test4 NULL
As the ID's will be present in the table, I need to update the NULL values, if the machine name is like the one which has a value, for example test 1 and test12 both should have ID 3, but test12 is showing NULL. What I want to be able to do is to replace the NULL for test12 with ID = 3, as the machine names are similar.
I have tried COALESCE, ISNULL and CASE, which all will update the values, but I need know the value, but I wont know it until I have done the select statement.
Any ideas on how to resolve this please?
As noted in the comments you will have to work out your match formula and specify it in the join condition. I believe the query you are looking for is:
Create Table Machines (Name Varchar(8000), ID Int)
Insert Into Machines Values ('test1', 3)
Insert Into Machines Values ('test12', Null)
Insert Into Machines Values ('test3', 4)
Insert Into Machines Values ('test4', Null)
Insert Into Machines Values ('test89', Null)
Insert Into Machines Values ('test8', 5)
Insert Into Machines Values ('test64', Null)
Update M1 Set M1.ID = M2.ID
From Machines M1
Join (Select Left(Name,5) NamePrefix, Max(ID) ID
From Machines
Where ID Is Not Null
Group By Left(Name,5)) M2
On Left(M1.Name, 5) = M2.NamePrefix
Where M1.ID Is Null
Select * From Machines
Note that I used a group by in the joined query in case multiple rows match and we only want one value returned. You can use window functions or other logic instead of the group by if you want to pick specifically which match is chosen.
Based on the requirements in your last comment:
There will be a number of records in the table, they are grouped by the letter before and after the '-' i.e. AB-CDE-L111, AB-CDE-L112, AB-CDE-L113, AB-CDE-L124, AB-CDE-L116 all of these should have in the query an ID of 45. The next set of machines will be AB-CCC-L111, AB-CCC-L112, AB-CCC-L115 all of these should have in the query an ID of 47 and finally there will be the last set of machine, AB-BBB-L113, AB-BBB-L144, AB-BBB-L115, AB-BBB-L120 all of these should have in the query an ID of 50. In the query, a machine returns a NULL ID then I need to update the query results, not the table.
So a SELECT query to get you your results would be:
declare #machine table (Machine varchar(30) not null, ID int null)
insert into #machine
values ('AB-CDE-L111', NULL),
('AB-CDE-L112', NULL),
('AB-CDE-L113', 45),
('AB-CDE-L124', NULL),
('AB-CDE-L116', NULL),
('AB-CCC-L111', NULL),
('AB-CCC-L112', NULL),
('AB-CCC-L113', 47),
('AB-CCC-L124', NULL),
('AB-CCC-L116', NULL),
('AB-BBB-L111', NULL),
('AB-BBB-L112', NULL),
('AB-BBB-L113', 50),
('AB-BBB-L124', NULL),
('AB-BBB-L116', NULL)
select m1.Machine, m2.ID
from #machine m1
inner join #machine m2
on m2.ID is not null
and left(m2.Machine, 6) = left(m1.Machine, 6)
order by m1.Machine
This assumes that:
1) There is always the same amount of characters making up the prefix to the machine code.
2) That there is only one Machine in each group that has been assigned an ID.
If either of these assumptions is wrong then you may need to do extra string manipulation in the case of 1) and use some kind of function (ROW_NUMBER etc.) in the case of 2) to avoid duplicate rows (although you could just use SELECT DISTINCT if the IDs would be the same).

allocating content to fixed size buckets without looping in SQL Server

I am working in SQL Server 2008 R2 with a priority ordered set of content that must be assigned to a set of buckets to achieve a content specified value. Each item in the content list is related to nodes within a ragged tree hierarchy (the buckets). Each bucket has a value assigned to it and can hold a fixed quantity of content.
I am trying to allocate content in priority order to the buckets that they relate to (or any parent/grandparent up the tree from related content). I must start with the highest bucket value (with empty spaces) and stop only when the bucket values match or exceed my content value.
Hopefully my crude example will help. Assuming the B’s are buckets that can each hold 2 pieces of content and C’s are content. The bracketed numbers are the bucket value and required content value.
C1 would result in being allocated to B1 (highest value in B1’s tree) and B4 to give it a total value of 7. Both B1 an B4 now only have one slot remaining.
C2 would be allocated B1 and B5 leaving no slots in B1 and 1 slot in B2.
C3 would not be able to use B1 as there are no slots available, so would result in B2, B5 and B9 leaving no slots in B5 and one slot in B2 / B5.
And so on...
I can see how to achieve this iteratively by creating a list of all buckets and their relationship with all child / grand child buckets. Looping though content one at a time, assigning its' buckets and reducing the remaining bucket spaces. The reason I feel that it needs to be a loop is due to the unknown number of spaces remaining in each bucket based on processing all higher priority content.
But looping through content one at a time feels intrinsically wrong and there must be a more efficient way to solve this allocation problem – ideally in one pass…
Example SQL Server code (to match the above diagram)
--core table/fields
CREATE TABLE Bucket
(
Id int,
Name varchar(3),
BucketValue int,
SlotRemaining int --only required for my solution to hold number of slots left to fill
)
CREATE TABLE BucketParent
(
ChildBucketId int,
ParentBucketId int
)
CREATE TABLE Content
(
Id int,
Name varchar(3),
ContentValue int,
AllocationState int, --only required for my solution to identify content that still needs processing
--1=unprocessed, 2=Complete
Priority int --order to work through content 1=most imnportant
)
CREATE TABLE ContentBucket
(
ContentId int,
BucketId int
)
Go
CREATE TABLE ContentPriorityBucket -- table to record my allocation of content to the most valuable bucket
(
ContentId int,
BucketId int
)
Go
--test data to match example (wish id made it smaller now :)
INSERT INTO Bucket Values (1,'B1', 4, null)
INSERT INTO Bucket Values (2,'B2', 5, null)
INSERT INTO Bucket Values (3,'B3', 4, null)
INSERT INTO Bucket Values (4,'B4', 3, null)
INSERT INTO Bucket Values (5,'B5', 3, null)
INSERT INTO Bucket Values (6,'B6', 3, null)
INSERT INTO Bucket Values (7,'B7', 4, null)
INSERT INTO Bucket Values (8,'B8', 2, null)
INSERT INTO Bucket Values (9,'B9', 1, null)
INSERT INTO Bucket Values (10,'B10', 2, null)
INSERT INTO Bucket Values (11,'B11', 1, null)
INSERT INTO BucketParent Values (8, 4)
INSERT INTO BucketParent Values (4, 1)
INSERT INTO BucketParent Values (9, 5)
INSERT INTO BucketParent Values (5, 1)
INSERT INTO BucketParent Values (5, 2)
INSERT INTO BucketParent Values (10, 5)
INSERT INTO BucketParent Values (10, 6)
INSERT INTO BucketParent Values (6, 2)
INSERT INTO BucketParent Values (6, 3)
INSERT INTO BucketParent Values (11, 6)
INSERT INTO BucketParent Values (11, 7)
INSERT INTO BucketParent Values (7, 3)
INSERT INTO Content Values (1,'C1', 5, null, 1)
INSERT INTO Content Values (2,'C2', 8, null, 2)
INSERT INTO Content Values (3,'C3', 9, null, 3)
INSERT INTO Content Values (4,'C4', 10, null, 4)
INSERT INTO ContentBucket Values (1,8)
INSERT INTO ContentBucket Values (1,4)
INSERT INTO ContentBucket Values (2,9)
INSERT INTO ContentBucket Values (3,9)
INSERT INTO ContentBucket Values (4,10)
INSERT INTO ContentBucket Values (4,7)
GO
--Iterative solution that I am trying to improve on
UPDATE Bucket
SET SlotRemaining = 2 --clear previous run and allocate maximum bucket size
UPDATE Content
SET AllocationState = 1 --set state to unprocessed
--Clear last run
TRUNCATE Table ContentPriorityBucket
GO
DECLARE #ContentToProcess int = 0
DECLARE #CurrentContent int
DECLARE #CurrentContentValue int
SELECT #ContentToProcess = COUNT(id) FROM Content WHERE AllocationState =1
WHILE (#ContentToProcess > 0)
BEGIN
-- get next content to process
SELECT Top(1) #CurrentContent = ID,
#CurrentContentValue = ContentValue
FROM Content
WHERE AllocationState =1
ORDER BY Priority;
WITH BucketList (Id, BucketValue, SlotRemaining)
as
(
-- list buckets related to content
SELECT b.Id
,b.BucketValue
,b.SlotRemaining
FROM ContentBucket cb
INNER JOIN Bucket b on cb.BucketId = b.Id
WHERE cb.ContentId = #CurrentContent
-- need to pull back all buckets (even those that are full as they may have empty parents)
UNION ALL
SELECT b.Id
,b.BucketValue
,b.SlotRemaining
FROM BucketList bl
INNER JOIN BucketParent bp on bl.Id = bp.ChildBucketId
INNER JOIN Bucket b on bp.ParentBucketId = b.Id
),
DistinctBucketList (Id, BucketValue, SlotRemaining)
as
(
--dedupe buckets
SELECT distinct Id
, BucketValue
, SlotRemaining
FROM BucketList
),
BucketListOrdered (Id, BucketValue, RowOrder)
as
(
--order buckets
SELECT Id
,BucketValue
,ROW_NUMBER() OVER (ORDER BY BucketValue desc, Id)-- added id to get consistant result if two buckets have same value
FROM DistinctBucketList
WHERE SlotRemaining >0
),
CulmativeBucketListWithinRequiredValue (Id, RowOrder, CulmativeBucketValue, RequiredBucket)
as
(
-- this will mark all buckets up to the bucket value, but will be 1 bucket short
SELECT blo.Id
,blo.RowOrder
,SUM(blc.BucketValue) CulmativeBucketValue
,CASE
WHEN SUM(blc.BucketValue) <=#CurrentContentValue THEN 1
ELSE 0
END RequiredBucket
FROM BucketListOrdered blo
LEFT JOIN BucketListOrdered blc ON blc.RowOrder <= blo.RowOrder
GROUP BY blo.Id, blo.RowOrder
)
-- this will identify all buckets required to top content value
INSERT INTO ContentPriorityBucket
SELECT #CurrentContent
,b.Id
FROM CulmativeBucketListWithinRequiredValue b
WHERE b.RowOrder <= (SELECT Max(RowOrder) + 1 FROM CulmativeBucketListWithinRequiredValue WHERE RequiredBucket =1)
--reduce all used bucket sizes by 1 (could alternatively determine this from ContentPriorityBucket)
UPDATE Bucket
SET SlotRemaining = SlotRemaining -1
WHERE id in (SELECT BucketId FROM ContentPriorityBucket WHERE ContentId = #CurrentContent)
-- update processed bucket
UPDATE Content
SET AllocationState = 2
WHERE #CurrentContent = Id
SELECT #ContentToProcess = COUNT(id) FROM Content WHERE AllocationState =1
END
SELECT ContentId, BucketId FROM ContentPriorityBucket
/*
DROP TABLE Bucket
DROP TABLE BucketParent
DROP TABLE Content
DROP TABLE ContentBucket
DROP TABLE ContentPriorityBucket
*/
There are a couple points to make about this problem.
First, generalized bin-packing is a NP-Complete problem, and therefore cannot be solved in general in a single pass. This specific bin-packing, since it is an ordered packing, may be different, but the issue of the complexity of the problem remains; it's certainly not O(1), so it may need a loop no matter what.
1-pass non-looping solutions for this seem like they should not be possible; it looks like a problem that isn't made for set-based solutions. You could create a table-valued CLR function, which could find the bucket that each item fits into. Otherwise, keeping the looping solution would be fine. (If you post the code, it might be easier to see if there are improvements possible.)

Resources