Joining 2nd Table with Random Row to each record - sql-server

I need to join table B to Table A, where Table B's records are randomly assigned, or joined. Most of the queries out there are based off of having a key between them and conditions, where I just want to randomly join records without a key.
I'm not sure where to start, as none of the queries I've found are doing this. I assume a nested join could be helpful for this, but how can I randomly assort the records on join?
**Table A**
| Associate ID| Statement|
|:----: |:------:|
| 33691| John is |
| 82451| Susie is |
| 25485| Sam is|
| 26582| Lonnie is|
| 52548| Carl is|
**Table B**
| RowID | List|
|:----: |:------:|
| 1| admirable|
| 2| astounding|
| 3| excellent|
| 4| awesome|
| 5| first class|
The result would be something like this, where items from the list are not looped through in order, but random:
**Result Table**
| Associate ID| Statement| List|
|:----: |:------:|:------:|
| 33691| John is |astounding|
| 82451| Susie is |first class|
| 25485| Sam is|admirable|
| 26582| Lonnie is|excellent|
| 52548| Carl is|awesome|
These are some of the queries I've tried:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/aeb83251-e132-435a-8630-e5b842a69368/random-join-between-tables?forum=sqldataaccess
-This seems to loop through values from 'Table B', not random.
https://www.daveperrett.com/articles/2009/08/11/mysql-select-random-row-with-join
-This is based off of a common key between the two tables and returning one of the records with the key, which I do not have.
SQL Join help when selecting random row
- I'll be honest, I don't understand this one, but it doesn't seem to assign random for each row from Table A, but more of a selection overall link the link above this.
Join One Table To Get Random Rows from 2nd Table
- This seems to be specific to a key, and not an overall random.

using 2 CTEs we generate a select which generates a row number for each table based on a random order and then join based on that row number.
Using a CTE to get N times the records in B as described here:
Repeat Rows N Times According to Column Value (Not included below) Note to get the "N" you'll need to get count from A and B, then divide by eachother and Add 1.
Assuming Even Distribution
With A as(
SELECT *, Row_number() over (order by NewID()) RN
FROM A),
B as (
SELECT *, Row_number () over (order by NewID()) RN
FROM B)
SELECT *
FROM A
INNER JOIN B
on A.RN = B.RN
Or use (assuming uneven distribution)
SELECT *
FROM A
CROSS APPLY (SELECT TOP 1 * FROM B ORDER BY NewID()) Z

This method assumes you know in advance which is the smaller table.
First it assigns an ascending row numbering from 1. This does not have to be randomized.
Then for each row in the larger table it uses the modulus operator to randomly calculate a row number in the range to join onto.
WITH Small
AS (SELECT *,
ROW_NUMBER() OVER ( ORDER BY (SELECT 0)) AS RN
FROM SmallTable),
Large
AS (SELECT *,
1 + CRYPT_GEN_RANDOM(3) % (SELECT COUNT(*) FROM SmallTable) AS RND
FROM LargeTable
ORDER BY RND
OFFSET 0 ROWS)
SELECT *
FROM Large
INNER JOIN Small
ON Small.RN = Large.RND
The ORDER BY RND OFFSET 0 ROWS is to get the random numbers materialized in advance.
This will allow a MERGE join on the smaller table. It also avoids an issue that can sometimes happen where the CRYPT_GEN_RANDOM is moved around in the plan and only evaluated once rather than once per row as required.

Related

T-SQL Verify each character of each value in a column against values of another table

I have a table in a database with a column which has values like XX-xx-cccc-ff-gg. Let's assume this is table ABC and column is called ABC_FORMAT_STR. In another table, ABC_FORMAT_ELEMENTS I have a column called CHARS with values like, A, B, C, D... X, Y, Z, a, d, f, g, x, y, z etc. (please don't assume I have all ASCII values there, it's mainly some letters and numbers plus some special characters like *, ;, -, & etc.).
I need to add a constraint in [ABC].[ABC_FORMAT_STR] column, in such a way so, each and every character of every value of that column, should exist in [ABC_FORMAT_ELEMENTS].[CHARS]
Is the possible? Can someone help me with this?
Thank you very much in advance.
This is an example with simple names, keeping the names of the object above for clarity:
Example
SELECT [ABC_FORMAT_STR] FROM [ABC]
Nick
George
Adam
SELECT [CHARS] FROM [ABC_FORMAT_ELEMENTS]
A
G
N
a
c
e
g
i
k
o
r
After the coonstraint:
SELECT [ABC_FORMAT_STR] FROM [ABC]
Nick
George
Note on the result:
"Adam" cannot be included because "d" and "m" character are not in [ABC_FORMAT_ELEMENTS] table.
Here is a simple and most natural solution based on the TRANSLATE() function.
It will work starting from SQL Server 2017 onwards.
SQL
-- DDL and sample data population, start
DECLARE #ABC TABLE (ABC_FORMAT_STR VARCHAR(50));
INSERT INTO #ABC VALUES
('Nick'),
('George'),
('Adam');
DECLARE #ABC_FORMAT_ELEMENTS TABLE (CHARS CHAR(1));
INSERT INTO #ABC_FORMAT_ELEMENTS VALUES
('A'), ('G'), ('N'),('a'), ('c'), ('e'), ('g'),
('i'), ('k'), ('o'), ('r');
-- DDL and sample data population, end
SELECT a.*
, t1.legitChars
, t2.badChars
FROM #ABC AS a
CROSS APPLY (SELECT STRING_AGG(CHARS, '') FROM #ABC_FORMAT_ELEMENTS) AS t1(legitChars)
CROSS APPLY (SELECT TRANSLATE(a.ABC_FORMAT_STR, t1.legitChars, SPACE(LEN(t1.legitChars)))) AS t2(badChars)
WHERE TRIM(t2.badChars) = '';
Output
+----------------+-------------+----------+
| ABC_FORMAT_STR | legitChars | badChars |
+----------------+-------------+----------+
| Nick | AGNacegikor | |
| George | AGNacegikor | |
+----------------+-------------+----------+
Output with WHERE clause commented out
Just to see why the row with the 'Adam' value was filtered out.
+----------------+-------------+----------+
| ABC_FORMAT_STR | legitChars | badChars |
+----------------+-------------+----------+
| Nick | AGNacegikor | |
| George | AGNacegikor | |
| Adam | AGNacegikor | d m |
+----------------+-------------+----------+
Based on your sample data, here's one method to identify valid/invalid rows in ABC. You could easily adapt this to be part of a trigger that can check inserted or updated rows in inserted and rollback if any rows violate the criteria.
This uses a tally/numbers table (very often used for splitting strings), This defines one using a CTE but a permanent solution would have a permanent numbers table to reuse.
The logic is to split the strings into rows and then count the rows that exist in the lookup table and reject any with a count of rows that is less than the length of the string.
with
numbers (n) as (select top 100 Row_Number() over (order by (select null)) from sys.messages ),
strings as (
select a.ABC_FORMAT_STR, Count(*) over(partition by a.ABC_FORMAT_STR) n
from abc a cross join numbers n
where n.n<=Len(a.ABC_FORMAT_STR)
and exists (select * from ABC_FORMAT_ELEMENTS e where e.chars=Substring(a.ABC_FORMAT_STR,n,1))
)
select ABC_FORMAT_STR
from strings
where Len(ABC_FORMAT_STR)=n
group by ABC_FORMAT_STR
/* change to where Len(ABC_FORMAT_STR) <> n to find rows that aren't allowed */
See this DB Fiddle

Finding invoices without matching credits

The simplified table looks like that:
BillID|ProductID|CustomerID|Price|TypeID
------+---------+----------+-----+-------
111111|Product1 |Customer1 | 100| I
111112|Product1 |Customer1 | -100| C
111113|Product1 |Customer1 | 100| I
111114|Product1 |Customer1 | -100| C
111115|Product1 |Customer1 | 100| I
I need to find invoices (I) that have their matching credits (C) but not "odd" invoices without matching credits (the last record) - or the other way around (unmatched invoices without corresponding credits).
So far I've got this:
SELECT Invoices.billid, Credits.billid
FROM
(SELECT B1.billid
FROM billing B1
WHERE B1.typeid='I') Invoices
INNER JOIN
(SELECT B2.billid
FROM billing B2
WHERE B2.typeid='C') Credits
ON Invoices.customerid = Credits.customerid
AND Invoices.productid = Credits.productid
AND Invoices.price = -(Credits.price)
But it obviously doesn't work, as it returns something looking like:
billid | billid2
-------+ -------
111111 | 111112
111113 | 111114
111115 | 111114
What I would like to get is a list of unmatched invoices;
billid |
-------+
111115 |
Or alternatively only the matching invoices;
billid | billid2
-------+ -------
111111 | 111112
111113 | 111114
The invoice numbers (BillID) will not necessarily be consecutive of course, it's just a simplified view.
Any help would be appreciated.
This should work. I tested by adding a few consecutive invoices before a credit. The query below shows all invoices with matching credit and shows NULL for the aliased "bar" part of the query if a match doesn't exist.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER(Partition By TypeID, CustomerID, ProductID, Price ORDER BY BillID ASC) AS rownumber,
*
FROM Billing
) AS foo
LEFT JOIN
(SELECT
ROW_NUMBER() OVER(Partition By TypeID, CustomerID, ProductID, Price ORDER BY BillID ASC) AS rownumber,
*
FROM Billing
) AS bar
on foo.CustomerID = bar.CustomerID and
foo.ProductID = bar.ProductID and
foo.rownumber = bar.rownumber and
foo.Price = -1*bar.Price
where foo.Price > 1
Here's the updated data that I used:
And Here are what my results looked like:
I wrote this a long time ago so there may be better ways to solve it now. Also I've attempted to adapt it to your table structure, so apologies if its not 100% there. I also assume that your BillID is sequential in date order i.e. larger numbers were entered later. I've also assumed that invoices are always positive and credit notes always negative - so I don't bother checking the type.
Essentially the query filters out any matched items.
Anyway here goes:
select *
from billing X
/* If we are inside the number of unmatched entries then show it. e.g. if there are 3 unmatched entries, and we are in the top 3 then display */
where (
/* Number of later entries relating that match this account entry e.g. Price/Product/Customer */
select count(*)
from billing Z
where Z.Customer = X.Customer and Z.ProductID = X.ProductID
and Z.Price = X.Price
and Z.BillID >= X.BillId
) <=
(
/* Number of unmatched entries for this Price/Product/Customer there are, and whether they are negative or positive. */
select abs(Y.Number)
from (
-- Works out how many unmatched billing entries for this Price/Product/Customer there are, and whether they are negative or positive
select ProductID, CustomerID, abs(Price) Price, sum(case when Price < 0 then -1 else +1 end) Number
from billing
group by ProductID, CustomerID, abs(Price)
having sum(Price) <> 0
) as Y
where X.ProductID = Y.ProductID
and X.CustomerID = Y.CustomerID
and X.Price = case when Y.Number < 0 then -1*Y.Amount else Y.Amount end
)
The odd/even thing concerns me a bit. But assuming this is an incremental key and your business logic is in place, try including this logic in the WHERE clause, the JOIN PREDICATE, or implementing a Lead/Lag function.
SELECT DISTINCT
Invoices.billid
,Credits.billid
FROM
(SELECT B1.billid
FROM billing B1
WHERE B1.typeid='I') Invoices
INNER JOIN (SELECT B2.billid
FROM billing B2
WHERE B2.typeid='C') Credits
ON Invoices.customerid = Credits.customerid
AND Invoices.productid = Credits.productid
AND Invoices.price = -(Credits.price)
AND (Invoices.Billid + 1) = Credits.Billid
Note: This is using your INNER JOIN, so we will get the cases where the invoices have a corresponding credit. You could also do a FULL OUTER JOIN instead, then include a WHERE CLAUSE that specifies WHERE Invoices.Billid IS NULL OR Credits.Billid IS NULL. That scenario would give you the trailing case where you don't have a match.

SQL GROUP BY with columns which contain mirrored values

Sorry for the bad title. I couldn't think of a better way to describe my issue.
I have the following table:
Category | A | B
A | 1 | 2
A | 2 | 1
B | 3 | 4
B | 4 | 3
I would like to group the data by Category, return only 1 line per category, but provide both values of columns A and B.
So the result should look like this:
category | resultA | resultB
A | 1 | 2
B | 4 | 3
How can this be achieved?
I tried this statement:
SELECT category, a, b
FROM table
GROUP BY category
but obviously, I get the following errors:
Column 'a' is invalid in the select list because it is not contained
in either an aggregate function or the GROUP BY clause.
Column 'b' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
How can I achieve the desired result?
Try this:
SELECT category, MIN(a) AS resultA, MAX(a) AS resultB
FROM table
GROUP BY category
If the values are mirrored then you can get both values using MIN, MAX applied on a single column like a.
Seams you don't really want to aggregate per category, but rather remove duplicate rows from your result (or rather rows that you consider duplicates).
You consider a pair (x,y) equal to the pair (y,x). To find duplicates, you can put the lower value in the first place and the greater in the second and then apply DISTINCT on the rows:
select distinct
category,
case when a < b then a else b end as attr1,
case when a < b then b else a end as attr2
from mytable;
Considering you want a random record from duplicates for each category.
Here is one trick using table valued constructor and Row_Number window function
;with cte as
(
SELECT *,
(SELECT Min(min_val) FROM (VALUES (a),(b))tc(min_val)) min_val,
(SELECT Max(max_val) FROM (VALUES (a),(b))tc(max_val)) max_val
FROM (VALUES ('A',1,2),
('A',2,1),
('B',3,4),
('B',4,3)) tc(Category, A, B)
)
select Category,A,B from
(
Select Row_Number()Over(Partition by category,max_val,max_val order by (select NULL)) as Rn,*
From cte
) A
Where Rn = 1

Find one record that exists as two records in another vendor database

I have two vendor databases that have become horribly out-of-sync over the years that I'm trying to correct. A single customer can have multiple id_numbers, and these IDs exist in both vendor databases. All of the IDs for a single customer are correctly attached to one customer record in the Vendor1 database (meaning they belong to the same customer_code). The problem, however, is that those same IDs might be split amongst multiple customers in the Vendor2 database, which is incorrect. I will need to merge those multiple customers together in the Vendor2 database.
I'm trying to identify which customers are represented as two or more customers in the second vendor database. So far I have joined the two together, but I can't figure out how to find only customers that having two or more distinct MemberInternalKeys for the same customer_code.
Here's what I have so far:
select top 10
c.customer_code,
i.id_number,
cc.MemberInternalKey
from Vendor1.dbo.customer_info as c
join Vendor1.dbo.customer_ids as i
on c.customer_code = i.customer_code
join Vendor2.dbo.Clubcard as cc
on (i.id_number collate Latin1_General_CI_AS_KS) = cc.ClubCardId
where i.id_code = 'PS'
In the example below, I would expect to only get back the last two rows in the table. The first two rows should not be included in the results because they have the same MemberInternalKey for both records and belong to the same customer_code. The third row should also not be included since there is a 1-1 match between both vendor databases.
customer_code | id_number | MemberInternalKey
--------------|-----------|------------------
5549032 | 4000 | 4926877
5549032 | 4001 | 4926877
5031101 | 4007 | 2379218
2831779 | 4029 | 1763760
2831779 | 4062 | 4950922
Any help is greatly appreciated.
If I understand correctly, you can use window functions for this logic:
select c.*
from (select c.customer_code, i.id_number, cc.MemberInternalKey,
min(MemberInternalKey) over (partition by customer_code) as minmik,
max(MemberInternalKey) over (partition by customer_code) as maxmik
from Vendor1.dbo.customer_info c join
Vendor1.dbo.customer_ids i
on c.customer_code = i.customer_code join
Vendor2.dbo.Clubcard as cc
on (i.id_number collate Latin1_General_CI_AS_KS) = cc.ClubCardId
where i.id_code = 'PS'
) c
where minmik <> maxmik;
This calculates the minimum and maximum MemberInternalKey for each customer_code. The outer where then returns only rows where these are different.
Another option is
Declare #YourTable table (customer_code int, id_number int, MemberInternalKey int)
Insert Into #YourTable values
(5549032,4000,4926877),
(5549032,4001,4926877),
(5031101,4007,2379218),
(2831779,4029,1763760),
(2831779,4062,4950922)
Select A.*
From #YourTable A
Join (
Select customer_code
From #YourTable
Group By customer_code
Having min(MemberInternalKey)<>max(MemberInternalKey)
) B on A.customer_code=B.customer_code
Returns
customer_code id_number MemberInternalKey
2831779 4029 1763760
2831779 4062 4950922

SQL Server : Group by breaks my program

I have the following query. The idea is to inner join the records and group them in order to get one record (the latest one) from each group.
If I add the GROUP BY (like on the example bellow) it doesn't work.
If I remove the GROUP BY the query works but display duplicated data.
If I group by all fields that I selected before the inner join, it works but not as intended. It will display all records.
Any suggestions?
SELECT
Calibrations.Cert_No,
Calibrations.Cust_Ref,
Calibrations.Rec_Date,
Instruments.Inst_ID,
Instruments.Description,
Instruments.Model_no,
Instruments.Manufacturer,
Instruments.Serial_no,
Instruments.Status,
Instruments.Cust_Acc_No
FROM
Instruments
INNER JOIN
Calibrations ON Instruments.Inst_ID = Calibrations.Inst_ID
WHERE
Instruments.Cust_Name = '" & Session("MM_Username") & "'
AND Instruments.Cust_Acc_No = '" & Session("MM_Password") & "'
AND Instruments.Cust_Acc_No = '" & Replace(rsDue__MMColParam, "'", "''") & "'
AND Instruments.Status IN ('N')
GROUP BY
Instruments.Inst_ID
ORDER BY
Calibrations.Rec_Date DESC
You cannot have columns in the SELECT part of your query, that does not appear in the GROUP BY part of the query, unless they are inside an aggregate function such as MIN(), MAX(), SUM(), etc...
Think about it this way: Say you have a table that looks like this:
+----------+------+--------+
| Col1 | Col2 | NumCol |
+----------+------+--------+
| Value 1a | ABC | 123 |
| Value 1a | DEF | 234 |
| Value 1b | GHI | 345 |
| Value 1b | JKL | 456 |
+----------+------+--------+
This query would not work:
SELECT Col1, Col2, NumCol FROM Table
GROUP BY Col1 ORDER BY NumCol
Why? Because you are only grouping by Col1, and since this column only contains two distinct values, the query engine doesn't know which of the values it should display in the Col2 or NumCol columns (since these contain 4 distinct values).
To fix this, you should either remove the columns from your SELECT statement like this:
SELECT Col1 FROM Table
GROUP BY Col1
...or aggregate the columns somehow. For example like this:
SELECT Col1, MAX(Col2) AS Col2, SUM(NumCol) AS NumCol FROM Table
GROUP BY Col1 ORDER BY NumCol
However, this is not the same as getting the "latest record", or for example the record with the largest NumCol for each distinct value of Col1. To do that, you should consider using the ROW_NUMBER() windowed function like this:
SELECT Col1, Col2, NumCol FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY NumCol DESC) AS N
FROM Table
) AS T
WHERE T.N = 1
How this works is a topic of its own, but basically, ROW_NUMBER assigns a running value to each row, resetting the value each time it encounters a new value in Col1. The ordering makes sure that the running value starts with 1 for the record that has the largest NumCol value. In the outer select statement, you then apply a filter on this running value, to get only the first record for each distinct Col1 value - that is the record with the largets NumCol value.
When you are grouping in a SQL query, you have to either list the column in the group by clause or use an aggregate function -> There can not be columns without aggregation since they are not in the group by list.
You did not provided any information about your specific goal, but either you can get the values by aggregating (using MIN, MAX, AVG, etc) functions to get the desired data, or you can use subqueries to retrieve the distinct list than another one to retrieve their specific data, or you can use analytic functions (FIRST_VALUE, LAST_VALUE, etc) and distinct.

Resources