Full Outer Join Mismatching - sql-server

new to this type of request in Microsoft SQL Server. I'm using a full outer join on two tables looking for records that are not matching in the right table (New) vs left table (Old). I'm trying to find the new scores from the new table so I can update a production table with the most recent scores, while still holding onto old scores that have not been updated yet. This is my set up
Select
a.customer
,a.date
,a.Cat score
,a.Dog score
,a.Mouse score
,b.customer
,b.date
,b.Cat score
,b.Dog score
,b.Mouse score
From Old Table a
Full Outer Join New Table b
ON a.customer = b.customer
AND a.date = b.date
AND a.Cat score = Cast(b.Cat score as Varchar)
AND a.Dog score = Cast(b.Dog score as Varchar)
AND a.Mouse score = Cast(b.Mouse score as Varchar)
Note--- Have to cast the scores as Varchar or else I could not get the join to work. "Conversion failed when converting the varchar value '9.0000' to data type int."
Results:
Both lists are 100% different without any matches
This can't be true because I can search the records in both tables manually and find the exact same result in both tables. Maybe there is a better way to do this type of update?

Your problem is the strings 9 and 9.0000 are not equal. They do not join. These table variables will be used to demo this:
DECLARE #TableA TABLE
(
CatScore Int
)
;
DECLARE #TableB TABLE
(
CatScore VARCHAR(10)
)
;
INSERT INTO #TableA (CatScore) VALUES (9);
INSERT INTO #TableB (CatScore) VALUES ('9.0000');
The first example highlights the mismatch.
Mismatching Join Example
SELECT
*
FROM
#TableA AS a
FULL OUTER JOIN #TableB AS b ON b.CatScore = CAST(a.CatScore AS VARCHAR(50))
Returned Value
CatScore CatScore
9 NULL
NULL 9.0000
What you need to do is match the data types and then the values. This example assumes:
Table A stores cat score as a integer.
Table B stores the same as a varchar.
Table B always includes 4 zeros after a full stop.
Matching Example
SELECT
*
FROM
#TableA AS a
FULL OUTER JOIN #TableB AS b ON b.CatScore = CAST(a.CatScore AS VARCHAR(50)) + '.0000'
Returns
CatScore CatScore
9 9.0000
Here the integer 9 has been cast into a varchar. The full stop and trailing zeros have then been added. It's not a decimal place, as these aren't really numbers.
The lesson to takeaway from this exercise is; always use the correct data type. Storing numbers in strings will cause problems further down the line.
UPDATE
It would make more sense to CAST both fields into a DECIMAL. Both integers and varchars, containing numeric like data, can be converted into decimals. When casting fields for matching you want to find the smallest data type that will hold all input from both source fields.

Related

Why TRY_PARSE its so slow?

I have this query that basically returns (right now) only 10 rows as results:
select *
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
Now, if I want to parse the field FakeData (which, unfortunately, can contain different types of data, from DateTime to Surname/etc; i.e. nvarchar(70)), for data show and/or filtering:
select *, TRY_PARSE(t.FakeData as date USING 'en-GB') as RealDate
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
It takes x10 the query to be executed.
Where am I wrong? How can I speed up?
I can't edit the database, I'm just a customer which read data.
The TSQL documentation for TRY_PARSE makes the following observation:
Keep in mind that there is a certain performance overhead in parsing the string value.
NB: I am assuming your typical date format would be dd/mm/yyyy.
The following is something of a shot-in-the-dark that might help. By progressively assessing the nvarchar column if it is a candidate as a date it is possible to reduce the number of uses of that function. Note that a data point established in one apply can then be referenced in a subsequent apply:
CREATE TABLE mytable(
FakeData NVARCHAR(60) NOT NULL
);
INSERT INTO mytable(FakeData) VALUES (N'oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc');
INSERT INTO mytable(FakeData) VALUES (N'9603200-0297r2-0--824');
INSERT INTO mytable(FakeData) VALUES (N'12/03/1967');
INSERT INTO mytable(FakeData) VALUES (N'12/3/2012');
INSERT INTO mytable(FakeData) VALUES (N'3/3/1812');
INSERT INTO mytable(FakeData) VALUES (N'ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu');
select
t.FakeData, oa3.RealDate
from mytable as t
outer apply (
select len(FakeData) as fd_len
) oa1
outer apply (
select case when oa1.fd_len > 10 then 0
when len(replace(FakeData,'/','')) + 2 = oa1.fd_len then 1
else 0
end as is_candidate
) oa2
outer apply (
select case when oa2.is_candidate = 1 then TRY_PARSE(t.FakeData as date USING 'en-GB') end as RealDate
) oa3
FakeData
RealDate
oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc
null
9603200-0297r2-0--824
null
12/03/1967
1967-03-12
12/3/2012
2012-03-12
3/3/1812
1812-03-03
ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu
null
db<>fiddle here

SQL- use an attribute to group activities and use the group as parameter

I have a table that looks like this:
ActivityID
Time Used
Activity Type
Activity Category ID
Activity Category
123456
30
A
1
X
765432
120
B
2
Y
876462
65
C
3
Z
h52635
76
D
3
Z
hsgs62
187
E
1
X
I would like to use the Activity Category as parameter (#ActivityCategory) to filter my report later, it means the filter should be X;Y;Z.
When I choose one Activity Category, the sum of "Time used" should appear.
My question is: how should I build the query, to be able to group the activities with the same Activity Category together and use the Category XYZ as a parameter?
Something like this perhaps:
-- Sample data
DECLARE #table TABLE (ActivityId INT, TimeUsed INT, ActivityCategory CHAR(1));
INSERT #table VALUES(123,20,'X'), (129,50,'Y'), (254,30,'Y'), (991,10,'Z');
-- Parameter
DECLARE #ActivityCategory VARCHAR(100) = 'X,Y';
SELECT t.ActivityCategory, TimeUsed = SUM(t.TimeUsed)
FROM #table AS t
CROSS APPLY STRING_SPLIT(#ActivityCategory,',') AS s -- You will need a string splitter funciton
WHERE t.ActivityCategory = s.value
GROUP BY t.ActivityCategory;
Returns:
ActivityCategory TimeUsed
---------------- -----------
X 20
Y 80
Alan's answer is good, but I'd personally use a temp table and a join for performance reasons. The table being queried might be very large, in which case a join to a temp table would be more performant than CROSS APPLY.
The easiest way to pass multi-value parameters in and out of your query are comma-separated lists. Indeed if you are using Report Server / SSRS then that is how the "Multiple Value" box in the user interface will deliver the users' selections into a varchar parameter.
--Declare and set parameter
DECLARE #ActivityCategories varchar(MAX)
SET #ActivityCategories = 'X,Y,Z'
--Convert individual parameter values to a temp table
DROP TABLE IF EXISTS #ParamaterValues
CREATE TABLE #ParameterValues (ActivityCategory varchar(10) NOT NULL PRIMARY KEY CLUSTERED)
INSERT INTO #ParameterValues WITH(TABLOCK)
SELECT value
FROM STRING_SPLIT(#ActivityCategories,',')
GROUP BY value
ORDER BY value
--Join on temp table to filter by paramater values
SELECT ActivityID,
TimeUsed,
ActivityType,
ActivityCategoryID,
ActivityCategory
FROM dbo.YourTable a
INNER JOIN #ParameterValues b ON a.ActivityCategory = b.ActivityCategory

Error converting data type varchar to numeric on join clause

Please help with this query. I am getting an
error converting data type varchar to numeric" on my join clause.
select
MARKETING
,HOME_TEL
,BUS_TEL
,CEL_TEL
,EMAIL
,FAX
,VALID_MAIL
,VALID_PHONE
,VALID_SMS
,VALID_EMAIL
into [storagedb - baw].dbo.Geyser_Glynis
from [storagedb - Mariana].dbo.HOC_Geyser_v2 as a
left join [IIIDB].[dbo].[EEE_BASE_201901] as b
on a.gcustomer_Number= b.Dedupe_Static
Execute these 2 queries to find values that you are trying to join together that can't be cast to numeric:
SELECT
T.gcustomer_Number
FROM
[storagedb - Mariana].dbo.HOC_Geyser_v2 AS T
WHERE
TRY_CAST(T.gcustomer_Number AS NUMERIC) IS NULL
SELECT
T.Dedupe_Static
FROM
[IIIDB].[dbo].[EEE_BASE_201901] AS T
WHERE
TRY_CAST(T.Dedupe_Static AS NUMERIC) IS NULL
You will have to delete or update them to be able to be converted correctly to a number or, if one of the columns might hold non-numeric values, you will have to cast the numeric column to VARCHAR, for example:
from
[storagedb - Mariana].dbo.HOC_Geyser_v2 as a
left join [IIIDB].[dbo].[EEE_BASE_201901] as b on
CONVERT(VARCHAR(100), a.gcustomer_Number) = b.Dedupe_Static
Please note that applying conversion or functions against indexed columns will make the index inapplicable and most likely end in a full table scan.

How do I compare two rows from a SQL database table based on DateTime within 3 seconds?

I have a table of DetailRecords containing records that seem to be "duplicates" of other records, but they have a unique primary key [ID]. I would like to delete these "duplicates" from the DetailRecords table and keep the record with the longest/highest Duration. I can tell that they are linked records because their DateTime field is within 3 seconds of another row's DateTime field and the Duration is within 2 seconds of one another. Other data in the row will also be duplicated exactly, such as Number, Rate, or AccountID, but this could be the same for the data that is not "duplicate" or related.
CREATE TABLE #DetailRecords (
[AccountID] INT NOT NULL,
[ID] VARCHAR(100) NULL,
[DateTime] VARCHAR(100) NULL,
[Duration] INT NULL,
[Number] VARCHAR(200) NULL,
[Rate] DECIMAL(8,6) NULL
);
I know that I will most likely have to perform a self join on the table, but how can I find two rows that are similar within a DateTime range of plus or minus 3 seconds, instead of just exactly the same?
I am having the same trouble with the Duration within a range of plus or minus 2 seconds.
The key is taking the absolute value of the difference between the dates and durations. I don't know SQL server, but here's how I'd do it in SQLite. The technique should be the same, only the specific function names will be different.
SELECT a.id, b.id
FROM DetailRecords a
JOIN DetailRecords b
ON a.id > b.id
WHERE abs(strftime("%s", a.DateTime) - strftime("%s", b.DateTime)) <= 3
AND abs(a.duration - b.duration) <= 2
Taking the absolute value of the difference covers the "plus or minus" part of the range. The self join is on a.id > b.id because a.id = b.id would duplicate every pair.
Given the entries...
ID|DateTime |Duration
1 |2014-01-26T12:00:00|5
2 |2014-01-26T12:00:01|6
3 |2014-01-26T12:00:06|6
4 |2014-01-26T12:00:03|11
5 |2014-01-26T12:00:02|10
6 |2014-01-26T12:00:01|6
I get the pairs...
5|4
2|1
6|1
6|2
And you should really store those dates as DateTime types if you can.
You could use a self-referential CTE and compare the DateTime fields.
;WITH CTE AS (
SELECT AccountID,
ID,
DateTime,
rn = ROW_NUMBER() OVER (PARTITION BY AccountID, ID, <insert any other matching keys> ORDER BY AccountID)
FROM table
)
SELECT earliestAccountID = c1.AccountID,
earliestDateTime = c1.DateTime,
recentDateTime = c2.DateTime,
recentAccountID = c2.AccountID
FROM cte c1
INNER JOIN cte c2
ON c1.rn = 1 AND c2.rn = 2 AND c1.DateTime <> c2.DateTime
Edit
I made several assumptions about the data set, so this may not be as relevant as you need. If you're simply looking for difference between possible duplicates, specifically DateTime differences, this will work. However, this does not constrain to your date range, nor does it automatically assume what the DateTime column is used for or how it is set.

Maintain ordering of characters if there is no id (SQL Server 2005)

I have the following
Chars
A
C
W
B
J
M
How can I insert some sequential numbers so that after insertion of the numbers the order of characters will not change?
I mean if I use row_number(), the output Character order is changing like
select
ROW_NUMBER() over(order by chars) as id,
t.* from #t t
Output:
id chars
1 A
2 B
3 C
4 J
5 M
6 W
My desired expectation is
id chars
1 A
2 C
3 W
4 B
5 J
6 M
Also, I cannot use any identity field like id int identity because I am in the middle of a query and I need to maintain a inner join for achieving something.
I hope I do make myself clear.
Please help.
Thanks in advance
There is no implicit ordering of rows in SQL. If some ordering is desired, be it order in which items were inserted or any other order, it must be supported by a user-defined column.
In other words, the SQL standard doesn't require the SQL implementations to maintain any order. On the other hand the ORDER BY clause in a SELECT statement can be used to specify the desired order, but such ordering is supported by the values in a particular (again, user defined) column.
This user defined column may well be an auto-incremented column for which SQL assigns incremental (or otherwise) values to, and this may be what you need.
Maybe something like...
CREATE TABLE myTable
(
InsertID smallint IDENTITY(1,1),
OneChar CHAR(1),
SomeOtherField VARCHAR(20)
-- ... etc.
)
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('A', 'Alpha')
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('W', 'Whiskey')
INSERT INTO myTable (OneChar, SomeOtherField) VALUES ('B', 'Bravo')
-- ... etc.
SELECT OneChar
FROM myTable
ORDER BY InsertId
'A'
'W'
'B'
--...

Resources