Comparing SQL tables

Comparing SQL tables - sql-server

I am still new in SQL. I am currently having two tables in SQL server and I would like to not exactly compare but more likely see if the one specific column in table 1 is equal to similar specific column in table 2. I have a certain level of success with it but I would like to see also the ones which don't match from table 1 with table 2 (e.g. it can give back null value). Below you can see an example code which might help to understand better my point:
select tb1.models, tb1.year, tb1.series, tb2.model, tb.price
from tb1, tb2
where tb1.year = '2014' and tb1.models = tb2.model
and here comes the place which I have tried all kind of combinations like <> and etc. but unfortunately haven't got to a solution. The point is that in table 1 I have certain amount of models and on table 2 I have quite huge list which sometimes is not including the same ones from table 1. Due to which I want to see what is not matching exactly so I can try to check and analyse it.
The above example I've shown is returning only the ones which are equal and I see for example that there are 30 more models in table 1 but they are not in table 2 and don't have visibility which ones exactly.
Thank you in advance!

Btw: Do not use '2014', if this value (and the column tb1.year) is numeric (probably INT). Rather use tb1.year=2014. Implicit casts are expensive and can have various side effects...
This sounds like a plain join:
select tb1.models
, tb1.year
, tb1.series
, tb2.model
, tb.price
from tb1
INNER JOIN tb2 ON tb1.models = tb2.model
where tb1.year = '2014'
But your model*s* vs. modell might point to troubles with not normalized data... If this does not help, please provide sample data and expected output!
UPDATE
Use LEFT JOIN to find all rows from tb1 (rows without a corresponding row in tb2 get NULLs
USE RIGHT JOIN for the opposite
USE FULL OUTER JOIN to enforce all rows of both tables with NULLs on both sides, if there is no corresponding row.

Related

Using subquery for filtering with IN throws wrong data [duplicate]

As always, there will be a reasonable explanation for my surprise, but till then....
I have this query
delete from Photo where hs_id in (select hs_id from HotelSupplier where id = 142)
which executes just fine (later i found out that the entire photo table was empty)
but the strange thing: there is no field hs_id in HotelSupplier, it is called hs_key!
So when i execute the last part
select hs_id from HotelSupplier where id = 142
separately (select that part of the query with the mouse and hit F5), i get an error, but when i use it in the in clause, it doesn't!
I wonder if this is normal behaviour?

It is taking the value of hs_id from the outer query.
It is perfectly valid to have a query that doesn't project any columns from the selected table in its select list.
For example
select 10 from HotelSupplier where id = 142
would return a result set with as many rows as matched the where clause and the value 10 for all rows.
Unqualified column references are resolved from the closest scope outwards so this just gets treated as a correlated sub query.
The result of this query will be to delete all rows from Photo where hs_id is not null as long as HotelSupplier has at least one row where id = 142 (and so the subquery returns at least one row)
It might be a bit clearer if you consider what the effect of this is
delete from Photo where Photo.hs_id in (select Photo.hs_id)
This is of course equivalent to
delete from Photo where Photo.hs_id = Photo.hs_id
By the way this is far and away the most common "bug" that I personally have seen erroneously reported on Microsoft Connect. Erland Sommarskog includes it in his wishlist for SET STRICT_CHECKS ON

It's a strong argument for keeping column names consistent between tables. As #Martin says, the SQL syntax allows column names to be resolved from the outer query, when there's no match in the inner query. This is a boon when writing correlated subqueries, but can trip you up sometimes (as here)

Joining Tables based on data sets with different column names

How do I join multiple tables when Table A has column CX_String_4 and Table B has column Details? The Details column has a string that includes a number that will match what is in colunn CX_
I've tried a full join, and my result don't yield anything. Its a blank screen.
Full Outer Join PVXMIHS ON PVXME.CX_STRING_4=SUBSTRING(Convert(varchar(318),PVXMIHS.DETAILS),78,10)

Details column has a string that includes a number that will match
what is in colunn CX_
What you are proposing is a wild card or fuzzy match join if you don't know the exact position of your value in the details column. For that, you'll want to use LIKE which you can read about in the docs.
FULL OUTER JOIN PVXMIHS ON PVXMIHS.DETAILS LIKE '%' + PVXME.CX_STRING_4 + '%'
This would match when the CX_STRING_4 is anywhere in the DETAILS column. If you are certain you know where the value would be located with in the DETAILS column, then your SUBSTRING method would work (assuming you used the right start position and length).

Sql Server aggregate concatenate CLR returning different sequence of strings based on number of records

I have a clr aggregate concatenation function, similar to https://gist.github.com/FilipDeVos/5b7b4addea1812067b09. When the number of rows are small, the sequence of concatenated strings follows the input data set. When the number of rows are larger (dozens and more), the sequence seems indeterminate. There is a difference in the execution plan, but I'm not that familiar with the optimizer and what hints to apply (I've tried MAXDOP 1, without success). From a different test than the example below with similar results here's what seems to be the difference in the plan - the separate sorts, then a merge join. The row count where it tipped over here was 60.
yielded expected results:
yielded unexpected results:
Below is the query that demonstrates the issue in the AdventureWorks2014 sample database with the above clr (renamed to TestConcatenate). The intended result is a dataset with a row for each order and a column with a delimited list of products for that order in quantity sequence.
;with cte_ordered_steps AS (
SELECT top 100000 sd.SalesOrderID, SalesOrderDetailID, OrderQty
FROM [Sales].[SalesOrderDetail] sd
--WHERE sd.SalesOrderID IN (53598, 53595)
ORDER BY sd.SalesOrderID, OrderQty
)
select
sd.SalesOrderID,
dbo.TestConcatenate(' QTY: ' + CAST(sd.OrderQty AS VARCHAR(9)) + ': ' + IsNull(p.Name, ''))
FROM [Sales].[SalesOrderDetail] sd
JOIN [Production].[Product] p ON p.ProductID = sd.ProductID
JOIN cte_ordered_steps r ON r.SalesOrderID = sd.SalesOrderID AND r.SalesOrderDetailID = sd.SalesOrderDetailID
where sd.SalesOrderID IN (53598, 53595)
GROUP BY sd.SalesOrderID
When the SalesOrderID is constrained in the cte for 53598, 53595, the sequence is correct (top set), when it's constrained in the main select for 53598, 53595, the sequence is not (botton set).
So what's my question? How can I build the query, with hints or other changes to return consistent (and correct) sequenced concatenated values independent of the number of rows.

Just like a normal query, if there isn't an order by clause, return order isn't guaranteed. If I recall correctly, the SQL 92 spec allows for an order by clause to be passed in to an aggregate via an over clause, SQL Server doesn't implement it. So there's no way to guarantee ordering in your CLR function (unless you implement it yourself by collecting everything in the Accumulate and Merge methods into some sort of collection and then sorting the list in the Terminate method before returning it. But you'll pay a cost in terms of memory grants as now need to serialize the collection.
As to why you're seeing different behavior based on the size of your result set, I notice that a different join operator is being used between the two. A loop join and a merge join walk through the two sets being joined differently and so that might account for the difference you're seeing.

Why not try the aggregate dbo.GROUP_CONCAT_S available at http://groupconcat.codeplex.com. The S is for Sorted output. It does exactly what you want.

While this answer doesn't have a solution, the additional information that Ben and Orlando provided (thanks!) have provided what I need to move on. I'll take the approach that Orlando pointed to, which was also my plan B, i.e. sorting in the clr.

Can SQL Server index a text string by delimiter?

I need to store content keyed by strings, so a database table of key/value pairs, essentially. The keys, however, will be of a hierarchical format, like this:
foo.bar.baz
They'll have multiple categories, delimited by dots. The above value is in a category called "baz" which is in a parent category called "bar" which is in a parent category called "foo."
How can I index this in such a way that it's rapidly searchable for different permutations of the key/dot combo? For example, I want to be able to very quick find everything that starts
foo
Or
foo.bar
Yes, I could do a LIKE query, but I never need find anything like:
fo
So that seems like a waste to me.
Is there any way that SQL would index all permutation of a string delimited by the dots? So, in the above case we have:
foo
foo.bar
foo.bar.baz
Is there any type of index that would facilitate searching like that?
Edit
I will never need to search backwards or from the middle. My searches will always begin from the front of the string:
foo.bar
Never:
bar.baz

SQL Server can't really index substrings, no. If you only ever want to search on the first string, this will work fine, and will perform an index seek (depending on other query semantics of course):
WHERE col LIKE 'foo.%';
-- or
WHERE col LIKE 'foo.bar.%';
However when you start needing to search for bar or baz following any leading string, you will need to search on the substring:
WHERE col LIKE '%.bar.%';
-- or
WHERE PATINDEX('%.bar.%', col) > 0;
This won't work well with regular B-tree indexes, and I don't think Full-Text Search will be much help either, because of the special characters (periods) - but you should try it out if this is a requirement.
In general, storing data this way smells wrong to me. Seems to me that you should either have separate columns instead of jamming all the data into one column, or using a more relational EAV design.

Its appears to be a work for CTE!
create TableA(
id int identity,
parentid int null,
name varchar(50)
)
for a (fixed) two level its easy
select t2.name, t1.name
from tableA t1
join tableA t2 on t2.id = t1.parentid
where t2.name = 'father'
To find that kind of hierarchical values for a most general case you ill need some kind of recursion in self-join table by using a CTE.
http://msdn.microsoft.com/pt-br/library/ms175972.aspx

Ordering numbers that are stored as strings in the database

I have a bunch of records in several tables in a database that have a "process number" field, that's basically a number, but I have to store it as a string both because of some legacy data that has stuff like "89a" as a number and some numbering system that requires that process numbers be represented as number/year.
The problem arises when I try to order the processes by number. I get stuff like:
1
10
11
12
And the other problem is when I need to add a new process. The new process' number should be the biggest existing number incremented by one, and for that I would need a way to order the existing records by number.
Any suggestions?

Maybe this will help.
Essentially:
SELECT process_order FROM your_table ORDER BY process_order + 0 ASC

Can you store the numbers as zero padded values? That is, 01, 10, 11, 12?

I would suggest to create a new numeric field used only for ordering and update it from a trigger.

Can you split the data into two fields?
Store the 'process number' as an int and the 'process subtype' as a string.
That way:
you can easily get the MAX processNumber - and increment it when you need to generate a
new number
you can ORDER BY processNumber ASC,
processSubtype ASC - to get the
correct order, even if multiple records have the same base number with different years/letters appended
when you need the 'full' number you
can just concatenate the two fields
Would that do what you need?

Given that your process numbers don't seem to follow any fixed patterns (from your question and comments), can you construct/maintain a process number table that has two fields:
create table process_ordering ( processNumber varchar(N), processOrder int )
Then select all the process numbers from your tables and insert into the process number table. Set the ordering however you want based on the (varying) process number formats. Join on this table, order by processOrder and select all fields from the other table. Index this table on processNumber to make the join fast.
select my_processes.*
from my_processes
inner join process_ordering on my_process.processNumber = process_ordering.processNumber
order by process_ordering.processOrder

It seems to me that you have two tasks here.
• Convert the strings to numbers by legacy format/strip off the junk• Order the numbers
If you have a practical way of introducing string-parsing regular expressions into your process (and your issue has enough volume to be worth the effort), then I'd
• Create a reference table such as
CREATE TABLE tblLegacyFormatRegularExpressionMaster(
LegacyFormatId int,
LegacyFormatName varchar(50),
RegularExpression varchar(max)
)
• Then, with a way of invoking the regular expressions, such as the CLR integration in SQL Server 2005 and above (the .NET Common Language Runtime integration to allow calls to compiled .NET methods from within SQL Server as ordinary (Microsoft extended) T-SQL, then you should be able to solve your problem.
• See
http://www.codeproject.com/KB/string/SqlRegEx.aspx
I apologize if this is way too much overhead for your problem at hand.

Suggestion:
• Make your column a fixed width text (i.e. CHAR rather than VARCHAR).
• Pad the existing values with enough leading zeros to fill each column and a trailing space(s) where the values do not end in 'a' (or whatever).
• Add a CHECK constraint (or equivalent) to ensure new values conform to the pattern e.g. something like
CHECK (process_number LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][ab ]')
• In your insert/update stored procedures (or equivalent), pad any incoming values to fit the pattern.
• Remove the leading/trailing zeros/spaces as appropriate when displaying the values to humans.
Another advantage of this approach is that the incoming values '1', '01', '001', etc would all be considered to be the same value and could be covered by a simple unique constraint in the DBMS.
BTW I like the idea of splitting the trailing 'a' (or whatever) into a separate column, however I got the impression the data element in question is an identifier and therefore would not be appropriate to split it.

You need to cast your field as you're selecting. I'm basing this syntax on MySQL - but the idea's the same:
select * from table order by cast(field AS UNSIGNED);
Of course UNSIGNED could be SIGNED if required.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Comparing SQL tables - sql-server

Related

Using subquery for filtering with IN throws wrong data [duplicate]

Joining Tables based on data sets with different column names

Sql Server aggregate concatenate CLR returning different sequence of strings based on number of records

Can SQL Server index a text string by delimiter?

Ordering numbers that are stored as strings in the database

Categories

Resources