Fetching results from inner query having comma separated values - SQL Server - sql-server

I have a temp table having two columns - key and value:
temp_tbl:
key value
---|-----
k1 | a','b
Below is the insert script with which I am storing the value in temp_tbl:
insert into temp_tbl values ('k1', 'a'+char(39)+char(44)+char(39)+'b');
Now, I want trying to fetch records from another table (actual_tbl) like this:
select * from actual_tbl where field_value in
(select value from tamp_tbl where key = 'k1');--query 1
But this is not returning anything.
I want the above query to behave like the following one:
select * from actual_tbl where field_value in
('a','b');--query 2
Where am I doing wrong in query 1?
I am using sql server.

Where am I doing wrong in query 1?
Where you are going wrong is in failing to understand the way the IN keyword works with a subquery vs a hard-coded list.
When an IN clause is followed by a list, each item in the list is a discrete value:
IN ('I am a value', 'I am another value', 'I am yet another value')
When it's followed by a sub-query, each row generates a single value. Your temp table only has one row, so the IN clause is only considering a single value. No matter how you try to "trick" the parser with commas and single-quotes, it won't work. The SQL Server parser is too smart to be tricked. It will know that a single value of 'a','b' is still just a single value, and it will look for that single value. It won't treat them as two separate values like you are trying to do.

Related

Need to Add Values to Certain Items

I have a table that I need to add the same values to a whole bunch of items
(in a nut shell if the item doesn't have a UNIT of "CTN" I want to add the same values i have listed to them all)
I thought the following would work but it doesn't :(
Any idea what i am doing wrong ?
INSERT INTO ICUNIT
(UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
VALUES ('CTN','20220509','22513927','ADMIN','AU','1')
WHERE ITEMNO In '0','etc','etc','etc'
If I understand correctly you might want to use INSERT INTO ... SELECT from original table with your condition.
INSERT INTO ICUNIT (UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
SELECT 'CTN','20220509','22513927','ADMIN','AU','1'
FROM ICUNIT
WHERE ITEMNO In ('0','etc','etc','etc')
The query you needs starts by selecting the filtered items. So it seems something like below is your starting point
select <?> from dbo.ICUNIT as icu where icu.UNIT <> 'CTN' order by ...;
Notice the use of schema name, terminators, and table aliases - all best practices. I will guess that a given "item" can have multiple rows in this table so long as ICUNIT is unique within ITEMNO. Correct? If so, the above query won't work. So let's try slightly more complicated filtering.
select distinct icu.ITEMNO
from dbo.ICUNIT as icu
where not exists (select * from dbo.ICUNIT as ctns
where ctns.ITEMNO = icu.ITEMNO -- correlating the subquery
and ctns.UNIT = 'CTN')
order by ...;
There are other ways to do that above but that is one common way. That query will produce a resultset of all ITEMNO values in your table that do not already have a row where UNIT is "CTN". If you need to filter that for specific ITEMNO values you simply adjust the WHERE clause. If that works correctly, you can use that with your insert statement to then insert the desired rows.
insert into dbo.ICUNIT (...)
select distinct icu.ITEMNO, 'CTN', '20220509', '22513927', 'ADMIN', 'AU', '1'
from ...
;

Select Value of Parameter in Array with Denodo (VQL)

I am trying to do something that seems simple but cannot find the right syntax for Denodo's VQL (Virtual Query Language). I have a string like this: XXXX-YYYY-ZZZZ-AAAA-BBBB in a column called "location" that varies in length, and I want to get the value of the fourth set (i.e. AAAA in this example). I am using the Denodo split function like this:
SELECT SPLIT("-",location)[3] AS my_variable FROM my_table
However, the [3] doesn't work. I've tried a bunch of variations:
SELECT SPLIT("-",location)[3].value AS my_variable FROM my_table
SELECT SPLIT("-",location).column3 AS my_variable FROM my_table
etc.
Can someone please help me figure out the right syntax to return a single parameter from an array? Thank you!
SELECT field_1[3].string
FROM (SELECT split('-', 'XXXX-YYYY-ZZZZ-AAAA-BBBB') as field_1)
You have to do it using a subquery because the syntax to access the element of an array (that is, [<number>]) can only be used with field names. You cannot use something like [4] next to the result of a expression.
This question helps: https://community.denodo.com/answers/question/details?questionId=90670000000CcQPAA0
I got it working by creating a view that saves the array as a field:
CREATE OR REPLACE VIEW p_sample_data FOLDER = '/stack_overflow'
AS SELECT bv_sample_data.location AS location
, bv_sample_data.id AS id
, split('-', location) AS location_array
FROM bv_sample_data;
Notice I created a column called location_array?
Now you can use a select statement on top of your view to extract the information you want:
SELECT location, id, location_array[2].string
FROM p_sample_data
location_array[2] is the 3rd element, and the .string tells denodo you want the string value (I think that's what it does... you'd have to read more about Compound Values in the documentation: https://community.denodo.com/docs/html/browse/6.0/vdp/vql/advanced_characteristics/management_of_compound_values/management_of_compound_values )
Another way you could probably do it is by creating a view with the array, and then flattening the array, although I haven't tried that option.
Update: I tried creating a view that flattens the array, and then using an analytics (or "window") function to get a row_number() OVER (PARTITION BY id order by ID ASC), but analytic/window functions don't work against flat file sources.
So if you go the "flatten" route and your source system doesn't work with analytic fuctions, you could just go with a straight rownum() function, but you'd have to offset the value by column number you want, and then use remainder division to pull out the data you want.
Like this:
--My view with the array is called p_sample_data
CREATE OR REPLACE VIEW f_p_sample_data FOLDER = '/stack_overflow' AS
SELECT location AS location
, id AS id
, string AS string
, rownum(2) AS rownumber
FROM FLATTEN p_sample_data AS v ( v.location_array);
Now, with the rownum() function (and an offset of 2), I can use remainder division in my where clause to get the data I want:
SELECT location, id, string, rownumber
FROM f_p_sample_data
WHERE rownumber % 5 = 0
Frankly, I think the easier way is to just leave your location data in the array and extract out the nth column with the location_array[2].string syntax, where 2 is the nth column, zero based.

Return rows where ID is in semicolon separated string from subquery MSSQL

I'm trying to query my sql database to return all the rows where the ID is contained in a separate tables column. The list of project IDs is kept in the Feedback table in the Project_ID Column with datatype varchar. I am trying to return the rows from the Projects table where the IDs are kept in the Project_ID column with datatype varchar.
I am doing this using the query
SELECT * FROM Projects WHERE Project_ID IN (
SELECT Project_ID FROM Feedback WHERE ID = 268 and Project_ID IS NOT NULL
)
When I run this query I am returned with the message:
Conversion failed when converting the varchar value '36;10;59' to data type int
This is yet another example of the importance of normalizing your data.
Keeping multiple data points in a single column is almost never the correct design, and by almost never I mean about 99.9999%.
If you can't normalize your database, you can use a workaround like this:
SELECT *
FROM Projects p
WHERE EXISTS (
SELECT Project_ID
FROM Feedback F WHERE ID = 268
AND Project_ID IS NOT NULL
AND ';'+ F.Project_ID +';' LIKE '%;'+ CAST(p.Project_ID as varchar) +';%'
)
You can't use the IN operator since it's expecting a list of values delimited by a comma, while you try to supply it with a single value that is delimited by a semicolon. Even if the values in Project_ID was delimited by a comma it would still not work.
The reason I've added the ; on each side of the Project_ID in both tables is that this way the LIKE operator will return true for any location it finds the Projects.Project_Id inside the Feedback.Project_Id. You must add the ; to the Projects.Project_Id to prevent the LIKE to return true when you are looking for a number that is a partial match to the numbers in the delimited string. Consider looking for 12 in a string containing 1;112;455 - without adding the delimiter to the search value (12 in this example) the LIKE operator would return true.

Convert Date Stored as VARCHAR into INT to compare to Date Stored as INT

I'm using SQL Server 2014. My request I believe is rather simple. I have one table containing a field holding a date value that is stored as VARCHAR, and another table containing a field holding a date value that is stored as INT.
The date value in the VARCHAR field is stored like this: 2015M01
The data value in the INT field is stored like this: 201501
I need to compare these tables against each other using EXCEPT. My thought process was to somehow extract or TRIM the "M" out of the VARCHAR value and see if it would let me compare the two. If anyone has a better idea such as using CAST to change the date formats or something feel free to suggest that as well.
I am also concerned that even extracting the "M" out of the VARCHAR may still prevent the comparison since one will still remain VARCHAR and the other is INT. If possible through a T-SQL query to convert on the fly that would be great advice as well. :)
REPLACE the string and then CONVERT to integer
SELECT A.*, B.*
FROM TableA A
INNER JOIN
(SELECT intField
FROM TableB
) as B
ON CONVERT(INT, REPLACE(A.varcharField, 'M', '')) = B.intField
Since you say you already have the query and are using EXCEPT, you can simply change the definition of that one "date" field in the query containing the VARCHAR value so that it matches the INT format of the other query. For example:
SELECT Field1, CONVERT(INT, REPLACE(VarcharDateField, 'M', '')) AS [DateField], Field3
FROM TableA
EXCEPT
SELECT Field1, IntDateField, Field3
FROM TableB
HOWEVER, while I realize that this might not be feasible, your best option, if you can make this happen, would be to change how the data in the table with the VARCHAR field is stored so that it is actually an INT in the same format as the table with the data already stored as an INT. Then you wouldn't have to worry about situations like this one.
Meaning:
Add an INT field to the table with the VARCHAR field.
Do an UPDATE of that table, setting the INT field to the string value with the M removed.
Update any INSERT and/or UPDATE stored procedures used by external services (app, ETL, etc) to do that same M removal logic on the way in. Then you don't have to change any app code that does INSERTs and UPDATEs. You don't even need to tell anyone you did this.
Update any "get" / SELECT stored procedures used by external services (app, ETL, etc) to do the opposite logic: convert the INT to VARCHAR and add the M on the way out. Then you don't have to change any app code that gets data from the DB. You don't even need to tell anyone you did this.
This is one of many reasons that having a Stored Procedure API to your DB is quite handy. I suppose an ORM can just be rebuilt, but you still need to recompile, even if all of the code references are automatically updated. But making a datatype change (or even moving a field to a different table, or even replacinga a field with a simple CASE statement) "behind the scenes" and masking it so that any code outside of your control doesn't know that a change happened, not nearly as difficult as most people might think. I have done all of these operations (datatype change, move a field to a different table, replace a field with simple logic, etc, etc) and it buys you a lot of time until the app code can be updated. That might be another team who handles that. Maybe their schedule won't allow for making any changes in that area (plus testing) for 3 months. Ok. It will be there waiting for them when they are ready. Any if there are several areas to update, then they can be done one at a time. You can even create new stored procedures to run in parallel for any updated app code to have the proper INT datatype as the input parameter. And once all references to the VARCHAR value are gone, then delete the original versions of those stored procedures.
If you want everything in the first table that is not in the second, you might consider something like this:
select t1.*
from t1
where not exists (select 1
from t2
where cast(replace(t1.varcharfield, 'M', '') as int) = t2.intfield
);
This should be close enough to except for your purposes.
I should add that you might need to include other columns in the where statement. However, the question only mentions one column, so I don't know what those are.
You could create a persisted view on the table with the char column, with a calculated column where the M is removed. Then you could JOIN the view to the table containing the INT column.
CREATE VIEW dbo.PersistedView
WITH SCHEMA_BINDING
AS
SELECT ConvertedDateCol = CONVERT(INT, REPLACE(VarcharCol, 'M', ''))
--, other columns including the PK, etc
FROM dbo.TablewithCharColumn;
CREATE CLUSTERED INDEX IX_PersistedView
ON dbo.PersistedView(<the PK column>);
SELECT *
FROM dbo.PersistedView pv
INNER JOIN dbo.TableWithIntColumn ic ON pv.ConvertedDateCol = ic.IntDateCol;
If you provide the actual details of both tables, I will edit my answer to make it clearer.
A persisted view with a computed column will perform far better on the SELECT statement where you join the two columns compared with doing the CONVERT and REPLACE every time you run the SELECT statement.
However, a persisted view will slightly slow down inserts into the underlying table(s), and will prevent you from making DDL changes to the underlying tables.
If you're looking to not persist the values via a schema-bound view, you could create a non-persisted computed column on the table itself, then create a non-clustered index on that column. If you are using the computed column in WHERE or JOIN clauses, you may see some benefit.
By way of example:
CREATE TABLE dbo.PCT
(
PCT_ID INT NOT NULL
CONSTRAINT PK_PCT
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SomeChar VARCHAR(50) NOT NULL
, SomeCharToInt AS CONVERT(INT, REPLACE(SomeChar, 'M', ''))
);
CREATE INDEX IX_PCT_SomeCharToInt
ON dbo.PCT(SomeCharToInt);
INSERT INTO dbo.PCT(SomeChar)
VALUES ('2015M08');
SELECT SomeCharToInt
FROM dbo.PCT;
Results:

How does sql server choose values in an update statement where there are multiple options?

I have an update statement in SQL server where there are four possible values that can be assigned based on the join. It appears that SQL has an algorithm for choosing one value over another, and I'm not sure how that algorithm works.
As an example, say there is a table called Source with two columns (Match and Data) structured as below:
(The match column contains only 1's, the Data column increments by 1 for every row)
Match Data
`--------------------------
1 1
1 2
1 3
1 4
That table will update another table called Destination with the same two columns structured as below:
Match Data
`--------------------------
1 NULL
If you want to update the ID field in Destination in the following way:
UPDATE
Destination
SET
Data = Source.Data
FROM
Destination
INNER JOIN
Source
ON
Destination.Match = Source.Match
there will be four possible options that Destination.ID will be set to after this query is run. I've found that messing with the indexes of Source will have an impact on what Destination is set to, and it appears that SQL Server just updates the Destination table with the first value it finds that matches.
Is that accurate? Is it possible that SQL Server is updating the Destination with every possible value sequentially and I end up with the same kind of result as if it were updating with the first value it finds? It seems to be possibly problematic that it will seemingly randomly choose one row to update, as opposed to throwing an error when presented with this situation.
Thank you.
P.S. I apologize for the poor formatting. Hopefully, the intent is clear.
It sets all of the results to the Data. Which one you end up with after the query depends on the order of the results returned (which one it sets last).
Since there's no ORDER BY clause, you're left with whatever order Sql Server comes up with. That will normally follow the physical order of the records on disk, and that in turn typically follows the clustered index for a table. But this order isn't set in stone, particularly when joins are involved. If a join matches on a column with an index other than the clustered index, it may well order the results based on that index instead. In the end, unless you give it an ORDER BY clause, Sql Server will return the results in whatever order it thinks it can do fastest.
You can play with this by turning your upate query into a select query, so you can see the results. Notice which record comes first and which record comes last in the source table for each record of the destination table. Compare that with the results of your update query. Then play with your indexes again and check the results once more to see what you get.
Of course, it can be tricky here because UPDATE statements are not allowed to use an ORDER BY clause, so regardless of what you find, you should really write the join so it matches the destination table 1:1. You may find the APPLY operator useful in achieving this goal, and you can use it to effectively JOIN to another table and guarantee the join only matches one record.
The choice is not deterministic and it can be any of the source rows.
You can try
DECLARE #Source TABLE(Match INT, Data INT);
INSERT INTO #Source
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4);
DECLARE #Destination TABLE(Match INT, Data INT);
INSERT INTO #Destination
VALUES
(1, NULL);
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN #Source Source
ON Destination.Match = Source.Match;
SELECT *
FROM #Destination;
And look at the actual execution plan. I see the following.
The output columns from #Destination are Bmk1000, Match. Bmk1000 is an internal row identifier (used here due to lack of clustered index in this example) and would be different for each row emitted from #Destination (if there was more than one).
The single row is then joined onto the four matching rows in #Source and the resultant four rows are passed into a stream aggregate.
The stream aggregate groups by Bmk1000 and collapses the multiple matching rows down to one. The operation performed by this aggregate is ANY(#Source.[Data]).
The ANY aggregate is an internal aggregate function not available in TSQL itself. No guarantees are made about which of the four source rows will be chosen.
Finally the single row per group feeds into the UPDATE operator to update the row with whatever value the ANY aggregate returned.
If you want deterministic results then you can use an aggregate function yourself...
WITH GroupedSource AS
(
SELECT Match,
MAX(Data) AS Data
FROM #Source
GROUP BY Match
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN GroupedSource Source
ON Destination.Match = Source.Match;
Or use ROW_NUMBER...
WITH RankedSource AS
(
SELECT Match,
Data,
ROW_NUMBER() OVER (PARTITION BY Match ORDER BY Data DESC) AS RN
FROM #Source
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN RankedSource Source
ON Destination.Match = Source.Match
WHERE RN = 1;
The latter form is generally more useful as in the event you need to set multiple columns this will ensure that all values used are from the same source row. In order to be deterministic the combination of partition by and order by columns should be unique.

Resources