Query from multiple XML columns in SQL Server - sql-server

Environment: SQL Server 2016; abstracted example of "real" data.
From a first query to a table containing XML data I have a SQL result set that has the following columns:
ID (int)
Names (XML)
Times (XML)
Values (XML)
Columns 2-4 contain multiple values in XML format, e.g.
Names:
Row 1: <name>TestR1</name><name>TestR2</name>...
Row 2: <name>TestS1</name><name>TestS2</name>...
Times:
Row 1: <time>0.1</time><time>0.2</time>...
Row 2: <time>-0.1</time><time>-0.2</time>...
Values:
Row 1: <value>1.1</value><value>1.2</value>...
Row 2: <value>-1.1</value><value>-1.2</value>...
The XML of all XML columns contain the exact same number of elements.
What I want now is to create a select that has the following output:
| ID | Name | Time | Value |
+----+--------+------+-------+
| 1 | TestR1 | 0.1 | 1.1 |
| 1 | TestR1 | 0.2 | 1.2 |
| .. | ...... | .... | ..... |
| 2 | TestS1 | -0.1 | -1.1 |
| 2 | TestS2 | -0.2 | -1.2 |
| .. | ...... | .... | ..... |
For a single column CROSS APPLY works fine:
SELECT ID, N.value('.', 'nvarchar(50)') AS ExtractedName
FROM <source>
CROSS APPLY <source>.nodes('/name') AS T(N)
Applying multiple CROSS APPLY statements makes no sense here to me.
I would guess it would work if I would create selects for each column that then produce individual result sets and perform a select over all of the result sets
but that's very likely not the best solution as I am duplicating selects for each additional column.
Any suggestions on how to design a query like that would be highly appreciated!

I'd suggest this approach:
First I create a declared table variable and fill it with your sample data to simulate your issue. This is called "MCVE", please try to provide this yourself in your next question.
DECLARE #tbl TABLE(ID INT, Names XML,Times XML,[Values] XML);
INSERT INTO #tbl VALUES
(1,'<name>TestR1</name><name>TestR2</name>','<time>0.1</time><time>0.2</time>','<value>1.1</value><value>1.2</value>')
,(2,'<name>TestS1</name><name>TestS2</name>','<time>0.3</time><time>0.4</time>','<value>2.1</value><value>2.2</value>');
--The query
SELECT t.ID
,t.Names.value('(/name[sql:column("tally.Nmbr")])[1]','nvarchar(max)') AS [Name]
,t.Times.value('(/time[sql:column("tally.Nmbr")])[1]','decimal(10,4)') AS [Time]
,t.[Values].value('(/value[sql:column("tally.Nmbr")])[1]','decimal(10,4)') AS [Value]
FROM #tbl t
CROSS APPLY
(
SELECT TOP(t.Names.value('count(*)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) Nmbr FROM master..spt_values
) tally;
The idea in short:
We create a tally on the fly by using APPLY to create a list of numbers.
The TOP-clause will limit this list to the count of <name> elements in the given row.
In this case I take master..spt_values as a source for many rows. We do not need the content, just an appropriate list to create a tally. This said, if there is a physical numbers table in your database, this was even better.
Finally we can pick the content by the element's position using sql:column() to introduce the tally's value into the XQuery predicate.

Related

Break up the data in a database column of a record into multiple records

Azure SQL Server - we have a table like this:
MyTable:
ID Source ArticleText
-- ------ -----------
1 100 <nvarchar(max) field with unstructured text from media articles>
2 145 "
3 866 "
4 232 "
ID column is the primary key and auto-increments on INSERTS.
I run this query to find the records with the largest data size in the ArticleText column:
SELECT TOP 500
ID, Source, DATALENGTH(ArticleText)/1048576 AS Size_in_MB
FROM
MyTable
ORDER BY
DATALENGTH(ArticleText) DESC
We are finding that for many reasons both technical and practical, the data in the ArticleText column is just too big in certain records. The above query allows me to look at a range of sizes for our largest records, which I'll need to know for what I'm trying to formulate here.
The feat I need to accomplish is, for all existing records in this table, any record whose ArticleText DATALENGTH is greater than X, break that record into X amount of records where each record will then contain the same value in the Source column, but have the data in the ArticleText column split up across those records in smaller chunks.
How would one achieve this if the exact requirement was say, take all records whose ArticleText DATALENGTH is greater than 10MB, and break each into 3 records where the resulting records' Source column value is the same across the 3 records, but the ArticleText data is separated into three chunks.
In essence, we would need to divide the DATALENGTH by 3 and apply the first 1/3 of the text data to the first record, 2nd 1/3 to the 2nd record, and the 3rd 1/3 to the third record.
Is this even possible in SQL Server?
You can use the following code to create a side table with the needed data:
CREATE TABLE #mockup (ID INT IDENTITY, [Source] INT, ArticleText NVARCHAR(MAX));
INSERT INTO #mockup([Source],ArticleText) VALUES
(100,'This is a very long text with many many words and it is still longer and longer and longer, and even longer and longer and longer')
,(200,'A short text')
,(300,'A medium text, just long enough to need a second part');
DECLARE #partSize INT=50;
WITH recCTE AS
(
SELECT ID,[Source]
,1 AS FragmentIndex
,A.Pos
,CASE WHEN A.Pos>0 THEN LEFT(ArticleText,A.Pos) ELSE ArticleText END AS Fragment
,CASE WHEN A.Pos>0 THEN SUBSTRING(ArticleText,A.Pos+2,DATALENGTH(ArticleText)/2) END AS RestString
FROM #mockup
CROSS APPLY(SELECT CASE WHEN DATALENGTH(ArticleText)/2 > #partSize
THEN #partSize - CHARINDEX(' ',REVERSE(LEFT(ArticleText,#partSize)))
ELSE -1 END AS Pos) A
UNION ALL
SELECT r.ID,r.[Source]
,r.FragmentIndex+1
,A.Pos
,CASE WHEN A.Pos>0 THEN LEFT(r.RestString,A.Pos) ELSE r.RestString END
,CASE WHEN A.Pos>0 THEN SUBSTRING(r.RestString,A.Pos+2,DATALENGTH(r.RestString)/2) END AS RestString
FROM recCTE r
CROSS APPLY(SELECT CASE WHEN DATALENGTH(r.RestString)/2 > #partSize
THEN #partSize - CHARINDEX(' ',REVERSE(LEFT(r.RestString,#partSize)))
ELSE -1 END AS Pos) A
WHERE DATALENGTH(r.RestString)>0
)
SELECT ID,[Source],FragmentIndex,Fragment
FROM recCTE
ORDER BY [Source],FragmentIndex;
GO
DROP TABLE #mockup
The result
+----+--------+---------------+---------------------------------------------------+
| ID | Source | FragmentIndex | Fragment |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 1 | This is a very long text with many many words and |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 2 | it is still longer and longer and longer, and |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 3 | even longer and longer and longer |
+----+--------+---------------+---------------------------------------------------+
| 2 | 200 | 1 | A short text |
+----+--------+---------------+---------------------------------------------------+
| 3 | 300 | 1 | A medium text, just long enough to need a second |
+----+--------+---------------+---------------------------------------------------+
| 3 | 300 | 2 | part |
+----+--------+---------------+---------------------------------------------------+
Now you have to update the existing line with the value at FragmentIndex=1, while you have to insert the values of FragmentIndex>1. Do this sorted by FragmentIndex and your IDENTITY ID-column will reflect the correct order.

MSSQL-Column calculated with two tables, is it possible?

I have an immense doubt, is it possible to create a column calculated using two tables?
Table 1:
---------------------
id | Value1 |
---------------------
1 | 25 |
Table 2
---------------------
id | Value2 |
---------------------
1 | 5 |
Now, in a 3rd table I want a calculated column of the values ā€‹ā€‹1 and 2?? is it possible?
Table 3
---------------------
id | Sumvalues |
---------------------
1 | ? |
or is there another method that can be used for "sumvalues" to self-adjust with the change of the other fields related to it?
The best option is to create a view in my opinion:
create view vMyView as
select T1.id,
T1.Value1 + T2.Value2
from [Table1] T1 join [Table2] T2 on T1.id = T2.id
This way, everytime you execute query against the view, you will get most actual data.
Use for adding value1 to value2:
SET #value1 = SELECT value1 FROM TABLE1;
SET #value2 = SELECT value2 FROM TABLE2;
SET #value3 = #value1+#value2;
INSERT INTO TABLE3 (value3) VALUES (#value3);
This may contain typos since Iā€™m writting from a cell phone.

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!
you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1
Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)
You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

Finding the best match with "fuzzy" ranking logic

I need help with grouping results of the below temp table using a 'rank' column.
The temp table (MS SQL) is as follows:
student_address | school_address | student_st| school_st| district | districtID | rank
---------------------------------------------------------------------------------------
123 some street | 12 apple way | CT | CT | 322 | 322 | 0.2
123 some street | 33 pear street| CT | NJ | 039 | 039 | 0.1
333 another st. | NULL | VT | NULL | 111 | 111 | 0.0
I populated the #temp table as such:
SELECT st.student_address, sc.school_address, st.student_st, sc.district, st.districtID, '0.0' as rank
FROM students st
LEFT OUTER JOIN schools sc
ON st.[District ID] = sc.District
ORDER BY st.[District ID] asc;
I followed the results of my temp table by a series of updates that changed the 'rank' column based on certain rules (e.g. no match between school and student = 0.0, only a district match = 0.1, a district match & a state match = 0.2 and so on). The end result is that highly ranked rows are more likely to show the student's actual school vs. lesser ranked rows.
Where I need help is the final query. I essentially want to return all student info (all rows from the original students table) and the most likely corresponding school (determined by rank).
Something like (pseudo code)
select student_address, student_st, student_etc, school_address
from #temp
where rank = max(rank)
group by student_address
I know the above isn't correct SQL, but I hope it gives you an idea what I am trying to achieve?
Thanks for any guidance.
You can try this out:
select student_address, student_st, student_etc, school_address,RANK
from #temp t1
group by student_address, student_st, student_etc, school_address,RANK having
RANK=(select MAX(RANK) from #temp t2 where t1.student_address=t2.student_address)
I think you're close. Probably need to use a subquery like:
SELECT student_address, student_st, student_etc, school_address
FROM #temp
WHERE rank = (SELECT MAX(rank) FROM #temp)
...though I'm missing where student_street is coming from. The above, however looks like the pattern you're looking for.

Find "regional" relationships in SQL data using a query, or SSIS

Edit for clarification: I am compiling data weekly, based on Zip_Code, but some Zip_Codes are redundant. I know I should be able to compile a small amount of data, and derive the redundant zip_codes if I can establish relationships.
I want to define a zip code's region by the unique set of items and values that appear in that zip code, in order to create a "Region Table"
I am looking to find relationships by zip code with certain data. Ultimately, I have tables which include similar values for many zip codes.
I have data similar to:
ItemCode |Value | Zip_Code
-----------|-------|-------
1 |10 | 1
2 |15 | 1
3 |5 | 1
1 |10 | 2
2 |15 | 2
3 |5 | 2
1 |10 | 3
2 |10 | 3
3 |15 | 3
Or to simplify the idea, I could even concantenate ItemCode + Value into unique values:
ItemCode+
Value | Zip_Code
A | 1
B | 1
C | 1
A | 2
B | 2
C | 2
A | 3
D | 3
E | 3
As you can see, Zip_Code 1 and 2 have the same distinct ItemCode and Value. Zip_Code 3 however, has different values for certain ItemCodes.
I need to create a table that establishes a relationship between Zip_Codes that contain the same data.
The final table will look something like:
Zip_Code | Region
1 | 1
2 | 1
3 | 2
4 | 2
5 | 1
6 | 3
...etc
This will allow me to collect data only once for each unique Region, and derive the zip_code appropriately.
Things I'm doing now:
I am currently using a query similar to a join, and compares against Zip_Code using something along the lines of:
SELECT a.ItemCode
,a.value
,a.zip_code
,b.ItemCode
,b.value
,b.zip_code
FROM mytable as a, mytable as b -- select from table twice, similar to a join
WHERE a.zip_code = 1 -- left table will have all ItemCode and Value from zip 1
AND b.zip_code = 2 -- right table will have all ItemCode and Value from zip 2
AND a.ItemCode = b.ItemCode -- matches rows on ItemCode
AND a.Value != b.Value
ORDER BY ItemCode
This returns nothing if the two zip codes have exactly the same ItemNum, and Value, and returns a slew of differences between the two zip codes if there are differences.
This needs to move from a manual process to an automated process however, as I am now working with more than 100 zip_codes.
I do not have much programming experience in specific languages, so tools in SSIS are somewhat limited to me. I have some experience using the Fuzzy tools, and feel like there might be something in Fuzzy Grouping that might shine a light on apparent regions, but can't figure out how to set it up.
Does anyone have any suggestions? I have access to SQLServ and its related tools, and Visual Studio. I am trying to avoid writing a program to automate this, as my c# skills are relatively nooby, but will figure it out if necessary.
Sorry for being so verbose: This is my first Question, and the page I agreed to in order to ask a question suggested to explain in detail, and talk about what I've tried...
Thanks in advance for any help I might receive.
Give this a shot (I used the simplified example, but this can easily be expanded). I think the real interesting part of this code is the recursive CTE...
;with matches as (
--Find all pairs of zip_codes that have matching values.
select d1.ZipCode zc1, d2.ZipCode zc2
from data d1
join data d2 on d1.Val=d2.Val
group by d1.ZipCode, d2.ZipCode
having count(*) = (select count(distinct Val) from data where zipcode = d1.Zipcode)
), cte as (
--Trace each zip_code to it's "smallest" matching zip_code id.
select zc1 tempRegionID, zc2 ZipCode
from matches
where zc1<=zc2
UNION ALL
select c.tempRegionID, m.zc2
from cte c
join matches m on c.ZipCode=m.zc1
and c.ZipCode!=m.zc2
where m.zc1<=m.zc2
)
--For each zip_code, use it's smallest matching zip_code as it's region.
select zipCode, min(tempRegionID) as regionID
from cte
group by ZipCode
Demonstrating that there's a use for everything, though normally it makes me cringe: concatenate the values for each zip code into a single field. Store ZipCode and ConcatenatedValues in a lookup table (PK on the one, UQ on the other). Now you can assess which zip codes are in the same region by grouping on ConcatenatedValues.
Here's a simple function to concatenate text data:
CREATE TYPE dbo.List AS TABLE
(
Item VARCHAR(1000)
)
GO
CREATE FUNCTION dbo.Implode (#List dbo.List READONLY, #Separator VARCHAR(10) = ',') RETURNS VARCHAR(MAX)
AS BEGIN
DECLARE #Concat VARCHAR(MAX)
SELECT #Concat = CASE WHEN Item IS NULL THEN #Concat ELSE COALESCE(#Concat + #Separator, '') + Item END FROM #List
RETURN #Concat
END
GO
DECLARE #List AS dbo.List
INSERT INTO #List (Item) VALUES ('A'), ('B'), ('C'), ('D')
SELECT dbo.Implode(#List, ',')

Resources