TSQL Distinct within subquery using xml - sql-server

I have the following table structure:
Table1 - List of Possible Region/Market Options
Region | Market
--------+----------------
EMEA | London
NA | Omaha
EMEA | Another City
Table2 - Holds the Markets that were selected as impacts
RequestID | Market
------------+----------------
123 | London
123 | Omaha
456 | Another City
Within my stored procedure, I am trying to create a distinct list of Region/Markets that are impacted based on Table2. The end result will be a distinct list of regions with all of the markets within them that are impacted.
In this case, there are two impacts from the EMEA region but I wouldn't want EMEA to show up twice.
When I was doing this on a single request, I was able to create a temp table and insert the data into it and then accomplish what I needed to do. However, this is pulling all of the results and I need to do this within the query and I am not sure how..
This code works fine if I wasn't getting the distinct data from it. Otherwise, it throws the error:
The xml data type cannot be selected as DISTINCT because it is not comparable.
Is there another way I can accomplish this within a sub-query/sub-select?
...
(SELECT DISTINCT
region,
(SELECT m.market
FROM dbo.bs_ToolRequests_MarketOptions AS m
INNER JOIN dbo.BS_ToolRequests_ImpactedMarkets AS ma ON ma.market = m.market
WHERE m.region = mo.region
AND ma.requestID = t.requestID
FOR XML PATH ('options'), TYPE, ELEMENTS, ROOT ('markets'))
FROM
dbo.BS_ToolRequests_MarketOptions AS mo
FOR XML PATH ('regions'), TYPE, ELEMENTS, ROOT ('impactedMarkets')),
Expected result:
<impactedMarkets>
<regions>
<region>EMEA</region>
<markets>
<options>
<market>London</market>
</options>
<options>
<market>Another City</market>
</options>
</markets>
</regions>
</impactedMarkets>

Probably you need to use DISTINCT keyword in the SELECT statement of derived table
...
(SELECT region,
(SELECT m.market
FROM dbo.bs_ToolRequests_MarketOptions AS m
INNER JOIN dbo.BS_ToolRequests_ImpactedMarkets AS ma ON ma.market = m.market
WHERE m.region = mo.region
AND ma.requestID = t.requestID
FOR XML PATH ('options'), TYPE, ELEMENTS, ROOT ('markets'))
FROM
(SELECT DISTINCT region FROM dbo.BS_ToolRequests_MarketOptions) AS mo
FOR XML PATH ('regions'), TYPE, ELEMENTS, ROOT ('impactedMarkets')),

Related

SQL Server 2008 - False error for "Msg 8120"?

I am writing a query in SQL Server 2008 (Express I believe?). I am currently getting this error:
Msg 8120, Level 16, State 1, Line 16
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I am trying to do a historical analysis of our production WIP (Work In Process).
I have created a standalone calendar table (actually located in a separate database called BAS on the same server to not interfere with the ERP that operates the AIM database). I've been overwhelmed for days with some of the examples for creating running total queries/views/tables, so for now I'll just plan on taking care of that part inside of Crystal Reports 2016. My thinking was that I wanted to return records for each order each day of my calendar table (to be narrowed down in the future to only days that match records in the AIM database). The values I think I will need are:
Record Date (not unique)
Order Number (unique for each day)
Estimated hours for the job
The total number of hours worked on the job current as of today's date (in case the estimated hours were drastically underbudgeted)
The SUM of the direct labor hours charged to the job on said record date
The COUNT of the number of employees in attendance on said record date.
The SUM of the hours attended by employees on said record date.
The tables I use are as follows:
BAS Database:
dbo.DateDimension - Used for complete calendar of dates from 1/1/1987 to 12/31/2036
AIM Database:
dbo.AggAttend - Contains one or more records for each employee's attendance duration on a given date (i.e. One record for each punch-in / punch-out. Should be equal to indirect + direct labor)
dbo.AggTicket - Contains one or more records for each employee's direct labor duration charged to a particular order number
dbo.ModOrders - Contains one record for each order including the estimated hours, start date, and end date (I will worry about using the start and end dates later for figuring out how many available hours there were on each date)
Here is the code I'm using in my query:
;WITH OrderTots AS
(
SELECT
AggTicket.OrderNo,
SUM(AggTicket.TotDirectHrs) AS TotActHrs
FROM
AIM.dbo.AggTicket
GROUP BY
AggTicket.OrderNo
)
SELECT
d.Date,
t.OrderNo,
o.EstHrs,
OrderTots.TotActHrs,
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs,
COUNT(a.EmplCode) AS NumEmployees,
SUM(a.TotHrs) AS DaysAttendHrs
FROM
BAS.dbo.DateDimension d
INNER JOIN
AIM.dbo.AggAttend a ON d.Date = a.TicketDate
LEFT OUTER JOIN
AIM.dbo.AggTicket t ON d.Date = t.TicketDate
LEFT OUTER JOIN
AIM.dbo.ModOrders o ON t.OrderNo = o.OrderNo
LEFT OUTER JOIN
OrderTots ON t.OrderNo = OrderTots.OrderNo
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs
ORDER BY
d.Date
When I run that query in SQL Server Management Studio 2017, I get the above error.
These are my questions for the community:
Does this error message correctly describe an error in my code?
If so, why is that error an error? (To the best of my knowledge, everything is already contained in either an aggregate function or in the GROUP BY clause...smh)
What is a better way to write this query so that it will function?
Much appreciation to everyone in advance!
I am writing a query in SQL Server 2008 (Express I believe?).
SELECT ##VERSION Will let you know what version you are on.
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.
The problem is with your SUM OVER() statement:
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs
Here, since you are using the OVER clause, you must include it in the GROUP BY. The OVER clause is used to determine the partitioning and order of a row-set for a window function. So, while you are using an aggregate with SUM you are doing this in a window function. Window functions belong to a type of function known as a 'set function', which means a function that applies to a set of rows. The word 'window' is used to refer to the set of rows that the function works on.
Thus, add t.TotDirectHrs to the GROUP BY
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs, t.TotDirectHrs
If this narrows your results into a grouping that you don't want, then you can wrap it in another CTE or use a correlated sub-query. Potentially like the below:
(SELECT SUM(t2.TotDirectHrs) OVER (PARTITION BY t2.TicketDate) AS DaysDirectHrs FROM AIM.dbo.AggTicket t2 WHERE t2.TicketDate = t.TicketDate) as DaysDirectHrs,
EXAMPLE
if object_id('tempdb..#test') is not null
drop table #test
create table #test(id int identity(1,1), letter char(1))
insert into #test
values
('a'),
('b'),
('b'),
('c'),
('c'),
('c')
Given the data set above, suppose we wanted to get a count of all rows. That's simple right?
select
TheCount = count(*)
from
#test
+----------+
| TheCount |
+----------+
| 6 |
+----------+
Here, no GROUP BY is needed because it's implied to group over all columns since no columns are specified in the SELECT list. Remember, GROUP BY groups the SELECT statement results according to the values in a list of one or more column expressions. If aggregate functions are included in the SELECT list, GROUP BY calculates a summary value for each group. These are known as vector aggregates.[MSDN].
Now, suppose we wanted to count each letter in the table. We could do that at least two ways. Using COUNT(*) with the letter column in the select list--or using COUNT(letter) with the letter column in the select list. However, in order for us to attribute the count with the letter, we need to return the letter column. Thus, we must include letter in the GROUP BY to tell SQL Server what to apply the summary table to.
select
letter
,TheCount = count(*)
from
#test
group by
letter
+--------+----------+
| letter | TheCount |
+--------+----------+
| a | 1 |
| b | 2 |
| c | 3 |
+--------+----------+
Now, what if we wanted to return this same count, but we wanted to return all rows as well? This is where window functions come in. The window function works similar to GROUP BY in this case by telling SQL Server the set of rows to apply the aggregate to. Then, it's value is returned for for every row in this window / partition. Thus, it returns a column which is applied to every row making it just like any column or calculated column which is returned form the select list.
select
letter
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
+--------+---------------------+
| letter | TheCountOfTheLetter |
+--------+---------------------+
| a | 1 |
| b | 2 |
| b | 2 |
| c | 3 |
| c | 3 |
| c | 3 |
+--------+---------------------+
Now we get to your case where you want to use an aggregate and an aggregate in a window function. Remember that the return of the window function is treated like any other column, thus must be applied in the GROUP BY. Pseudo would look something like this, but window functions aren't allowed in the GROUP BY clause.
select
letter
,TheCount = count(*)
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
group by
letter
,count(*) over (partition by letter)
--returns an error
Thus, we must a correlated sub-query or a cte or some other method.
select
t.letter
,TheCount = count(*)
,TheCountOfTheLetter = (select distinct count(*) over (partition by letter) from #test t2 where t2.letter = t.letter)
from
#test t
group by
t.letter
+--------+----------+---------------------+
| letter | TheCount | TheCountOfTheLetter |
+--------+----------+---------------------+
| a | 1 | 1 |
| b | 2 | 2 |
| c | 3 | 3 |
+--------+----------+---------------------+

Are Views or Functions faster in SQL?

I have a table with customer receipts. I'm trying to generate a report based on the user's name, address, and purchases total based by department. The desired output should look like
|Customer |Address | Clothing | Electronics | Hardware | Household |
|Homer Simpson | 724 Evergreen Terr | $42 | $20 | $500 | $24 |
|Walter White | 308 Negra Arroyo Lane | $120 | $80 | $52 | $2400 |
The receipts table is part of a temporal model. So, the code looks like:
Select c.customername,a.address,r.receiptno,ir.department,ir.total
from customer c
inner join customer_address_lnk cal on cal.customerid = c.id
inner join address a on cal.addressid = a.id
inner join customer_receipts_lnk crl on crl.customerid = c.id
inner join receipts r on crl.receiptid = r.id
inner join receipts_receiptitem_lnk rrl on rrl.receiptid = r.id
inner join receiptitem ri on ri.id = rrl.receiptitemid
The lnk tables are linking tables.
The receiptitem table has the following columns: ID, Department, Amount, CreatedDate, UpdatedDate
The idea is that if the receipt is updated, the updated amount can be adjusted for returns, price adjustments, and so forth.
The goal is to get the query under 5 sec. Since we have over 125 million rows in the receiptitems table alone, it takes SQL 20+ minutes to calculate the report.
I've tried CTE's on views without success. I've tried different JOIN orders. I've used LEFT Joins. Even Pivot didn't slow it down. I still can't get it under 20 minutes.
Before I start down the path of creating a Function to get it under the 5 second goal, I'm open to any suggestions. I have limited ability to alter indices at this time.
Any thoughts?
Well, obviously views and SQL functions are different things.
Try to use a function where it needs to be clear to a user in the future (maybe yourself!) that the data returned requires certain parameters where the data does not make sense without those parameters. Sort of like forcing the user to include a WHERE clause.
In your example, you may want to force the user to filter by CustomerId or ReceiptId.
HOWEVER....
In this case, the view approach would probably be better.
Functions, by design, do not use temporary tables, but use table variables instead. Tables as variables are much slower than temp tables.
The query you've included is really straight forward with no surprises. The view would be the simplest and best approach here.
For 125M rows, I suggest either checking execution plan during processing (include a WHERE clause for this) or dumping data into a summary table that is updated periodically. Or both. Check indexes all along the way.
Here is more (better) discussion Test SQL Queries

Metadata database design

I am trying to store meta data about a document into a SQL Server. The document are stored into a document archive, and returns back an identifier so I can get back that document by asking the archive to get the document by identifier.
Our user would like to be able to search for this document based on different meta data. The meta data could be 1 attribute or 5 depending on the document type, and the users should be able to create new document types from a admin site.
I can see two solution here. One is that each documenttype gets it's own metadata table, where all metadata attributes are predefined, and if one should be added a new column needs to be created. And if a new documenttype is created a new metadata table needs to be created. Our DBA will freak out with a solution like this, and I also see a problem with indexes. Because if the documenttype has 5 different meta data attributes it needs to be searchable with 1 or 4 of them specified in the search. Then I would need to write index for all the different combinations of possible searchs.
here is an example (fictiv)
|documentId | Name | InsertDate | CustomerId | City
| 1 | John | 2014-01-01 | 2 | London
| 2 | John | 2014-01-20 | 5 | New York
| 3 | Able | 2014-01-01 | 10 | Paris
I could here say:
Give me all documents where Name = 'John'
Give me all documets where Name = 'John' And CustomerId = 5
Give me all document where InserDate = '2014-01-01' and City = 'London'
This will be 3 differnet indexes and then I haven't coverd all possible combinations. This isn't practical.
So I am look in to the evil 'EAV' (anti)pattern.
So instead of having the metadata as columns I can have the as rows.
|documentId | MetaAttribute | MetaValue
| 1 | Name | John
| 1 | InsertDate | 2014-01-01
| 1 | CustomerId | 2
| 1 | City | London
| 2 | Name | John
| 2 | InsertDate | 2014-01-20
| 2 | CustomerId | 5
| 2 | City | New York
| 3 | Name | Able
| 3 | InserDate | 2014-01-01
| 3 | CustomerId | 10
| 3 | City | Paris
Here it's simple to create one index om MetaAttribute och metaValue, and it's covered. If a new documenttype is created, new metadata can be created with that documenttype into a MetaAttributeTable (that contains all MetaAttribute for the different documenttype). So no need to create new tables or coulms if a new documenttype is added or if a new attribute is added to a documenttype. Instead all MetaValues most be strings :( and the SQL Query to find the document id is a bit more complicated.
This is what I figured out. (In this example the MetaAttribute is a string, but would be an ID to the MetaAttribute Table)
SELECT * FROM [Document]
WHERE ID IN (SELECT documentId FROM [MetaData]
WHERE ((MetaAttribute = 'Name' AND MetaValue = 'John')
OR (MetaAttribute = 'CustomerId' and MetaValue = '5'))
GROUP BY [documentId]
HAVING Count(1) = 2)
Here I need to ask if the Name = 'John' and CustomerId = 5. I do that by finding all records that have Name = 'John' and CustomerId = '5' and the Group it on the documentId and count number of items in the group. If I got 2 then both Name = 'John' and CustomerId = '5' is true for this search. Return the documentId and use that to retrive information about the document, like the document archive storage id.
There should be a better SQL statement for this isn't there?
So my question is. Is there a better approche than these 2. Is the EAV-pattern so bad that I should stick with the first approche and have a Freaked out DBA and "ten millions of indexes"
We are talking about a system that will have around 10-20 millions of new records each month, and contain data for at least 3 years.... So the tables will be preatty big and good indexes are neccasary for performance.
Best Regards
Magnus
The EAV model is appealing if you have unbounded attributes--that is, anyone can set up anything as an attribute. However, it sounds from your description that this is not the case--the possible document attributes come from a known and fairly limited set. If this is the case, routine normalization suggests the following:
-- One per document
CREATE TABLE Document
(
DocumentId -- primary key
,DocumentType
,<etc>
)
-- One per "type" of document
CREATE TABLE DocumentType
(
DocumentTypeId -- pirmary key
,Name
)
-- One per possible document attribute.
-- Note that multiple document types can reference the same attribute
CREATE TABLE DocumentAttributes
(
AttributeId -- primary key
,Name
)
-- This lists which attributes are used by a given type
CREATE TABLE DocumentTypeAttributes
(
DocumentTypeId
,AttributeId
-- compound primary key on both columns
-- foeign keys on both columns
)
-- This contains the final association of document and attributes
CREATE TABLE DocumentAttributeValues
(
DocumentId
,AttributeId
,Value
-- compound primary key on DocumentId, AttributeId
-- foeign keys on both columns ot their respective parent tables
)
A tighter model with more robust keys could be implemented to ensure at the database level that an attribute cannot be assigned to a document with an “inappropriate” type.
Queries have to use joins, but (presumably) only the Documents and DocumentAttributes tables will ever be large. An index on on (AttributeId + Value) facilitiate lookups by attribute type, and depending on cardinality an index on (Value + AttributeId) could make searches for specific attributes quite efficient.
(Edit)
Ooh, clever, I created two tables with the same name. I've renamed the last one to DocumentAttributeValues. (Free advice is clearly worth what you paid for it!)
This shows how ugly these systems can get in SQL, as you have to “look up” both attributes separately. On the plus side you don’t have to worry about “does this type go with this document”, as those rules have (better had) been applied when the data was loaded. Two examples:
This one spells everything out in joins, and as such I think it might perform worse than the next:
-- Top-down
SELECT do.DocumentId
from Documents do
inner join DocumentAttributes da1
on da.Name = 'Name'
inner join DocumentAttributeValues dav1
on dav1.AttributeId = da1.AttributeId
and dav1.Value = 'John'
inner join DocumentAttributes da2
on da2.Name = 'CustomerId'
inner join DocumentAttributeValues dav2
on dav2.AttributeId = da2.AttributeId
and dav2.Value = '5'
This one picks out the attributes, then finds which documents have all of them. It might perform better, as there’s one less table to process:
-- Bottom-up
SELECT xx.DocumentId
from (-- All documents with name "John"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'Name'
and dav.Value = 'John'
-- This combines the two sets, with "all" keeping any duplicate entries
union all
-- All documents with CustomerId = "5"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'CustomerId'
and dav.Value = '5') xx -- Have to give the subquery an alias
group by xx.DocumentId
having count(*) = 2
While further refinements might be possible, the more more attributes you’re filtering on, the uglier the queries will be. Five attributes max might work ok in SQL, but if you’ve got tons of attributes, a NoSQL solution might be what you’re looking for.
(Please note that, as with my original post, I have not tested this code, so there may be typos or subtle--or not so subtle--errors in here.)
SQL Server 2008+ offers three related features for dealing with such cases:
Sparse Columns which allow you to define hundreds of columns even if only a subset are used at a time
Column Sets allow you to group these columns and treat them as a group
Filtered indexes can index only the rows that actually have values in them.
These features allow you to work with more-or-less normal SQL statements to handle all metadata columns.
These features were specifically added to address the EAV/metadata scenario.
EDIT
If you have a limited set of attributes that are always filled, there is no need for Sparse Columns or the EAV anti-pattern either.
You can create your tables as you normally would and add indexes to optimize the real workload you encounter. Certain types of queries will occur far more often than others and SQL Server's Index tuning advisor can propose the indexes and statistics to use based on a trace captured using SQL Server's Profiler.
It's quite possible that only a subset of the columns will accelerate searches and the rest can be added as include columns in the index.
Full Text Search
A more powerful option is to use SQL Server's Full Text Search. This will allow you to execute queries using arbitrary attributes. This is another technique using by document/content management systems, ERPs and CRMs to handle arbitrary attributes.
With FTS you simply specify the columns to include in one FTS index and don't have to create separate indexes for each attribute.
You can use FTS predicates in SELECT queries like this:
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
This can result in much simpler queries (you just write a modified select) and administration (no worries about column order in indexes, only one FTS index to manage)

Join two rows together if they share the same value?

I've shifted through views and other points and I've gotten to here. Take example below
Name | Quantity | Billed |
| | |
PC Tablet| 0 | 100 |
PC Tablet| 100 | -2345 |
Monitor | 9873 | 0 |
Keyboard | 200 | -300 |
So basically the select I would do off this view. I would want it to Bring in the data BUT it be ordered by the Name first so its in nice alphabetical order and also for a few reasons some of the records appear more then once (I think the most is 4 times). If you add the up the rows with duplicates the true 'quantity' and 'billed' would be correct.
NOTE: The actual query is very long but I broke it down for a simple example to explain the problem. The idea is the same but there is A LOT MORE columns that needs to be added together... So I'm looking for a query that would bring them together if it contains the same name. I've tried a bunch of different queries with no success either it rolls ALL the rows into one. or it won't work and I get a bunch of null errors/ name column is invalid in the select list/group by because it's not an aggregate function..
Is this even possible?
Try:
SELECT A.Name, A.TotalQty, B.TotalBilled
FROM (
SELECT Name, SUM(Quantity) as TotalQty
FROM YourTableHere
GROUP BY Name
) A
INNER JOIN
(
SELECT Name, SUM(Billed) as TotalBilled
FROM YourTableHere
GROUP BY Name
) B
ON A.Name = B.Name

SQL View where rows not mapped

Basically what I'm trying to figure out is,
Say I have
table 1tbl1
ID | Name
and table2tbl2
ID | Name
Then I have a mapping table mt
ID | tbl1ID | tbl2ID
Data really isn't important here, and these tables are examples.
How to make a view that will grab all the items in tbl1 that aren't mapped to mt.
I'm using Microsoft SQL-server 2008 by the way.
CREATE VIEW v_unmapped
AS
SELECT *
FROM tbl1
WHERE id NOT IN
(
SELECT tbl1Id
FROM mt
)

Resources