SQL Server 2008 - False error for "Msg 8120"? - sql-server

I am writing a query in SQL Server 2008 (Express I believe?). I am currently getting this error:
Msg 8120, Level 16, State 1, Line 16
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I am trying to do a historical analysis of our production WIP (Work In Process).
I have created a standalone calendar table (actually located in a separate database called BAS on the same server to not interfere with the ERP that operates the AIM database). I've been overwhelmed for days with some of the examples for creating running total queries/views/tables, so for now I'll just plan on taking care of that part inside of Crystal Reports 2016. My thinking was that I wanted to return records for each order each day of my calendar table (to be narrowed down in the future to only days that match records in the AIM database). The values I think I will need are:
Record Date (not unique)
Order Number (unique for each day)
Estimated hours for the job
The total number of hours worked on the job current as of today's date (in case the estimated hours were drastically underbudgeted)
The SUM of the direct labor hours charged to the job on said record date
The COUNT of the number of employees in attendance on said record date.
The SUM of the hours attended by employees on said record date.
The tables I use are as follows:
BAS Database:
dbo.DateDimension - Used for complete calendar of dates from 1/1/1987 to 12/31/2036
AIM Database:
dbo.AggAttend - Contains one or more records for each employee's attendance duration on a given date (i.e. One record for each punch-in / punch-out. Should be equal to indirect + direct labor)
dbo.AggTicket - Contains one or more records for each employee's direct labor duration charged to a particular order number
dbo.ModOrders - Contains one record for each order including the estimated hours, start date, and end date (I will worry about using the start and end dates later for figuring out how many available hours there were on each date)
Here is the code I'm using in my query:
;WITH OrderTots AS
(
SELECT
AggTicket.OrderNo,
SUM(AggTicket.TotDirectHrs) AS TotActHrs
FROM
AIM.dbo.AggTicket
GROUP BY
AggTicket.OrderNo
)
SELECT
d.Date,
t.OrderNo,
o.EstHrs,
OrderTots.TotActHrs,
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs,
COUNT(a.EmplCode) AS NumEmployees,
SUM(a.TotHrs) AS DaysAttendHrs
FROM
BAS.dbo.DateDimension d
INNER JOIN
AIM.dbo.AggAttend a ON d.Date = a.TicketDate
LEFT OUTER JOIN
AIM.dbo.AggTicket t ON d.Date = t.TicketDate
LEFT OUTER JOIN
AIM.dbo.ModOrders o ON t.OrderNo = o.OrderNo
LEFT OUTER JOIN
OrderTots ON t.OrderNo = OrderTots.OrderNo
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs
ORDER BY
d.Date
When I run that query in SQL Server Management Studio 2017, I get the above error.
These are my questions for the community:
Does this error message correctly describe an error in my code?
If so, why is that error an error? (To the best of my knowledge, everything is already contained in either an aggregate function or in the GROUP BY clause...smh)
What is a better way to write this query so that it will function?
Much appreciation to everyone in advance!

I am writing a query in SQL Server 2008 (Express I believe?).
SELECT ##VERSION Will let you know what version you are on.
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.
The problem is with your SUM OVER() statement:
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs
Here, since you are using the OVER clause, you must include it in the GROUP BY. The OVER clause is used to determine the partitioning and order of a row-set for a window function. So, while you are using an aggregate with SUM you are doing this in a window function. Window functions belong to a type of function known as a 'set function', which means a function that applies to a set of rows. The word 'window' is used to refer to the set of rows that the function works on.
Thus, add t.TotDirectHrs to the GROUP BY
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs, t.TotDirectHrs
If this narrows your results into a grouping that you don't want, then you can wrap it in another CTE or use a correlated sub-query. Potentially like the below:
(SELECT SUM(t2.TotDirectHrs) OVER (PARTITION BY t2.TicketDate) AS DaysDirectHrs FROM AIM.dbo.AggTicket t2 WHERE t2.TicketDate = t.TicketDate) as DaysDirectHrs,
EXAMPLE
if object_id('tempdb..#test') is not null
drop table #test
create table #test(id int identity(1,1), letter char(1))
insert into #test
values
('a'),
('b'),
('b'),
('c'),
('c'),
('c')
Given the data set above, suppose we wanted to get a count of all rows. That's simple right?
select
TheCount = count(*)
from
#test
+----------+
| TheCount |
+----------+
| 6 |
+----------+
Here, no GROUP BY is needed because it's implied to group over all columns since no columns are specified in the SELECT list. Remember, GROUP BY groups the SELECT statement results according to the values in a list of one or more column expressions. If aggregate functions are included in the SELECT list, GROUP BY calculates a summary value for each group. These are known as vector aggregates.[MSDN].
Now, suppose we wanted to count each letter in the table. We could do that at least two ways. Using COUNT(*) with the letter column in the select list--or using COUNT(letter) with the letter column in the select list. However, in order for us to attribute the count with the letter, we need to return the letter column. Thus, we must include letter in the GROUP BY to tell SQL Server what to apply the summary table to.
select
letter
,TheCount = count(*)
from
#test
group by
letter
+--------+----------+
| letter | TheCount |
+--------+----------+
| a | 1 |
| b | 2 |
| c | 3 |
+--------+----------+
Now, what if we wanted to return this same count, but we wanted to return all rows as well? This is where window functions come in. The window function works similar to GROUP BY in this case by telling SQL Server the set of rows to apply the aggregate to. Then, it's value is returned for for every row in this window / partition. Thus, it returns a column which is applied to every row making it just like any column or calculated column which is returned form the select list.
select
letter
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
+--------+---------------------+
| letter | TheCountOfTheLetter |
+--------+---------------------+
| a | 1 |
| b | 2 |
| b | 2 |
| c | 3 |
| c | 3 |
| c | 3 |
+--------+---------------------+
Now we get to your case where you want to use an aggregate and an aggregate in a window function. Remember that the return of the window function is treated like any other column, thus must be applied in the GROUP BY. Pseudo would look something like this, but window functions aren't allowed in the GROUP BY clause.
select
letter
,TheCount = count(*)
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
group by
letter
,count(*) over (partition by letter)
--returns an error
Thus, we must a correlated sub-query or a cte or some other method.
select
t.letter
,TheCount = count(*)
,TheCountOfTheLetter = (select distinct count(*) over (partition by letter) from #test t2 where t2.letter = t.letter)
from
#test t
group by
t.letter
+--------+----------+---------------------+
| letter | TheCount | TheCountOfTheLetter |
+--------+----------+---------------------+
| a | 1 | 1 |
| b | 2 | 2 |
| c | 3 | 3 |
+--------+----------+---------------------+

Related

How to find the last 100k rows from 10000K table in oracle?

when i am using this query it is taking more than 5 mins please give me some other suggestion
SELECT * FROM
( SELECT id,name,rownum AS RN$$_RowNumber FROM MILLION_1) INNER_TABLE where
RN$$_RowNumber > (V_total_count - V_no_of_rows)
ORDER BY RN$$_RowNumber DESC;
Try the offset clause.
I have a table with about 16M records in it, if i just want the last 100,000 rows, I ORDER them via the ORDER BY clause, and then I use the OFFSET clause, which basically says, read this many rows first, before you return any data.
select *
from SHERI; -- 15,691,544 Rows
select *
from SHERI
order by COLUMN4 asc
offset 15591444 rows; -- my math was bad, should have offset 15591544 rows to get just the last 100,000
The FETCH FIRST and OFFSET clauses are new for 12c (docs)
If we look at the plan under this query, we can see how the database makes it work:
PLAN_TABLE_OUTPUT
SQL_ID 7wd4ra8pfu1vb, child number 0
-------------------------------------
select * from SHERI order by COLUMN4 asc offset 15591444 rows
Plan hash value: 3535161482
----------------------------------------------
| Id | Operation | Name | E-Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | VIEW | | 15M|
| 2 | WINDOW SORT | | 15M|
| 3 | TABLE ACCESS FULL| SHERI | 15M|
----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber">15591444)
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
'window sort' basically translates to, an analytic function
There are some very thorough answers at this similar question, but I'll try to make them specific to your case.
First, when you say "last 100k rows", what do you mean? It looks like you just want to pull the last 100k rows from an unsorted query, but that doesn't make a lot of sense. If you want the 100k most recent rows, Oracle doesn't guarantee that they'll be at the end of your unsorted query. So you want to order by something which will have the most recent ones at the end.
Also, part of the reason your query is slow is that you're sorting/filtering on the rownum pseudo-column, which can't be indexed. Sorting on a column that has an index would drastically speed this up. So I'd guess you want to order by the id column, which is probably a unique/primary key.
So this is the old (11g and earlier) way to do this.
select id, name
from (select id, name
from MILLION_1
order by id desc)
where rownum < 100000;
If you're on 12c or later, there's a newer way to do it.
select id, name
from MILLION_1
order by id desc
fetch first 100000 rows only;

SQLite delete rows based on multiple columns

im pretty new to SQLite hence asking this question!
I need to remove rows in a table so that I have the earliest occurence of column each unique value in column X(colour) based on column Y (time).
Basically i have this:
test | colour | time(s)
one | Yellow | 8
one | Red | 6
one | Yellow | 10
two | Red | 4
Which i want to remove rows so that is looks like:
test | colour | time(s)
one | Yellow | 8
two | Red | 4
Thanks in advance!
EDIT: To be clearer i need to retain the Earliest occurence in time that each colour occurred, regardless of the test.
EDIT: I can select the rows i want to keep by doing this:
select * from ( select * from COL_TABLE order by time desc) x group by colour;
which produces the desired result, but i want to remove what is not there in the result of the select.
EDIT: The following worked thanks to #JimmyB:
DELETE FROM COL_TABLE WHERE EXISTS ( SELECT * FROM COL_TABLE t2 WHERE COL_TABLE .colour = t2.colour AND COL_TABLE .test = t2.test AND COL_TABLE .time < t2.time )
You can include subqueries (EXISTS/NOT EXISTS) in the WHERE clause of a DELETE statement.
Like subqueries in SELECTs, these can refer to the table in the outer statement to create matches.
In your case, try this:
DELETE FROM my_table
WHERE EXISTS (
SELECT *
FROM my_table t2
WHERE my_table.colour = t2.colour
AND my_table.test = t2.test
AND my_table.time < t2.time
)
This statement uses three noteworthy constructs:
Subquery in DELETE
Self-join
Emulation of a MIN(...), via self-join
The subquery with EXISTS is mentioned above.
The self-join is required whenever one row of a table must be compared against other rows of the same table. Finding the minimum value of some column is exactly that.
Normally, you'd use the MIN(...) function to find the minimum. The minimum can be defined as the single value for which no lower value exists, and that's what we're using here because we're not actually interested in the actual value but only want to identify the record which contains that value.
(Since we're deleting, our SELECT yields all the non-minimum rows, which we want to delete to keep only the minimums.)
So, what the statement says is:
Delete all records from my_table for which there is at least one record in my_table with the same colour and the same test but a lower time.

Efficiently counting strength of relationship between rows in Postgres

I have a table that looks similar to this:
session_id | sku
------------|-----
a | 1
a | 2
a | 3
a | 4
b | 2
b | 3
c | 3
I want to pivot this into a table similar to this:
sku1 | sku2 | score
------|------|------
1 | 2 | 1
1 | 3 | 1
1 | 4 | 1
2 | 3 | 2
2 | 4 | 1
3 | 4 | 1
The idea is to store a denormalised table that allows one to look up for a given sku, what other skus are related to sessions it has been related to, and how many times both skus are related to the same session.
What algorithms, patterns or strategies could you suggest for implementing this in PostgreSQL or other technologies?
I realise that this kind of lookup can be done on the original table using counts, or using a facetting search engine. However, I want to make the reads more performant, and just want to keep the overall statistics. The idea is that I will be performing this pivot regularly on the newest few thousand rows in the first table, then storing the result in the second. I'm only concerned with approximate statistics for the second table.
I've got some SQL that works, but VERY slowly. Also looking into the potential for using a graph database of some sort, but wanted to avoid adding another technology for a small part of the app.
Update: The SQL below seems performant enough. I can convert 1.2 million rows in the first table (tags) into 250k rows in the second table (product_relations) with around 2-3k variations of sku in about 5 minutes on my iMac. I will realistically be denormalising only up to 10k rows per day. Question is whether this is actually the best approach. Seems a little dirty to me.
BEGIN;
CREATE
TEMPORARY TABLE working_tags(tag_id int, session_id varchar, sku varchar) ON COMMIT DROP;
INSERT INTO working_tags
SELECT id,
session_id,
sku
FROM tags
WHERE time < now() - interval '12 hours'
AND processed_product_relation IS NULL
AND sku IS NOT NULL LIMIT 200000;
CREATE
TEMPORARY TABLE working_relations (sku1 varchar, sku2 varchar, score int) ON COMMIT DROP;
INSERT INTO working_relations
SELECT a.sku AS sku1,
b.sku AS sku2,
count(DISTINCT a.session_id) AS score
FROM working_tags AS a
INNER JOIN working_tags AS b ON a.session_id = b.session_id
AND a.sku < b.sku
WHERE a.sku IS NOT NULL
AND b.sku IS NOT NULL
GROUP BY a.sku,
b.sku;
UPDATE product_relations
SET score = working_relations.score+product_relations.score
FROM working_relations
WHERE working_relations.sku1 = product_relations.sku1
AND working_relations.sku2 = product_relations.sku2;
INSERT INTO product_relations (sku1, sku2, score)
SELECT working_relations.sku1,
working_relations.sku2,
working_relations.score
FROM working_relations
LEFT OUTER JOIN product_relations ON (working_relations.sku1 = product_relations.sku1
AND working_relations.sku2 = product_relations.sku2)
WHERE product_relations.sku1 IS NULL;
UPDATE tags
SET processed_product_relation = TRUE
WHERE id IN
(SELECT tag_id
FROM working_tags);
COMMIT;
If I've interpreted your intention correctly (per comments) this should do it:
SELECT
s1.sku AS sku1,
s2.sku AS sku2,
count(session_id)
FROM session s1
INNER JOIN session s2 USING (session_id)
WHERE s1.sku < s2.sku
GROUP BY s1.sku, s2.sku
ORDER BY 1,2;
See: http://sqlfiddle.com/#!15/2e0b2/1
In other words: Self-join session, then find all pairings of SKUs for each session ID, excluding ones where the left is greater than or equal to the right in order to avoid repeating pairings - if we have (1,2,count) we don't want (2,1,count) as well. Then group by the SKU pairings and count how many rows are found for each pairing.
You may want to count(distinct session_id) instead, if your SKU pairings can repeat and you want to exclude duplicates. There will probably be more efficient ways to do that, but that's the simplest.
An index on at least session_id will be very useful. You may also want to mess with planner cost parameters to make sure it chooses a good plan - in particular, make sure effective_cache_size is accurate and random_page_cost vs seq_page_cost reflects your caching and I/O costs. Finally, throw as much work_mem at it as you can afford.
If you're creating a materialized view, just CREATE UNLOGGED TABLE whatever AS SELECT .... . That way you minimise the numer of writes/rewrites/overwrites.

Join two rows together if they share the same value?

I've shifted through views and other points and I've gotten to here. Take example below
Name | Quantity | Billed |
| | |
PC Tablet| 0 | 100 |
PC Tablet| 100 | -2345 |
Monitor | 9873 | 0 |
Keyboard | 200 | -300 |
So basically the select I would do off this view. I would want it to Bring in the data BUT it be ordered by the Name first so its in nice alphabetical order and also for a few reasons some of the records appear more then once (I think the most is 4 times). If you add the up the rows with duplicates the true 'quantity' and 'billed' would be correct.
NOTE: The actual query is very long but I broke it down for a simple example to explain the problem. The idea is the same but there is A LOT MORE columns that needs to be added together... So I'm looking for a query that would bring them together if it contains the same name. I've tried a bunch of different queries with no success either it rolls ALL the rows into one. or it won't work and I get a bunch of null errors/ name column is invalid in the select list/group by because it's not an aggregate function..
Is this even possible?
Try:
SELECT A.Name, A.TotalQty, B.TotalBilled
FROM (
SELECT Name, SUM(Quantity) as TotalQty
FROM YourTableHere
GROUP BY Name
) A
INNER JOIN
(
SELECT Name, SUM(Billed) as TotalBilled
FROM YourTableHere
GROUP BY Name
) B
ON A.Name = B.Name

Detecting Correlated Columns in Data

Suppose I have the following data:
OrderNumber | CustomerName | CustomerAddress | CustomerCode
1 | Chris | 1234 Test Drive | 123
2 | Chris | 1234 Test Drive | 123
How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perfectly? I'm thinking that Sql Server data mining is probably the right tool for the job, but I don't have too much experience with that.
Thanks in advance.
UPDATE:
By "correlate", I mean in the statistics sense, that whenever column a is x, column b will be y. In the above data, The last three columns correlate with each other, and the first column does not.
The input of the operation would be the name of the table, and the output would be something like :
Column 1 | Column 2 | Certainty
CustomerName | CustomerAddress | 100%
CustomerAddress | CustomerCode | 100%
There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:
http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx
I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.
What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:
select distinct
case when a.OrderNumber < b.OrderNumber then a.OrderNumber
else b.OrderNumber
end as FirstOrderNumber,
case when a.OrderNumber < b.OrderNumber then b.OrderNumber
else a.OrderNumber
end as SecondOrderNumber
from
MyTable a
inner join MyTable b on
a.CustomerName = b.CustomerName
and a.CustomerAddress = b.CustomerAddress
and a.CustomerCode = b.CustomerCode
This would return you:
FirstOrderNumber | SecondOrderNumber
1 | 2
Correlation is defined on metric spaces, and your values are not metric.
This will give you percent of customers that don't have customerAddress uniquely defined by customerName:
SELECT AVG(perfect)
FROM (
SELECT
customerName,
CASE
WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress)
THEN 0
ELSE 1
END AS perfect
FROM orders
GROUP BY
customerName
) q
Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.

Resources