Query help, not getting right rows - sql-server

enter image description here> Write a select statement that returns InvoiceNumber, VendorName and InvoiceDate
from the Vendors and Invoices table. Use the following correlations for each table:
Vendors v
Invoices i
Filter the results to return only rows where a balance is due
I should be getting back 11 rows but I am getting 114 rows.
Select InvoiceNumber, VendorName, InvoiceDate
From Vendors as v
join Invoices as i on v.VendorId = i.VendorID
where InvoiceTotal > 0

Query above looks good.
If you got unexpected number of rows in result it means (like #Dale-K mentioned in comment) wrong condition. InvoiceTotal does not look like amount left to pay.
Look for column like InvoiceDue or InvoiceBalance.
Or there is also column with due date and this should be added to condition (to find overdued invoices).
Or there is another table keeping payments which should be joined too.
Without structure of tables and sample data we can only guess.
However this looks like homework, so look for the column with current balance of invoice and use it.

Related

MSSQL, how compare records with dataframe table?

let's say we have a table with invoices and employees ID
and another table for vacation which also contains an employee ID.
In this case only emplyee1 had vacation a these date frames.
Now we have to compare if the invoice date is between or exactly at the date frames of vacation table.
The aim is: The invoice date has to be compared, if an employee is on vacation the calc_flag = 0, if not then always=1. E.g. employee1 was from 2023/01/05 till 2023/01/12 on vacation.
So all his invoices must be calc_flag=0 for this time.
How to create the sql query in mssql for this topic?
Thanks for any advice.
I already tried to check the dateframes but if there are several entries in vacation table, I'm not sure how to handle it.
One way to do this is a LEFT JOIN to check for existance. Perhaps you need to tweak the < or > depening if you would like to include the Begin and or End date.
SELECT i.InvNo,
i.emp_ID,
i.InvDate,
i.calc_flag ,
CASE WHEN v.Emp_ID IS NOT NULL THEN 0 ELSE 1 END AS calc_flage
FROM dbo.invoices AS i
LEFT JOIN dbo.vacation AS v ON v.Emp_ID = i.emp_ID AND v.[Begin] < i.InvDate AND v.[End] > i.InvDate

Add values to a table from a second table that only match a third table allowing for duplicates

I have been tasked to match a payment file from a bank that has invoices/payments listed on a text file I have imported into a table called Bank. I need to match the invoices to the project/projects that are associated with the invoices - call this table Invoices - which contains every invoice and project we have every had. I want to match the invoices (from Bank) to the project or multiple projects (from Invoices) to another table - called Report - so I can reconcile the payment file. I can get the correct results from Bank and Invoices with the following query
SELECT invoice
FROM Invoices INV
INNER JOIN Bank as BANK
ON INV.Invoice = BANK.Invoice_Number
The Bank file has 100 invoices and I get 169 invoices on this query. But when I try and do and update or insert
Update Report
set Invoice_Num =
(SELECT invoice
FROM Invoices INV
INNER JOIN Bank as BANK
ON INV.Invoice = BANK.Invoice_Number)
I get 0 rows updated.
I have tried to copy the Bank table to the Report table with
Insert into Report(Invoice_Num)
select Invoice_Number from Bank
but can't figure out how to account for the projects that have duplicated invoices when they are found. Of course I might be going at this entirely wrong and someone has a better way entirely.
Thanks!
Does your Report table have anything in to start with? If not, your UPDATE statement will update 0 rows, because there are 0 rows there to update. (Also, with your code as it stands, note that it would update every entry there to have the same, indeterminate value; I don't think that's what you intend.)
If you just want to copy the invoice numbers from Bank to Report, but leave out any duplicates, then your final bit of SQL just needs a DISTINCT added to do that:
Insert into Report(Invoice_Num)
select DISTINCT Invoice_Number from Bank
If you're trying to put in only invoice numbers from Bank that also match Invoice, then your first bit of code almost works, but needs to be an INSERT, not an UPDATE:
INSERT INTO report (invoice_number)
SELECT invoice
FROM Invoices INV
INNER JOIN Bank as BANK
ON INV.Invoice = BANK.Invoice_Number
Again, if you're also dealing with potential duplicates invoice numbers you only want in Report once, make that a SELECT DISTINCT to avoid them.

GROUP BY or Aggregation Function error message [duplicate]

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?
Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr
Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..
Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.
"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

CakePHP - How to use calculations from associated data in query

I'm trying to figure out the best way to do something - basically I'm looking for advice before I do it the long/hard way!
I have the following model associations:
Seller hasMany Invoices
Invoice hasOne Supplier
Supplier belongsTo SupplierType
Each invoice is for a certain amount and is from a certain date. I want to be able to retrieve Sellers who have spent within a certain amount in the past 'full' month for which we have data. So, I need to get the date 1 month before the most recent invoice, find the total on all invoices for that Seller since that date, and then retrieve only those where the total lies between, say, £10000 and £27000 (or whatever range the user has set).
Secondly, I want to be able to do the same thing, but with the SupplierType included. So, the user may say that they want Sellers who have spent between £1000 & £5000 from Equipment Suppliers, and between £1000 & £7000 from Meat Suppliers.
My plan here is to do an inital search for the appropriate supplier type id, and then I can filter the invoices based on whether each one is from a supplier of an appropriate type.
I'm mainly not sure whether there is a way to work out the monthly total and then filter on it in one step. Or am I going to have to do it in several steps? I looked at Virtual Fields, but I don't think they do what I need - they seem to be mainly used to combine fields from the same record - is that correct?
(Posted on behalf of the question author).
I'm posting the eventual solution here in case it helps anyone else:
SELECT seller_id FROM
(SELECT i.seller_id, SUM(price_paid) AS totalamount FROM invoices i
JOIN
(SELECT seller_id, MAX(invoice_date) AS maxdate FROM invoices) sm
ON i.seller_id = sm.seller_id
WHERE i.invoice_date > (sm.maxdate - 30) GROUP BY seller_id) t
WHERE t.totalamount BETWEEN 0 AND 1000
This can be done in a single query that will look something like:
select * from (
select seller, sum(amount) as totalamount
from invoices i join
(select seller, max(invoicedate) as maxdate from invoices group by seller) sm
on i.seller=sm.seller
and i.invoicedate>(sm.maxdate-30)
group by seller
) t where t.totalamount between 1000 and 50000

How to fetch an object graph at once?

I'm reading a book, where the author talks about fetching an row + all linked parent rows in one step. Like fetching an order + all it's items all at once. Okay, sounds nice, but really: I've never seen an possibility in SQL to ask for - lets say - one order + 100 items? How would this record set look like? Would I get 101 rows with merged fields of both the order and the item table, where 100 rows have a lot of NULL values for the order fields, while one row has a lot of NULL values for the item fields? Is that the way to go? Or is there something much cooler? I mean... I never heard of fetching arrays onto a field?
A simple JOIN would do the trick:
SELECT o.*
, i.*
FROM orders o
INNER JOIN order_items i
ON o.id = i.order_id
The will return one row for each row in order_items. The returned rows consist of all fields from the orders table, and concatenated to that, all fields from the order_items table (quite literally, the records from the tables are joined, that is, they are combined by record concatenation)
So if orders has (id, order_date, customer_id) and order_items has (order_id, product_id, price) the result of the statement above will consist of records with (id, order_date, customer_id, order_id, product_id, price)
One thing you need to be aware of is that this approach breaks down whenever there are two distinct 'detail' tables for one 'master'. Let me explain.
In the orders/order_items example, orders is the master and order_items is the detail: each row in order_items belongs to, or is dependent on exactly one row in orders. The reverse is not true: one row in the orders table can have zero or more related rows in the order_items table. The join condition
ON o.id = i.order_id
ensures that only related rows are combined and returned (leaving out the condition would retturn all possible combinations of rows from the two tables, assuming the database would allow you to omit the join condition)
Now, suppose you have one master with two details, for example, customers as master and customer_orders as detail1 and customer_phone_numbers. Suppose you want to retrieve a particular customer along with all is orders and all its phone numbers. You might be tempted to write:
SELECT c.*, o.*, p.*
FROM customers c
INNER JOIN customer_orders o
ON c.id = o.customer_id
INNER JOIN customer_phone_numbers p
ON c.id = p.customer_id
This is valid SQL, and it will execute (asuming the tables and column names are in place)
But the problem is, is that it will give you a rubbish result. Assuming you have on customer with two orders (1,2) and two phone numbers (A, B) you get these records:
customer-data | order 1 | phone A
customer-data | order 2 | phone A
customer-data | order 1 | phone B
customer-data | order 2 | phone B
This is rubbish, as it suggests there is some relationship between order 1 and phone numbers A and B and order 2 and phone numbers A and B.
What's worse is that these results can completely explode in numbers of records, much to the detriment of database performance.
So, JOIN is excellent to "flatten" a hierarchy of items of known depth (customer -> orders -> order_items) into one big table which only duplicates the master items for each detail item. But it is awful to extract a true graph of related items. This is a direct consequence of the way SQL is designed - it can only output normalized tables without repeating groups. This is way object relational mappers exist, to allow object definitions that can have multiple dependent collections of subordinate objects to be stored and retrieved from a relational database without losing your sanity as a programmer.
This is normally done through a JOIN clause. This will not result in many NULL values, but many repeated values for the parent row.
Another option, if your database and programming language support it, it to return both result sets in one connection - one select for the parent row another for the related rows.

Resources