Subquery for calculated field giving invalid argument to function error - database

I have a table with a list of stores and attributes that dictate the age of the store in weeks and the order volume of the store. The second table lists the UPLH goals based on age and volume. I want to return the stores listed in the first table along with its associated UPLH goal. The following works correctly:
SELECT store, weeksOpen, totalItems,
(
SELECT max(UPLH)
FROM uplhGoals as b
WHERE b.weeks <= a.weeksOpen AND 17000 between b.vMIn and b.vmax
) as UPLHGoal
FROM weekSpecificUPLH as
a
But this query, which is replacing the hard coded value of totalItems with the field from the first table, gives me the "Invalid argument to function" error.
SELECT store, weeksOpen, totalItems,
(
SELECT max(UPLH)
FROM uplhGoals as b
WHERE b.weeks <= a.weeksOpen AND a.totalItems between b.vMIn and b.vmax
) as UPLHGoal
FROM weekSpecificUPLH as a
Any ideas why this doesnt work? Are there any other options? I can easily use a dmax() and cycle through every record to create a new table but that seems the long way around something that a query should be able to produce.
SQLFiddle: http://sqlfiddle.com/#!9/e123a8/1
It appears that SQLFiddle output (below) was what i was looking for even though Access gives the error.
| store | weeksOpen | totalItems | UPLHGoal |
|-------|-----------|------------|----------|
| 1 | 15 | 13000 | 30 |
| 2 | 37 | 4000 | 20 |
| 3 | 60 | 10000 | 30 |
EDIT:
weekSpecificUPLH is a query not a table. If I create a new test table in Access, with identical fields, it works. This would indicate to me that it has something to do with the [totalItems] field which is actually a calculated result. So instead i replace that field with [a.IPO * a.OPW]. Same error. Its as if its not treating it as the correct type of number.
Ive tried:
SELECT store, weeksOpen, (opw * ipo) as totalItems,
(
SELECT max(UPLH)
FROM uplhGoals as b
WHERE 17000 between b.vMIn and b.vmax AND b.weeks <= a.weeksOpen
) as UPLHGoal
FROM weekSpecificUPLH as
a
which works. but replace the '17000' with 'totalitems' and same error. I even tried using val(totalItems) to no avail.

Try to turn it into
b.vmin < a.totalItems AND b.vmax > a.totalItems
Although there're questions to your DB design.
For future approaches, it would be very helpful if you reveal your DB structure.
For example, it seems you don't have the records in weekSpecificUPLH table related to the records in UPLHGoals table, do you?
Or, more general: these table are not related in any way except for rules described by data itself in Goals table (which is "external" to DB model).
Thus, when you call it "associated" you got yourself & others into confusion, I presume, because everyone immediately start considering the classical Relation in terms of Relational Model.

Something was changing the type of value of totalItems. To solve I:
Copied the weekSpecificUPLH query results to a new table 'tempUPLH'
Used that table in place of the query which correctly pulled the UPLHGoal from the 'uplhGoals' table

Related

I wonder how to design a good database that sorts data when inserting values in the middle or top

I have a lot of data and I need to sort the data by reflecting it when the value is added in the middle or top.
For example, a table with increasing data (group_order) is shown below.
code| group_id | group_order | depth
----+------------+-------------+-------
c1 | Group1 | 1 | 1
<- c6 | Group1 | 2 | 1
c2 | Group1 | 2 | 1
c3 | Group1 | 3 | 1
c4 | Group1 | 4 | 1
c5 | Group1 | 5 | 1
As above table, I put data with group_order of 2 in the second row, and tried to increase the group_order of the data below (c2, c3, c4, c5) by 1.
Of course, it runs well, but as I said before, it took a lot of time to update because I have a lot of data.
When I insert the data into the desired location, the values should be sorted in that order.
Please help me.
The database I use is postgresql.
Thank you.
demo:db<>fiddle
I personally think, it is never a good idea to manipulate the internal data for such reasons as sort order or similar. Final order is something for the view and not for the model. So in my opinion you should think about calculating the order when you need it, not generally. Another problem could be the inserting performance, which I am not quite sure about, but should be investigated: As you will update all your data, the records will be blocked by the transaction and in that time you will not be able to insert another record. If no blocking transaction, you should think about race conditions (what happens if two records will be inserted simultaneously?).
I would introduce a second sort column like an insert number which stores the number at which the record was inserted. Could be simply a serial or auto-increment sequence.
And then you could simply order by group_order ASC followed by insert_nr DESC. Benefit: You don't need to manipulate your original data.
If you still need a correct order number, you could create a (materialized) view and add the order number with incrementing numbers using row_number() window function.
Introducing an (auto-)incrementing column insert_nr:
CREATE TABLE mytable (
code text,
group_order int,
insert_nr serial
);
Using it within a view:
CREATE VIEW v_myview AS
SELECT
code, group_order,
row_number() OVER (ORDER BY group_order, insert_nr DESC)
FROM (
SELECT * FROM mytable
) s;

How to log or notify when a column is truncated using a LEFT()

As part of our OLAP modeling workflow, we are often truncating fields as upstream data sources have no restrictions or defined data types. A field which should be a 10 character string can sometimes be 50 or 100 characters long if it is a free form user input. I've been told this can cause problems with downstream processes which involve uploads to external sources.
I've been asked to find a way to identify instances in which one ore more of these fields is truncated.
How we handle these fields now is something like this:
SELECT
LEFT(FreeResponseField, 10) AS Comment
INTO
dbo.ModeledTable
FROM
dbo.SourceTable
Essentially if the field is greater than 10 characters, who cares, we only take the first 10.
If dbo.SourceTable.FreeResponseField has a length greater than 10, now we want to know somehow (be it a warning/error message or insertion into a log table). We have a lot of tables with a lot of fields, so the above example is a simplification. Identifying just the field in which this occurs and/or the tuple in the table would be helpful to see where these issues are occurring.
Is something like this possible? You can't just compare data types of the source table with the target table as the source table sets everything to essentially VARCHAR(MAX). The naive approach is to check the length every single value of every tuple against the defined length of the target table.
The original specifications weren't descriptive, but I've figured out a solution and thought I'd share in case anyone stumbles across this for some reason.
Imagine we have a SourceTable which are pulling in to our model. We have defined zip codes as being of length 5 and addresses of being length 25. Say we have the following two records:
CustomerID | ZipCode | Address
1 | 90210 | 123 Fake Street
2 | 902106 | 546 Fake Street
Based on our model definitions, there is an error with ZipCode for the record where CustomerID equals 2. We would like to identify both ZipCode as being the problem field and the record where CustomerID equals 2. The following query with a CROSS APPLY does that:
WITH CTE AS (
SELECT
CustomerID,
ZipCodeFlag = IIF(LEN(ZipCode) > 5, 1, 0),
AddressFlag = IIF(Len(Address) > 25, 1, 0),
ZipCode,
Address
FROM
SourceTable
)
SELECT
CustomerID,
TruncatedField,
RawValue
FROM
CTE
CROSS APPLY (
VALUES ('ZipCode', ZipCodeFlag, ZipCode),
('Address', AddressFlag, Address)
) CA(TruncatedField, TruncatedFlag, RawValue)
WHERE
TruncatedFlag = 1
ORDER BY
CustomerID
With the following output:
CustomerID | TruncatedField | RawValue
2 | ZipCode | 902106

How to label result tables in multiple SELECT output

I wrote a simple dummy procedure to check the data that saved in the database. When I run my procedure it output the data as below.
I want to label the tables. Then even a QA person can identify the data which gives as the result. How can I do it?
**Update : ** This procedure is running manually through Management Studios. Nothing to do with my application. Because all I want to check is whether the data has inserted/updated properly.
For better clarity, I want to show the table names above the table as a label.
Add another column to the table, and name it so it will be distinguished by who reads them :)
Select 'Employee' as TABLE_NAME, * from Employee
Output will look like this:
| TABLE_NAME | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
Or you can call the column 'Employee'
SELECT 'Employee' AS 'Employee', * FROM employee
The output will look like this:
| Employee | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
Add an extra column, whiches name (not value!) is the label.
SELECT 'Employee' AS "Employee", e.* FROM employee e
The output will look like this:
| Employee | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
By doing so, you will see the label, even if the result does not contain rows.
I like to stick a whole nother result set that looks like a label or title between the result sets with real data.
SELECT 0 AS [Our Employees:]
WHERE 1 = 0
-- Your first "Employees" query goes here
SELECT 0 AS [Our Departments:]
WHERE 1 = 0
-- Now your second real "Departments" query goes here
-- ...and so on...
Ends up looking like this:
It's a bit looser-formatted with more whitespace than I like, but is the best I've come up with so far.
Unfortunately there is no way of labeling any SELECT query output in SQL Server or SSMS. The very similar thing was once needed in my experience a few years ago. We settled for using a work around:
Adding another table which contains the list of table aliases.
Here is what we did:
We appended the list of tables with another table in the beginning of the data set. So the first Table will look as follows:
Name
Employee
Department
Courses
Class
Attendance
In c# while reading the tables, you can iterate through the first table first and assign TableName to all tables in the DataSet further.
This is best done using Reporting Services and creating a simple report. You can then email this report daily if you wish.

Possible to query a database into excel on a cell by cell basis? Or another solution..?

I have various large views/stored procedures that basically churns out a lot of data into an excel spread sheet. There was a problem where not all of the
company amounts weren't flowing through. I narrowed it down to a piece of code in a stored procedure: (Note this is cut down for simplicity)
LEFT OUTER JOIN view_creditrating internal_creditrating
ON creditparty.creditparty =
internalrating.company
LEFT OUTER JOIN (SELECT company, contract, SUM(amount) amount
FROM COMMON_OBJ.amount
WHERE status = 'Active'
GROUP BY company, contract) col
ON vd.contract = col.contract
Table with issue:
company | contract | amount |
| | |
TVC | NULL | 1006 |
KS | 10070 | -2345 |
NYC-G | 10060 | 334000 |
NYC-G | 100216 | 4000 |
UECR | NULL | 0 |
SP | 10090 | 84356 |
Basically some of the contracts are NULL. So when there is a LEFT OUTER JOIN on contract the null values in contract drop out and don't flow through...So i decided to do it based on company.
This also causes problems because company appears within the table more than once in order to show different contracts. With this change the query becomes ambiguous because it won't know if I want
contract 10060's amount or the contract 100216's amount and more often than not it gives me the incorrect amount. I thought about leaving the final ON clause with company = company.
This causes the least issues.... Then Somehow directly querying for for each cell value that would be inconsistent because it only affects a few cells. Although I've searched and I don't think that this is possible.
Is this possible?? OR is there another way to fix this on the database end?
As you've worked out, the problem is in the ON clause, and its use of NULL.
One way to alter the NULL to be a value you can match against is to use COALESCE, which would alter the clause to:
ON coalesce(vd.contract,'No Contract') = coalesce(col.contract,'No Contract')
This will turn all NULL's into 'No Contract', which will change the NULL=NULL test (which would return NULL) to 'No Contract'='No Contract', which will return True

SQL Server : separating data from one column into two based on other column data

I have SQL Server 2012 and want to create a view that does the following:
My table has:
MovType | Qty
In | 200
Out | 10
Now I want to create a view that basically goes through the whole table and depending on if the MovType is In or Out assigns to a special dynamic column.
So basically I need:
InQty | OutQty
200 | 0
0 | 10
I know this can be done with CASE, but not sure of the code. Secondly speed is an important factor, so is there any particular way to do this with least overhead possible?
Thanks to all in advance!
Yes, you can use CASE, it should be efficient. The case statement does not need to read from disk like a JOIN, it's a very simply command that is unlikely to cause performance issues.
SELECT CASE WHEN MovType='In' THEN Qty Else 0 End AS InQty,
CASE WHEN MovType='Out' THEN Qty Else 0 End AS OutQty
FROM dbo.Table1
http://sqlfiddle.com/#!6/d507f/3/0

Resources