Let's say I have an item A,B,C in Table1.
They all have attributes f1. However, A and B has f2 which does not apply to C.
Table1 would be designed as:
itemName f1 f2
------------------------------------
A 100 50
A 43 90
B 66 10
C 23
There would be another table Table2 contains all the possible value of f2:
itemName f2(possible value)
------------------------------------
A 50
A 90
A 77
B 10
Let's say now i want to add an record with the highest value of f2 into Table1,depends on the iteaName. Things working fine for A and B. But in the case of C, when i loop through Table2, since there is no record of C in Table2, I cannot distinguish if it's a corrupted table or the fact that C just does not have attribute f2.
The only twos ways i can think of to solve this issue is:
1. Adding a constraint in the code, like:
if (iteaName == C )
"Do not search Table2"
else (search Table2)
if (No record)
return "Corrupted Table"
Or
2. Adding another bool field "having_f2"in Talbe1 to helping identifying that f2 does not apply to C.
The above is just an example of where to put such business logic constraints, in the DB or in the code.
Can you give me more opinions on the tradeoff between the above two ideology? In another word, which one makes more sence.
Since this is basically a field validation ("if MyModel can have property f2 set to NULL (inexistent)"), I would say, you must do that in a validator of your model.
Only if that is impossible, add some columns to model tables.
The rule I use is the following: database is used to store model data. You should try to store nothing else, except data, if possible. In your case has_f2 is not a data, but a business rule.
Of course, there are exceptions to this rule. For example, sometimes business logic must be controlled by the user and in this case it is perfectly ok to store it in the database.
Regarding your second proposal: you typically can also just query for a ~NULL value in the table, which would be the same as adding and setting a boolean attribute (would be better considering redundancy). This would also be the way to detect if the table is "corrupt". However, you can also start your query by collecting all "itemName" entries from table2, possibly building an intersection with table1 and inserting the cases of interest into table1:
1.) Intersect the "itemName" from table1 and table2 => table3
2.) Join the table3 and table2 on "itemName", "f2" => insert each tuple into table1
Alternatively, you can also split table1 into two tables { "itemName", "f1" } and { "itemName", "f2" } which would eliminate your problem.
Related
I need advice on the following topic:
I am developing a DW/BI solution in SQL Server and reports are published in Power BI.
Main part of my question starts here: I have a large table which collects measurement data on product measurement for multiple attributes. Product can be of multiple type, that can be recognised by item number in this table, measurements can be done multiple times and can be identified by measurement date. Usually, we refer latest dates. If it makes things complicated, I can filter data for latest dates only. This is dense row table (multi million). Number of attribute counts about 200.
I want to include specifications for these attributes most likely in a dimension table, and there may be tens of such specifications. Intention is that user shall select in the report any one specification name and he would like to see each product with attributes passing/failing as well as the products passing if all all of specification attributes are passed.
I currently have this measurement table and a dim table with test names, I can add a table for specification if needed. Specification can define few or all test names with lower/upper spec limits:
Sample measurement table:
Sample dim table for test names:
I can add a table for specification as below and user will select any of one:
e.g. Use select ID_spec = 1 then measurement table may look like:
Some spec may contain all and some few attributes.
Please suggest strategy to design a spec table to be efficient for such large size tables. Please let me know if any further details needed.
Later, I will have to further work to calculate % of pass product if they have been tested for all needed tests in a specification selected.
For large tables, the best thing to do is choose the right key. That means dumping the "Id" column (nothing more than a row identifier) and replacing it with something that:
Guarantees uniqueness
Facilitates searches
That often means composite keys, which are fine.
It's also means dumping the whole "fact/dimension" mindset and just focusing on the relations. This is also fine.
Based on your description, this is the first draft of a data model for your warehouse. If you are unfamiliar with IDEF1X diagrams, please read this.
I've added a unique constraint to SpecCd so you could specify the value directly instead of having to check both the ProductId and SpecCd to return a result.
ProductTest exists so you can provide integrity for ProductTestCriteria and ensure tests are limited to only those products that can be measured by them. If all products are subject to all tests, this can be removed and Test can relate directly to ProductMeasurement and ProductTestCriteria.
If you want to subject the latest test of "Product A" to "Spec S" your query would look like:
SELECT
Measurement.ProductId
,Measurement.TestCd
,Measurement.TestDt
,Criteria.SpecCd
,Measurement.Value
,CASE
WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM
ProductMeasurement Measurement
INNER JOIN
ProductTestCriteria Criteria
ON Critera.ProductId = Measurement.ProductId
AND Criteria.TestCd = Measurement.TestCd
WHERE
Measurement.ProductId = 'A'
AND Criteria.SpecCd = 'S'
AND Measurement.TestDt =
(
SELECT
MAX(TestDt)
FROM
ProductMeasurement
WHERE
ProductId = Measurement.ProductId
)
You could remove the filters for ProductId and SpecCd and roll that into a view - users could later specify for the Products and specifications they want later.
If you want result as of a given date, the query is easily modified to this or incorporated into a TVF:
SELECT
Measurement.ProductId
,Measurement.TestCd
,Measurement.TestDt
,Criteria.SpecCd
,Measurement.Value
,CASE
WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM
ProductMeasurement Measurement
INNER JOIN
ProductTestCriteria Criteria
ON Critera.ProductId = Measurement.ProductId
AND Criteria.TestCd = Measurement.TestCd
WHERE
Measurement.ProductId = 'A'
AND Criteria.SpecCd = 'S'
AND Measurement.TestDt =
(
SELECT
MAX(TestDt)
FROM
ProductMeasurement
WHERE
ProductId = Measurement.ProductId
AND TestDt <= <Your Date>
)
Summery: I need any combination of [Field_1] and [Field_2] to be unique and for that uniqueness to be enforced. Note: This is not for permutations - and that's the difficulty.
In Depth:
I'm trying to track contacts for vendor software. I've set my DB up in the time old fashion such that a Vendor record may have many contacts. The trick is that contacts may be related to each other and may not be related to the parent vendor record. An example:
1. SuperBrokenSoftware is a tool who's vendor I need to contact all the time.
2. WeMakeBadSoftware is the Vendor
3. Fred works for WeMakeBadSoftware
4. Gale works for WeHelpPeopleWhenOthersWont
Let's say Gale is the appropriate contact to fix my issue with the SuperBrokenSoftware.
There is no way using the current hierarchy to track Gales relationship to SuperBrokenSoftware.
My solution is to keep track of these relationships in a table like so:
Field1 Field2 Field3
Fred Gale Gale handles specific issues for Fred
However given this solution Field_1 and Field_2 must be unique in combination. That is to say the records:
Field1 Field2 Field3
Fred Gale "Gale handles specific issues for Fred"
Gale Fred "Gale is awesome - Fred sucks"
Should be viewed as the same. Record 2 should not be allowed in the database because it is not unique.
What I have Tried:
Using the bijective - Szudzik's function: a >= b ? a * a + a + b : a + b * b; where a, b >= 0
I can calculate a unique identifier for every combination - but access cannot enforce uniqueness on a calculated field.
What is the best way to enforce a combination in Access?
Thanks in advance!!!
Create new field for unique identifier with unique index and create Before Change data macro, which should insert/change calculated identifier in new field.
Unique key can be just sorted concatenation of field1 and field2
Let's say I have a Product table in a shopping site's database to keep description, price, etc of store's products. What is the most efficient way to make my client able to re-order these products?
I create an Order column (integer) to use for sorting records but that gives me some headaches regarding performance due to the primitive methods I use to change the order of every record after the one I actually need to change. An example:
Id Order
5 3
8 1
26 2
32 5
120 4
Now what can I do to change the order of the record with ID=26 to 3?
What I did was creating a procedure which checks whether there is a record in the target order (3) and updates the order of the row (ID=26) if not. If there is a record in target order the procedure executes itself sending that row's ID with target order + 1 as parameters.
That causes to update every single record after the one I want to change to make room:
Id Order
5 4
8 1
26 3
32 6
120 5
So what would a smarter person do?
I use SQL Server 2008 R2.
Edit:
I need the order column of an item to be enough for sorting with no secondary keys involved. Order column alone must specify a unique place for its record.
In addition to all, I wonder if I can implement something like of a linked list: A 'Next' column instead of an 'Order' column to keep the next items ID. But I have no idea how to write the query that retrieves the records with correct order. If anyone has an idea about this approach as well, please share.
Update product set order = order+1 where order >= #value changed
Though over time you'll get larger and larger "spaces" in your order but it will still "sort"
This will add 1 to the value being changed and every value after it in one statement, but the above statement is still true. larger and larger "spaces" will form in your order possibly getting to the point of exceeding an INT value.
Alternate solution given desire for no spaces:
Imagine a procedure for: UpdateSortOrder with parameters of #NewOrderVal, #IDToChange,#OriginalOrderVal
Two step process depending if new/old order is moving up or down the sort.
If #NewOrderVal < #OriginalOrderVal --Moving down chain
--Create space for the movement; no point in changing the original
Update product set order = order+1
where order BETWEEN #NewOrderVal and #OriginalOrderVal-1;
end if
If #NewOrderVal > #OriginalOrderVal --Moving up chain
--Create space for the momvement; no point in changing the original
Update product set order = order-1
where order between #OriginalOrderVal+1 and #NewOrderVal
end if
--Finally update the one we moved to correct value
update product set order = #newOrderVal where ID=#IDToChange;
Regarding best practice; most environments I've been in typically want something grouped by category and sorted alphabetically or based on "popularity on sale" thus negating the need to provide a user defined sort.
Use the old trick that BASIC programs (amongst other places) used: jump the numbers in the order column by 10 or some other convenient increment. You can then insert a single row (indeed, up to 9 rows, if you're lucky) between two existing numbers (that are 10 apart). Or you can move row 370 to 565 without having to change any of the rows from 570 upwards.
Here is an alternative approach using a common table expression (CTE).
This approach respects a unique index on the SortOrder column, and will close any gaps in the sort order sequence that may have been left over from earlier DELETE operations.
/* For example, move Product with id = 26 into position 3 */
DECLARE #id int = 26
DECLARE #sortOrder int = 3
;WITH Sorted AS (
SELECT Id,
ROW_NUMBER() OVER (ORDER BY SortOrder) AS RowNumber
FROM Product
WHERE Id <> #id
)
UPDATE p
SET p.SortOrder =
(CASE
WHEN p.Id = #id THEN #sortOrder
WHEN s.RowNumber >= #sortOrder THEN s.RowNumber + 1
ELSE s.RowNumber
END)
FROM Product p
LEFT JOIN Sorted s ON p.Id = s.Id
It is very simple. You need to have "cardinality hole".
Structure: you need to have 2 columns:
pk = 32bit int
order = 64bit bigint (BIGINT, NOT DOUBLE!!!)
Insert/UpdateL
When you insert first new record you must set order = round(max_bigint / 2).
If you insert at the beginning of the table, you must set order = round("order of first record" / 2)
If you insert at the end of the table, you must set order = round("max_bigint - order of last record" / 2)
If you insert in the middle, you must set order = round("order of record before - order of record after" / 2)
This method has a very big cardinality. If you have constraint error or if you think what you have small cardinality you can rebuild order column (normalize).
In maximality situation with normalization (with this structure) you can have "cardinality hole" in 32 bit.
It is very simple and fast!
Remember NO DOUBLE!!! Only INT - order is precision value!
One solution I have used in the past, with some success, is to use a 'weight' instead of 'order'. Weight being the obvious, the heavier an item (ie: the lower the number) sinks to the bottom, the lighter (higher the number) rises to the top.
In the event I have multiple items with the same weight, I assume they are of the same importance and I order them alphabetically.
This means your SQL will look something like this:
ORDER BY 'weight', 'itemName'
hope that helps.
I am currently developing a database with a tree structure that needs to be ordered. I use a link-list kind of method that will be ordered on the client (not the database). Ordering could also be done in the database via a recursive query, but that is not necessary for this project.
I made this document that describes how we are going to implement storage of the sort order, including an example in postgresql. Please feel free to comment!
https://docs.google.com/document/d/14WuVyGk6ffYyrTzuypY38aIXZIs8H-HbA81st-syFFI/edit?usp=sharing
This might be a nightmare.
Let's say I have two rows of data in two different tables, each row containing one character each. A is Row1 and B is Row2 in Table1 and is reversed in Table2. B is Row1 and A is Row2.
I also have a third table that contains three columns. The first two are columns to be joined on, and the third is the resulting value, depending on what was joined in the first two columns.
A,A= 1
A,B=.8
A,C=.2
B,A=.8
B,B= 1
B,C=.6
C,A=.2
C,B=.6
C,C= 1
What I'm trying to do, in essence, is try finding the highest-rated pairs from Table1 and Table2 by using associated values within Table3.
A,A= 1
B,B= 1
Because of the matching A's in Table1+2 and matching B's in Table1+2. Instead, I forgot that by just aimlessly joining tables, I get this instead:
A,A= 1
A,B=.8
B,A=.8
B,B= 1
However, I'm getting ALL possible pairs, and that won't work. And the problem here is that I cannot do a direct JOIN between Table1+2, because a value within Table1 might not match up with Table2, for instance...
Row1 and Row2 in Table1 is A,B and Row1 and Row2 in Table2 is B,C. If I do a direct JOIN, values A and C won't line up with each other, leaving me only with the pairs of B.
I thought of one more problem with this, though! In trying to use a subquery, the subquery would be constantly re-run... meaning that previously selected rows would then be up for grabs again the next time, leading to incorrect values.
For instance, with A,B and B,C... I would expect to get this returned via subqueries:
A,B=.8
B,B= 1
Unless, of course, there's a way from disqualifying a row from being used again.
Any suggestions or ideas? I'm using Access but I'm sure the concepts apply to any database solution.
Consider this query:
SELECT F1,F2 FROM TABLE GROUP BY F1
Selecting F1 is valid, but F2 seems to be incorrect (after all it can change from row to row). However SQL Server does not check any logic involved here -- for example F2 could be dependent of F1 (because of the JOIN clause, for example).
I know the workarounds, but my question here is:
How to RELAX this "group by" restriction (directly)?
Something like:
RELAX_GROUPBY
SELECT F1,F2 ....
begin of edit 1
So it would be something similar to MySQL ability to get data without any workarounds from groupped dataset.
Example of data:
F1 | F2
1 | 2
1 | 2
Output (after executing the query given above):
F1 | F2
1 | 2
end of edit 1
Remark: yes, I do know the workarounds -- aggregate functions, creating view, table on-fly, and others (depending on scenario). I am not interested in another workaround. If you know the solution to the question please answer, thank you very much.
Assuming F2 is the same for every F1 (which is where your query is relevant), the easiest way is to do something like
SELECT F1, MAX(F2) AS F2
FROM TABLE
GROUP BY F1
assuming F2 is a field that can have aggregate functions applied to it, of course.
There's no way to relax the GROUP BY in the way you describe, short of rewriting the whole thing. I know MySQL does something a bit different (you can group by one field and SELECT all the others), but it's inconsistent with other implementations.
if you are so sure that F2 is dependent on F1, just add it to the group by (how difficult is that?):
SELECT F1,F2 FROM TABLE GROUP BY F1, F2
The "do what I mean and not what I code" portion of SQL Server will never be good enough to read your mind, tell it how to group the columns and it will do it. There is no facility within SQL Server to "relax" the group by restrictions, and I'm glad.
if you use group by you should use a Aggregate function then only it will work
e.g:
SELECT F1,count(F2) FROM TABLE GROUP BY F1