Task from exam for subject Database Systems:
I have following schema:
Excavator(EID, Type) - EID is a key
Company(Name, HQLocation) - Name is a key
Work(Name, EID, Site, Date) - All collumns together form a key
I have to write this query in relational algebra:
"Which company was digging on exactly one site on 1st of May?"
I don't know how to express it without aggregate functions (count). I know that people add these functions to relational algebra but we were forbidden to do it during this exam.
You can use standart set operations, division, projection, selection, join, cartesian product.
I forget the proper relational algebra syntax now but you can do
(Worked on >= 1 site on 1st May)
minus (Worked on > 1 site on 1st May)
--------------------------------------
equals (Worked on 1 site on 1st May)
A SQL solution using only the operators mentioned in the comments (and assuming rename) is below.
SELECT Name
FROM Work
WHERE Date = '1st May' /*Worked on at least one site on 1st May */
EXCEPT
SELECT W1.Name /*Worked more than one site on 1st May */
FROM Work W1
CROSS JOIN Work W2
WHERE W1.Name = W2.Name
AND W1.Date = '1st May'
AND W2.Date = '1st May'
AND W2.Site <> W2.Site
I assume this will be relatively straight forward to translate
Related
Control table:
ControlID, Date1, Date2, Date3
Sale table:
ID, ControlID, SaleDate
I want to get the sales from Date1 to which ever date is earlier amongst Date2 and Date3.
SELECT *
FROM SALE S
JOIN CONTROL C ON S.CONTROLID=C.ID
WHERE S.SALEDATE>=C.DATE1 AND S.SALEDATE<EARLIER(DATE2, DATE3)
What is the correct way to write the EARLIER(DATE2, DATE3) logic? For example - implement this as a new scalar function?
Or maybe:
AND S.SALEDATE<C.DATE2 AND S.SALEDATE<C.DATE3
LEAST may be available to you (SQL Server 2022)
SELECT *
FROM SALE S
JOIN CONTROL C ON S.CONTROLID=C.ID
WHERE S.SALEDATE>=C.DATE1 AND S.SALEDATE<LEAST(DATE2, DATE3)
otherwise try
WHERE S.SALEDATE>=C.DATE1 AND S.SALEDATE< DATE2 AND S.SALEDATE< DATE3
In order to be able to SARG saledate, I'd use a case:
WHERE S.SALEDATE>=C.DATE1
AND S.SALEDATE< CASE WHEN DATE2<DATE3 THEN DATE2 ELSE DATE3 END
An alternative which is probably less performant but arguably easier to read (and capable of accepting more values) is to make a "mini" unpivoted subquery:
WHERE S.SALEDATE>=C.DATE1
AND S.SALEDATE< (select min(datex) from (values (date2),(date3)) as t1(datex))
Post comment addendum:
If you have one less_than condition and one greater_than condition, an index can be used once to satisfy both. For ease's sake, let's say your dates are integers (they actually are anyway). Let's say date1=5, date2=20, date3=17.
If you use my case solution (or you are lucky enough to be able to use earlier), then the engine will:
Calculate once that starting point is date1=5 and ending point= case/earlier(20,17)=17
If there is an index on sale.saledate, it will index seek to 5 and 17. This could be very fast if [sale] is a large table.
Now that it quickly found the starting and ending point, it returns all possible rows on the output/next operator
If you use AND S.SALEDATE<C.DATE2 AND S.SALEDATE<C.DATE3, what probably will happen is that it will start fastly on 5 like before, but then create an expression aliased somewhat like expr01 which includes both of these conditions. It will then evaluate this expr01 beginning at 5 and stopping not at 17, but at the end of the table.
This does have some speculation on my part, that's why it would be helpful for you to run both and then pastetheplan.com.
Note: It is highly probable that either 1) such an index does not exist, or 2) the optimizer wouldn't use it, or 3) your query is fast anyway, which makes all this analysis partly a waste of time, much like XKCD suggests:
However, even in these cases, understanding the points and creating a good programming habit of SARGable queries is just good business.
While using Power BI for a few months now, we (the user group) encountered an issue that is not really clear to us.
We use Power-BI with a remote SQL-Server data source, we access the data source through direct query.
Let's pretend we have 2 Tables as below-
Table name: Issue
Column:
ResolutionTime(Date/Time)
IssueID(Unique Numbers)
Table Name: WorkItem
Column:
start (Date/Time)
end (Date/Time)
IssueID (Unique Numbers, Foreign Key to "Issue" table)
Table WorkItem also contain a calculated column "WorkTime" which uses this DAX-expression as below-
WorkTime = WorkItem[end] - WorkItem[start]
The two tables are configured through Power-Bi having a two-way 1:n relationship that can be queried to collect all "WorkItem"(s) assigned to an "Issue" entry, using the "IssueID" as correlation column.
To be able to compute the aggregated "work-time" for each "WorkItem", we use a new/calculated table with the following DAX expression to aggregate the total amount of time invested for a single "Issue":
SumWork =
SUMMARIZE(
WorkItem, WorkItem[IssueID], "All work per item", SUM(WorkItem[WorkTime])
)
The above table computes the total invested work-time for a particular issue, grouping/summarizing results based on the "IssueID" foreign key. This new calculated table is also configured to have a relationship with the "Issue" table, this time a "1:1" relationship, using the IssueID as correlation column.
Now to compute the time that the issue was worked on + the time for Resolution should be summarized in a calculated column inside "Issue", but this does not work:
ResolutionAndWorkTime = Issue[ResolutionTime] + SumWork["All work per item"]
But the above DAX expression fails to compile, as it always reports that it returns "more than one result", thus not being a singular result. But that is suprising, as the two table ("Issue" and "SumWork" are related to each other with a "1:1" relationship).
Tables:
Issues
IssueID ResolutionTime ResolutionAndWorkTime
1 03:20:20 ???
2 01:20:20 ???
3 00:20:20 ???
WorkItem
IssueID start end WorkTime
1 1-2-2020 3:20:20 1-2-2020 3:25:20 00:05:00
1 2-2-2020 6:20:20 2-2-2020 7:20:20 01:00:00
3 1-3-2020 3:20:20 1-3-2020 3:29:20 00:09:00
Any ideas what to look for? Data-types? Table-definition? Table-relationships? We checked other Stackoverflow questions/answers, but no good ideas retrieved so far.
NOTE that a lot of join/merge features of Power BI are not available if direct-query is used and thus joining the tables is not really an option (we think).
You need this following code for your new Calculated column.
Visit HERE To know more about RELATED.
ResolutionAndWorkTime = Issues[ResolutionTime] + RELATED(SumWork[All work per item])
Based on input provided by "mkRabbani" (see other answer) we investigated why "RELATED" does not function as expected. The problem originates in the access to the database. As suspected earlier the function delivers the expected results once the database access is switched to "import" instead of "direct-query".
As a workaround we now joins the data inside the SQL server by using traditional database views. Of course this only works for scenarios where the database is under control of the data analytics team.
I need advice on the following topic:
I am developing a DW/BI solution in SQL Server and reports are published in Power BI.
Main part of my question starts here: I have a large table which collects measurement data on product measurement for multiple attributes. Product can be of multiple type, that can be recognised by item number in this table, measurements can be done multiple times and can be identified by measurement date. Usually, we refer latest dates. If it makes things complicated, I can filter data for latest dates only. This is dense row table (multi million). Number of attribute counts about 200.
I want to include specifications for these attributes most likely in a dimension table, and there may be tens of such specifications. Intention is that user shall select in the report any one specification name and he would like to see each product with attributes passing/failing as well as the products passing if all all of specification attributes are passed.
I currently have this measurement table and a dim table with test names, I can add a table for specification if needed. Specification can define few or all test names with lower/upper spec limits:
Sample measurement table:
Sample dim table for test names:
I can add a table for specification as below and user will select any of one:
e.g. Use select ID_spec = 1 then measurement table may look like:
Some spec may contain all and some few attributes.
Please suggest strategy to design a spec table to be efficient for such large size tables. Please let me know if any further details needed.
Later, I will have to further work to calculate % of pass product if they have been tested for all needed tests in a specification selected.
For large tables, the best thing to do is choose the right key. That means dumping the "Id" column (nothing more than a row identifier) and replacing it with something that:
Guarantees uniqueness
Facilitates searches
That often means composite keys, which are fine.
It's also means dumping the whole "fact/dimension" mindset and just focusing on the relations. This is also fine.
Based on your description, this is the first draft of a data model for your warehouse. If you are unfamiliar with IDEF1X diagrams, please read this.
I've added a unique constraint to SpecCd so you could specify the value directly instead of having to check both the ProductId and SpecCd to return a result.
ProductTest exists so you can provide integrity for ProductTestCriteria and ensure tests are limited to only those products that can be measured by them. If all products are subject to all tests, this can be removed and Test can relate directly to ProductMeasurement and ProductTestCriteria.
If you want to subject the latest test of "Product A" to "Spec S" your query would look like:
SELECT
Measurement.ProductId
,Measurement.TestCd
,Measurement.TestDt
,Criteria.SpecCd
,Measurement.Value
,CASE
WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM
ProductMeasurement Measurement
INNER JOIN
ProductTestCriteria Criteria
ON Critera.ProductId = Measurement.ProductId
AND Criteria.TestCd = Measurement.TestCd
WHERE
Measurement.ProductId = 'A'
AND Criteria.SpecCd = 'S'
AND Measurement.TestDt =
(
SELECT
MAX(TestDt)
FROM
ProductMeasurement
WHERE
ProductId = Measurement.ProductId
)
You could remove the filters for ProductId and SpecCd and roll that into a view - users could later specify for the Products and specifications they want later.
If you want result as of a given date, the query is easily modified to this or incorporated into a TVF:
SELECT
Measurement.ProductId
,Measurement.TestCd
,Measurement.TestDt
,Criteria.SpecCd
,Measurement.Value
,CASE
WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM
ProductMeasurement Measurement
INNER JOIN
ProductTestCriteria Criteria
ON Critera.ProductId = Measurement.ProductId
AND Criteria.TestCd = Measurement.TestCd
WHERE
Measurement.ProductId = 'A'
AND Criteria.SpecCd = 'S'
AND Measurement.TestDt =
(
SELECT
MAX(TestDt)
FROM
ProductMeasurement
WHERE
ProductId = Measurement.ProductId
AND TestDt <= <Your Date>
)
I have one small database for exercise, please see below ER-Diagram
I want to write a query that List student last and first names and majors for students who had at least one high grade (>= 3.5) in at least one course offered in fall of 2012.
My code below:
select s.StdNo,s.StdFirstName,s.StdLastName,s.StdMajor,e.EnrGrade,o.OfferNo,o.OffYear
from Enrollment e
join Offering o on e.OfferNo=o.OfferNo
join Student s on s.StdNo=e.StdNo
where e.EnrGrade >=3.5 and o.OffYear="2010";
But I got an SQL Error
[207] [S0001]: Invalid column name '2010'
I am confused about the error, value "2010" is NOT a column name, the Offyear is column. So why did this happen?
The basic query is not that hard, but I am stuck on (multiple)nested query.
Offyear is shown as a number, so you should compare against the number 2010, not the text "2010":
[...] and Offyear = 2010
I have a database containing names of certain blacklisted companies and individuals.
All transactions created, its detail needs to be scanned against these blacklisted names. The created transactions may have names not correctly spelled, for example one can write "Wilson" as "Wilson", "Vilson" or "Veelson". The Fuzzy search logic or utility should match against the name "Wilson" present in the blacklisted database and based on the required correctness / accuracy percentage set by the user, has to show the matching name within the percentage set.
The transactions will be sent in batches or real time to check against black listed names.
I would appreciate, if users who had similar requirement and has implemented them, could also give their views and implementation
T-SQL leaves a lot to be desired in the realm of fuzzy search. Your best options are third party libraries, but if you don't want to mess with that, your best best is using the DIFFERENCE function built in to SQL Server. For example:
SELECT * FROM tblUsers U WHERE DIFFERENCE(U.Name, #nameEntered) >= 3
A higher return value for DIFFERENCE indicates higher accuracy. A drawback of this is that the algorithm favors words that sound alike, which may not be your desired characteristic.
This next example shows how to get the best match out of a table:
DECLARE #users TABLE (Name VARCHAR(255))
INSERT INTO #users VALUES ('Dylan'), ('Bob'), ('Tester'), ('Dude')
SELECT *, MAX(DIFFERENCE(Name, 'Dillon')) AS SCORE FROM #users GROUP BY Name ORDER BY SCORE DESC
It returns:
Name | Score
Dylan 4
Dude 3
Bob 2
Tester 0