SPARQL UNION - Result set incomplete - union

I have two queries:
query 1:
SELECT DISTINCT ?o COUNT(?o)
WHERE
{ ?s1 ?somep1 <predicate_one-uri>. ?s1 ?p ?o}
query 2:
SELECT DISTINCT ?o COUNT(?o)
WHERE
{?s2 ?somep2 <predicate_two-uri>.?s2 ?p ?o.}
Each query gives me a different result set (as expected). I need to make a union of these two sets, from what I understand the query below should give me the set I want:
SELECT DISTINCT ?o COUNT(?o)
WHERE
{
{ ?s1 ?somep1 <predicate_one-uri>.?s1 ?p1 ?o}
UNION
{?s2 ?somep2 <predicate_two-uri>.?s2 ?p2 ?o.}
}
The problem is that some results from query 1 are not in the union set and vice-versa for query 2. The union is not working properly as it does not incorporate all results of query 1 and query 2. Please advise on the proper structure of the sparql query for achieving the desired result set.
Though if I make the following query (simply remove the COUNT function):
SELECT DISTINCT ?o
WHERE
{
{ ?s1 ?somep1 <predicate_one-uri>.?s1 ?p ?o}
UNION {?s2 ?somep2 <predicate_two-uri>.?s2 ?p ?o.}
}
I get the appropriate result set. But I also need to have the frequency of the variable ?o.

I think it will work if you remove the DISTINCT, and add GROUP BY ?o to the end of the query.
DISTINCT is really just for removing duplicates. It's not for grouping and counting.

Not entirely sure here but have a theory which may be entirely wrong
Your query confuses me slightly as it seems to imply some grouping since in theory at least a SPARQL engine should not let you select both a variable and an aggregate on that variable in the same query without an explicit GROUP BY. So results may depend on what SPARQL engine/triplestore you are using?
If an implicit grouping is the case you may not get as many results as you expect as the grouping will group results from both sides of the union together. For example say query 1 gives you 10 results and query 2 gives you 5 results then the maximum number of results you can get from a union is 15 but may be less as the grouping may combine results from the two sides of the union together. To avoid this then you should use completely different variable names on both sides of the query, for example:
SELECT * WHERE { {?s ?p ?o} UNION {?x ?y ?z}}
Which would give you a results table which had a pattern like the following:
?s | ?p | ?o | ?x | ?y | ?z
-----------------------------
a | b | c | | |
| | | a | b | c
Not sure if any of that is relevant/useful to you, if you can provide more details about the environment you are executing the query in i.e. Triplestore, SPARQL engine, API/library etc then I/someone else may be able to provide a better answer

Related

How to find the last 100k rows from 10000K table in oracle?

when i am using this query it is taking more than 5 mins please give me some other suggestion
SELECT * FROM
( SELECT id,name,rownum AS RN$$_RowNumber FROM MILLION_1) INNER_TABLE where
RN$$_RowNumber > (V_total_count - V_no_of_rows)
ORDER BY RN$$_RowNumber DESC;
Try the offset clause.
I have a table with about 16M records in it, if i just want the last 100,000 rows, I ORDER them via the ORDER BY clause, and then I use the OFFSET clause, which basically says, read this many rows first, before you return any data.
select *
from SHERI; -- 15,691,544 Rows
select *
from SHERI
order by COLUMN4 asc
offset 15591444 rows; -- my math was bad, should have offset 15591544 rows to get just the last 100,000
The FETCH FIRST and OFFSET clauses are new for 12c (docs)
If we look at the plan under this query, we can see how the database makes it work:
PLAN_TABLE_OUTPUT
SQL_ID 7wd4ra8pfu1vb, child number 0
-------------------------------------
select * from SHERI order by COLUMN4 asc offset 15591444 rows
Plan hash value: 3535161482
----------------------------------------------
| Id | Operation | Name | E-Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | VIEW | | 15M|
| 2 | WINDOW SORT | | 15M|
| 3 | TABLE ACCESS FULL| SHERI | 15M|
----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber">15591444)
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
'window sort' basically translates to, an analytic function
There are some very thorough answers at this similar question, but I'll try to make them specific to your case.
First, when you say "last 100k rows", what do you mean? It looks like you just want to pull the last 100k rows from an unsorted query, but that doesn't make a lot of sense. If you want the 100k most recent rows, Oracle doesn't guarantee that they'll be at the end of your unsorted query. So you want to order by something which will have the most recent ones at the end.
Also, part of the reason your query is slow is that you're sorting/filtering on the rownum pseudo-column, which can't be indexed. Sorting on a column that has an index would drastically speed this up. So I'd guess you want to order by the id column, which is probably a unique/primary key.
So this is the old (11g and earlier) way to do this.
select id, name
from (select id, name
from MILLION_1
order by id desc)
where rownum < 100000;
If you're on 12c or later, there's a newer way to do it.
select id, name
from MILLION_1
order by id desc
fetch first 100000 rows only;

SQL Server 2008 - False error for "Msg 8120"?

I am writing a query in SQL Server 2008 (Express I believe?). I am currently getting this error:
Msg 8120, Level 16, State 1, Line 16
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I am trying to do a historical analysis of our production WIP (Work In Process).
I have created a standalone calendar table (actually located in a separate database called BAS on the same server to not interfere with the ERP that operates the AIM database). I've been overwhelmed for days with some of the examples for creating running total queries/views/tables, so for now I'll just plan on taking care of that part inside of Crystal Reports 2016. My thinking was that I wanted to return records for each order each day of my calendar table (to be narrowed down in the future to only days that match records in the AIM database). The values I think I will need are:
Record Date (not unique)
Order Number (unique for each day)
Estimated hours for the job
The total number of hours worked on the job current as of today's date (in case the estimated hours were drastically underbudgeted)
The SUM of the direct labor hours charged to the job on said record date
The COUNT of the number of employees in attendance on said record date.
The SUM of the hours attended by employees on said record date.
The tables I use are as follows:
BAS Database:
dbo.DateDimension - Used for complete calendar of dates from 1/1/1987 to 12/31/2036
AIM Database:
dbo.AggAttend - Contains one or more records for each employee's attendance duration on a given date (i.e. One record for each punch-in / punch-out. Should be equal to indirect + direct labor)
dbo.AggTicket - Contains one or more records for each employee's direct labor duration charged to a particular order number
dbo.ModOrders - Contains one record for each order including the estimated hours, start date, and end date (I will worry about using the start and end dates later for figuring out how many available hours there were on each date)
Here is the code I'm using in my query:
;WITH OrderTots AS
(
SELECT
AggTicket.OrderNo,
SUM(AggTicket.TotDirectHrs) AS TotActHrs
FROM
AIM.dbo.AggTicket
GROUP BY
AggTicket.OrderNo
)
SELECT
d.Date,
t.OrderNo,
o.EstHrs,
OrderTots.TotActHrs,
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs,
COUNT(a.EmplCode) AS NumEmployees,
SUM(a.TotHrs) AS DaysAttendHrs
FROM
BAS.dbo.DateDimension d
INNER JOIN
AIM.dbo.AggAttend a ON d.Date = a.TicketDate
LEFT OUTER JOIN
AIM.dbo.AggTicket t ON d.Date = t.TicketDate
LEFT OUTER JOIN
AIM.dbo.ModOrders o ON t.OrderNo = o.OrderNo
LEFT OUTER JOIN
OrderTots ON t.OrderNo = OrderTots.OrderNo
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs
ORDER BY
d.Date
When I run that query in SQL Server Management Studio 2017, I get the above error.
These are my questions for the community:
Does this error message correctly describe an error in my code?
If so, why is that error an error? (To the best of my knowledge, everything is already contained in either an aggregate function or in the GROUP BY clause...smh)
What is a better way to write this query so that it will function?
Much appreciation to everyone in advance!
I am writing a query in SQL Server 2008 (Express I believe?).
SELECT ##VERSION Will let you know what version you are on.
Column 'AIM.dbo.AggTicket.TotDirectHrs' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.
The problem is with your SUM OVER() statement:
SUM(t.TotDirectHrs) OVER (PARTITION BY t.TicketDate) AS DaysDirectHrs
Here, since you are using the OVER clause, you must include it in the GROUP BY. The OVER clause is used to determine the partitioning and order of a row-set for a window function. So, while you are using an aggregate with SUM you are doing this in a window function. Window functions belong to a type of function known as a 'set function', which means a function that applies to a set of rows. The word 'window' is used to refer to the set of rows that the function works on.
Thus, add t.TotDirectHrs to the GROUP BY
GROUP BY
d.Date, t.TicketDate, t.OrderNo, o.EstHrs,
OrderTots.TotActHrs, t.TotDirectHrs
If this narrows your results into a grouping that you don't want, then you can wrap it in another CTE or use a correlated sub-query. Potentially like the below:
(SELECT SUM(t2.TotDirectHrs) OVER (PARTITION BY t2.TicketDate) AS DaysDirectHrs FROM AIM.dbo.AggTicket t2 WHERE t2.TicketDate = t.TicketDate) as DaysDirectHrs,
EXAMPLE
if object_id('tempdb..#test') is not null
drop table #test
create table #test(id int identity(1,1), letter char(1))
insert into #test
values
('a'),
('b'),
('b'),
('c'),
('c'),
('c')
Given the data set above, suppose we wanted to get a count of all rows. That's simple right?
select
TheCount = count(*)
from
#test
+----------+
| TheCount |
+----------+
| 6 |
+----------+
Here, no GROUP BY is needed because it's implied to group over all columns since no columns are specified in the SELECT list. Remember, GROUP BY groups the SELECT statement results according to the values in a list of one or more column expressions. If aggregate functions are included in the SELECT list, GROUP BY calculates a summary value for each group. These are known as vector aggregates.[MSDN].
Now, suppose we wanted to count each letter in the table. We could do that at least two ways. Using COUNT(*) with the letter column in the select list--or using COUNT(letter) with the letter column in the select list. However, in order for us to attribute the count with the letter, we need to return the letter column. Thus, we must include letter in the GROUP BY to tell SQL Server what to apply the summary table to.
select
letter
,TheCount = count(*)
from
#test
group by
letter
+--------+----------+
| letter | TheCount |
+--------+----------+
| a | 1 |
| b | 2 |
| c | 3 |
+--------+----------+
Now, what if we wanted to return this same count, but we wanted to return all rows as well? This is where window functions come in. The window function works similar to GROUP BY in this case by telling SQL Server the set of rows to apply the aggregate to. Then, it's value is returned for for every row in this window / partition. Thus, it returns a column which is applied to every row making it just like any column or calculated column which is returned form the select list.
select
letter
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
+--------+---------------------+
| letter | TheCountOfTheLetter |
+--------+---------------------+
| a | 1 |
| b | 2 |
| b | 2 |
| c | 3 |
| c | 3 |
| c | 3 |
+--------+---------------------+
Now we get to your case where you want to use an aggregate and an aggregate in a window function. Remember that the return of the window function is treated like any other column, thus must be applied in the GROUP BY. Pseudo would look something like this, but window functions aren't allowed in the GROUP BY clause.
select
letter
,TheCount = count(*)
,TheCountOfTheLetter = count(*) over (partition by letter)
from
#test
group by
letter
,count(*) over (partition by letter)
--returns an error
Thus, we must a correlated sub-query or a cte or some other method.
select
t.letter
,TheCount = count(*)
,TheCountOfTheLetter = (select distinct count(*) over (partition by letter) from #test t2 where t2.letter = t.letter)
from
#test t
group by
t.letter
+--------+----------+---------------------+
| letter | TheCount | TheCountOfTheLetter |
+--------+----------+---------------------+
| a | 1 | 1 |
| b | 2 | 2 |
| c | 3 | 3 |
+--------+----------+---------------------+

What is the conventional way of checking whether something Exists in another table in Application Insights Analytics Query Language?

I want to limit requests by a specific associated dependency by name. I tried using a leftsemi join but that didn't seem to work as I expected because it game me the same results as my inner join.
requests
| where timestamp >= ago(24h)
| join kind=leftsemi (
dependencies
| where name contains "MYDATABASENAME"
) on operation_Id
| summarize count() by tostring(parseurl(url).Path)
| order by count_ desc
I'm looking at the the where-in statement next but I'm still unsure whether this is sort of the expected way to do what what typically be an exists statement in T-SQL.
You should be able to use let statement to achieve this.
Actually, in order to get the where-in semantics you should use inner join. From the documentation of join (at the kind=inner section):
There's a row in the output for every combination of matching rows from left and right.
In addition, since there's a limit on the size of the returned table, you might want to limit the right side of the join like this:
requests
| where timestamp >= ago(24h)
| join kind=inner (
dependencies
| where name contains "MYDATABASENAME"
| project operation_Id
) on operation_Id
| summarize count() by tostring(parseurl(url).Path)
| order by count_ desc

Postgresql inner select with distinct

I'm using Postgresql 9.2 and have a simple students table as follow
id | proj_id | mark | name | test_date
I have 2 queries which is described below
select * from (select distinct on (proj_id) proj_id , mark, name,
test_date from students )
t
where t.mark <= 1000
VS
select distinct on (proj_id) proj_id , mark, name, test_date from
students where mark <= 1000
when I run each query for more than 10000 records each query returns different result especially result count although for less than 3000 records the result would be the same.
is this postgresql 9.2 bug or I'm missing something ?
Your queries are producing two different sets of results because they are applying the logic differently.
The first query is getting a distinct set of results, and then applying the 'mark' filter.
The second query is applying the 'mark' filter, and then getting a distinct set of results.
As you don't have any ordering applied the first query could potential return a different number of rows each time it is run - as the mark field could contain any of the values that relate to the proj_id.

SQL NOT LIKE and LIKE

I'm having a problem with understanding the LIKE and NOT LIKE operators in SQL. This is a query that I've executed:
select serial_number from UNIT U
group by serial_number
order by serial_number
which yields 2000 results.
When I execute this query, I get 1950 results:
select serial_number from UNIT U
WHERE op_name LIKE 'Assembly'
group by serial_number
order by serial_number
So when I execute this query, I expect to get 50 results, but instead I get 2000:
select serial_number from UNIT U
WHERE op_name NOT LIKE 'Assembly'
group by serial_number
order by serial_number
Any explanations? Thanks a bunch.
The group you're doing makes it not really valid to do the sort of count comparison you're attempting.
Suppose you have 10 unique serial numbers, and for each of those serial numbers there are two rows (so 20 rows total), one with op_name "Xyz", and another with op_name "Assembly". Your first query would return 10 rows. Your second query would return 10 rows. Your third query would return 10 rows. Because of the group, LIKE "Assembly" and NOT LIKE "Assembly" are not mutually exclusive.
NULL is neither LIKE nor NOT LIKE anything.
Actually, re-reading your numbers more carefully, there may be another cause. (My earlier point is true, but this is more likely.)
Suppose you have the following data:
serial_number | op_name
--------------+---------
1 | Assembly
1 | Not
1 will be returned by both queries.
Without the wildcards % your LIKE operates just like an =.
Check and see what the values are...most likely if you are using LIKE, you also want to use wildcards for example:
select serial_number from UNIT U WHERE op_name NOT LIKE '%Assembly%' group by serial_number order by serial_number

Resources