TSQL Least number of appearances - sql-server

My question is that I want to find the "Balie" with the least number of "Maatschappijen" booked on it. So far I got this query wich displays all "Balies" and all the "Maatschappijen" with them. The wanted result is one "balienummer" record with the least number of "maatschappijen" booked on it.
Query
SELECT [Balie].[balienummer], [IncheckenBijMaatschappij].[balienummer], [IncheckenBijMaatschappij].[maatschappijcode]
FROM [Balie]
JOIN [IncheckenBijMaatschappij]
ON [Balie].[balienummer] = [IncheckenBijMaatschappij].[balienummer]
Query result
balienummer balienummer maatschappijcode
1 1 BA
1 1 TR
2 2 AF
2 2 NZ
3 3 KL
4 4 KL
LRS: https://www.dropbox.com/s/f2l9a874d5witpt/LRS_CasusGelreAirport.pdf

SELECT [Balie].[balienummer], count([IncheckenBijMaatschappij].[maatschappijcode])
FROM [Balie]
JOIN [IncheckenBijMaatschappij]
ON [Balie].[balienummer] = [IncheckenBijMaatschappij].[balienummer]
GROUP BY [Balie].[balienummer]
ORDER BY count([IncheckenBijMaatschappij].[maatschappijcode])
First record should be your answer.

Related

Alternative solutions to an array search in PostgreSQL

I am not sure if my database design is good for this tricky case and I also ask for help how the query for this could look like.
I plan a query with the following table:
search_array | value | id
-----------------------+-------+----
{XYa,YZb,WQb} | b | 1
{XYa,YZb,WQb,RSc,QZa} | a | 2
{XYc,YZa} | c | 3
{XYb} | a | 4
{RSa} | c | 5
There are 5 main elements in the search_array: XY, YZ, WQ, RS, QZ and 3 Values: a, b, c that are concardinated to each element.
Each row has also one value: a, b or c.
My aim is to find all rows that fit to a specific row in this sense: At first it should be checked if they have any same main elements in their search_arrays (yellow marked in the example).
As example:
Row id 4 an row id 5 wouldnt match because XY != RS.
Row id 1, 2 and 3 would match two times because they have all XY and YZ.
Row id 1 and 2 would even match three times because they have also WQ in common.
And second: if there is a Main Element match it should be 'crosschecked' if the lowercase letters after the Main Elements fit to the value of the other row.
As example: The only match for Row id 1 in the table would be Row id 4 because they both search for XY and the low letters after the elements match each value of the two rows.
Another match would be ROW id 2 and 5 with RS and search c to value c and search a to value a (green and orange marked).
My idea was to cut the search_array elements in the query in two parts with the RIGHT and LEFT command for strings. But I dont know how to combine the subqueries for this search.
Or would be a complete other solution faster? Like splitting the search array into another table with the columns 'foregin key' to the maintable, 'main element' and 'searched_value'. I am not sure if this is the best solution because the program would all the time switch to the main table to find two rows out of 3 million rows to compare their searched_values to the values?
Thank you very much for your answers and your time!
You'll have to represent the data in a normalized fashion. I'll do it in a WITH clause, but it would be better to store the data in this fashion to begin with.
WITH unravel AS (
SELECT t.id, t.value,
substr(u.val, 1, 2) AS arr_main,
substr(u.val, 3, 1) AS arr_val
FROM mytable AS t
CROSS JOIN LATERAL unnest(t.search_array) AS u(val)
)
SELECT a.id AS first_id,
a.value AS first_value,
b.id AS second_id,
b.value AS second_value,
a.arr_main AS main_element
FROM unravel AS a
JOIN unravel AS b
ON a.arr_main = b.arr_main
AND a.arr_val = b.value
AND b.arr_val = a.value;

Need to return the first instance found, instead of all

I have a query that I have had make do over the last few years which heavily leans on the users entering in data in a somewhat correct order (i know, first mistake).
The idea is that the users enter a item with the category of CRT in the first 7 lines...and only one. Some times there may be others with this category, but i only want the first one. Now, users sometimes enter the item in after the first 7 rows...leading to another problem. Here is my original query:
Select distinct
d.Job_Number as Rig,
case d.Customer_Name
when 'case' then 'diff case'
else '******'
end as Parent_Customer,
d.Customer_Name,
case d.Reference_Location2
when 'case' then 'diff case'
else '******'
end as Size,
case d.Office_code
when 'case' then 'diff case'
else '****'
end as Office,
d.Rental_Ticket,
case d.job_type
when 'case' then 'diff case'
else '0'
end as Billable,
d.Reference_Location5 as TT
from HP_View_DEL_Ticket_Header_Master as d
left join CSView_INVC_Header_Master as i --left join to report jobs that have not been invoiced
on d.Rental_Ticket = i.Rental_Ticket_or_Tag_Number
join CSView_DEL_Ticket_Lines_Master as l
on d.Rental_Ticket = l.Rental_Ticket
where (d.Ticket_Month between 7 and 7 and d.Ticket_Year = 2017)-- or (d.Ticket_Month between 12 and 12 and d.Ticket_Year = 2016))
and d.Rental_Ticket not in (select dticket from deltick_void)
and d.Ticket_Type = 'D'
and d.Posted_Flag = -1
and l.Row_Counter between 1 and 7 --Currently how I get the correct record, problem is its sometimes not entered in the first 7 rows.
and l.Category_Code = 'CRT'
order by Parent_Customer
I am needing one Ticket number, to one CRT item (the CRT first item). I've tried doing something in the where clause to find the top CRT items but didn't work the way i figured it would.
Any help would be great.
Thanks
BD
EDIT:
Expected OUTPUT
Rig Parent_Customer Customer_Name Size Office Rental_Ticket Billable TT
RIG 642 A A 5 to 5.5 City 12627 0 YES
RIG 525 B B 5 1/2 City 2 12628 0 YES
RIG 603 C C 9 5/8 City 3 12634 0 NO
Incorrect OUTPUT:
Rig Parent_Customer Customer_Name Size Office Rental_Ticket Billable TT
RIG 642 A A 5 to 5.5 City 12627 0 YES
RIG 642 A A 5 to 5.5 City 12627 0 YES
Because the query finds more than one 'CRT' on the Rental Ticket, it returns two records that are the same for the columns I have selected. I am wanting to use ONLY the first one it finds.

Performing COUNT() on a computed Column from a VIEW

So all I want to do is have a view that shows how many kid between and including the age of 5 - 18 are in each family. I AM USING SQL SERVER.
The view I Have written to get the Family Members Ages is
CREATE VIEW VActiveMembers
AS
SELECT
TM.intMemberID AS intMemberID,
TM.strFirstName AS strFirstName,
TM.strLastName AS strLastName,
TM.strEmailAddress AS strEmailAddress,
TM.dtmDateOfBirth AS dtmDateOfBirth,
FLOOR(DATEDIFF(DAY, dtmDateOfBirth, GETDATE()) / 365.25) AS intMemberAge
FROM
TMembers AS TM
WHERE
TM.intStatusFlagID = 1
intStatusFlag = 1 is just a flag that means the member is active.
Now I have tried for about 3ish hours to figure this out but I cannot figure it out. Here is the one where instead of trying to get the solution in one fowl swoop I tried to step wise it, but then I still didn't get the result I wanted.
As you can see I didn't use the view where I calculated the AGE from because the "Multi-part Identifier could not be bound" I have seen that error but I couldn't get it to go away in this case. Ideally I would like the count to be performed on the VIEW instead of recalculating the ages all over again
CREATE VIEW VActiveFamilyMembersK12Count
AS
SELECT
TF.intParishFamilyID,
COUNT(DATEDIFF(DAY, dtmDateOfBirth, GETDATE()) / 365) AS intMemberAgeCount
FROM
TFamilies AS TF
INNER JOIN
TFamilyMembers AS TFM
INNER JOIN
VActiveMembers AS vAM ON (TFM.intMemberID = vAM.intMemberID)
ON (TFM.intParishFamilyID = TF.intParishFamilyID)
WHERE
TF.intStatusFlagID = 1
GROUP BY
TF.intParishFamilyID
I wanted to just get a count using the age calculation just to see If I could get a correct count of members in a family, then I could start building upon that to get a count of members of a certain age. The result I get back is 2 but there are guaranteed 3 members to each family.
The result I am looking For is this
Family_ID | K12Count
-----------------------------
1001 | 2
1002 | 0
1003 | 1
1004 | 0
Here is a list of resources I looked up trying to figure this out, maybe one of them is in fact the answer and I just don't see it, but I am at a loss at the moment.
SQL Select Count from below a certain age
How to get count of people based on age groups using SQL query in Oracle database?
Count number of user in a certain age's range base on date of birth
Conditional Count on a field
http://timmurphy.org/2010/10/10/conditional-count-in-sql/
*** EDIT ***
CREATE VIEW VActiveFamilyMembersK12Count
AS
SELECT
TF.intParishFamilyID,
SUM(CASE WHEN intMemberAge >= 5 AND intMemberAge <= 18 THEN 1 ELSE 0 END) AS intK12Count
FROM
TFamilies AS TF
INNER JOIN TFamilyMembers AS TFM
INNER JOIN VActiveMembers AS vAM
ON (TFM.intMemberID = vAM.intMemberID)
ON (TFM.intParishFamilyID = TF.intParishFamilyID)
WHERE
TF. intStatusFlagID = 1
GROUP BY
TF.intParishFamilyID
GO
THIS IS THE SOLUTION ABOVE.
Conditional count is the way to go.
Something like:
SELECT intParishFamilyID,
COUNT(CASE WHEN intMemberAge >=5 and intMemberAge <=18 THEN 1 ELSE 0 END)
FROM
TFamilies AS TF
INNER JOIN TFamilyMembers AS TFM
INNER JOIN VActiveMembers AS vAM
ON (TFM.intMemberID = vAM.intMemberID)
ON (TFM.intParishFamilyID = TF.intParishFamilyID)
WHERE
TF. intStatusFlagID = 1
GROUP BY
TF.intParishFamilyID

Using multiple aggregate functions - sum and count

I've tried several of the solutions to my question on the site but could not find one that worked. Please help!
Other than taking some liberties with the report_names, the data is realistic of what I am trying to accomplish and is just a small portion of what I am up against, roughly 97K rows of data with the same type of repetition of branch, file_count, report_name...the file numbers are unique and are insignificant. It is for informational purposes of my question and explains why the amounts are unique - they are tied to the file_name
I am looking for one report_name with the sum of the two amounts.
Here are the current results to my query:
branch file_count file_volume net_profit report_name file_number
Northeast 1 $200,000.00 $200,000.00 bogart.hump.new 12345
Northeast 1 $195,000.00 $197,837.00 bogart.hump.new 23456
Northeast 1 $111,500.00 $113,172.00 bogart.hump.new 34567
Northwest 1 $66,000.00 -$1,500.18 jolie.angela.new 45678
Northwest 1 $159,856.00 -$2,745.58 jolie.angela.new 56789
Northwest 1 $140,998.00 -$2,421.69 jolie.angela.new 67890
Southwest 1 $74,000.00 $73,904.00 Man.bat.net 78901
Southwest 1 $186,245.00 -$4,231.25 Man.bat.net 89012
Southwest 1 $72,375.00 $73,641.00 Man.bat.net 90123
Southeast 1 $79,575.00 -$1,821.76 zep.led.new 1234A
Southeast 1 $268,600.00 $268,600.00 zep.led.new 2345A
Southeast 1 $77,103.00 -$1,751.68 zep.led.new 3456A
This is what I am looking for:
branch file_count file_volume net_profit report_name file_number
Northeast 3 $506,500.00 $511,009.00 bogart.hump.new
Northwest 3 $366,854.00 -$6,667.45 jolie.angela.new
Southwest 3 $332,620.00 $143,313.75 Man.bat.net
Southeast 3 $425,278.00 $265,026.56 zep.led.new
My query:
SELECT
branch,
count(filenumber) AS file_count,
sum(fileAmount) AS file_amount,
sum(netprofit*-1) AS net_profit,
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
FROM user.summary u
inner join user.db1 d1 ON d1.loaname = u.loaname
inner join user.db2 d2 ON d2.cn = u.loaname
WHERE d2.filedate = '2015-09-01'
AND filenumber is not null
GROUP BY branch,concat(d2.lastname,'.',d2.firstname,'.','new')
The only issue i see with your current query is that you have a comma at the end of this line that would give you a syntax error:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
If you want the blank field file_number as shown in your desired result set though, you could leave the comma and follow it with the blank field by adding to it:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
'' file_number
I figured it out but could not have done it without airing it out in this forum. In my actual query, I included the "file_name" column, so I had both the "count(file_name)" and "file_name" columns...but in my query example, I only had the "count(file_name)" column. When I removed the "file_column" column from my actual query, it worked. Side note...it was obvious that I excluded a key component in my query. On any future query questions, I will include the complete query but substitute actual column names with col1, col2, db1, db2, etc... thanks very much for responding to my question.

Fill secondly data from Q KDB+

I have a csv file with some high frequency stock price data, and I'd like to get a secondly price data from the table.
In each file, there are columns named date, time, symbol, price, volume, and etc.
There are some seconds with no trading so there are missing data in some seconds.
I'm wondering how could I fill the missing data in Q to get the secondly data from 9:30 to 16:00 in full? If there is missing price, just use the recently price as its price in that second.
I'm considering to write some loop, but I don't know how to exactly to that.
Simplifying a little, I'll assume you have some random timestamps in your dataset like this:
time price
--------------------------------------
2015.01.20D22:42:34.776607000 7
2015.01.20D22:42:34.886607000 3
2015.01.20D22:42:36.776607000 4
2015.01.20D22:42:37.776607000 8
2015.01.20D22:42:37.886607000 7
2015.01.20D22:42:39.776607000 9
2015.01.20D22:42:40.776607000 4
2015.01.20D22:42:41.776607000 9
so there are some missing seconds there. I'm going to call this table t. So if you do a by-second type of query, obviously the seconds that are missing are still missing:
q)select max price by time.second from t
second | price
--------| -----
22:42:34| 7
22:42:36| 4
22:42:37| 8
22:42:39| 9
22:42:40| 4
22:42:41| 9
To get missing seconds, you have to join a list of nulls. In this case we know the data goes from 22:42:34 to 22:42:41, but in reality you'll have to find the min/max time and use that to create a temporary "null" table to join against:
q)([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N)
second price
--------------
22:42:34
22:42:35
22:42:36
22:42:37
22:42:38
22:42:39
22:42:40
22:42:41
Then left join:
q)([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N) lj select max price by time.second from t
second price
--------------
22:42:34 7
22:42:35
22:42:36 4
22:42:37 8
22:42:38
22:42:39 9
22:42:40 4
22:42:41 9
You can use fills or whatever your favourite filling heuristic is after that.
q)fills `second xasc asc ([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N) lj select max price by time.second from t
second price
--------------
22:42:34 7
22:42:35 7
22:42:36 4
22:42:37 8
22:42:38 8
22:42:39 9
22:42:40 4
22:42:41 9
(Note the sort on second before fills!)
By the way for larger tables this will be much faster than a loop. Loops in q are generally a bad idea.
EDIT
You could use a comma join too, both tables need to be keyed on the second column
t,t1
(where t1 is the null-filled table keyed on second)
I haven't tested it, but I suspect it would be slightly faster than the lj version.
Using aj which is one of the most powerful features of KDB:
q)data
sym time price size
----------------------------
MS 10:24:04 93.35974 8
MS 10:10:47 4.586986 1
APPL 10:50:23 0.7831685 1
GOOG 10:19:52 49.17305 0
in-memory table needs to be sym,time sorted with g# attribute applied to sym column
q)data:update `g#sym from `sym`time xasc data
q)meta trade
c | t f a
-----| -----
sym | s g
time | v
price| f
size | j
Creating a rack table intervalized per second per sym :
q)rack: `sym`time xasc (select distinct sym from data) cross ([] time:{x[0]+til `int$x[1]-x[0]}(min;max)#\:data`time)
Using aj to join the data :
q)aj[`sym`time; rack; data]

Resources