TSQL Count appearance Group by values in other table - sql-server

I need to group some data to show in a graph but... it is too difficult for me :-(
In one table I have customers info, among that, Name, Kgs and yearly turnover
CustomerA 8 415.86
CustomerB 145846 6815.80
..............
CustomerZC 25160 25690.30
and I need to COUNT customers that has bought less than 50 Kgs, how many bought from 51 to 100, from 100 to 1.000, from 1000 to 30.000 and so on
but since groups limit are not similar, the boundaries of each range are stored in another table and looks like
Group0 0-50
Group1 51-100
Group2 101-1000
.....
Group15 1000001-5000000
Group16 5000001-9999999999
but I can modify it if it can helps
My Target is to have result like this:
0-50 14217
51-100 6425
101-1000 841
....
1000001-5000000 43
Now I achieve this result making 15 different queries but I would like to make an global algorithm that can adapt to a variable number of groups
Thanks

This one is similar, take a look at the second option that joins to a range table.
In your case, it would look something like this:
select r.boundary_name, coalesce(count(*), 0) as cnt
from ranges r
left join customers c
on c.kgs between r.low_range and r.high_range
group by r.boundary_name;
Naturally you'd need to tweak the join if you're looking for exclusive ranges vs. inclusive, and the ranges table will need a low and high bound column.

Related

Extracting all records using a conditional SQL Server query?

I have a long database of observations for individuals. There are multiple observations for each individual, all assigned different medcodeid's.
I want to extract all records of individuals with certain medcodeid's assigned, but only if they at some point have had a smaller list of specific codes assigned.
This is an example of what I start with:
long dataset, multiple observations
and this is the records I'd like to extract:
multiple observations, but patients 3 and 5 are not extracted, as they never had a medcode 12
Would this be an additional WHERE clause? I am struggling as this will then only extract the second AND medcodeid list. But I want it to extract all, if the individual has had one of these certain fewer codes at some point. I hope that makes some sense. I am unfamiliar with IF command? And cannot see how CASE WHEN would work either.
Thank you very much in advance!
You definitely don't want to filter out all the rows so you're right that an additional condition won't help with that. And where only lets you look at the current row and you're trying to make a decision based all the rows belonging to the patient.
This query just uses a table expression and an analytic count() that tags each row with the number of matches as it lets you look outside the current row just like you need.
-- my additions to your query are in lowercase
with data as (
SELECT obs.patid, yob, obsdate, medcodeid,
count(case when medcodeid IN (<list of mandatory codes>) then 1 end)
over (partition by obs.patid) as medcode_count
-- assuming the relationship looks something like this
from obs inner join medcode on medcode.patid = obs.patid
WHERE medcodeid IN (<list of codes>)
AND obsdate BETWEEN '2004-12-31' AND GETDATE()
AND patienttypeid = 3 AND acceptable = 1 AND gender = 2
AND YEAR(obsdate) - yob > 15 AND YEAR(obsdate) - yob < 45
)
select * from data where medcode_count > 0;
At first I thought you were requiring that at least five of the codes from the full set were found. Now that you've edited the question I believe that you want to require that at least one code from a smaller subset is present. Either way this approach will work.
If I'm understanding what you're asking, I think what you need is an additional WHERE clause with a subquery. This could be done with and EXIST or a join but I find an IN query to be easier to work with.
You left the FROM out of your query so I had to guess at it but try this:
SELECT
obs.patid,
yob,
obsdate,
medcodeid
FROM
obs
WHERE
medcodeid IN (list of 20 codes)
AND (obsdate BETWEEN '2004-12-31' AND GETDATE())
AND patienttypeid = 3
AND acceptable = 1
AND gender = 2
AND ((YEAR(obsdate))-yob) > 15
AND ((YEAR(obsdate)) - yob) < 45
AND obs.patid IN (
SELECT
obs.patid
FROM
obs
WHERE
medcodeid IN (5 of the 20 codes)
);

Find Whole Duplicated Invoices in SQL

I'm trying to write some SQL to allow me to get result of possible duplicated Invoices that will have the same [same Items, with Same Quantity], That is possible to be Duplicate issued
Invoice items Average around 300 Item
Total Invoice To be Revision around 2500 Invoice
The Following is a invoices sample with only 1 items or so, but in real population items average is 300
Inv_ID Item_Code Item_Q
A-800 101010 24
A-801 101010 24
A-802 202020 9
A-803 101010 18
A-804 202020 9
A-805 202020 9
A-806 101010 18
Hoping The Excepted Result will be
A-800, A-801
A-802, A-804, A-805
A-803, A-806
But the invoice has around 200 item, and the duplicated invoices has to be has the same items and exact same quantity for these.
It's SQL_Server
And The Result need to match the whole Invoices item
Like Invoice A has 300 Different Items line with each one Quantity 2
The Results need to be all invoice has the exact same 300 Item with the Exact Quantity.
The Supplier has issued multiple duplicated invoice to our accounting
Department by mistakes over 4 years, it was discovered by chance, so
we need to find out the duplicated invoice to remove it from payment
schedule.
The issued invoices Need to has the exact different items with exact quantity to be considered duplicated.,,,
You didn't specify your DBMS product, so this answer is for Postgres.
select string_agg(inv_id, ',' order by inv_id) as inv_ids
from the_table
group by item_code, item_q
order by inv_ids;
Online example
Other DBMS products have similar functions to do the aggregation. In Oracle or SQL Server, you would use listagg(), in MySQL you would use group_concat(). The individual syntax would be slightly different (you have to check the manual), but the idea is the same.

What is the most efficient way to store 2-D timeseries in a database (sqlite3)

I am performing large scale wind simulations to produce hourly wind patterns over a city. The results is a time series of 2-dimensional contours. Currently I am storing the results in SQLite3 database tables with the following structure
Table: CFD
id, timestamp, velocity, cell_id
1 , 2010-01-01 08:00:00, 3.345, 1
2 , 2010-01-01 08:00:00, 2.355, 2
3 , 2010-01-01 08:00:00, 2.111, 3
4 , 2010-01-01 08:00:00, 6.432, 4
.., ..................., ....., .
1000 , 2010-01-01 09:00:00, 3.345, 1
1001 , 2010-01-01 10:00:00, 2.355, 2
1002 , 2010-01-01 11:00:00, 2.111, 3
1003 , 2010-01-01 12:00:00, 6.432, 4
.., ..................., ....., .
Actual create statement:
CREATE TABLE cfd(id INTEGER PRIMARY KEY, time DATETIME, u, cell_id integer)
CREATE INDEX idx_cell_id_cfd on cfd(cell_id)
CREATE INDEX idx_time_cfd on cfd(time)
(There are three of these tables, each for a different result variable)
where cell_id is a reference to the cell in the domain representing a location in the city. See this picture to have an idea of what it looks like at a specific timestep.
The typical query performs some kind of aggregation on the time dimension and group by on cell_id. For example, if I want to know the average local wind speed in each cell during a specific time interval, I would execute
select sum(time in ('2010-01-01 08:00:00','2010-01-01 13:00:00','2010-01-01 14:00:00', ...................., ,'2010-12-30 18:00:00','2010-12-30 19:00:00','2010-12-30 20:00:00','2010-12-30 21:00:00') and u > 5.0) from cfd group by cell_id
The number of timestamps can vary from 100 to 8,000.
This is fine for small databases, but it gets much slower for larger ones. For example, my last database was 60GB, 3 tables and each table had 222,000,000 rows.
Is there a better way to store the data? For example:
would it make sense to create a different table for each day?
would be better to use a separate table for the timesteps and then use a join?
is there a better way of indexing?
I have already adopted all the recommendations in this question to maximise the performance.
This particular query is hard to optimize because the sum() must be computed over all table rows. It is a better idea to filter rows with WHERE:
SELECT count(*)
FORM cfd
WHERE time IN (...)
AND u > 5
GROUP BY cell_id;
If possible, use a simpler expression to filter times, such as time BETWEEN a AND b.
It might be worthwhile to use a covering index, or in this case, when all queries filter on the time, a clustered index (without additional indexes):
CREATE TABLE cfd (
cell_id INTEGER,
time DATETIME,
u,
PRIMARY KEY (cell_id, time)
) WITHOUT ROWID;

How to merge rows of SQL data on column-based logic?

I'm writing a margin report on our General Ledger and I've got the basics working, but I need to merge the rows based on specific logic and I don't know how...
My data looks like this:
value1 value2 location date category debitamount creditamount
2029 390 ACT 2012-07-29 COSTS - Widgets and Gadgets 0.000 3.385
3029 390 ACT 2012-07-24 SALES - Widgets and Gadgets 1.170 0.000
And my report needs to display the two columns together like so:
plant date category debitamount creditamount
ACT 2012-07-29 Widgets and Gadgets 1.170 3.385
The logic to join them is contained in the value1 and value 2 column. Where the last 3 digits of value 1 and all three digits of value 2 are the same, the rows should be combined. Also, the 1st digit of value 1 will always been 2 for sales and 3 for costs (not sure if that matters)
IE 2029-390 is money coming in for Widgets and Gadgets sold to customers, while 3029-390 is money being spent to buy the Widgets and Gadgets from suppliers.
How can I so this programmatically in my stored procedure? (SQL Server 2008 R2)
Edit: Would I load the 3000's into one variable table the and the 2000's into another, then join the two on value2 and right(value1, 3)? Or something like that?
Try this:
SELECT RIGHT(LTRIM(RTRIM(value1)),3) , value2, MAX(location),
MAX(date), MAX(category), SUM(debitamount), SUM(creditamount) FROM
table1 GROUP BY RIGHT(LTRIM(RTRIM(value1)),3), value2
It will sum the credit amount and debit amount. It will choose the maximum string value in the other columns, assuming they are always the same when value2 and the last 3 digits of value1 are the same it shouldn't matter.

GROUP_CONCAT and DISTINCT are great, but how do i get rid of these duplicates i still have?

i have a mysql table set up like so:
id uid keywords
-- --- ---
1 20 corporate
2 20 corporate,business,strategy
3 20 corporate,bowser
4 20 flowers
5 20 battleship,corporate,dungeon
what i WANT my output to look like is:
20 corporate,business,strategy,bowser,flowers,battleship,dungeon
but the closest i've gotten is:
SELECT DISTINCT uid, GROUP_CONCAT(DISTINCT keywords ORDER BY keywords DESC) AS keywords
FROM mytable
WHERE uid !=0
GROUP BY uid
which outputs:
20 corporate,corporate,business,strategy,corporate,bowser,flowers,battleship,corporate,dungeon
does anyone have a solution? thanks a ton in advance!
What you're doing isn't possible with pure SQL the way you have your data structured.
No SQL implementation is going to look at "Corporate" and "Corporate, Business" and see them as equal strings. Therefore, distinct won't work.
If you can control the database,
The first thing I would do is change the data setup to be:
id uid keyword <- note, not keyword**s** - **ONE** value in this column, not a comma delimited list
1 20 corporate
2 20 corporate
2 20 business
2 20 strategy
Better yet would be
id uid keywordId
1 20 1
2 20 1
2 20 2
2 20 3
with a seperate table for keywords
KeywordID KeywordText
1 Corporate
2 Business
Otherwise you'll need to massage the data in code.
Mmm, your keywords need to be in their own table (one record per keyword). Then you'll be able to do it, because the keywords will then GROUP properly.
Not sure if MySql has this, but SQL Server has a RANK() OVER PARTITION BY that you can use to assign each result a rank...doing so would allow you to only select those of Rank 1, and discard the rest.
You have two options as I see it.
Option 1:
Change the way your store your data (keywords in their own table, join the existing table with the keywords table using a many-to-many relationship). This will allow you to use DISTINCT. DISTINCT doesn't work currently because the query sees "corporate" and "corporate,business,strategy" as two different values.
Option 2:
Write some 'interesting' sql to split up the keywords strings. I don't know what the limits are in MySQL, but SQL in general is not designed for this.

Resources