How to merge rows of SQL data on column-based logic? - sql-server

I'm writing a margin report on our General Ledger and I've got the basics working, but I need to merge the rows based on specific logic and I don't know how...
My data looks like this:
value1 value2 location date category debitamount creditamount
2029 390 ACT 2012-07-29 COSTS - Widgets and Gadgets 0.000 3.385
3029 390 ACT 2012-07-24 SALES - Widgets and Gadgets 1.170 0.000
And my report needs to display the two columns together like so:
plant date category debitamount creditamount
ACT 2012-07-29 Widgets and Gadgets 1.170 3.385
The logic to join them is contained in the value1 and value 2 column. Where the last 3 digits of value 1 and all three digits of value 2 are the same, the rows should be combined. Also, the 1st digit of value 1 will always been 2 for sales and 3 for costs (not sure if that matters)
IE 2029-390 is money coming in for Widgets and Gadgets sold to customers, while 3029-390 is money being spent to buy the Widgets and Gadgets from suppliers.
How can I so this programmatically in my stored procedure? (SQL Server 2008 R2)
Edit: Would I load the 3000's into one variable table the and the 2000's into another, then join the two on value2 and right(value1, 3)? Or something like that?

Try this:
SELECT RIGHT(LTRIM(RTRIM(value1)),3) , value2, MAX(location),
MAX(date), MAX(category), SUM(debitamount), SUM(creditamount) FROM
table1 GROUP BY RIGHT(LTRIM(RTRIM(value1)),3), value2
It will sum the credit amount and debit amount. It will choose the maximum string value in the other columns, assuming they are always the same when value2 and the last 3 digits of value1 are the same it shouldn't matter.

Related

Find Whole Duplicated Invoices in SQL

I'm trying to write some SQL to allow me to get result of possible duplicated Invoices that will have the same [same Items, with Same Quantity], That is possible to be Duplicate issued
Invoice items Average around 300 Item
Total Invoice To be Revision around 2500 Invoice
The Following is a invoices sample with only 1 items or so, but in real population items average is 300
Inv_ID Item_Code Item_Q
A-800 101010 24
A-801 101010 24
A-802 202020 9
A-803 101010 18
A-804 202020 9
A-805 202020 9
A-806 101010 18
Hoping The Excepted Result will be
A-800, A-801
A-802, A-804, A-805
A-803, A-806
But the invoice has around 200 item, and the duplicated invoices has to be has the same items and exact same quantity for these.
It's SQL_Server
And The Result need to match the whole Invoices item
Like Invoice A has 300 Different Items line with each one Quantity 2
The Results need to be all invoice has the exact same 300 Item with the Exact Quantity.
The Supplier has issued multiple duplicated invoice to our accounting
Department by mistakes over 4 years, it was discovered by chance, so
we need to find out the duplicated invoice to remove it from payment
schedule.
The issued invoices Need to has the exact different items with exact quantity to be considered duplicated.,,,
You didn't specify your DBMS product, so this answer is for Postgres.
select string_agg(inv_id, ',' order by inv_id) as inv_ids
from the_table
group by item_code, item_q
order by inv_ids;
Online example
Other DBMS products have similar functions to do the aggregation. In Oracle or SQL Server, you would use listagg(), in MySQL you would use group_concat(). The individual syntax would be slightly different (you have to check the manual), but the idea is the same.

What is the most efficient way to store 2-D timeseries in a database (sqlite3)

I am performing large scale wind simulations to produce hourly wind patterns over a city. The results is a time series of 2-dimensional contours. Currently I am storing the results in SQLite3 database tables with the following structure
Table: CFD
id, timestamp, velocity, cell_id
1 , 2010-01-01 08:00:00, 3.345, 1
2 , 2010-01-01 08:00:00, 2.355, 2
3 , 2010-01-01 08:00:00, 2.111, 3
4 , 2010-01-01 08:00:00, 6.432, 4
.., ..................., ....., .
1000 , 2010-01-01 09:00:00, 3.345, 1
1001 , 2010-01-01 10:00:00, 2.355, 2
1002 , 2010-01-01 11:00:00, 2.111, 3
1003 , 2010-01-01 12:00:00, 6.432, 4
.., ..................., ....., .
Actual create statement:
CREATE TABLE cfd(id INTEGER PRIMARY KEY, time DATETIME, u, cell_id integer)
CREATE INDEX idx_cell_id_cfd on cfd(cell_id)
CREATE INDEX idx_time_cfd on cfd(time)
(There are three of these tables, each for a different result variable)
where cell_id is a reference to the cell in the domain representing a location in the city. See this picture to have an idea of what it looks like at a specific timestep.
The typical query performs some kind of aggregation on the time dimension and group by on cell_id. For example, if I want to know the average local wind speed in each cell during a specific time interval, I would execute
select sum(time in ('2010-01-01 08:00:00','2010-01-01 13:00:00','2010-01-01 14:00:00', ...................., ,'2010-12-30 18:00:00','2010-12-30 19:00:00','2010-12-30 20:00:00','2010-12-30 21:00:00') and u > 5.0) from cfd group by cell_id
The number of timestamps can vary from 100 to 8,000.
This is fine for small databases, but it gets much slower for larger ones. For example, my last database was 60GB, 3 tables and each table had 222,000,000 rows.
Is there a better way to store the data? For example:
would it make sense to create a different table for each day?
would be better to use a separate table for the timesteps and then use a join?
is there a better way of indexing?
I have already adopted all the recommendations in this question to maximise the performance.
This particular query is hard to optimize because the sum() must be computed over all table rows. It is a better idea to filter rows with WHERE:
SELECT count(*)
FORM cfd
WHERE time IN (...)
AND u > 5
GROUP BY cell_id;
If possible, use a simpler expression to filter times, such as time BETWEEN a AND b.
It might be worthwhile to use a covering index, or in this case, when all queries filter on the time, a clustered index (without additional indexes):
CREATE TABLE cfd (
cell_id INTEGER,
time DATETIME,
u,
PRIMARY KEY (cell_id, time)
) WITHOUT ROWID;

SSAS cube with date range records

I have to build a cube based on date range records, and not sure about the best way to proceed.
Imagine say a cube of Cars and warranty periods. Each car has a start date, and an end of warranty periods. Then there may be extended warranty periods.. so imagine
CAR REG TYPE WARRANTY START WARRANTY END
CAR A PURCHASE 01/01/2016 31/01/2016
CAR A EXTENDED 01/01/2017 30/06/2017
CAR A EXTENDED 01/08/2017 30/01/2018 -- note, gap here
CAR B PURCHASE 01/01/2016 31/01/2016
CAR B EXTENDED 01/01/2017 30/06/2017
CAR B EXTENDED 01/08/2017 30/01/2018 -- note, gap here
So multiple items, with multiple date ranges. There is a main table (CARS) with car details (colour, model, etc).
Now I want to build a cube, which is reportable at month level, cars under warranty/warranty type, etc.
So plan 1 was to build a view which explodes the above out by a join to a date table, report by month, and feed this into a cube. But, the number of cars multiplied by the months covered leads to multi hundreds of milions of rows - which means sometimes the server runs out of TempDB space, and when it does run, the cube takes hours to build.
Is there a better way - such as a view for the car details, and then another view on the warranty table (how do I get SSAS to deal with months in a date range) - will the join in SSAS be more efficient than a join in a view in SQL?
Thanks all.
You can connect start and end columns to time dimension. And on the report you can use ":" operator to build date tange report.
More details you will find here: http://www.purplefrogsystems.com/blog/2013/04/mdx-between-start-date-and-end-date/
One approach which will work with drag-and-drop client tools like Excel or Power BI would be a many-to-many Date dimension. Since car A and B match, let's pretend there's a car C which has a warranty from 2015-07-30 to 2015-12-31.
Create a DimWarrantyDateRangeKey which represents a unique combination of dates during which a warranty is active. The surrogate key is WarrantyDateRangeKey. Certainly the ETL that builds this table will be a bit expensive, but given the size of your data I think it's a worthwhile investment that will produce better query performance than if your m2m bridge table has one row per active day per car.
Each car should be assigned one WarrantyDateRangeKey. Add the WarrantyDateRangeKey column to your fact tables...
CAR REG WarrantyDateRangeKey
A 1
B 1
C 2
m2mWarrantyDateRange
WarrantyDateRangeKey DateKey
1 20160101
1 20160102
1 ...
1 20170629
1 20170630
1 20170801
1 20170802
1 ...
1 20180129
1 20180130
2 20150701
2 20150702
2 ...
2 20151230
2 20151231
The tables relate together as follows...
FactTable -> DimWarrantyDateRange <- m2mWarrantyDateRange -> DimDate
Then in the cube you DimWarrantyDateRange should be a dimension, m2mWarrantyDateRange should be a measure group with a count measure. DimDate should be a dimension. Then you should relate DimDate to FactTable as a many-to-many (m2m) dimension using m2mWarrantyDateRange as the intermediate measure group.
Now in Excel or Power BI you should be able to filter to a particular date and it will filter down to the cars which had an active warranty on that day.

TSQL Count appearance Group by values in other table

I need to group some data to show in a graph but... it is too difficult for me :-(
In one table I have customers info, among that, Name, Kgs and yearly turnover
CustomerA 8 415.86
CustomerB 145846 6815.80
..............
CustomerZC 25160 25690.30
and I need to COUNT customers that has bought less than 50 Kgs, how many bought from 51 to 100, from 100 to 1.000, from 1000 to 30.000 and so on
but since groups limit are not similar, the boundaries of each range are stored in another table and looks like
Group0 0-50
Group1 51-100
Group2 101-1000
.....
Group15 1000001-5000000
Group16 5000001-9999999999
but I can modify it if it can helps
My Target is to have result like this:
0-50 14217
51-100 6425
101-1000 841
....
1000001-5000000 43
Now I achieve this result making 15 different queries but I would like to make an global algorithm that can adapt to a variable number of groups
Thanks
This one is similar, take a look at the second option that joins to a range table.
In your case, it would look something like this:
select r.boundary_name, coalesce(count(*), 0) as cnt
from ranges r
left join customers c
on c.kgs between r.low_range and r.high_range
group by r.boundary_name;
Naturally you'd need to tweak the join if you're looking for exclusive ranges vs. inclusive, and the ranges table will need a low and high bound column.

Is it possible to create an SQL query that displays results like this?

Background
I have a database that hold records of all assets in an office. Each asset have a condition, a category name and an age.
A ConditionID can be;
In use
Spare
In Circulation
CategoryID are;
Phone
PC
Laptop
and Age is just a field called AquiredDate which holds records like;
2009-04-24 15:07:51.257
Example
I've created an example of the inputs of the query to explain better what I need if possible.
NB.
Inputs are in Orange in the above example.
I've split the example into two separate queries.
Count would be the output
Question
Is this type of query and result set possible using SQL alone? And if so where do I start? Would it be easier to use Ms Excel also?
Yes it is possible, for your orange fields you can just e.g.
where CategoryID ='Phone' and ConditionID in ('In use', 'In Circulation')
For the yellow one you could do a datediff of days of accuired date to now and divide it by 365 and floor that value, to get the last one (6+ years category) you need to take the minimum of 5 and the calculated value so you get 0 for all between 0-1 year old etc. until 5 which has everything above 6 years.
When you group by that calculated column and select the additional the count you get what you desire.

Resources