Group by rows with arrays with different values but one same value - arrays

I have an issue, we have trac postgresql db (version 8.4) and we need to get the worklog based on the tags(keywords) in tickets and group time spent on these tickets.
This is my query:
select round(sum(wl.endtime-wl.starttime)/3600.0, 2) AS sum_of_hours,
string_to_array(t.keywords, ',') AS keywords
from work_log AS wl
JOIN ticket AS t ON t.id = wl.ticket
where t.keywords SIMILAR TO '%SWA.IMPLEMENTATION%'
GROUP BY t.keywords
HAVING string_to_array(t.keywords, ',') #> ARRAY['SWA.IMPLEMENTATION'];
The output is:
sum_of_hours | keywords
--------------+------------------------------
950.08 | {Running,SWA.IMPLEMENTATION}
11.00 | {SWA.IMPLEMENTATION,Done}
341.63 | {SWA.IMPLEMENTATION}
49.25 | {SWA.IMPLEMENTATION,Running}
(4 rows)
My goal is to group all hours where "SWA.IMPLEMENTATION" is presented. So all those 4 lines should be group together.

Related

Joining 2nd Table with Random Row to each record

I need to join table B to Table A, where Table B's records are randomly assigned, or joined. Most of the queries out there are based off of having a key between them and conditions, where I just want to randomly join records without a key.
I'm not sure where to start, as none of the queries I've found are doing this. I assume a nested join could be helpful for this, but how can I randomly assort the records on join?
**Table A**
| Associate ID| Statement|
|:----: |:------:|
| 33691| John is |
| 82451| Susie is |
| 25485| Sam is|
| 26582| Lonnie is|
| 52548| Carl is|
**Table B**
| RowID | List|
|:----: |:------:|
| 1| admirable|
| 2| astounding|
| 3| excellent|
| 4| awesome|
| 5| first class|
The result would be something like this, where items from the list are not looped through in order, but random:
**Result Table**
| Associate ID| Statement| List|
|:----: |:------:|:------:|
| 33691| John is |astounding|
| 82451| Susie is |first class|
| 25485| Sam is|admirable|
| 26582| Lonnie is|excellent|
| 52548| Carl is|awesome|
These are some of the queries I've tried:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/aeb83251-e132-435a-8630-e5b842a69368/random-join-between-tables?forum=sqldataaccess
-This seems to loop through values from 'Table B', not random.
https://www.daveperrett.com/articles/2009/08/11/mysql-select-random-row-with-join
-This is based off of a common key between the two tables and returning one of the records with the key, which I do not have.
SQL Join help when selecting random row
- I'll be honest, I don't understand this one, but it doesn't seem to assign random for each row from Table A, but more of a selection overall link the link above this.
Join One Table To Get Random Rows from 2nd Table
- This seems to be specific to a key, and not an overall random.
using 2 CTEs we generate a select which generates a row number for each table based on a random order and then join based on that row number.
Using a CTE to get N times the records in B as described here:
Repeat Rows N Times According to Column Value (Not included below) Note to get the "N" you'll need to get count from A and B, then divide by eachother and Add 1.
Assuming Even Distribution
With A as(
SELECT *, Row_number() over (order by NewID()) RN
FROM A),
B as (
SELECT *, Row_number () over (order by NewID()) RN
FROM B)
SELECT *
FROM A
INNER JOIN B
on A.RN = B.RN
Or use (assuming uneven distribution)
SELECT *
FROM A
CROSS APPLY (SELECT TOP 1 * FROM B ORDER BY NewID()) Z
This method assumes you know in advance which is the smaller table.
First it assigns an ascending row numbering from 1. This does not have to be randomized.
Then for each row in the larger table it uses the modulus operator to randomly calculate a row number in the range to join onto.
WITH Small
AS (SELECT *,
ROW_NUMBER() OVER ( ORDER BY (SELECT 0)) AS RN
FROM SmallTable),
Large
AS (SELECT *,
1 + CRYPT_GEN_RANDOM(3) % (SELECT COUNT(*) FROM SmallTable) AS RND
FROM LargeTable
ORDER BY RND
OFFSET 0 ROWS)
SELECT *
FROM Large
INNER JOIN Small
ON Small.RN = Large.RND
The ORDER BY RND OFFSET 0 ROWS is to get the random numbers materialized in advance.
This will allow a MERGE join on the smaller table. It also avoids an issue that can sometimes happen where the CRYPT_GEN_RANDOM is moved around in the plan and only evaluated once rather than once per row as required.

SQL - Return first non-empty value for previous days

I'm currently working with an exchange rates table in SQL that has these fields:
| Country | ExchangeRateDt | ExchangeRateValue |
| DK | 202000601 | 0.2 |
| DK | 202000603 | 0.21 |
| HR | 202000601 | 0.10 |
| HR | 202000602 | 0.12 |
For each currency I don't have a value for any day of the year because of bank holidays or simply weekends.
I need to join it with an order table where some orders are placed on weekends and on a specific day I could not have an exchange rate to calculate taxes.
I need to take the first non missing value from the previous days (so in the examples should I have an order for day 2020-06-02 in Denmark I should exchange it using the rate 0.2)
I thought about using a calendar table but I can't manage to get the job done.
Can someone help me?
Thanks in advance,
R
To get the most recent value less than or equal to the current day:
SELECT
<whatever columns you need from order>
,exchange.ExchangeRateValue
FROM
<order table> order
LEFT JOIN
<exchange rate table> exchange
ON exchange.Country = order.Country
AND exchange.ExchangeRateDt =
(
SELECT
MAX(ExchangeRateDt)
FROM
<exchange rate table>
WHERE
Country = order.Country
AND ExchangeRateDt <= order.OrderDt
)
Ensure the clustered index on the exchange rate table is (Country, ExchangeRateDt).
I have this as a left join so you will still return order results if the currency information is somehow missing. You would have to refer to business rules on how to proceed if no exchange rate was available.
You would typically create a calendar table that stores all the days you are interested in, say dates, with each date on a separate row.
You would also probably have a table that lists the countries: I assumed countries.
Then, one option is a lateral join:
select c.country, d.date, t.ExchangeRateValue
from dates d
cross join countries c
outer apply (
select top (1) t.*
from mytable t
where t.country = c.country and t.ExchangeRateDt <= d.date
order by t.ExchangeRateDt desc limit 1
) t
If you don't have these two tables, or can't create them, then one option is a recursive query to generate the dates and a subquery to list the countries. For example, this would generate the data for the month of June:
with dates as (
select '20200601' date
union all
select dateadd(day, 1, date) from dates where date < '20200701'
)
select c.country, d.date, t.ExchangeRateValue
from dates d
cross join (select distinct country from mytable) c
outer apply (
select top (1) t.*
from mytable t
where t.country = c.country and t.ExchangeRateDt <= d.date
order by t.ExchangeRateDt desc limit 1
) t
You should be able to do the mapping between the transation date and the exchange rate date with this query:
select TAB.primary_key, TAB.TransationDate, max(EXR.ExchangeRateDt)
from yourtable TAB
inner join exchangerate EXR
on TAB.Country = EXR.Country and TAB.TransationDate >= EXR.ExchangeRateDt
group by TAB.primary_key, TAB.TransationDate

SQL Server : count ProductID to get total times sold

I'm having trouble with what seems to be a simple query. I'm trying to get the amount of times an entire product has sold by counting and grouping by the ProductID. I've researched it online and every where I go it's just add a simple COUNT, but when I do it, it still outputs the same numbers of rows.
So if I don't use COUNT (for example) it outputs 1,000 rows, and if I DO use COUNT it outputs 1,000 rows and doesn't give me the correct times sold. They are all listed as "1" and not being grouped and counted. I'm guessing it has something to do with my joins but I can't figure it out.
Here's an example below of what I'm seeing after using the COUNT (I've removed brand and date_added just to make it easier to read). ProductID's are showing more than once even though they should be grouped together and counted.
times_sold | ProductID | title
---------- | --------- | ---------
1 | 17998 | title 2
1 | 13670 | title 3
1 | 17956 | title 4
1 | 4569 | title 5
1 | 12598 | title 1
1 | 12598 | title 1
1 | 17998 | title 2
And here's the query I'm running:
SELECT TOP (100) PERCENT
COUNT(s.ProductID) AS times_sold,
s.ProductID, p.title, p.brandname, p.date_added
FROM
dbo.TBL_OrderSummary AS s
INNER JOIN
dbo.jewelry AS p ON s.ProductID = p.ProductID
INNER JOIN
dbo.sent_items AS i ON s.InvoiceID = i.ID
GROUP BY
s.ProductID, p.title, p.brandname, p.flare_type, p.date_added,
i.date_order_placed, i.ship_code, p.jewelry
HAVING
(p.title LIKE '%stone%')
AND (i.date_order_placed > CONVERT(DATETIME, '2016-01-01 00:00:00', 102))
AND (i.ship_code = N'paid')
AND (p.flare_type = 'Single flare')
AND (p.jewelry LIKE '%plugs%')
Thanks for any help!
The reason why they aren't looking right is because the records aren't the same all the way across in the row. If you have a product name Widget 2 and year made is 2015 and you have another one product name widget and year made 2016 it is only going to count a 1 next to each product because the whole row only appears one time. You will need to limit your group by to get an accurate count.
GROUP BY s.productID, p.title, COUNT(s.productID)
This should give you an accurate count. You are just limiting your group by to a too large of sample to get any unique records. You will have to cut down what is in your select for this to work you need to have s.Product and p.title in your select to match the group by. Hope this helps.
Unless you are filtering by your aggregate function (ie. HAVING COUNT(s.ProductID) > 2) then you could move all of your selection criteria to the WHERE line.
So you could try:
select count(s.ProductID) times_sold, s.ProductID, p.title
from dbo.TBL_OrderSummary s inner join dbo.jewelry p on s.ProductID = p.ProductID
inner join dbo.sent_items i on s.InvoiceID = i.ID
where p.title like '%stone%'
and i.date_order_placed > CONVERT(DATETIME, '2016-01-01 00:00:00', 102)
and i.ship_code = N'paid'
and p.flare_type = 'Single flare'
and p.jewelry like '%plugs%'
group by s.ProductID, p.title

How to use group by in SQL Server query?

I have problem with group by in SQL Server
I have this simple SQL statement:
select *
from Factors
group by moshtari_ID
and I get this error :
Msg 8120, Level 16, State 1, Line 1
Column 'Factors.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
This is my result without group by :
and this is error with group by command :
Where is my problem ?
In general, once you start GROUPing, every column listed in your SELECT must be either a column in your GROUP or some aggregate thereof. Let's say you have a table like this:
| ID | Name | City |
| 1 | Foo bar | San Jose |
| 2 | Bar foo | San Jose |
| 3 | Baz Foo | Santa Clara |
If you wanted to get a list of all the cities in your database, and tried:
SELECT * FROM table GROUP BY City
...that would fail, because you're asking for columns (ID and Name) that aren't in the GROUP BY clause. You could instead:
SELECT City, count(City) as Cnt FROM table GROUP BY City
...and that would get you:
| City | Cnt |
| San Jose | 2 |
| Santa Clara | 1 |
...but would NOT get you ID or Name. You can do more complicated things with e.g. subselects or self-joins, but basically what you're trying to do isn't possible as-stated. Break down your problem further (what do you want the data to look like?), and go from there.
Good luck!
When you group then you can select only the columns you group by. Other columns need to be aggrgated. This can be done with functions like min(), avg(), count(), ...
Why is this? Because with group by you make multiple records unique. But what about the column not being unique? The DB needs a rule for those on how to display then - aggregation.
You need to apply aggregate function such as max(), avg() , count() in group by.
For example this query will sum totalAmount for all moshtari_ID
select moshtari_ID,sum (totalAmount) from Factors group by moshtari_ID;
output will be
moshtari_ID SUM
2 120000
1 200000
Try it,
select *
from Factorys
Group by ID, date, time, factorNo, trackingNo, totalAmount, createAt, updateAt, bark_ID, moshtari_ID
If you are applying group clause then you can only use group columns and aggregate function in select
syntax:
SELECT expression1, expression2, ... expression_n,
aggregate_function (aggregate_expression)
FROM tables
[WHERE conditions]
GROUP BY expression1, expression2, ... expression_n
[ORDER BY expression [ ASC | DESC ]];

How to perform statistical computations in a query?

I have a table which is filled with float values. I need to calculate the number of results grouped by their distribution around the mean value (Gaussian Distribution). Basically, it is calculated like this:
SELECT COUNT(*), FloatColumn - AVG(FloatColumn) - STDEV(FloatColumn)
FROM Data
GROUP BY FloatColumn - AVG(FloatColumn) - STDEV(FloatColumn)
But for obvious reasons, SQL Server gives this error: Cannot use an aggregate or a subquery in an expression used for the group by list of a GROUP BY clause.
My question is, can I somehow leave this computation to SQL Server? Or do I have to do it the old fashioned way? Retrieve all the data, and do the calculation myself?
To get the aggregate of the whole set you can use an empty OVER clause
WITH T(Result)
AS (SELECT FloatColumn - Avg(FloatColumn) OVER() - Stdev(FloatColumn) OVER ()
FROM Data)
SELECT Count(*),
Result
FROM T
GROUP BY Result
SQL Fiddle
You can perform a pre-aggregation of the data, and join back to the table.
Schema Setup:
create table data(floatcolumn float);
insert data values
(1234.56),
(134.56),
(134.56),
(234.56),
(1349),
(900);
Query 1:
SELECT COUNT(*) C, D.FloatColumn - A
FROM
(
SELECT AVG(FloatColumn) + STDEV(FloatColumn) A
FROM Data
) preagg
CROSS JOIN Data D
GROUP BY FloatColumn - A;
Results:
| C | COLUMN_1 |
--------------------------
| 2 | -1196.876067819572 |
| 1 | -1096.876067819572 |
| 1 | -431.436067819572 |
| 1 | -96.876067819572 |
| 1 | 17.563932180428 |

Resources