i have table like this in POSTGRESQL:
Column | Type | Modifiers
---------------+-----------------------------+-----------
id | smallint | not null
merchant_id | smallint | not null
batch_no | smallint | not null
i have query like this :
select merchant_id , max(batch_no) from batch group by merchant_id
it returns a value like this :
merchant_id | max
-------------------+------
14 | 593
45 | 1
34 | 3
46 | 1
25 | 326
27 | 61
17 | 4
how i can get an id of each data? what query i can used for to get 1 result whish is the id of the data above?
This query works with any version of PostgreSQL, even before there were window functions (PostgreSQL 8.3 or earlier):
SELECT b.id, b.merchant_id, b.batch_no
FROM batch b
JOIN (
SELECT merchant_id, max(batch_no) AS batch_no
FROM batch
GROUP BY merchant_id
) bmax USING (merchant_id, batch_no)
If batch_no should not be unique per merchant_id, you may get multiple rows per merchant_id.
With PostgreSQL 8.4 or later you use the window function first_value():
SELECT DISTINCT
merchant_id
, first_value(batch_no) OVER w
, first_value(id) OVER w
FROM batch
GROUP BY merchant_id
WINDOW w AS (PARTITION BY merchant_id ORDER BY batch_no DESC, id)
This even yields unique rows per merchant_id if batch_no should not be unique. In this case the smallest id (for the biggest batch_no per merchant_id) would be selected as I additionally sort the window by id.
I use DISTINCT here, because it is applied after the window function (as opposed to GROUP BY).
Related
I am having trouble ranking top customers by month. I created a new Rank column - but how do I break it up by month? Any help plz. Code and tables below:
The logic for ranking is selecting the top two customers per month from the tables. Also wrapped into the code (attempted at least) is renaming the date field and setting it to reflect end of month date only.
SELECT * FROM table1;
UPDATE table1
SET DATE=EOMONTH(DATE) AS MO_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER SALES;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
Starting wtih
------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired end result
+------+---------+-------+------+--+
| CUST | MO_END | SALES | RANK | |
+------+---------+-------+------+--+
| 37 | 3-31-18 | 100 | 1 | |
| 38 | 3-25-18 | 65 | 2 | |
| 39 | 4-30-18 | 500 | 1 | |
| 40 | 4-45-18 | 199 | 2 | |
+------+---------+-------+------+--+
As a simple selection:
select *
from (
select
table1.*
, DENSE_RANK() OVER(PARTITION BY cust, EOMONTH(DATE) ORDER BY sales DESC) as ranking
from table1
)
where ranking < 3
;
If storing is important: I would not use [rank] as a column name as I avoid any words that are used in SQL, maybe [sales_rank] or similar.
with cte as (
select
cust
, DENSE_RANK() OVER(PARTITION BY cust, EOMONTH(DATE) ORDER BY sales DESC) as ranking
from table1
)
update cte
set sales_rank = ranking
where ranking < 3
;
There is really no reason to store the end of month, just use that function within the partition of the over() clause.
LIMIT 2 is not something that can be used in SQL Server by the way, and it sure can't be used "per grouping". When you use a "window function" such as rank() or dense_rank() you can use the output of those in the where clause of the next "layer". i.e. use those functions in a subquery (or cte) and then use a where clause to filter rows by the calculated values.
Also note I used dense_rank() to guarantee that no rank numbers are skipped, so that the subsequent where clause will be effective.
I'm trying to add rank by sales and also change the date column to a 'month end' field that would have one month end date per month - if that makes sense?
Would you alter table and add column or could you just rename the date field and use set and case to make all March dates = 3-31-18 and all April 4-30-18?
I got this far:
UPDATE table1
SET DATE=EOMONTH(DATE) AS MONTH_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER sales;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
can i do two sets in a row like that without adding an update? I'm looking for top 2 within each month - would this work? I feel like this is right and most efficient query, but its not working - any help appreciated!!
orig table
+------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired output
+------+-----------+-------+------+
| CUST | Month End | SALES | Rank |
+------+-----------+-------+------+
| | | | |
| 37 | 3-31-18 | 100 | 1 |
| 38 | 3-31-18 | 65 | 2 |
| 39 | 4-30-18 | 500 | 1 |
| 40 | 4-30-18 | 199 | 2 |
+------+-----------+-------+------+
Based on your expected output I think this may work as well.
create table Salesdate (Cust int, Dates date, Sales int)
insert into Salesdate values
(36 , '2018-03-05' , 50 )
,(37 , '2018-03-15' , 100 )
,(38 , '2018-03-25' , 65 )
,(37 , '2018-04-05' , 95 )
,(40 , '2018-04-25' , 199 )
,(39 , '2018-04-21' , 500 )
Updating the same column dates to the last day of the month (EOmonth will help to give last day of the month), you can add a separate column or update the column as you prefer.
Update Salesdate
set Dates = eomonth(Dates)
Add a column called rank in the table.
Alter table Salesdate
add rank int
Update the column rank which was just added.
update Salesdate
set Salesdate.[rank] = tbl.Ranked from
(select Cust, Sales, Dates , rank() over (Partition by Dates order by Sales Desc)
Ranked from Salesdate ) tbl
where tbl.Cust = salesdate.Cust
and tbl.Sales = salesdate.Sales
and tbl.dates = salesdate.Dates
--Not sure if this step is necessary if you want your final table to have only rank 1 and 2, then you can delete the data. Or it can be filtered out only on select list as well. Please note that sometimes rank may skip the number if we don't have unique set of sales amount for a given customer.
;With cte as (
select * from Salesdate)
delete from cte
where [RANK] > 2
select * from Salesdate
order by dates, [RANK]
Output
Cust Dates Sales rank
37 2018-03-31 100 1
38 2018-03-31 65 2
39 2018-04-30 500 1
40 2018-04-30 199 2
I have a postgres database with duplicated entries on one of the table. I would like to show the created_by columns
Table1
id | number
1 | 123
2 | 124
3 | 125
4 | 126
Table2
id | number | created_on
1 | 123 | 3/29
2 | 123 | 4/3
3 | 124 | 3/31
4 | 124 | 4/1
On table 2 number are duplicated. I would like to form a single query to list the following:
id | number | created_on
1 | 123 | 4/3
2 | 124 | 4/1
For duplicated entries only the latest entry will be included. How could I form that SQL query?
SELECT DISTINCT ON (Table1.number) Table1.id, Table2.number, Table2.create_on FROM Table1
JOIN Table2 ON Table1.number=Table2.number
ORDER BY Table2.create_on;
Actually I tried putting 'DISTINCT ON' and 'ORDER BY' in a single query (with JOIN) it gives me the following error:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
The columns in DISTINCT ON() have to be the first ones in the ORDER BY query, also if you want the latest created_on date you should order by created_on DESC
SELECT DISTINCT ON (Table1.number) Table1.id, Table2.number, Table2.created_on
FROM Table1
JOIN Table2
ON Table1.number=Table2.number
ORDER BY Table1.number,Table2.created_on DESC;
http://sqlfiddle.com/#!12/5538a/2
As you said in the comment: created_on=date_trunc('day', now()), so the data type of the field created_on is timestamp. Here is what you can do:
SELECT table_1.id, table_1.number, max(created_on) as created_on
FROM table_1
inner join table_2 using(number)
group by table_1.id, table_1.number
I've a table with some duplicate rows in it. I want to delete only one duplicate row.
For example I'v 9 duplicate rows so should delete only one row and should show 8 remaining rows.
example
date calling called duration timestampp
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
from above date should delete only one row and should show 3 rows
2012-06-19 10:22:45.000 165 218 155 1.9 100
2012-06-19 10:22:45.000 165 218 155 1.9 100
2012-06-19 10:22:45.000 165 218 155 1.9 100
from above date should delete only one row and should show 2 rows
How can I do this?
This solution allows you to delete one row from each set of duplicates (rather than just handling a single block of duplicates at a time):
;WITH x AS
(
SELECT [date], rn = ROW_NUMBER() OVER (PARTITION BY
[date], calling, called, duration, [timestamp]
ORDER BY [date])
FROM dbo.UnspecifiedTableName
)
DELETE x WHERE rn = 2;
As an aside, both [date] and [timestamp] are terrible choices for column names...
For SQL Server 2005+ you can do the following:
;WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY [date], calling, called, duration, [timestamp] ORDER BY 1) RN
FROM YourTable
)
DELETE FROM CTE
WHERE RN = 2
Do you have a primary key on the table?
What makes a row a duplicate? Same time? same date? all columns being the same?
If you have a primary key you can use the TOP function to select only one record and delete that one row:
Delete from [tablename] where id in (select top 1 id from [tablename] where [clause])
If you don't mind the order of these rows there is a command in MySQL:
DELETE TOP (numberOfRowsToDelete) FROM db.tablename WHERE {condition for ex id = 5};
Since I don't have the schema, I'd a possible solution in steps:
Apply a row number to the select of all columns
Make a group by with those columns and delete the min(rownumber) in each group
Edit:
The rownumber is in a inner query and will have the rownumber incrementing in all rows. In the outer query I make the group by of the inner query and select the min(rownumber) for each group. Since each group is composed by duplicated rows, I then remove the min(rownumber) for each group.
using LIMIT 1 will help you delete only 1 ROW that matches your DELETE query:
DELETE FROM `table_name` WHERE `column_name`='value' LIMIT 1;
BEFORE:
+----------------------+
| id | column_name |
+-----+----------------+
| 1 | value |
+-----+----------------+
| 2 | value |
+-----+----------------+
| 3 | value |
+-----+----------------+
| 4 | value |
+-----+----------------+
AFTER:
+----------------------+
| id | column_name |
+-----+----------------+
| 1 | value |
+-----+----------------+
| 2 | value |
+-----+----------------+
| 3 | value |
+-----+----------------+
I have a PresentationSlide table:
PresentationSlide
PresentationSlideId
PresentationId
Content
Order
and example rows:
+---------------------+----------------+---------+-------+
| PresentationSlideId | PresentationId | Content | Order |
+--------+------------+----------------+---------+-------+
| 123 | 3 | "bla" | 1 |
| 23 | 3 | "bla2" | 2 |
| 22 | 3 | "bla3" | 3 |
| 100 | 3 | "bla4" | 4 |
| 150 | 3 | "bla5" | 5 |
+---------------------+----------------+---------+-------+
I want to maintain arithmetic sequence of numbers (1,2,3,4,...) in the Order column after DELETE operation.
For example, if I delete third row (PresentationSlideId = 22), values in order column will be: (1,2,4,5) I want to update Order this way:
PresentationSlideId = 100: update order from 4 to 3
PresentationSlideId = 150: update order from 5 to 4
How is the most efficient way to do this kind of update? Is any way to do this with using only one UPDATE statement? I could do this using cursor and loop, but it doesn't seems efficient.
1) Order is a very poor name for a column, since it's an SQL Keyword
2) It would be a lot better if you could cope with gaps in the order (and possibly switch to using a float, so you can insert fractional values), because in your current model, every insert, update or delete is potentially going to affect the entire table. This doesn't scale well. Computing an order using ROW_NUMBER() during selects would generally be better.
3)
create table #PresentationSlide (
PresentationSlideID int not null,
PresentationId int not null,
Content varchar(10) not null,
[Order] int not null
)
insert into #PresentationSlide (PresentationSlideId , PresentationId , Content , [Order])
select 123,3,'bla',1 union all
select 23,3,'bla2',2 union all
select 22,3,'bla3',3 union all
select 100,3,'bla4',4 union all
select 150,3,'bla5',5
delete from #PresentationSlide where PresentationSlideId = 22
;With Reorder as (select PresentationSlideId,ROW_NUMBER() OVER (ORDER BY [Order]) as NewOrder from #PresentationSlide)
update ps set [Order] = NewOrder
from #PresentationSlide ps inner join Reorder r on ps.PresentationSlideId = r.PresentationSlideId
select * from #PresentationSlide order by [Order]
drop table #PresentationSlide
;with C as
(
select [Order],
row_number() over(order by [Order]) as rn
from PresentationSlide
)
update C set
[Order] = rn