Best way to count distinct occurrences across 3 columns - sql-server

I am looking for the best way to count the number of unique users that have interacted with an item using SQL, however I'm not sure the best way to go about this.
To start with, here is an example of my data.
ItemID State1 State2 State3 (DesiredResult)
--------------------------------------------------------
1 User1 User1 User1 1
2 User1 User1 User2 2
3 User1 User2 User3 3
4 User1 User2 User1 2
To explain, as an item progresses from state to state, it can be progressed by any user. What I could like to do for each item, is get the number of unique users that have interacted with the item by progressing the status at some point (for reference, I've added the desired output to the data above)
Now I know that this would be possible using a CASE statements, checking for each condition using something like this
SELECT
ItemID
,CASE WHEN State1 = State2 AND State1 = State3 THEN 1
WHEN State1 = State2 AND State1 <> State3 THEN 2
WHEN State1 <> State2 AND State1 = State3 THEN 2
WHEN State1 <> State2 AND State1 <> State3 AND State2 = State3 THEN 2
WHEN State1 <> State2 AND State1 <> State3 AND State2 <> State3 THEN 3
END AS UserCount
However that seems a little cumbersome, so I'm just wondering whether there is another, more efficient / stream-lined way to achieve what I'm after.
Any advise would be appreciated.

The shape of this table may be nice for reporting but makes both recording and querying state transitions harder. In the relational model data flows along rows, not columns.
You can UNPIVOT the data into a ItemID, State, User shape and then count the distinct Users.
with unpvt as
(
SELECT ItemId, State, User
FROM
(SELECT ItemID, State1, State2, State2
FROM someTable) p
UNPIVOT
(User FOR State IN (State1, State2, State3)
)
SELECT ItemID, COUNT(Distinct User)
FROM unpvt
GROUP BY ItemID

Related

SQL spread out one group proportionally to other groups

So, for example, we have a number of users with different group id. Some of them don't have group:
userID groupID
-------------
user1 group1
user2 group1
user3 group2
user4 group1
user5 NULL
user6 NULL
user7 NULL
user8 NULL
We need to group users by their groupID. And we want users without group (groupID equals NULL) to be assigned to one of existing groups(group1 or group2 in this example). But we want to distribute them proportionally to amount of users already assigned to those groups. In our example group1 has 3 users and group2 has only 1 user. So 75% (3/4) of new users should be counted as members of group 1 and other 25% (1/4) should be "added" to group2. The end result should look like this:
groupID numOfUsers
-------------
group1 6
group2 2
This is a simplified example.
Basically we just can't figure out how users without a group can be divided between groups in a certain proportion, not just evenly distributed between them.
We can have any number of groups and users, so we can't just hardcode percentages.
Any help is appreciated.
Edit:
I tried to use NTILE(), but it gives even distribution, not proportional to amount of users in groups
SELECT userID ,
NTILE(2) OVER( ) gr
from(
select DISTINCT userID
from test_task
WHERE groupID IS NULL ) AS abc
here is one way:
select
groupid
, count(*)
+ round(count(*) / sum(count(*)) over(),0) * (select count(*) from table where groupid ='no_group')
from table
where groupid <> 'no_group'
group by groupid
We can use an updatable CTE to do this
First, we take all existing data, group it up by groupID, and calculate a running sum of the number of rows, as well as the total rows over the whole set
We take the rows we want to update and add a row-number (subtract 1 so the calculations work)
Join the two based on that row-number modulo the total existing rows should be between the previous running sum and the current running sum
Note that this only works well when there are a divisible number of rows eg. 4 or 8, by 4 existing rows
WITH Groups AS (
SELECT
groupID,
perGroup = COUNT(*),
total = SUM(COUNT(*)) OVER (),
runningSum = SUM(COUNT(*)) OVER (ORDER BY groupID ROWS UNBOUNDED PRECEDING)
FROM test_task
WHERE groupID IS NOT NULL
GROUP BY groupID
),
ToUpdate AS (
SELECT
groupID,
userID,
rn = ROW_NUMBER() OVER (ORDER BY userID) - 1
FROM test_task tt
WHERE groupID IS NULL
)
UPDATE u
SET groupID = g.groupID
FROM ToUpdate u
JOIN Groups g
ON u.rn % (g.total) >= g.runningSum - g.perGroup
AND u.rn % (g.total) < g.runningSum;
db<>fiddle

sql server query to assign 2 different column values for each half of rows

I have the following table with three columns:
tb1
userID itemID rating
This table contains information about the ratings given by users to different items.
A user can give ratings to multiple items, and an item can receive ratings from multiple users.
I need to update the rating values for this table, so that half the items in tb1 should be assigned 5 rating and other half should be assigned 1 rating.
Note: This means that while a user can give different ratings to different items, an item can have either all its ratings as 1 or all ratings as 5.
Initially, the rating values are NULL for all pairs of users and items.
This task could be performed using two separate queries.
UPDATE tb1
SET rating = 5
WHERE itemID IN
(SELECT top(50) percent itemID
FROM tb1
GROUP BY itemID
ORDER BY newid());
UPDATE tb1
SET rating = 1
WHERE rating IS NULL
Is there a way to combine both these queries into a single query?
You don't state if it matters if the first half are 1 and the second half are 5s only that 50% should be each.
If it doesn't matter then you can do something like this:
UPDATE tb1
SET rating =
(CASE
WHEN itemId <=
(SELECT MAX(itemID)
FROM
(SELECT TOP (50) percent itemID
FROM tb1
GROUP BY itemID
ORDER BY itemID
) x
) THEN 5
ELSE 1
END)
Or if your records don't have any deleted items or you're not strictly concerned about being exactly 50% then you could simply do something like this:
UPDATE tb1
SET rating = CASE
WHEN (itemID % 2) = 1 THEN 1
ELSE 5
END
the benefit of this approach is you can do things like this:
UPDATE tb1
SET rating =
CASE (itemID % 5)
WHEN 1 THEN 1
WHEN 2 THEN 7
WHEN 3 THEN 10
WHEN 4 THEN 40
ELSE 5
END

Get the Currently logged user and their last login time in SQL Server

Here is my user log table
ID USERID TIME TYPE
1 6 12:48:45 OUT
2 11 12:08:46 IN
3 6 12:18:45 IN
4 6 12:08:45 IN
5 9 12:06:44 IN
6 11 11:08:46 IN
I need get currently loggedin user and last logged in time in SQL Server . Output like this
ID USERID TIME TYPE
2 11 12:08:46 IN
5 9 12:06:44 IN
Using your test data..
Declare #logintable table (id int, userid int, time datetime, type varchar(3))
Insert into #logintable values (1,6,'12:48:45','OUT')
Insert into #logintable values (2,11,'12:08:46','IN')
Insert into #logintable values (3,6,'12:18:45','IN')
Insert into #logintable values (4,6,'12:08:45','IN')
Insert into #logintable values (5,9,'12:06:44','IN')
Insert into #logintable values (6,11,'11:08:46','IN')
..this query should get you the result set you are after
SELECT
a.*
FROM
#logintable a
INNER JOIN
(SELECT userid, MAX(time) as time FROM #logintable b GROUP BY userid) b
ON
a.userid = b.userid AND a.time = b.time
WHERE
a.type = 'IN'
ORDER BY
a.id
The subquery finds the most recent entry for each user and compares it with the record we're currently looking at (that must be an 'IN' record). By doing this we automatically find logged in users.
Note that I have joined on MAX(TIME) here where it would be nicer to join on MAX(ID) - I would guess that in production your ID column would be sequential (ideally an identity column) and so the bigger the value of id the bigger the value of time.
The above query gives these results
ID UserID Time Type
2 11 1900-01-01 12:08:46.000 IN
5 9 1900-01-01 12:06:44.000 IN
The solution consists in using 2 CTEs, the first one contains all logged in users and the second contains all logged out users. Then the final select take those logged-in users that are not yet logged-out.
WITH loggedIn
AS
(
SELECT userId, MAX([time]) LastLogIn
FROM logtable
WHERE [type]='IN'
GROUP BY userId
),
loggedOut
AS
(
SELECT userId, MAX([time]) LastLoginOut
FROM logtable
WHERE [type]='OUT'
GROUP BY userId
)
SELECT loggedIn.userId, loggedIn.LastLogin
FROM loggedIn
WHERE
userId NOT IN (SELECT userId FROM loggedOut)
--AND LastLogIn > #currentTime

Printing the current value and previous value between the date range

I have a sample data like this
ID DATE TIME STATUS
---------------------------------------------
A 01-01-2000 0900 ACTIVE
A 05-02-2000 1000 INACTIVE
A 01-07-2000 1300 ACTIVE
B 01-05-2005 1000 ACTIVE
B 01-08-2007 1050 ACTIVE
C 01-01-2010 0900 ACTIVE
C 01-07-2010 1900 INACTIVE
From the above data set, if we only focus on ID='A' we note that A was initally active, then became inactive on 05-02-2000 and then it was inactive until 01-07-2000.
Which means that A was inactive from 05-Feb-2000 to 01-July-2000.
My questions are:
if I execute a query with (ID=A, Date=01-04-2000) it should give me
A 05-02-2000 1000 INACTIVE
because since that date is not available in that data set, it should search for the previous one and print that
Also, if my condition is (ID=A, Date=01-07-2000) it should not only print the value which is present in the table, but also print a previous value
A 05-02-2000 1000 INACTIVE
A 01-07-2000 1300 ACTIVE
I would really appreciate if any one can assist me solve this query. I am trying my best to solve this.
Thank you every one.
Any take on this?
Afaq
Something like the following should work:
SELECT ID, Date, Time, Status
from (select ID, Date, Time, Status, row_number() over (order by Date) Ranking
from MyTable
where ID = #SearchId
and Date <= #SearchDate) xx
where Ranking < 3
order by Date, Time
This will return at most two rows. Its not clear if you are using Date and Time datatyped columns, or if you are actually using reserved words as column names, so you'll have to fuss with that. (I left out Time, but you could easily add that to the various orderings and filterings.)
Given the revised criteria, it gets a bit trickier, as the inclusion or exclusion of a row depends upon the value returned in a different row. Here, the “second” row, if there are two or more rows, is included only if the “first” row equals a particular value. The standard way to do this is to query the data to get the max value, then query it again while referencing the result of the first set.
However, you can do a lot of screwy things with row_number. Work on this:
SELECT ID, Date, Time, Status
from (select
ID, Date, Time, Status
,row_number() over (partition by case when Date = #SearchDate then 0 else 1 end
order by case when Date = #SearchDate then 0 else 1 end
,Date) Ranking
from MyTable
where ID = #SearchId
and Date <= #SearchDate) xx
where Ranking = 1
order by Date, Time
You'll have to resolve the date/time issue, since this only works against dates.
Basically you need to pull a row if, for the specified date, it is:
1) the last record, or
2) the last inactive record.
And the two conditions may match the same row as well as two distinct rows.
Here's how this logic could be implemented in SQL Server 2005+:
WITH ranked AS (
SELECT
ID,
Date,
Time,
Status,
RankOverall = ROW_NUMBER() OVER ( ORDER BY Date DESC),
RankByStatus = ROW_NUMBER() OVER (PARTITION BY Status ORDER BY Date DESC)
FROM Activity
WHERE ID = #ID
AND Date <= #Date
)
SELECT
ID,
Date,
Time,
Status,
FROM ranked
WHERE RankOverall = 1
OR Status = 'INACTIVE' AND RankByStatus = 1

how to create a mssql view for getting last state information

lets say, i have two tables, one for object records and one for activity records about these objects.
i'm inserting a new record in this activity table every time an object is inserted or updated.
for telling it in a simple way, assume i have four fields in activity table; objectId, type, status and date.
when an object is about to be updated, i'm planning to get the last state for the object and look for the changes. if there is a difference between the updating value and the previous value, i'll set the value with new input, otherwise i'll set it null. so for example in an update process, user only changes the status value of the object but leaves the type value as the same, so i'll insert a new row with a null value for type and a new value for the status.
SELECT * FROM Activity;
oid type status date
-----------------------------------------
1 0 1 2009.03.05 17:58:07
1 null 2 2009.03.06 07:00:00
1 1 null 2009.03.07 20:18:07
1 3 null 2009.03.08 07:00:00
so i have to create a view tells me the current state of my object like,
SELECT * FROM ObjectStateView Where oid = 1;
oid type status date
-----------------------------------------
1 3 2 2009.03.08 07:00:00
how do i achieve this_?
Assuming date can be used to find latest record:
CREATE VIEW foo
AS
SELECT
A.oid,
(SELECT TOP 1 type FROM Activity At WHERE At.OID = A.oid AND At.Date <= MAX(A.date) AND type IS NOT NULL),
(SELECT TOP 1 status FROM Activity Ast WHERE Ast.OID = A.oid AND Ast.Date <= MAX(A.date) AND status IS NOT NULL),
MAX(A.date) AS date
FROM
Activity A
GO
Edit: if you want a JOIN (untested)
CREATE VIEW foo
AS
SELECT TOP 1
A.oid,
At.type,
Ast.status,
A.date
FROM
Activity A
LEFT JOIN
(SELECT TOP 1 oid, date, type FROM Activity WHERE type IS NOT NULL ORDER BY date DESC) At ON A.OID = At.oid
LEFT JOIN
(SELECT TOP 1 oid, date, status FROM Activity WHERE status IS NOT NULL ORDER BY date DESC) Ast ON A.OID = Ast.oid
ORDER BY date DESC
GO
Should have added this earlier:
It will scale exponentially because you have to touch the table 11 different times.
A better solution would be to maintain a "current" table and maintain it via a trigger on activity.
Have you considered using MAX function?
select oid, type, status, MAX(date) as max_date
from ObjectStateView
where oid = 1
Not really sure why you'd want the nulls in there. You can track what's changed between inputs by comparing the latest entry to the previous. Then the current state of the object is the latest entry in the table. You can determine if an object has changed by creating a hash of the parts of the object that you want to track changes to and storing that as an extra column.
Historical values:
Since you track changes, you may want to see the status of the object historically:
SELECT a.oid,
a.date,
a_type.type,
a_status.status
FROM Activity a
LEFT JOIN Activity a_type
ON a_type.oid = a.oid
AND a_type.date = (SELECT TOP 1 date FROM Activity WHERE oid = a.oid AND date <= a.date AND type IS NOT NULL ORDER BY date DESC)
LEFT JOIN Activity a_status
ON a_status.oid = a.oid
AND a_status.date = (SELECT TOP 1 date FROM Activity where oid = a.oid AND date <= a.date AND status IS NOT NULL ORDER BY date DESC)
which will return:
oid date type status
----------- ---------- ----------- -----------
1 2009-03-05 0 1
1 2009-03-06 0 2
1 2009-03-07 1 2
1 2009-03-08 3 2
Performance consideration:
On the other hand, if you have more then just a few fields, and the table is big, the performance would become an issue. In this case I would make sense also to store/cache the whole values in another table MyDataHistory, which would contain data like in the table shown above. Then selecting the current(latest) version is trivial using a SQL view filtering the latest row (1 row only) by oid and date.

Resources