Create an array of the dates of the last X times something happened, including duplicates [closed]

Create an array of the dates of the last X times something happened, including duplicates [closed] - arrays

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 months ago.
Improve this question
I'm a very advanced Excel user and have used dynamic arrays and complex lambdas to solve some pretty tough problems, but this one has me stumped. I have a column (or array) of invoice dates, that are non-unique. I have a corresponding column of the count of a specific widget sold on that invoice. It could be 0, 1, or multiples. And a given day could have 3 invoices with no widgets, 2 invoices with one widget, and 1 invoice with 4 widgets.
I am not trying to get the count, that's trivial. I need to create an array that is the dates of the last 6 widget sales, including duplicated dates. So if on 7/1/22 3x widgets were on one invoice, and and on 7/5 there were 2x on 1 invoice and 1x on a second invoice, my resulting array would have 3x 7/1 entries and 3x 7/5 entries. It's a list of the date that each of the last 6 widgets were sold.
I can get it to one single intermediate table, but I can't find a way to do it using a single formula and dynamic arrays. In this application, I can't resort to VBA, which would be easier.
Adding example source data:
Date
Ct
6/5
1
6/7
1
6/7
2
6/10
0
6/10
1
6/25
1
6/26
0
6/28
1
Example result: the last 6 items ordered were ordered on what date?
Rslt
6/7
6/7
6/7
6/10
6/25
6/28

With Invoice Date and Widget Sales in A1:A10 and B1:B10 respectively, and C1 containing your chosen value for X, e.g. 6:
=LET(ζ,A1:B10,ω,C1,ξ,SORT(ζ,1,-1),λ,XLOOKUP(SEQUENCE(ω),SCAN(0,INDEX(ξ,,2),LAMBDA(α,β,α+β)),INDEX(ξ,,1),,1),FILTER(λ,1-ISNA(λ)))
The final FILTER clause is included so that, if less than X sales are present, dates corresponding to the actual number of sales are returned.

Related

Finding unique customers in a selected day range

I have a simple table as follows:
day order_id customer_id
1 1 1
1 2 1
1 3 2
2 4 1
2 5 1
I want to find a number of unique customers from Day 1 to Day 2. And the answer is 2.
But my size of the table is huge and querying takes long. So I want to store an aggregated data in another table to reduce the data size and query faster. I have created a new table out of the above table.
day uniq_customer
1 2
2 1
Now if I want to find a unique customer from Day 1 to Day 2, I am getting 2 + 1 = 3, whereas the answer is 2.
Is there any way to find a work around without having to query the old table.
PS: I am using Druid as a data source.

This depends on the trends on your data. For example, if you have a small number of distinct customers and days, you can keep the customers in a bit vector per day. At the end, just or the bitvectors of the days in the query and the result will be the sum of the bits. May be boring to implement.
If you have a large number of distinct customers and days, chunk them per customer and sort by date. Then for each customer, get the index of first row where day is greater than or equal to the query start and get the index of the first row where day is less than or equal to the query end using binary search. Difference between those two indices plus 1 gives you the number-of-query-fitting-days for the customer. Complexity becomes #customers x 2 x O(log #customerRecords).

Apache Druid supports the use of approximations for this kind of query. Take a look at the tutorial on using approximations in Druid: https://druid.apache.org/docs/latest/tutorials/tutorial-sketches-theta.html
In Druid you can also partially aggregate into Theta Sketches at ingestion time and aggregated them over time or over other grouping dimensions at query time. This is specifically designed to deal with large data volumes and you can control the accuracy of the approximations.

Excel: Need to Generate IDs based on multiple criteria with repeating IDs

Looking to create pricing groups bases on multiple criteria. Each group could have multiple items within the group. I'm struggling with the autocreation the naming of each group. I estimate there should be about 6.5K pricing groups out of 14K items.
Below is the criteria -
QTY per case - is the number of bottles in a case
Size - size of the bottle
Family Brand - contains a group of like items
Code - CS1 - This is my unique code for each group that contains each of the above and lowest possible case price.
enter image description here
The "Thinking" column is how I want each group to look, but how do I do this with 14K items quickly?

If I understood correctly your pricing group name consists of two parts: a simple combination of columns and a "special" column, that should be counted.
Part 1 is simple: =C2&"-"&B2&"-"&A1&"-"
To make Part 2 easier you could sort, sorting fields Part 1, CODE-CS1.
After have done this you could use helping columns. If Part 1 is in column x and code-CS1 in column y you could find a formula for
Part 2 (column z): ="T"&IF(X1=X2;IF(Y1=Y2;Z1;Z1+1);1)
That means: If Part 1 is changing your counter starts with T1, if not so if your CODE CS1 changes, it counts, if not, so it keeps last number.
the result code would be =X2&Z2
It is untested and I use german excel, maybe the code doesn't work without any adaption, but in general it should work

Count across columns starting from unique column for each row in SAS

I have a dataset containing a number of persons who have been involved in an accident. Each person have been in an accident at a different time and I have coded a variable start_week which indicates what week number after a certain date (january 1st 2011), the accident occurred.
For each individual I also have a a variable for each week after january 1st 2011, that shows whether or not this individual has been hospitalized. I now need to count how many weeks a person has been hospitalized XX weeks after the accident.
The desired results should be a column like sum_week that sums number of weeks after the accident depending on the value shown in the variable start_week.
Id
start_week
week_1
week_2
week_3
week_4
sum_week
1
2
1
0
1
1
2
2
3
1
0
0
1
1
I think this can be done using an array, but I have no idea how. If it isn't possible to count across columns based on the variable start_week, I am planning on transposing my data. I would however prefer if this could be done without having to transpose my data.
Any help is much appreciated!

Just use the START_WEEK as the initial value in the DO loop you use to check the array.
data want;
set have ;
array week_[4];
sum_week=0;
do index=start_week to dim(week_);
sum_week+week_[index];
end;
drop index;
run;

Need an optimized way to handle combination of entities to improve performance

So, I am working on a feature in a web application. The problem is like this-
I have four different entities. Let's say those are - Item1, Item2, Item3, Item4. There's two phase of the feature. Let's say the first phase is - Choose entities. In the first phase, User will have option to choose multiple items for each entity and for every combination from that choosing, I need to do some calculation. Then in the second phase(let's say Relocate phase) - based on the calculation done in the first phase, for each combination I would have to let user choose another combination where the value of the first combination would get removed to the row of the second combination.
Here's the data model for further clarification -
EntityCombinationTable
(
Id
Item1_Id,
Item2_Id,
Item3_Id,
Item4_Id
)
ValuesTable
(
Combination_Id,
Value
)
So suppose I have following values in both values -
EntityCombinationTable
Id -- Item1_Id -- Item2_Id -- Item3_Id -- Item4_Id
1 1 1 1 1
2 1 2 1 1
3 2 1 1 1
4 2 2 1 1
ValuesTable
Combination_Id -- Value
1 10
2 0
3 0
4 20
So if in the first phase - I choose (1,2) for Item1, (1,2) for Item_2 and 1 for both Item_3 and Item4, then total combination would be 2*2*1*1 = 4.
Then in the second phase, for each of the combination that has value greater than zero, I would have to let the user choose different combination where the values would get relocated.
For example - As only combination with Id 1 and 2 has value greater than zero, only two relocation combination would need to be shown in the second dialog. So if the user choose (3,3,3,3) and (4,4,4,4) as relocation combination in the second phase, then new row will need to be inserted in
EntityCombinationTable for (3,3,3,3) and (4,4,4,4). And values of (1,1,1,1) and (2,2,1,1) will be relocated respectively to rows corresponding to (3,3,3,3) and (4,4,4,4) in the ValuesTable.
So the problem is - each of the entity can have items upto 100 or even more. So in worst case the total number of combinations can be 10^8 which would lead to a very heavy load in database(inserting and updating a huge number rows in the table) and also generating all the combination in the code level would require a substantial time.
I have thought about an alternative approach to not keep the items as combination. Rather keep separate table for each entity. and then make the combination in the runtime. Which also would cause performance issue. As there's a lot more different stages where I might need the combination. So every time I would need to generate all the combinations.
I have also thought about creating key-value pair type table, where I would keep the combination as a string. But in this approach I am not actually reducing number of rows to be inserted rather number of columns is getting reduced.
So my question is - Is there any better approach this kind of situation where I can keep track of combination and manipulate in an optimized way?
Note - I am not sure if this would help or not, but a lot of the rows in the values table will probably have zero as value. So in the second phase we would need to show a lot less rows than the actual number of possible combinations

How many trucks came empty but bought something

I think this is the hardest to date I have had to crack - so hard I had a hard time finding a good headline.
So we have a site where trucks come and buy say Gravel, or sand or other building materials.
Sometimes they also unload demolition waste first.
I need to find out a couple of things
how many trucks (and from what companys) came empty
if they came empty what did they buy from us.
what companys are sending full trucks and what are sending empty trucks.
a tope 10 of materials they will drive to us from to buy even when coming empty to our facility.
a list of all the order numbers that they drove to us til fill and came with empty trucks. ( I have distances linked to order numbers, so now I can estimate the value of our products)
The data I have available:
I have a full data set of when what customer buys what and / or pay to deliver.
E.G.:
I can see the parts I need to split the data into I think it should be something like this
find all unique licence plates
somehow map if they bought materials within 30 minutes of
offloading demolition waste (most trucks will come between 2 and 10
times per day)
Present all this data (on a normal day we have about 800 trucks = 2000 lines since they weigh in, weigh out, and then some buy something = 2 more weigh lines)
I can easily find unique licence plates per day (either by formula or by Excel function Data/delete doublets,
but after that I have no clue where to start.
I think I need some sheets in between, where I somehow mark if a material was bought from an "empty truck" and I need a counter for that .. somehow...
Any help on how to get started is appreciated.

It seems like the best way to start is with a helper column (in the following exampes, I have chosen "Column M") to flag whether the truck arrived empty.
In the helper column, you can use something similar to the following formula.
{=IF(ISBLANK(B2),0,IF(C2="In",0,IF(B2=$B$2:$B$13,IF($C$2:$C$13="In",IF($A$2:$A$13>(A2-TIME(0,30,0)),0,1),1),1)))}
This is an array formula, which means you have to press ctrl+shift+enter after pasting it in the cell. Then you can copy that cell down the column.
Just to explain, the first if statement knows the truck is not arriving empty if Column C is 'In'. The second if statement creates an array and tests to see if other the same truck appears in other rows. The third if statement checks to see if the same truck checked 'In' in the matching rows, and the fourth if statement verifies if the time they checked in was less than thirty minutes ago. You can adjust the length by editing the TIME(0,30,0) function. The format is TIME(hours,minuites,seconds). Unless the truck matches all three of the second, third and fourth if statements, it is marked as coming empty.
Once you have this helper column, just about all of your tasks are quite simple.
1a: How many trucks came empty? Sum Column M
1b: How many trucks from what company? Create a unique list of companies. Then create a COUNTIFS formula based on Column M = 1 and Column K = Company. For example, if C32 had Company B then the formula =COUNTIFS($M$2:$M$13,1,$K$2:$K$13,C32) would return 2
1c: How many times did a truck come empty? Similar to 1b, create a unique list of License Plates, then use a COUNTIFS based on Column M = 1 and Column B = License Plate.
2: Similar to 1b, just use a unique list of products tested against Column F
3: Similar to 1b, just create a second column, next to the first that uses =COUNTIFS($M$2:$M$13,0,$K$2:$K$13,C53,$C$2:$C$13,"In") Which tests that Column M reports the truck did not come empty, that matches the company in Column K and that the truck came 'In' so you don't double count the same truck when it goes 'out'
4: Just sort list created by number 2. You can highlight the range, right-click and select "Sort" > "Custom Sort", then select the column you want to sort on and largest to smallest.
5: There are a couple of different ways, you could do this. The formula
{=TEXTJOIN(", ",TRUE,IF($M$2:$M$13=1,$J$2:$J$13,""))}
(again, entered as an array formula)
would create a comma separated list of order numbers. An alternative if you want a column of order numbers (but would only work if they are actually numbers), is to paste the formula {=MAX(IF($M$2:$M$13=1,$J$2:$J$13,))} in the first row of the column (in my example, its O2) and then {=MAX(IF($M$2:$M$13=1,IF($J$2:$J$13<O2,$J$2:$J$13,)))} in the row below (change the reference to O2 if you pasted it in a different spot)(again, note that both of these are array formulas). Then copy and paste the second formula down the column. When order numbers of trucks that came in empty are exhausted, the formula will report 0.