(Excel) Lookup where lookup value isn't exactly in the array - arrays

Background: I usually do "full reviews" on select categories during months 5, 8, 9, and 12, where an excerpt of this schedule can be seen in the image below (F1:I13). For example, for category 101 I usually do a full review on months 5, 9, and 12 (as seen in cells F8:F10; Note that the values in column F are of the format [Category]-[Month]. E.g., Cell F2 is the lookup key for category 085 during December/12).
Example of normal process: If doing a full review on 5/2015 for category 101, I would input "5" and "2015" in cells B2:B3, then cells B5:B6 would properly VLOOKUP the first and second prior dates of full reviews for the category. So in this case,
B5.Value() = 12.2014
B6.Value() = 9.2014
Problem: I have begun a process where I intend on doing a partial review during all the months I don't do a full review for each category, but the process of pulling in the first and second prior full review dates would no longer work. For example, as seen below, if I want to do a partial review in 6/2015, I want to pull in "5.2015" and "12.2014" in cells B5:B6.
The issue I run into is that the array on the right-side is produced by an outside source and I can't really do anything to edit it (it's also MUCH larger than the excerpt I have shown and too many things depend on its current format; the array gets updated at the start of each year).
Is there a clever way to figure out how to lookup the proper prior full review dates during partial reviews, where the partial review dates aren't clearly defined in the first column of the lookup array (Lookup Key)?

=INDEX(H2:H13,MATCH("*"&B1&"-"&"*",G2:G13,0))
This should do the trick. Just create another row for specifically finding partial review dates.

Related

Splitting Multiple Ranges from a Single Range in Google Sheets

I am building a scheduling tool for my company. The structure of my Google Sheets document is a summary page with the entire schedule laid out for each employee in each department. Then, each employee gets their own sheet. In each employee sheets I have a section for Sunday, Monday, and Tuesday (the days each employee works). In each section by day I have a column indicating the hours they are on the clock, and then a column for each department. I have placed a checkbox into each cell and can activate the check box to indicate that this employee will be working in that department at that corresponding time. However, there are times when an employee is in a department more than once per day, meaning that the column with the checkboxes has, for example, checks in the cells corresponding to 7am-10am and 2pm-5pm, with unchecked boxes in the cells corresponding to 11am-1pm.
I have query functions that can pull the start and end times in that department if the employee is only in that department once per day. The output, after some concatenation, is something like "7AM-2PM".
=QUERY(A4:C17, "Select MIN(A) where (C=TRUE)")
=QUERY(A4:C17, "Select MAX(A) where (C=TRUE)")
However, I cannot think of a way to discriminate multiple start and end times. Using the above example, I'd like my output to be "7AM-10AM" and "2PM-5PM". These can be in separate cells or the same cell, doesn't make a difference to me. There can also be formulas in several cells if a subsequent formula needs to operate off a previous one.
I hope this makes sense in the way I have described it. I have been struggling for weeks trying to come up with something and am running out of time. Thanks for any and all help!
try:
=ARRAYFORMULA(IFNA(TEXTJOIN(CHAR(10), 1,
TEXT(FILTER($A4:$A17, IF((B4:B17=TRUE)*({FALSE; B4:B16}=FALSE), 1, )=1), "hh:mm\ - ")&
TEXT(FILTER($A4:$A17, IF((B4:B17=TRUE)*({B5:B17; FALSE}=FALSE), 1, )=1), "hh:mm"))))

iterating over multindex - a groupby.value_counts() object is only through values and not through original date index

i want to know the percent of males in the ER (emergency room) during days that i defined as over crowded days.
i have a DF named eda with rows repesenting each entry to the ER. a certain column states if the entry occurred in an over crowded day (1 means over crowded) and a certain column states the gender of the person who entered.
so far i managed to get a series of over crowded days as index and a sub-index representing gender and the number of entries in that gender.
i used this code :
eda[eda.over_crowd==1].groupby(eda[eda.over_crowd==1].index.date).gender.value_counts()
and got the following result:
my question is, what is the most 'pandas-ian' way to get the percent of males\females in general. or, how to continue from the point i stopped?
as can be shown in the bottom of the screenshot, when i iterate over the elements, each value is the male of female consecutively. i want to iterate over dates so i could somehow write a more clean loop that will produce another column of male percentage.
i found a pretty elegant solution. i'm sure there are more, but maybe it can help someone else.
so i defined a multi-index series with all dates and counts of females and males. then used .loc to operate on each count of all dates to get percentage of males at each day. finally i just extract only the days that apply for over_crowd==1.
temp=eda.groupby(eda.index.date).gender.value_counts()
crowding['male_percent']=np.divide(100*temp.loc[:,1],temp.loc[:,2]+temp.loc[:,1])
crowding.male_percent[crowding.over_crowd==1]

Remove duplicate values based on timestamp

I would need your help with and SQL query that has to remove duplicate entries from a table, mostly using the datestamp column as a criteria in two passes.
Microsoft SQL DBMS is in question.
Here is a little more details:
Terminology: Module is basically a group of single machine workplaces onto which users operate.
Table:
ModNam column is fixed, there are 15 modules from M A01 to M A15, then goes the B row M B01 ... M B15 and so on until row F.
Pos column is irrelevant at the moment.
MdCod column represents a code of the machine being added to the position in the certain module. It can be replaced by another machine at any given time.
I have one query that will be inserting data into this table by copying entries from another table, every time a new machine is added to one of the positions.
Tricky part for me is a second query that should be comparing records in two phases and if:
1) Inside same module (first pass of the query represented with red color in the example pic attached):
ModNam value is the same, MdCod matches between the entries then the most recent datestamp decides the single one to stay and others duplicates get deleted
2) Inside other module (second pass of the query represented with purple color in the example pic attached):
ModNam values are different and MdCod matches between the entries then the most recent datestamp decides the single one to stay and others duplicates get deleted.
Please help and advise.
Example pic (updated):
Thank you all in advance.

Creating custom rollups with SSAS

I am currently working on a requirement as follows and would appreciate some help in figuring out a way to configure the aggregation of my measure:
I have a fact table that contains the following Item ID, DateID,StoreID, ReceivedComments. The way received comments work is that on a daily basis a new record is created that adds to the value of received comments (for example if Item 5 in Store 5 on 1 Jan had 23 Received Comments and it received 5 comments the following day, the row for Jan 2 would be Item 5, Store 5, Jan 2, 28)
We created a measure using MAX and it works fine whenever Item ID is used in the query. When we start moving to a higher level the max produces wrong results. Our requirement is to setup the measure to be as follows:
If the member selected is on the Item Level then MAX, if it's on any other level (Date or Store) then the measure should aggregate the Max of all Items under this date or store.
Due to the business rules and structure of the database Store and Item are different dimensions so I can not include them in 1 Hierarchy.
We have been playing around with Custom RollUps but so far haven't been able to get it to work.
Thanks
I would solve this by using a more traditional approach to your fact table. Instead of keeping a cumulative count in the ReceivedComments column, I would keep only the number of comments received THAT DAY.
That way, instead of using MAX, you can create your measure using SUM, and it will automatically rollup when you go to higher levels.
The only disadvantage I can see to this approach is that you will need to use a range of dates, instead of only the most recent date, to get a full total of all the comments for a given item/store/date. But that's a very small change to your MDX.
Someone suggested using ISLEAF to determine the level, Instead of using ISLeaf i went with AS CASE WHEN [Item].[ItemID].CURRENTMEMBER.LEVEL IS [Item].[ItemID].[(All)] so I don't have to account for other dimensions such as Date, Store, etc as I have several other dimensions that all behave the same way.
And then I went with this formula to determine the Sum of the Max of the items in a particular store like this:
SUM({[Item].[Item ID].children},[Measures].[ReceivedComments]), Now I expect some performance issues with this measure but we are currently running some tests to see if it's gonna be reliable to work with it on actual data.

couchdb map / reduce multiple keys filtering by date

I have a view setup with a map reduce. Right now this code works great:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([id,doc.items[id].name], 1);
}
}
}
}
function(keys,prices){
return (keys, sum(prices));
}
I get a return and when using the group parameter, it condenses everything just fine.
My issue/question, I want to add a third key.... DATE, so I may only reduce records from certain dates. So for example:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([date,id,doc.items[id].name], 1);
}
}
}
}
My issue is that since date is at the beginning of the array, the reduce groups by date, id etc. I know I use group_level and say just take the first key from the array or the first 2 keys, but that doesn't help either because afaik, group_level goes from left to right in the array. I could put the date on the end of the emit array, but that doesn't help either because I need to have values at the beginning of my startkey and endkey to search on.
Here is an example of the output of data:
{"key":["2012-03-13","356752b8a5f6871f3","Apple"],"value":1},
{"key":["2012-03-20","123752b8a76986857","Pear"],"value":1},
{"key":["2012-04-12","3013531de05871194","Grapefruit"],"value":1},
{"key":["2012-04-12","356752b8a5f6871f3","Apple"],"value":1},
I want APPLE to be added up in one row, here it's adding up apples by date first. I was able to successfully just add up all the apples if I remove DATE as the first key in the array, but then I can't search by date range.
Any ideas on how to accomplish this?
If I correctly understand what you want to do, then you'd want to put the date as the first element of your array, and use group_level as well as start_key and end_key.
Eg. startkey=[1, "someid"] endkey=[1,"someid",{}] group_level=2
Will get you all items from date 1 (obviously choose your own format here), with id "someid" and any name. It seems funny that you emit id's before names, and without having more information about what you're actually trying to accomplish, it's hard to advise your general data model. If ID is a "type" id meaning that many items share the same ID then this makes sense. If ID is a unique per item ID, then it does not. In that case, you'd want to emit "name" before ID...
Edit 1
As per your comment, to do a range of dates you do this:
startkey=[1] endkey=[5,{}] group_level=2
You will get everything from date 1 to date 5 grouped by id ie. apples, oranges etc. I use this exact technique in a very large scale production application. I actually formatted the dates as an easily human readable integers of the format yyyymmdd, so 20140624 would sort to the top. If I want everything from the start of the month till now grouped by my group ids, I call
startkey=[20140601] endkey=[20140624,{}] group_level=2
It works perfectly and as far as I can tell that's what you're looking to do. I also have a third key layer "detail" which allows me to provide a deeper level of grouping for items that need it. I can then call
startkey=[20140601, "someid"] endkey=[20140624, "someid",{}] group_level=3
To drill to the detail level for a particular id, or just use the previous query with group_level=3 if I want the details for every id. I'm certain you can make this work - I've solved this exact problem in a production application using the techniques described.
Edit 2
If you want to group all apples regardless of date, then you'll need to let apples be the first element in the key. You can then get all apples over all time as a single row in the view result using group_level=1, and Apples over a date range using group_level=2. The difference here is that you'll only be able to do the group_level=2 query on a single item type at a time. If you want the best of both worlds, you unfortunately just need to make 2 views. That's just how key ordering works... If you need fast response times for both types of queries, all item types over a date range, and all of a particular item not grouped by date, I believe 2 views is the only way to achieve that.
Note
Another thing to note is about your reduce function. Wherever possible it is highly recommended that you use the built in reduce functions. They're implemented in erlang and are highly optimized compared to custom javascript reduce functions.
In your case, just replace your reduce function with this
_sum
Easy hey?
If you post more info about your application, data model etc. then I'd be happy to help out more with your database design.

Resources