Split In Google Sheets or Excel : based on condition - arrays

I have this Google Spreadsheet.
-The product name column is already in a good format. Full word and delimited by "+".
-The second column is meaning there is part of the string or Sugar in the product name.
-The third columns are some product reformated in the good format (full name and "+")
Product name Sugar_word_in_product_name Product reformated
Choco + sug,oil 1 Choco + Sugar + oil
Tablets + Sofa 0
Television + table 0
sugar,oil,ingred 1 Sugar + oil + ingredients
I want to return in this format. Ingredients 1, Ingredients 2...
Product name Sugar_word_in_product_name Product reformated Ingredients1 Ingredients2 Ingredients3
Choco + sug,oil 1 Choco + Sugar + oil Choco Sugar oil
Tablets + Sofa 0 Tablets Sofa
Television + table 0 Television Table
sugar,oil,ingred 1 Sugar + oil + ingredients Sugar Oil Ingredients
0
So basically, I want to split if it's "1" in column "Sugar_word_in_product_name" From "Product reformated"
If its "0" I want to split into columns from "product name".
This work on one condition. =IF(G2=0,SPLIT(F2,"+")).
That's the query I started but not sure how to make it work and returning blanks cells when no values.
=IF(G2=0,SPLIT(F2,"+",IF(G2=1,SPLIT(H2,"+"))))

try:
=ARRAYFORMULA(IFERROR(IF(B2:B=1; SPLIT(C2:C; "+"); SPLIT(A2:A; "+"))))

Related

Google Spreadsheet MAX + Join of other cells

I have a spreadsheet like this - I can't figure out how to dynamically search for this.
I want to find the MAX Score Value + output the related name
(no worries, in another column I've got the Score calculated as clean numbers without letters).
(QUERY doesn't really work because once I change a score, it doesn't update that output)
use:
=SORTN(SORT({x2:x, y2:y}, 2, 0), 9^9, 2, 1, 1)
where:
x2:x - column with names
y2:y - column with values
2 - sort values
0 - in descending order
9^9 - return all rows
2 - 2nd mode of sortn eg. group by
1 - names column
1 - in ascending order

SQL Server aggregating data that may contain multiple copies

I am working on some software where I need to do large aggregation of data using SQL Server. The software is helping people play poker better. The query I am using at the moment looks like this:
Select H, sum(WinHighCard + ChopHighCard + WinPair + ChopPair + Win2Pair + Chop2Pair + Win3OfAKind + Chop3OfAKind + WinStraight + ChopStraight + WinFlush + ChopFlush + WinFullHouse + ChopFullHouse + WinQuads + ChopQuads + WinStraightFlush + ChopStraightFlush) / Count(WinStraightFlush) as ResultTotal
from [FlopLookup].[dbo].[_5c3c3d]
inner join[FlopLookup].[dbo].[Lookup5c3c3d] on [FlopLookup].[dbo].[Lookup5c3c3d].[id] = [FlopLookup].[dbo].[_5c3c3d].[id]
where (H IN (1164, 1165, 1166) ) AND
(V IN (1260, 1311))
Group by H;
This works fine and is the fastest way I have found to do what I am trying to achieve. The problem is I need to enhance the functionality in a way that allows the aggregation to include multiple instances of V. So for example in the above query instead of it just including the data from 1260 and 1311 once it may need to include 1260 twice and 1311 three times. But obviously just saying
V IN (1260, 1260, 1311, 1311, 1311)
won't work because each unique value is only counted once in an IN clause.
I have come up with a solution to this problem which works but seems rather clunky. I have created another lookup table which just takes the values between 0 and 1325 and assigns them to a field called V1 and for each V1 there are 100 V2 values that for e.g. for V1 = 1260 there is a range from 126000 through to 126099 for the V2 values. Then in the main query I join to this table and do the lookup like this:
Select H, sum(WinHighCard + ChopHighCard + WinPair + ChopPair + Win2Pair + Chop2Pair + Win3OfAKind + Chop3OfAKind + WinStraight + ChopStraight + WinFlush + ChopFlush + WinFullHouse + ChopFullHouse + WinQuads + ChopQuads + WinStraightFlush + ChopStraightFlush) / Count(WinStraightFlush) as ResultTotal
from [FlopLookup].[dbo].[_5c3c3d]
inner join[FlopLookup].[dbo].[Lookup5c3c3d] on [FlopLookup].[dbo].[Lookup5c3c3d].[id] = [FlopLookup].[dbo].[_5c3c3d].[id]
inner join[FlopLookup].[dbo].[VillainJoinTable] on [FlopLookup].[dbo].[VillainJoinTable].[V1] = [FlopLookup].[dbo].[_5c3c3d].[V]
where (H IN (1164, 1165, 1166) ) AND
(V2 IN (126000, 126001, 131100, 131101, 131102) )
Group by H;
So although it works it is quite slow. It feels inefficient because it is adding data multiple times when what would probably be more appropriate is a way of doing this using multiplication, i.e. instead of passing in 126000, 126001, 126002, 126003, 126004, 126005, 126006, 126007 I instead pass in 1260 in the original query and then multiply it by 8. But I have not been able to work out a way to do this.
Any help would be appreciated. Thanks.
EDIT - Added more information at the request of Livius in the comments
H stands for "Hero" and is in the table _5c3c3d as a smallint representing the two cards the player is holding (e.g. AcKd, Js4h etc.). V stands for "Villain" and is similar to Hero but represents the cards the opponent is holding similarly encoded. The encoding and decoding takes place in the code. These two fields form the clustered index for the _5c3c3d table. The remaining field in this table is Id which is another smallint which is used to join with the table Lookup5c3c3d which contains all the equity information for the hero's hand against the villain's hand for the flop 5c3c3d.
V2 is just a field in a table I have created to try and resolve the problem described by having a table called VillainJoinTable which has V1 (which maps directly to V in _5c3c3d via a join) and V2 which can potentially contain 100 numbers per V1 (e.g. when V1 is 1260 it could contain 126000, 126001 ... 126099). This is in order to allow me to create an "IN" clause that can effectively have multiple lookups to equity information for the same V multiple times.
Here are some screenshots:
Structure of the three tables
Some data from _5c3c3d
Some data from Lookup5c3c3d
Some data from VillainJoinTable

Sum array of data within date range and other = text

I have a dataset with two tabs, one with monthly goal(target) and another tab with sales and order data. I'm trying to summarize sales data from the other tab into the target tab using several parameters with an Index(Match and SumIfs:
My Attempt:
=SUMIFS(INDEX(OrderBreakdown!$A$2:$T$8048,,MATCH(C2,OrderBreakdown!$G$2:$G$8048)),OrderBreakdown!$I$2:$I$8048,">="&A2,OrderBreakdown!$I$2:$I$8048,"<="&B2)
Order Breakdown is the other sheet, column D in OrderBreakdown sheet is what I want to sum if OrderBreakdown_Category(Col G) = Col C and if OrderBreakdown_Order Date(Col I) >= Start Date(Col A) and if OrderBreakdown_Order Date(Col I) <= End Date(Col A)
My answer should be much more in line with Col D but instead I'm getting $MM
Here's a sample of the dataset I'm pulling from:
dataset I'm pulling from
Ok, I am not sure why your range to sum is from A through T - that is probably where you went wrong. Also, I did not find the index method necessary. This should work for you
=SUMIFS(OrderBreakdown!$D$2:$D$8048,OrderBreakdown!$I$2:$I$8048, ">=" & A2,OrderBreakdown!$I$2:$I$8048, "<=" & B2, OrderBreakdown!$G$2:$G$8048, "<=" & C2)
Here is my sample data Starting on first sheet row 2
1/1/2011 1/30/2011 Office Supplies
Then the orderBreakdown tab starts on column C
Discount Sales Profit Quantity Category sub-category OrderDate
0.5 $45.00 ($26.00) 3 Office Supplies Paper 1/1/11 Eugene Mo Stockholm Sweden North Home Offic 1/5/11 Second Cla: Stockholm 2011-(11 0.1-2011 2011 1/1/2011
0 $854.00 $290.00 7 Furniture BookCases 1/2/2011
0 $854.00 $290.00 7 Furniture BookCases 12/32/2010

Using multiple aggregate functions - sum and count

I've tried several of the solutions to my question on the site but could not find one that worked. Please help!
Other than taking some liberties with the report_names, the data is realistic of what I am trying to accomplish and is just a small portion of what I am up against, roughly 97K rows of data with the same type of repetition of branch, file_count, report_name...the file numbers are unique and are insignificant. It is for informational purposes of my question and explains why the amounts are unique - they are tied to the file_name
I am looking for one report_name with the sum of the two amounts.
Here are the current results to my query:
branch file_count file_volume net_profit report_name file_number
Northeast 1 $200,000.00 $200,000.00 bogart.hump.new 12345
Northeast 1 $195,000.00 $197,837.00 bogart.hump.new 23456
Northeast 1 $111,500.00 $113,172.00 bogart.hump.new 34567
Northwest 1 $66,000.00 -$1,500.18 jolie.angela.new 45678
Northwest 1 $159,856.00 -$2,745.58 jolie.angela.new 56789
Northwest 1 $140,998.00 -$2,421.69 jolie.angela.new 67890
Southwest 1 $74,000.00 $73,904.00 Man.bat.net 78901
Southwest 1 $186,245.00 -$4,231.25 Man.bat.net 89012
Southwest 1 $72,375.00 $73,641.00 Man.bat.net 90123
Southeast 1 $79,575.00 -$1,821.76 zep.led.new 1234A
Southeast 1 $268,600.00 $268,600.00 zep.led.new 2345A
Southeast 1 $77,103.00 -$1,751.68 zep.led.new 3456A
This is what I am looking for:
branch file_count file_volume net_profit report_name file_number
Northeast 3 $506,500.00 $511,009.00 bogart.hump.new
Northwest 3 $366,854.00 -$6,667.45 jolie.angela.new
Southwest 3 $332,620.00 $143,313.75 Man.bat.net
Southeast 3 $425,278.00 $265,026.56 zep.led.new
My query:
SELECT
branch,
count(filenumber) AS file_count,
sum(fileAmount) AS file_amount,
sum(netprofit*-1) AS net_profit,
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
FROM user.summary u
inner join user.db1 d1 ON d1.loaname = u.loaname
inner join user.db2 d2 ON d2.cn = u.loaname
WHERE d2.filedate = '2015-09-01'
AND filenumber is not null
GROUP BY branch,concat(d2.lastname,'.',d2.firstname,'.','new')
The only issue i see with your current query is that you have a comma at the end of this line that would give you a syntax error:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
If you want the blank field file_number as shown in your desired result set though, you could leave the comma and follow it with the blank field by adding to it:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
'' file_number
I figured it out but could not have done it without airing it out in this forum. In my actual query, I included the "file_name" column, so I had both the "count(file_name)" and "file_name" columns...but in my query example, I only had the "count(file_name)" column. When I removed the "file_column" column from my actual query, it worked. Side note...it was obvious that I excluded a key component in my query. On any future query questions, I will include the complete query but substitute actual column names with col1, col2, db1, db2, etc... thanks very much for responding to my question.

Loops of count command in Stata

Software: Stata
I have two datasets: one of company CEOs (dataset 1) and one of business agreements signed (dataset 2).
Dataset 1 is the following format, sorted by company:
company 1, CEO name, start date, end date, etc.
company 1, CEO name, start date, end date, etc.
...
company 2, CEO name, start date, end date, etc.
Dataset 2 is the following format, sorted by agreement (each with 2-150 parties):
agreement 1, party 1, party 1 accession date, party 2, party 2 accession date.
agreement 2, party 1, party 1 accession date, party 2, party 2 accession date.
I want to write a code that, for each individual CEO, counts the number of agreements signed by the CEO's company in his/her tenure as CEO.
So far I have created a CEO-day dataset with expand.
gen duration = enddate - startdate
expand duration -1
sort id startdate
by id: gen n = _n -1
gen day = startdate + n
Ideally I would proceed with a code like this:
collapse (count) agreement, by(id)
However, Dataset 2 lists the different parties as different variables. Company 1 is not always "party 1", sometimes it may be "party 150". Also, each party may have different accession dates. I need a loop that "scans" Dataset 2 for agreements where company 1 acceeded to the agreement as one of the parties with an accession date located within the period CEO 1 of company 1 was CEO of company 1.
What should I do? Do I need to create a loop?
A loop is not strictly necessary. You can try using reshape and joinby:
clear
set more off
*----- example data -----
// ceo data set
input ///
firm str15(ceo startd endd)
1 "pete" "01/04/1999" "05/12/2010"
1 "bill" "06/12/2010" "12/01/2011"
1 "lisa" "13/01/2011" "15/06/2014"
2 "mary" "01/04/1999" "05/12/2010"
2 "hank" "06/12/2010" "12/01/2011"
2 "mary" "13/01/2011" "15/06/2014"
3 "bob" "01/04/1999" "05/12/2010"
3 "john" "06/12/2010" "12/01/2011"
end
gen double startd2 = date(startd, "DMY")
gen double endd2 = date(endd, "DMY")
format %td startd2 endd2
drop startd endd
tempfile ceo
save "`ceo'"
clear
// agreement data set
input ///
agree party1 str15 p1acc party2 str15 p2acc
1 2 "09/12/2010" 3 "10/01/2011"
2 1 "05/06/1999" 2 "17/01/2011"
3 1 "06/06/1999" 3 "05/04/1999"
4 2 "07/01/2011" . ""
5 2 "08/01/2011" . ""
end
gen double p1accn = date(p1acc, "DMY")
gen double p2accn = date(p2acc, "DMY")
format %td p?accn
drop p?acc
*----- what you want -----
// reshape
gen i = _n
reshape long party p#accn, i(i)
rename (party paccn) (firm date)
order firm agree date
sort firm agree
drop i _j
// joinby
joinby firm using "`ceo'"
// find under which ceo, agreement was signed
gen tag = inrange(date, startd2, endd2)
list, sepby(firm)
// count
keep if tag
collapse (count) agreenum=tag, by(ceo firm)
list
A potential pitfall is joinby creating so many observations, you run out of memory.
See help datetime if you have no experience with dates in Stata.
(Notice how I set up example data for your problem. Providing it helps others, help you.)

Resources