Google Sheets REGEXEXTRACT from Instagram - arrays

I'm facing a particular challenge with regular expression.
I am populating a Google sheet with Google search results of Instagram post captions with the following repetitive pattern:
A | B | C | D
--------------------------------------------------------------------------------
1 | 10.7k Likes, 1.7k Comments - #kristiannairn on Instagram... | |
2 | 4219 Likes, 176 Comments - #djiglobal on Instagram... | |
3 | 1.1m Likes, 209k Comments - #kristiannairn on Instagram... | |
I'm trying unsuccessfully to find the right REGEXEXTRACT formula to extract the number of Likes with decimals and the k/m designators following it and without when it is not existent, to populate Column C, and then the REGEXEXTRACT formula to extract the number of Comments with decimals and the k/m designators following it and without when it is not existent, to populate Column D.
So far I was able to come up with this formula for Column C to extract the Likes:
=REGEXEXTRACT(B1,"(\.?\d*)\W?(?:Likes)")
However, it does not recognize decimals and does not fetch the k/m designators.
I have the same problem with the Column D comments formula I found:
=REGEXEXTRACT(B1,"(\.?\d*)\W?(?:Comments)")
Same here... it does not recognize decimals and does not fetch the k/m designators.

all you need is:
=ARRAYFORMULA(IFNA({
REGEXEXTRACT(B2:B, "(.*) Like"),
REGEXEXTRACT(B2:B, ", (.*) Comm")}))

For Likes, you may use
(\d+(?:\.\d+)?[km]?)\W*Likes\b
and for Comments,
(\d+(?:\.\d+)?[km]?)\W*Comments\b
See the regex demo #1 and regex demo #2. Proof:
Details
(\d+(?:\.\d+)?[km]?) - Group 1: 1+ digits followed with an optional sequence of . and 1+ digits and then an optional k or m
\W* - 0+ non-word chars
Likes - a word Likes
\b - a word boundary.

Related

Google Sheets SumIfs with left formula

I want to use the sumifs formula, but the sum interval range has text in it.
Example:
|Criteria|Sum Interval|
|--------|------------|
| A | 1 - Good |
| A | 2 - Regular|
| C | 3 - Bad |
So, I want to check the criteria field and, when met, sum the first character of the Sum Interval. I tried something like this:
= sumifs( arrayformula(left(suminterval, 1)) , criteria, 'A')
In this case, the formula should return 3 (1 + 2)
arrayformula(left(suminterval, 1)) = interval with only first character
This work when used alone, but when I use it as an argument, a receive a message saying that the argument must be a range.
Ps: The hole solution has to be in an only formula.
try:
=INDEX(QUERY({A2:A, REGEXEXTRACT(B2:B, "\d+")*1}, "select sum(Col2) where Col1 = 'A'"), 2)

Using cells as output table of while loop in octave

So I'm implementing a while loop in my code that just does some simple calculations. The thing is, that I want to have an output that no only shows the final values but all of them from each step. The best I could do was using cell arrays with the following code:
i=1; p=(a+b)/2;
valores=cell(n, 3);
while (i<=n && f(p)!=0);
if f(a)*f(p)<0;
a=a; b=p;
else a=p; b=b;
endif
i=i+1; p=(a+b)/2;
valores(i, :)={i-1 p f(p)}; fprintf('%d %d %d \n', valores{i, :});
endwhile
An example output would be:
1 1.25 -1.40998
2 1.125 -0.60908
3 1.0625 -0.266982
4 1.03125 -0.111148
5 1.01562 -0.0370029
But I have two main issues with this method, the first one is that I couldn't find a way to get some text as title in the first line, so I have to explain what each column in a sentence later, and second I don't know how to make it so that all the columns stay at the same distance from each other instead of each text staying at the same distance. I assume this last issue has something to do with the way I used the fprintf line since I'm not to familiar with it.
In case it helps to understand what I want to get from this algorithm, I'm trying to calculate the root of a function with the bisection method. And sorry if this was to long or unclear, feel free to give me advise, I'm kinda new here :)
An open-source package called Tablicious can take care of cell, row, and column alignment. Using print statements and whitespace gets tedious and leads to unmaintainable code.
Tablicious is a package for GNU Octave that provides relational data structures for Octave. It includes implementations of table arrays, datetime, string, categorical, and some other related stuff. You can think of it as “pandas for Octave”.
Installation
pkg install https://github.com/apjanke/octave-tablicious/releases/download/v0.3.6/tablicious-0.3.6.tar.gz
Example
pkg load tablicious
Forename = {"Tom"; "Dick"; "Harry"};
Age = [21; 63; 38];
Salary = {"$1"; "$2"; "$3"};
tab = table(Forename, Age, Salary);
prettyprint (tab)
Result
-------------------------------
| Forename | Age | Salary |
-------------------------------
| Tom | 21 | $1 |
| Dick | 63 | $2 |
| Harry | 38 | $3 |
-------------------------------
Documentation can be found here.

Use excel to summarise data from a column by identifier

I have a spreadsheet with a column called MRN (the identifier) and the drugs administered next to them. There are duplicates of the MRN in column A that correspond to different courses of drugs. What I'm hoping to do is to summarise all the drugs administered associated with one MRN in one line, removing all duplicates. It looks something like this.
| | A | B |
| 1 | MRN Item
| 2 | 1 cefoTAXime
| 3 | 1 ampicillin
| 4 | 1 cefoTAXime
| 5 | 1 vancomycin
| 6 | 1 cefTRIaxone
| 7 | 2 ampicillin
| 8 | 2 vancomycin
| 9 | 2 vancomycin
I have 3 different formulas. The first is to produce a list of MRNs that are all unique. The second is to pull all drugs by MRN and list them in one line. The third is to remove duplicates from this list. They are below (in order).
{=IFERROR(INDEX($A$2:$A$2885, MATCH(0,COUNTIF(D$1:$D1, $A$2:$A$2885),0 )),"")}
{=INDEX($A$2:$B$2885,SMALL(IF($A$2:$A$2885=$D2,ROW($A$2:$A$2885)),COLUMN(D:D))-4,2)}
{=IFERROR(INDEX($E$2:$AE$2, MATCH(0,COUNTIF(D$3:$D3, $E$2:$AE$2),0 )),"")}
*I know that I can edit the second one by adding IF(ISERROR ...) to remove NA and print blanks if drug not found, but want to keep the formulas as simple as possible at this time.
My problem is that second formula isn't pulling all the drugs by MRN, and in an ideal world I would be able to combine the second and third formula into one, but I am not sure how to. Here is a link to a test file that shows my issue and the formulas in action.
https://1drv.ms/x/s!ApoCMYBhswHzhooXnumW2iV7yx-JaA
I appreciate that there may be a better way to do this using python/R, and if that's possible then I'm more than happy to try, but I couldn't make any headway. Thanks for your help and suggestions.
If you could deal with a count of the number of courses per drug per MRN, you can do this with Power Query (aka Get & Transform in Excel 2016)
Starting with the data you provided on your worksheet, the results would look like:
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"MRN", Int64.Type}, {"Item", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"MRN"}, {{"Count", each _, type table}}),
#"Expanded Count" = Table.ExpandTableColumn(#"Grouped Rows", "Count", {"MRN", "Item"}, {"Count.MRN", "Count.Item"}),
#"Pivoted Column" = Table.Pivot(#"Expanded Count", List.Distinct(#"Expanded Count"[Count.Item]), "Count.Item", "Count.MRN", List.NonNullCount)
in
#"Pivoted Column"

Excel Lookup IP addresses in multiple ranges

I am trying to find a formula for column A that will check an IP address in column B and find if it falls into a range (or between) 2 addresses in two other columns C and D.
E.G.
A B C D
+---------+-------------+-------------+------------+
| valid? | address | start | end |
+---------+-------------+-------------+------------+
| yes | 10.1.1.5 | 10.1.1.0 | 10.1.1.31 |
| Yes | 10.1.3.13 | 10.1.2.16 | 10.1.2.31 |
| no | 10.1.2.7 | 10.1.1.128 | 10.1.1.223 |
| no | 10.1.1.62 | 10.1.3.0 | 10.1.3.127 |
| yes | 10.1.1.9 | 10.1.4.0 | 10.1.4.255 |
| no | 10.1.1.50 | … | … |
| yes | 10.1.1.200 | | |
+---------+-------------+-------------+------------+
This is supposed to represent an Excel table with 4 columns a heading and 7 rows as an example.
I can do a lateral check with
=IF(AND((B3>C3),(B3 < D3)),"yes","no")
which only checks 1 address against the range next to it.
I need something that will check the 1 IP address against all of the ranges. i.e. rows 1 to 100.
This is checking access list rules against routes to see if I can eliminate redundant rules... but has other uses if I can get it going.
To make it extra special I can not use VBA macros to get it done.
I'm thinking some kind of index match to look it up in an array but not sure how to apply it. I don't know if it can even be done. Good luck.
Ok, so I've been tracking this problem since my initial comment, but have not taken the time to answer because just like Lana B:
I like a good puzzle, but it's not a good use of time if i have to keep guessing
+1 to Lana for her patience and effort on this question.
However, IP addressing is something I deal with regularly, so I decided to tackle this one for my own benefit. Also, no offense, but getting the MIN of the start and the MAX of the end is wrong. This will not account for gaps in the IP white-list. As I mentioned, this required 15 helper columns and my result is simply 1 or 0 corresponding to In or Out respectively. Here is a screenshot (with formulas shown below each column):
The formulas in F2:J2 are:
=NUMBERVALUE(MID(B2,1,FIND(".",B2)-1))
=NUMBERVALUE(MID(B2,FIND(".",B2)+1,FIND(".",B2,FIND(".",B2)+1)-1-FIND(".",B2)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2)+1)+1,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)-1-FIND(".",B2,FIND(".",B2)+1)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)+1,LEN(B2)))
=F2*256^3+G2*256^2+H2*256+I2
Yes, I used formulas instead of "Text to Columns" to automate the process of adding more information to a "living" worksheet.
The formulas in L2:P2 are the same, but replace B2 with C2.
The formulas in R2:V2 are also the same, but replace B2 with D2.
The formula for X2 is
=SUMPRODUCT(--($P$2:$P$8<=J2)*--($V$2:$V$8>=J2))
I also copied your original "valid" set in column A, which you'll see matches my result.
You will need helper columns.
Organise your data as outlined in the picture.
Split address, start and end into columns by comma (ribbon menu Data=>Text To Columns).
Above the start/end parts, calculate MIN FOR START, and MAX FOR END for all split text parts (i.e. MIN(K5:K1000) .
FORMULAS:
VALIDITY formula - copy into cell D5, and drag down:
=IF(AND(B6>$I$1,B6<$O$1),"In",
IF(OR(B6<$I$1,B6>$O$1),"Out",
IF(B6=$I$1,
IF(C6<$J$1, "Out",
IF( C6>$J$1, "In",
IF( D6<$K$1, "Out",
IF( D6>$K$1, "In",
IF(E6>=$L$1, "In", "Out"))))),
IF(B6=$O$1,
IF(C6>$P$1, "Out",
IF( C6<$P$1, "In",
IF( D6>$Q$1, "Out",
IF( D6<$Q$1, "In",
IF(E6<=$R$1, "In", "Out") )))) )
)))

TSQL search exact match into a string

I stumbling on an issue with string parsing; what I'm trying to achieve is substitute a marker string with a value but the string match needs to be perfect.
Keep in mind that before the compare I split the entire string in a table (rowID int, segment nvarchar(max)) wherever i find a space so, a thing like 'The delta_s is §delta_s' will look like:
rowID | segment
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s
After this i cycle each row with my table of "replacements" (idString nvarchar(max), val float); example:
Marker string (#segment): '§deltaT_s'
String to replace (#idString): '§deltaT_s'
The instruction I am using (since "like" is a lost cause as far I can see):
SELECT STUFF(#segment, PATINDEX('%'+#idString+'[^a-z]%', #segment), LEN(#idString), CAST(#val AS NVARCHAR(MAX)))
with #val being the number to substitute taken from the "replacements" table.
Now, in my table of "replacements" i have 2 delta like markers
1) §deltaT_s
2) §deltaT
My issue is that when the cycle start comparing the segments with the markers and the §deltaT comes up it will substitute the first part of the string in this way
'§deltaT_s' -> '10_s'
I don't understand what I am doing wrong with the REGEX anyone can give me and hand on this matter?
I am available in case more info are required.
Thank you,
F.
If possible you should change the marking style putting a paragraph symbol (§) at both side of the token, making one of the example in your comment
the deltaT_s is §deltaT_s§, see ya!
doing that the sentence will be split as
rowID | segment
--------------------
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s§,
5 | see
6 | ya!
if the replace values are stored in a fact table you will have something like
token | value
------------------
§deltaT§ | foo
§deltaT_s§ | 10
or you can fake it putting the symbol at the end of the token in you query.
Than it's possible to search for the substitution with a LIKE and a LEFT JOIN between the two tables
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '%'
SQLFiddle demo
If you cannot change the fact table you can fake the change adding the symbol after the token
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '§%'
Maybe it is not an option, but for me helped ones.
If you can use Regex in sql or create CLR functions, look at this link http://www.sqllion.com/2010/12/pattern-matching-regex-in-t-sql/ last 2 options.
For you the best will be to take last choice using CLR function.
Then you will can do like this:
Text: the deltaT_s is §delta, see ya!
Regex: (?<=[^a-z])§delta(?![a-z_]) - this (?<=[^a-z]) means that will not take to match and (?![a-z_]) is not followed by letters and underline.
Replace to : 10
I also have tried regex \b§delta\b (\b :Start or End of word), but it seems it doesn't like §

Resources