Regexextract number just before specific text Google sheets - arrays

I need to extract the number that comes right before the text Positions
Example String:
Medical Specialist (Anaestheologist) (4 Positions) at Ministry
Valid Output should be 4
Example String 2 (If text Positions doesn't exist)
Medical Specialist (Anaestheologist) (4) at Ministry
Valid Output
4
I tried:
=REGEXEXTRACT(A24,"\(.*Positions.*\)") but it did not work.

try:
=ARRAYFORMULA(REGEXEXTRACT(A1:A2; "(\d+)(?: Positions)?"))

Related

Display all the distinct values in 1 ssrs letter

Hi I am creating an ssrs report (which is a letter format). And in the letter format, if a owner has highest sales for 3 products in same store I need those products to be listed in the same letter instead of 3 different letters. I have written an expression for the body of the letter. But that expression is taking all the products that has highest sales and generating 3 different letters. May I know how can I embed the product names within the body of the letter?
Sample That I need:
Hi Doc,
It has really great that you are making a huge impact on our product sales.
In our recent survey, we found that your store had the highest sales for following products:
Comforters
Cosmetics
Toys
Please continue this xyz.
NA Dept.

sentiment140 dataset doesn't contain label 2 i.e neutral sentences when uploading it from HuggingFace

I want to work with the sentiment140 dataset for a sentiment analysis task, as I saw that it contains normally the following labels :
0, 4 for pos and neg sentences
2 for neutral sentences
which I found when looking at the dataset on their website :
https://huggingface.co/datasets/sentiment140#data-fields
But after importing it on my notebook it tells me that it contains just two labels :
0 for neg
4 for pos !!!
So how to get full dataset with the three labels ?
You are correct on the fact that the HuggingFace sentiment140 dataset only contains 2 labels in the training set (0 and 4); however, the test set contains the 3 labels (0, 2 and 4).
You could open a discussion here to ask the authors if this is normal.

Higlight the dominating number in excel, most repeated for each keyword

Is this possible using excel formulas? To find keyword and number then match and color the highest number for that specific keyword, e.g. below:
this is the list Cell A keyword and B numbers
shoes 9
shoes 5
shoes 3
furniture 2
furniture 4
furniture 5
beauty 6
beauty 8
health 35
health 4
health 2
grocery 3
grocery 2
computers 9
computers 7
laptop 2
laptop 11
laptop 2
laptop 6
pets 9
pets 3
books 5
books 5
shoes 9 Highlight this number
shoes 5
shoes 3
furniture 2
furniture 4
furniture 5 Highlight this number
beauty 6
beauty 8 Highlight this number
health 35 Highlight this number
health 4
health 2
grocery 3 Highlight this number
grocery 2
computers 9 Highlight this number
computers 7
laptop 2
laptop 11 Highlight this number
laptop 2
laptop 6
pets 9 Highlight this number
pets 3
books 5 ignore if its equal
books 5
You can use conditional formatting, choosing "Use a formula..." and use a formula such as =b1=maxifs($B$1:$B$100,$A$1:$A$100,a1). Be mindful of absolute vs. relative reference to ensure that you're tracking the right ranges.
In particular when tagged vba you should be showing what you have tried. macros Usage guide specifically states "DO NOT USE for VBA / MS-Office languages" and excel wiki states "Questions tagged with excel should be version-agnostic.". However, with a formula is possible in versions earlier than those with MAXIFS (ie not: Excel for Office 365 Excel for Office 365 for Mac Excel 2016 Excel 2016 for Mac Excel Online Excel for iPad Excel for iPhone Excel for Android tablets Excel for Android phones Excel Mobile), if in a more long-winded way:
Assuming you have 11 in B18. Add a column (say I) and populate I1 with 0 and enough of it from I2 downwards with:
=IF(A1<>A2,I1+1,I1)
copied down to sort your data on ColumnI Smallest to Largest then by ColumnB Largest to Smallest (to preserve the order of the values in ColumnA).
Then select B2 down to as far as required, clear any existing CF rules from it and HOME > Styles - Conditional Formatting, New Rule..., Use a formula to determine which cells to format and Format values where this formula is true::
=AND(A1<>A2,B2<>B3)
Format..., select choice of formatting, OK.
The above should not, as specified, highlight the values for books though if working I suspect #nutsch's current answer might.
Sorry, I forgot to adjust my guess for what was where, once I realised a header row would make things easier.
This does though stil have a problem, in that text that changes from one row to the next but shares the same quantity, one row to the next, will not trigger highlighting - a more complex formula may be required.
based on #pnuts idea, found a simpler way to do it.
Sort Z to A of B row, then sort column A, A to Z, with expand the selection for both
next write a formula to highlight duplicates excluding the first one from column A and drag down the formula, it higlights all the correct ones.
thank you

Vector Space Model query - set of documends search

i'm trying to write a code for vsm search in c. So using a collection of documents i built a hashtable (inverded index) in wich each slot holds a word along with it's df and a pointer to a list in which each slot hold a name of a document(in which the word appeared at least once) along with the tf(how many times it appeared in this doccument). The user will write a question(also chooses weighting qqq.ddd and comparing method but that doesn't matter for my question) and i have to print him the documents that are relevant to it(from the most relevant to the least relevant). So the examples i've seen are showing which are the steps having only one document for example: we have a collection of 1.000.000 documents(N=1.000.000) and we want to compare
1 document: car insurance auto insurance
with the queston: best car insurance
So in the example it creates an array like this:
Term | Query | Document
| tf | tf
auto | 0 | 1
best | 1 | 0
car | 1 | 1
insurance| 1 | 2
The example also gives the df for each term so using these clues and the weighting and comparing methods it's easy to compare them turning them into vectors by finding the 4 coordinates(1 for each word in the array).
So in this example there are 1.000.000 documents and to see how relevant the document with the query is we use 1 time each(4 words) of the words that there are in the query and in the document. So we have to find 4 coordinates and then compare.
In what i'm trying to do there are like 8000 documents each of them having from 3 to 50 words. So how am i suppose to compare how relevant is a query with each document? If i have
a query: ping pong
document 1: this is ping kong
document 2: i am ping tongue
To compare the query-document1 i will use the words: this is ping kong pong (so 5 coordinates) and to compare the query-document2 i will use the words: i am ping tongue is kong (6 coordinates) and then since i use the same comparing method the one with the highest score is the most relevant? OR do i have to use for both the words: this is ping kong am tongue kong (7 coordinates)? So my question is which is the right way to compare all these 8000 documents with the question? I hope i succeed on making my question easy to understand. thank you for your time!

Import a text file by converting repeating groups of lines into columns

My data file looks like
1234567 7654321
TEXT ABOUT STUFF
ON MULTIPLE LINES
NOT SURE HOW MANY
1234567 7654321
TEXT ABOUT STUFF
ON MULTIPLE LINES
NOT SURE HOW MANY
The only thing for certain is a new record starts with 2 sets of numbers that are 7 characters long. The numbers are also on a new line and appear as my sample data above.
I am using SQL Server Express on Windows 8.
Ultimately I need the first group of numbers in a column, 2nd group in another column and the remainder of the text in the 3rd column.
This is the realm of ETL. The SQL Server was of doing this is to use SSIS.

Resources