sentiment140 dataset doesn't contain label 2 i.e neutral sentences when uploading it from HuggingFace - dataset

I want to work with the sentiment140 dataset for a sentiment analysis task, as I saw that it contains normally the following labels :
0, 4 for pos and neg sentences
2 for neutral sentences
which I found when looking at the dataset on their website :
https://huggingface.co/datasets/sentiment140#data-fields
But after importing it on my notebook it tells me that it contains just two labels :
0 for neg
4 for pos !!!
So how to get full dataset with the three labels ?

You are correct on the fact that the HuggingFace sentiment140 dataset only contains 2 labels in the training set (0 and 4); however, the test set contains the 3 labels (0, 2 and 4).
You could open a discussion here to ask the authors if this is normal.

Related

SageMaker ClientError: Detected non integer labels in the dataset

I have created a SageMaker training job to train on a toy, tabular, multiclass(3) dataset which has failed with the following error:
ClientError: Detected non integer labels in the dataset. For classification tasks, the labels should be integers between 0 to (num_classes-1), exit code: 2
It sounds like they're saying that for the classes (labels) they want to see values between 0 and 2 in this case, as I have 3 classes.
I have set num_classes to 3 and have validated that I only have 3 unique values in the rightmost column of my dataset: 0, 1, and 2
I've set feature_dim to 3. I've removed the headers from my dataset. My raw data looks like 5,000 lines of this:
csv snapshot
Can anyone guess as to what might be causing this error?
I wanted to answer this because at the time of this writing, the error message I recieved returns 0 hits on Google.
It turns out that the issue was that SageMaker expects the class labels to appear in the first column, by default. This is different from how datasets are typically structured. So when I got this error message, SageMaker was looking at my first column which had all sorts of float values. I fixed it by moving my labels to the first column.

Google sheets Find next numeric value from the colnums

I have two rows with SKUs, one comes from one database, and another one from another one. As you can see in the visual example SKUs with values 1, 2, 4 & 5 are present in both databases.
https://i.stack.imgur.com/KlTCr.png
I start with number 1 and I need a formula that would bring up the next valid number (in this case number 2) that is present in both columns.
I would need a formula that would do the following:
If I lookup 1 it should bring 2
If I lookup 2 it should bring 4
If I lookup 4 it should bring 5
Thank you in advance
try:
={FILTER(B2:B16, COUNTIF(E2:E16, B2:B16)),
{QUERY(FILTER(B2:B16, COUNTIF(E2:E16, B2:B16)), "offset 1", ); ""}}

Regexextract number just before specific text Google sheets

I need to extract the number that comes right before the text Positions
Example String:
Medical Specialist (Anaestheologist) (4 Positions) at Ministry
Valid Output should be 4
Example String 2 (If text Positions doesn't exist)
Medical Specialist (Anaestheologist) (4) at Ministry
Valid Output
4
I tried:
=REGEXEXTRACT(A24,"\(.*Positions.*\)") but it did not work.
try:
=ARRAYFORMULA(REGEXEXTRACT(A1:A2; "(\d+)(?: Positions)?"))

Google sheets using Filter and Sort together

This is my first question here. I hope it's ok.
I'm a bit of a newbie using google sheets but I'm slowly progressing.
I'm trying to build a sheet with all my data in sheet 1.
On sheet 2 I would like to Filter all the data from sheet 2 that is marked with the number "1" in column D.
For that purpose, I'm using
=FILTER('Ark1'!A2:C999; 'Ark1'!D2:D999=1)
So far so good. It works.
Then I would like to sort this sheet based on the value in column E.
For that purpose, I'm using
=sort(FILTER('Ark1'!A2:C999; 'Ark1'!D2:D999=1);'Ark1'!E2:E999; SAND)
I get an I/T error. In danish it says:
SORT har forskellige intervalstørrelser. Forventede 2 rækker og 1 kolonner, men indeholder 998 rækker og 1 kolonner.
Google translated to:
SORT has different range sizes. Expected 2 rows and 1 columns, but contains 998 rows and 1 columns.
I have a copy of the sheet here which is editable for your help.
https://docs.google.com/spreadsheets/d/1Eh8aBnXH-SbqHyuvvmaCMc9eoNwZvoAtulxeJXB5-bE/edit?usp=sharing
Any help is much appreciated.
use:
=SORT(FILTER('Ark1'!A2:C999; 'Ark1'!D2:D999=1); 5; 0)
or:
=SORT(FILTER('Ark1'!A2:C999; 'Ark1'!D2:D999=1); 5; 1)

SSRS chart labels

I need some help figuring out how to accurately display the labels in a stacked chart in SSRS; I need a single represenation of the upper stack in the below chart.
The chart itself has two states, it can either be based on red or green data, both are in the same data source.
At the moment the chart looks like this (this is based on green data):
As you can clearly see both the labels inside the chart and the legend is absolutely cluttered. The idea is to have a legend where we have two items (Late issues and Not finished issues); one that displays any non-finished issues and one that displays any non-finished issues that are outside the estimated due-date.
For reference: The above chart should have 1 non-outstanding issue and 5 outstanding issues (3 Ongoing and 2 Open, see below for further info about stages).
Inside the chart we want to have a numerical respresentation of above requirements. Basically a number representing the outstanding issues and one that shows any non-finished issues.
This is what a red representation looks like:
At this point I'm not sure what could be wrong anymore. As mentioned they both run on the same dataset, but with slightly different values.
The red tracker has a simple True/False value that it runs most of its data from whereas the green tracker has a numerical representation of three values (5-7). Where the data it represent is: 5 - Open, 6 - Ongoing, 7 - Closed.
I've attempted to only get the green data both when the series is either of the three above mentioned numbers, but also when it's only getting anything that is un-closed (5,6 but not 7).
This is the code to set the labels for the maroon part of the chart (it's only a workable snippet.):
IIF(Count(IIF(Fields!Outstanding.Value = 1 AND
Fields!TRK_TrackerStatus_LKID2.Value <> 7, 1, Nothing)) = 0, "",
Count(IIF(Fields!Outstanding.Value = 1 AND
Fields!TRK_TrackerStatus_LKID2.Value <> 7, 1, Nothing))))
Basically what it does is checking if there are more than 0 items that are outstanding and that aren't finished (are not 7). If there are more than 0, it sets the label. If there are 0 counted items, the label should be the empty string.
I think what you are wanting to do is group your Series data based on a status number. You can either do this as a case statement in your dataset query or you can use an expression in your Series Group:
The expression I have used there is as follows:
=switch(Fields!Status.Value = 5, "Group 1", Fields!Status.Value = 6, "Group 2", Fields!Status.Value = 7, "Group 2", TRUE, "Group 3")
This esentially assigns a grouping value to your data based on the values in the field. In this case, a Status of 5 becomes Group 1, a Status of 6 or 7 becomes Group 2 and all other values become Group 3 just to ensure bad data is obvious on the report.
What this does is takes the chart as it would be displayed with the raw data (On the left) and turns it into what I think is how you want to see it (On the right):
You will need to apply the same logic to your chart labels as well. For this reason I would recommend you add a column to your original SQL script that does this grouping for you, so you only have to make changes once.

Resources