Data sets for emotion detection in text [closed] - database

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm implementing a system that could detect the human emotion in text. Are there any manually annotated data sets available for supervised learning and testing?
Here are some interesting datasets:
https://dataturks.com/projects/trending

The field of textual emotion detection is still very new and the literature is fragmented in many different journals of different fields. Its really hard to get a good look on whats out there.
Note that there a several emotion theories psychology. Hence there a different ways of modeling/representing emotions in computing. Most of the times "emotion" refers to a phenomena such as anger, fear or joy. Other theories state that all emotions can be represented in a multi-dimensional space (so there is an infinite number of them).
Here are a some (publicly available) data sets I know of (updated):
EmoBank. 10k sentences annotated with Valence, Arousal and Dominance values (disclosure: I am one of the authors). https://github.com/JULIELab/EmoBank
The "Emotion Intensity in Tweets" data set from the WASSA 2017 shared task. http://saifmohammad.com/WebPages/EmotionIntensity-SharedTask.html
The Valence and Arousal Facebook Posts by Preotiuc-Pietro and
others:
http://wwbp.org/downloads/public_data/dataset-fb-valence-arousal-anon.csv
The Affect data by Cecilia Ovesdotter Alm:
http://people.rc.rit.edu/~coagla/affectdata/index.html
The Emotion in Text data set by CrowdFlower
https://www.crowdflower.com/wp-content/uploads/2016/07/text_emotion.csv
ISEAR:
http://emotion-research.net/toolbox/toolboxdatabase.2006-10-13.2581092615
Test Corpus of SemEval 2007 (Task on Affective Text)
http://web.eecs.umich.edu/~mihalcea/downloads.html
A reannotation of the SemEval Stance data with emotions:
http://www.ims.uni-stuttgart.de/data/ssec
If you want to go deeper into the topic, here are some surveys I recommend (disclosure: I authored the first one).
Buechel, S., & Hahn, U. (2016). Emotion Analysis as a Regression Problem — Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation. In ECAI 2016.22nd European Conference on Artificial Intelligence (pp. 1114–1122). The Hague, Netherlands (available: http://ebooks.iospress.nl/volumearticle/44864).
Canales, L., & Martínez-Barco, P. (n.d.). Emotion Detection from text: A Survey. Processing in the 5th Information Systems Research Working Days (JISIC 2014), 37 (available: http://www.aclweb.org/anthology/W14-6905).

Related

Custom training Extract PDF into table [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I have a PDF file that includes a table and I want to convert it into table structured data.
My PDF file includes a pretty complex table which makes most tool insufficient. For example,
I tried to use the following tools and they didn't extract it well: AWS Textract, Google AI Document, Google Vision, Microsoft Text Recognition.
Actually, Google AI Document managed to do about 70% correct but it is not good enough.
So, I searched for a way to customize train model, so that when extracting this table, it will extract it properly. I tried Power Apps AI Builder and Google AutoML Entity Extraction, but both of them didn't help (BTW, I wasn't what AutoML's purpose, is it for prediction or also possible to customize table extraction?).
I would like to know which tools are good for my use case and if there is any (AI) tool that I can use to train these kind of tables, so that the text extraction will be better.
Most text extractors should hold that structure if it is rendered crisp enough, but layout can be many a fickle mis-trees.
Here it correctly picked up the mis-spelling of reaar but failed in first line on 05.05.1983
On an identical secondpass the failings are different
3 29.06.1983 Part of Ground Floor of 05.05.1983 GM315727
2 (part of) Conavon Court 25 years from
1.3.1983
4 31.01.1984 Part of Third Floor Conavon 30.12.1983 GM335793
4 (part of) Court 25 years from
12.8.1983
5 19.04.1984 I?art of Basement Floor of 23.01.1984 GM342693
l (part of), 2 Conavon C:ourt 25 years from
(part of), 3 20.01.1984
(part Of ) , 4
(part of)
NOTE: The Lease also grants a right of way for the purpose only of
loading and unloading and reserves a right of way in case of emergency
only from the boiler house adjacent hereto
6 14.06.1984 Part of Third Floor Conavon 31.10.1983 GM347623
3 (part of) Court 25 years from
31.10.1983
7 14.06.1984 Part of the Third Floor 31.10.1983 GM347623
3 (part: of}, 4 Conavon Court 25 years from
(part of) 31.10.1983
8 01.10.1984 "The Italian Stallion'' 17.08.1984 GM357142
4 (part of) Conavon Court (Basement) 25 years from
20.1.1984
NOTE: The Lease also grants a right of way for the purpose only of
loading and unloading and a right of access through the security door
at the reaar of the building
9 06.07.2016 3rd floor 14-16 Blackfriars 28.06.2016
4 (part of}, 5 Streec 5 years from
(part of) 25/06/2016
That's the beauty of OCR, every run can be a different pass rate per character so experience says use best of three estimates. Thus run 3 different ways and comparing character by character keep those that are in agreement.

Open-source system/service/database alternative to a search algorithm [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I have a custom algorithm (written in java) to search in a list of strings (A) the string that is the longest substring of another string (B) (the longest common substring alg it's not suited in this case because the list of strings is big 100k+ ).
EX:
B -> "sadsaf dsfsc adsa 4 sad3 dfa fgs adsafd"
A -> ["fdsdf dsa", "adsa", "4 sad3", "cdsdfds dsa", "cx d45"]
And the result is "4 sad3" since its a subtring of B and also is
longer than "adsa" which is also a substring of B
I'm trying to find an alternative to a search algorithm using a system/service/database in order to externalize this algorithm .
What i've tried till now is:
mysql but it's pretty slow and it requires at least a ssd and a powerful cpu
elasticsearch using percolate query which i didn't benchmark yet but seems promising
redis but i didn't found a way to replicate the alg using their syntax
So any suggestion regarding a system/service/database that can do this relatively decent in terms of performance is appreciated, since the more options i'll have then the better (faster) the solution will be.

Database which has categorized the english words into matching emotion [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
is there a database or api which has the categorized version of English words into the matching emotion?
e.g: - http://www.psychpage.com/learning/library/assess/feelings.html
One useful resource is the NRC Word-Emotion Association Lexicon compiled by Saif Mohammad. It lists the sentiments (positive, negative) and emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) for around 14,000 English words.
I would take a look into the topic of http://en.wikipedia.org/wiki/Sentiment_analysis If you are good with doing it in Python take a look at this demo: http://text-processing.com/demo/sentiment/ Which is able to get if a sentence is positive or negative using NLTK.

Count a 2D array within a range [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have a map that I want to separately count the patterns of different numbers.
Without VB, I want to be able to create a dynamic counter that will be able to count the patterns of numbers.
For example:
I want to count how many times, even if it overlaps that this pattern occurs in the map
2 2
2 2
Counting I can see the pattern occurs six times but I'm struggling to create a simple array formula that will be able to do so
I've been told of success with and IF function with nested AND functions so I know it can be done without VB.
Use the formula
=COUNTIFS(A1:E15,2,B1:F15,2)
notice how the two areas are adjacent - one column offset from each other.
You can extend this to find two-by-two regions:
=COUNTIFS(A1:E14,2,B1:F14,2,A2:E15,2,B2:F15,2)
just be very careful about how the different ranges are offset.
An alternative way to write this which, I suspect, will be more efficient for large ranges is:
=SUMPRODUCT((A1:E14=2)*(B1:F14=2)*(A2:E15=2)*(B2:F15=2))

Why is a database always represented with a cylinder? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
This question came up today and I couldn't find any historical answer as to why a database is always represented as a cylinder. I am hoping someone in the stack world would know why and have a link or something backing it up.
I'm reasonably certain that it predates disk drives, and goes back to a considerably older technology: drum memory:
Another possibility (or maybe the choice was based on both) is a still older technology: mercury tank memory:
You may have seen the symbol oriented horizontally instead of vertically, but horizontal drums were common as well:
You asked for more pics. I took these at the computer history museum in Mountain View, CA in May 2016.
Description for the above image says:
UNIVAC I mercury memory tank, Remington Rand, US, 1951
For memory, the UNIVAC used seven mercury delay line tanks. Eighteen pairs of crystal transducers in each tank transmitted and received data as waves in mercury held at a constant 149°F
Gift of William Agee X976.89
Description for the above image says:
Williams-Kilburn tube - Manchester Mark I, Manchester University, UK, ca 1950
This was the memory in the Manchester Mark I, the successor to the "Baby." It stored only 128 40-bit words. Each bit was an electric charge that created a spot of light on the face of a "TV tube."
Gift of Manchester University Computer Science Department, X67.82
It's because people view a DB as simple storage, much like a disk. And disk storage has always been represented by a cylinder due to, well, the physical properties of spinning magnetic disks.
I always assumed it stood for the round edges of a hard drive platter. The average consumer might not have necessarily known what a Physical Hard Drive Component looked like, so it was represented as a cylinder.

Resources