Watson explorer advanced edition - ibm-watson

Is it possible to change the way that WEX process a word in the "Linguistic Analysis Annotator" section of the pipeline? In my case I have a word, in spanish, that I need as a noun, not as an adjective.
"Seguro" is taken as an adjective, but as a positive sentiment. Also I defined "muy lento" as negative expression in the "Sentiment Analysis" section in WEX. Final result is a neutral phrase.
Is it possible to change the way that "Linguistic Analysis Annotator" handles the "Seguro" word?

Related

SQL Server Problem with carriage splitting a string

DECLARE #Description2 VARCHAR(MAX);
SELECT #Description2 = '1. Each week there will be a philosophical question to address. For example, you will address questions such as do we have freewill or are we determined?, what does quantum physics tell us about the nature of reality?, and what are the philosophical implications of Darwinian evolution? Utilizing the readings for the week prepare a written essay response. There will be a minimum of ten mini-essays for the term and each must be a minimum of 200 words.
2. Develop an argument on the topic of ontology, focusing specifically on the question "Are we just the brain?" Argue either the materialist position (we are just the brain) or the non-materialist position (we are not just the brain), drawing from the primary writings of the philosophers. Be sure to explain both positions in your essay and then make the case for the position you are supporting. This argumentative essay needs to be at least 750 words in length. We will then conduct an in class debate and you will need to argue your point in a debate setting.
3. Develop an argument in the area of ethics, arguing for or against animal rights. Make sure to utilize primary writings in the construction of your argument. This argumentative essay needs to be at least 750 words in length. Students will present their position to the class in a ten minute oral presentation.
4. Having read the writings of Epictetus and Sartre compare and contrast Stoicism and Existentialism. Write a 750 word essay highlighting the key differences and similarities.
5. Analyze the primary readings of Nietzsche in journal form. Choose 10 separate passages to analyze and include the following: a) a summary of the passage; b) an interpretation or analysis of the argument; c) a comparison and/or contrast to the ideas of another philosopher or philosophy; d) personal insight into the writing by applying the ideas to you or to the world at large (its meaning on a deeper, more personal level). This journal will be at least 1000 words in length. '
set #Description2 = replace(replace(replace(replace(replace(replace(replace(#Description2,'<p>',''),'</p>',''),'<br />',''),' ',''),'<br />.<br />',''),'<div>',''),'</div>','')
set #Description2 = concat('???',REPLACE(#Description2,CHAR(13)+char(10),'???'))
select ltrim(s.Item)
from dbo.DelimitedSplit8K_LEAD(#Description2, '???') s
where ltrim(s.Item) <> ''
So what do you actually want to happen? And what are you seeing that is different from what you want?
In SSMS, if you go to tools > options > query results > SQL Server > results to grid, there is a checkbox labelled "retain CR/LF on copy or save" which determines how varchar data will be treated when you click on the output grid results and copy the data.
If you check the checkbox, the carriage return/linefeed will be retained. Ie, if you do:
select 'a
b'
Run that query, copy the result from the grid, and paste it into, say, notepad, you will get 2 lines of text in notepad.
On the other hand, if you don't check the checkbox, you will only get a single line in notepad.
Be aware that if you change the setting of this checkbox, I believe you will need to open a new query window to see the new behaviour.
Note that this only controls the behaviour of SSMS. It has nothing to do with how your data is "really" stored, for example, if you insert it into a table. If you have an application reading data from a table, the way it formats the output is up to the application.

How to tell Watson conversation to not recognize strings as numbers

I'm facing a strange issue with IBM Watson Conversation when capturing numbers in Spanish language:
In Spanish when you write (or say), "please give me an answer" (por favor, dame una respuesta) or "I want to talk with a professional" (quiero hablar con un profesional), Watson recognize the words "una" and "un" as a number. Yes, it is a number (the number 1) but in these phrases they do not have the meaning of a number, they work as an article.
Do you know how to tell Watson to not recognize strings as numbers? I have been thinking about patterns but the numbers can have different length.
According to the Official documentation, the #sys-number system entity detects numbers that are written using either numerals or words. In either case, a numeric value is returned.
When you enable the System Entity #sys-number, this entity always tries to detect if the user typed some number, there are the recognized formats:
21
twenty one (in your case, works with un, una, etc)
3.13
You can see this table showing how to use this entity with other examples:
So, Watson will recognize these values (un, una) like one number, and currently don't have exceptions or configuration for does not recognize something, like your example, the word typed by the user.
If you want for some purpose to send to the user the 'una' or 'un'(literal format example), just add in your conversation response:
The number is #sys-number.literal
And the return in the bot will be:
The number is un?
See more about #sys-number System entity.
See more about System entities.

Strange SQL Server Fulltext match

I stumbled upon strange fulltextindex behavior in SQL Server 2008 R2 (my word-breaker language is German).
I have this text indexed:
[...] Java Editorerstellung in Eclipse eines Modellierungseditors(UML) mit den Eclipse Technologien [...]
I triple-checked: The only occurrence of the term edi is in this short snippet of text, I can only find it as part of Editorerstellung und Modellierungseditors.
But SQL Server still has edi as a single word in it's fulltextindex (occurrence: 1) and therefore returns it on ContainsTable(...) searches. Why is it recognized as a single word?
Has anybody an explanation for this behavior? Thanks.
German Compound words require special parsing in natural language word breakers. For example, "Editorerstellung" is parsed and stored as three separate terms, "editor", "erstellung" and "editorerstellung". Extensive research has been done on analyzing German compund words and while the techniques are improving, the process is not perfect.
It is likely that the behavior you are seeing is due to heuristics being using in the word breaker. I cannot re-produce your issue using the above snippet and the Sql Server 2012 word breaker, so either Microsoft's improvement in the German word-breaker between Sql Server 2008 R2 and Sql Server 2012 solved the problem or some text you didn't include is the source of "edi" in the full-text index.
You can use sys.dm_fts_index_keywords_by_document() to see what terms are in the index. Using a binary search pattern, you should be able to narrow it down to the specific text that is generating the "edi" term.

Regular Expression in Visual Studio Find & Replace - multiple spaces between search terms

I require a regular expression for the Visual Studio Search and Replace functionality, as follows:
Search for the following term: sectorkey in (
There could be multiple spaces between each of the above 3 search terms, or even multiple line breaks/carriage returns.
The search term is looking for SQL statements that have hard-coded SectorKey values inside a SQL in statement. These need to be replaced with a SQL join statement - this will be done manually.
The little arrow to the right of the Find What box is your friend and can help you with the vagaries of the MS regex syntax.
Newline is represented by \n, so you can just do sectorkey( |\n)+in( |\n)+\( (You need to escape the open paren in your search expression, since that's used in grouping.)
I believe :Wh+ is what you want. The Visual Studio regex flavor is very strange; you'll tend to get better results if you consult the official reference. Expertise with "mainstream" regexes tends to be more of a handicap than a help when it comes to VS.
You can use \s+ to search for one or more adjacent whitespace characters (including tab, CR, LF etc), so your regex would presumably end up looking something like sectorkey\s+in\s+\(.
Edit...
As Joe points out in his comment, it seems that Visual Studio doesn't support \s in Find/Replace expressions, in which case you'll probably need to use something like [\n:b] instead. The regex would then become sectorkey[\n:b]+in[\n:b]+\(.

Make SQL Server index small numbers

We're using SQL Server 2005 in a project. The users of the system have the ability to search some objects by using 'keywords'. The way we implement this is by creating a full-text catalog for the significant columns in each table that may contain these 'keywords' and then using CONTAINS to search for the keywords the user inputs in the search box in that index.
So, for example, let say you have the Movie object, and you want to let the user search for keywords in the title and body of the article, then we'd index both the Title and Plot column, and then do something like:
SELECT * FROM Movies WHERE CONTAINS(Title, keywords) OR CONTAINS(Plot, keywords)
(It's actually a bit more advanced than that, but nothing terribly complex)
Some users are adding numbers to their search, so for example they want to find 'Terminator 2'. The problem here is that, as far as I know, by default SQL Server won't index short words, thus doing a search like this:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"')
is actually equivalent to doing this:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator"') <-- notice the missing '2'
and we are getting a plethora of spurious results.
Is there a way to force SQL Server to index small words? Preferably, I'd rather index only numbers like 1, 2, 21, etc. I don't know where to define the indexing criteria, or even if it's possible to be as specific as that.
Well, I did that, removed the "noise-words" from the list, and now the behaviour is a bit different, but still not what you'd expect.
A search won't for "Terminator 2" (I'm just making this up, my employer might not be really happy if I disclose what we are doing... anyway, the terms are a bit different but the principle the same), I don't get anything, but I know there are objects containing the two words.
Maybe I'm doing something wrong? I removed all numbers 1 ... 9 from my noise configuration for ENG, ENU and NEU (neutral), regenerated the indexes, and tried the search.
These "small words" are considered "noise words" by the full text index. You can customize the list of noise words. This blog post provides more details. You need to repopulate your full text index when you change the noise words file.
I knew about the noise words file, but I'm not why your "Terminator 2" example is still giving you issues. You might want to try asking this on the MSDN Database Engine forum where people that specialize in this sort of thing hang out.
You can combine CONTAINS (or CONTAINSTABLE) with simple where conditions:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"') and Title like '%Terminator 2%'
While the CONTAINS find all Terminator the where will eliminate 'Terminator 1'.
Of course the engine is smart enough to start with the CONTAINS not the like condition.

Resources