Watson Discovery Service Issue - ibm-watson

Right Way - It's working
Wrong Way - Isn't working how should be
I'd like your help about an issue. I'm using wds and so I created a collection that was uploaded by several pieces of a manual. Once I did it, on the conversation service I also created, I put some descriptions on the intentions that the Discovery should uses. Now, when I try to identify these descriptions on the Discovery Service, unless I write exactly the same to test, it's not recognizing. Any suggestion about what can I use to fix it?
e.g. I uploaded a metadata txt file with the following fields:
+---------------------+------------+-------------+-----------------------+---------+------+
| Document | DocumentID | Chapter | Session | Title | Page |
+---------------------+------------+-------------+-----------------------+---------+------+
| Instructions Manual | BR_1 | Maintenance | Long Period of Disuse | Chassis | 237 |
+---------------------+------------+-------------+-----------------------+---------+------+
Now, when I search on the Discovery, I need to use the exactly word I put on the intention's description (Chassis). Otherwise the Discovery it's not getting through the way below:
metadata.Title:chas*|metadata.Chapter:chas*|metadata.Session:chas*
Any idea??

Please check the syntax if its right or wrong by matching it with discovery tool.
Sometimes we need inverted commas with backslash.

Related

Too many languages in solr config

We have a solr configuration based on apache solr 8.52.
We use the installation from the TYPO3 extension ext:solr 10.0.3.
In this way we have multiple (39) languages and multiple cores.
As we do not need most of the languages (for sure we need one, maybe two further) I tried to remove most of them with deleting (moving to another folder) all the configurations I identified as other languages, leaving only these folders and files in the solr folders:
server/
+-solr/
| +-configsets/
| | +-ext_solr_10_0_0/
| | +-conf/
| | | +-english/
| | | +-_schema_analysis_stopwords_english.json
| | | +-admin-extra.html
| | | :
| | | +-solrconfig.xml
| | +-typo3lib
| | +-solr-typo3-plugin-4.0.0.jar
| +cores/
| | +-english/
| | +-core.properties
| +-data/
| | +-english/
: : :
I thought that after restarting the server it would only present one language and one core. This was correct.
But on start it noted all the other languages as missing like:
core_es: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core core_es: Error loading schema resource spanish/schema.xml
Where does solr get this information about all these languages I don't need?
How can I avoid this long list of warnings?
First of all, it does not hurt to have those cores. As long as they are empty and not loaded, they do not take much RAM and CPU.
But if you still want to get rid of them, you need to do it correctly. If you just move core's data directory, this does not mean it is deleted because solr server also needs to adjust config files. Best way is to use curl like this:
curl 'http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core_en&deleteInstanceDir=true'
That would remove the core and all its data.

Behat test don't send the same information on database with a json array

In my features when i execute my behat test i send it like this :
And the following user:
| id | array |
| ID1 | [{"key1":"value1","key2":"value2"}] |
But in my database i receive this information
| id | array (DC2Type:json_array) |
| ID1 | ["[{\"key1\":\"value1\"","\"key2\":\"value2\"}]"] |
So i can't use this informations on my array.
Have you got any idea which expression i have to use for have the same informations like tests entries?
I precise i work with symfony 3.4.15, API Platform and PhpMyAdmin
Thank you!
Feature files are flexible, but we should avoid adding to much details.
I would hide all unnecessary info from the scenario that doesn't bring any value.
You could hide this information in the feature file and create a method that sets some user details based on some key/parameter identifier.
/**
* #Then /^I have an (.*) user$/
*/
public function iHaveAUser($user) {
// generates/gets some data in any format you need
$dataINeed = generateUser($user);
}

Designing a caching layer in front of a DB with minimal number of queries

I have multiple jobs that work on some key. The jobs are ran asynchronously and are written to some write-behind cache. Conceptually it looks like this:
+-----+-----------+----------+----------+----------------+
| key | job1 | job2 | job3 | resolution |
+-----+-----------+----------+----------+----------------+
| 123 | job1_res | job2_res | job3_res | resolution_val |
+-----+-----------+----------+----------+----------------+
The key concept is that I don't know in advance how many jobs are running. Instead, when it's time to write the record we add our "resolution" (based on the current job results we've got) and write all values to the DB (MongoDB if that's matter)
I also have a load() function that runs in case of a cache-miss. What it does is to fetch the record from the database, or creating a new (and empty) one if the record wasn't found.
Now, there's a time window where the record isn't in the cache nor in the database. In that time, a "slow worker" might write its result, and unluckily the load() function will create a new record.
When evacuated from the cache, the record will look like this:
+-----+----------+-------------------------------+
| key | job4 | resolution |
+-----+----------+-------------------------------+
| 123 | job4_val | resolution_based_only_on_job4 |
+-----+----------+-------------------------------+
I can think of two ways to control this problem:
Configure the write-behind mechanism so it will wait for all jobs to complete (i.e. give sufficient amount of time)
On write event, first query the DB for the record and merge results.
Problems with current solutions:
Hard to calibrate
Demands an extra query for write operation
What's the most natural solution to my problem?
Do I have to implement solution #2 in order to guarantee a resolution on all job results?
EDIT:
Theoretically speaking, I think that even after implementing solution #2 it doesn't give us the guarantee that the resolution will be based on all job results.
EDIT2:
If the write-behind mechanism guarantees order of operations then solution #2 is ok. This can be achieved by limiting the write-behind to one thread.

Technique to map any number of varying data schemas into a normalised repository

I'm looking for some guidance, what's the best way to transform data from A to B. I.e. if each client has different data but essentially recording the same details, what is the best way to transform this into a common schema?
Simple example showing different schemas
Client A | Id | Email | PricePaid | Date |
Client B | Id | Email | Price | DateOfTransaction |
Result schema
| Id | Email | Price | Order_Timestamp |
This simple example shows two clients record essentially the same things but with different names for the price and date of the order. Is there a good technique to automate this for any future schema but purely through configuration? I.e. maybe something with XML/ XSD perhaps?
Many suggestions welcome.
Thanks,

Best way to apply FIR filter to data stored in a database

I have a PostgreSQL database with a few tables that store several million of data from different sensors. The data is stored in one column of each row like:
| ID | Data | Comment |
| 1 | 19 | Sunny |
| 2 | 315 | Sunny |
| 3 | 127 | Sunny |
| 4 | 26 | Sunny |
| 5 | 82 | Rainy |
I want to apply a FIR filter to the data and store it in another table so I can work with it, but because of the amount of data I'm not sure of the best way to do it. So far I've got the coefficients in Octave and work with some extractions of it. Basically I export the column Data to a CSV and then run a csvimport in Octave to have it in a array and filter it. The problem is that this method doesn't allow me to work with more of several thousand data at the time.
Things I've been looking so far:
PostgreSQL: I've been looking for someway to do it directly in the database, but I haven't been able to find any way to do it so far.
Java: Another possible way to do it is making a small program that extracts chunks of data each time, recalculates the data using the coefficients and stores it back in other table of the database.
C/C++: I've seen some questions and resolutions about how to implement the filter in StackOverflow here, here or here, but they seem to be for working with data on real time and not talking advantage of having all the data already.
I think the best way would be to do it directly with PostgreSQL and with Java or C/C++ would be too slow, but I don't have too much experience working with so much data so probably I'm wrong. Just need to know why and where to point myself to.
What's the best way to apply a FIR filter to data stored on a database, and why?

Resources