Query tensor in vespa.ai - vespa

Imitating: https://blog.vespa.ai/billion-scale-knn/
Command line:
curl -s -d '{"yql":"select * from user where {\"targetHits\":10}nearestNeighbor(approximate, q_binary_code);","ranking.features.query(q_binary_code)":[1,2,3,4,5,6,7,8,9,10],"hits":10}' -H "Content-Type: application/json" -X POST http://localhost:8080/search/ | jq .
Error message:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 4,
"summary": "Invalid query parameter",
"source": "content",
"message": "Expected a tensor value of 'query(q_binary_code)' but has [1,2,3,4,5,6,7,8,9,10]"
}
]
}
}
Question: How pass q_binary_code?

With recent Vespa versions, you can define the query tensor in the schema. It must be defined
schema code {
document code {
field id type int {..}
field binary_code type tensor<int8>(b[16]) {..}
}
rank-profile coarse-ranking {
inputs {
query(q_binary_code) tensor<int8>(b[16])
}
num-threads-per-search:12
first-phase { expression { closeness(field,binary_code) } }
}
You also must define the rank profile in the query request:
curl -s -d '{"yql":"select * from user where {\"targetHits\":10}nearestNeighbor(binary_code, q_binary_code);","ranking.features.query(q_binary_code)":[1,2,3,4,5,6,7,8,9,10],"hits":10, "ranking": "coarse-ranking"}' -H "Content-Type: application/json" -X POST http://localhost:8080/search/ | jq .

Related

How to get PubSub message data in an API?

I want to get this kind of message json data from Pub/Sub
{
"message": {
"data": {
"from": "no-reply#example.com",
"to": "user#example.com",
"subject": "test",
"body": "test"
}
}
}
And parse its data to use for other service.
private parseMessage(message: Message) {
try {
const decoded = Buffer.from(message.data.toString(), 'base64').toString().trim();
return JSON.parse(decoded);
} catch (err) {
throw new BadRequestException(
'Parse Error: ' + message,
);
}
}
But when run the API got this error:
SyntaxError: Unexpected token � in JSON at position 0
at JSON.parse (<anonymous>)
at EventController.parseMessage (../myapp/src/api/posts/posts.controller.ts:44:18)
response: {
statusCode: 400,
message: 'Parse Error: [object Object]',
error: 'Bad Request'
},
status: 400
It seems this post isn't right:
curl -X 'POST' \
'http://localhost:3000/posts' \
-H 'Content-Type: application/json' \
-d '{
"message": {
"data": {
"from": "no-reply#example.com",
"to": "user#example.com",
"subject": "test",
"body": "test"
}
}
}'
Then how to make fake Pub/Sub message data?
I think you need to encode your data into Base64.
Base64 encoding schemes need to encode binary data that needs to be stored and transferred over media that are designed to deal with ASCII. This is to ensure that the data remain intact without modification during transport.
You can also refer to this GCP public documentation.
Eg. From the Doc:
# 'world' base64-encoded is 'd29ybGQ='
curl localhost:8080 \
-X POST \
-H "Content-Type: application/json" \
-d '{
"context": {
"eventId":"1144231683168617",
"timestamp":"2020-05-06T07:33:34.556Z",
"eventType":"google.pubsub.topic.publish",
"resource":{
"service":"pubsub.googleapis.com",
"name":"projects/sample-project/topics/gcf-test",
"type":"type.googleapis.com/google.pubsub.v1.PubsubMessage"
}
},
"data": {
"#type": "type.googleapis.com/google.pubsub.v1.PubsubMessage",
"attributes": {
"attr1":"attr1-value"
},
"data": "d29ybGQ="
}
}'

curl post array data

My JSON Block
{
"type": "abcd",
"data": {
"id": [
"efgh"
],
"model": "ijkl"
}
I tried updating 'type' using the following curl post, but it doesn't like it.
curl -X PUT -d '{"id":["ABCD"]}'

Exact dismax search in multivalue field with spaces

I'm trying to add tags to user-provided text in order to automagically classify the article.
It works pretty well except for words with spaces.
For instance, I want to add the "clothes" tag when user type the following words in that order: "tee shirt" or "tee shirts".
The sentence "my tee shirt is blue" should brings a result since "tee shirt" is written correctly in that order but neither "tee my shirt" nor "my shirt" should return a result.
I have a dedicated "tags" core to do that.
I create an empty core with
/opt/solr/bin/solr create -c "tags"
and update the core schema using curl
curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field-type" : { "name":"myShingleTextField", "class":"solr.TextField", "positionIncrementGap":"100", "analyzer" : { "tokenizer":{ "class":"solr.StandardTokenizerFactory" }, "filters":[ { "class":"solr.LowerCaseFilterFactory" }, { "class":"solr.ShingleFilterFactory", "maxShingleSize":"3", "outputUnigrams":"true" }, ]}} }' http://localhost:8983/solr/tags/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field": { "name":"keywords", "type":"myShingleTextField", "multiValued":true, "indexed":true, "stored":true, "required":true, "docValues":false } }' http://localhost:8983/solr/tags/schema
curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field": { "name":"results", "type":"string", "multiValued":true, "indexed":true, "stored":true, "required":true, "docValues":true } }' http://localhost:8983/solr/tags/schema
I then /update it with the following (simplified) document:
{ "add": { "doc": { "keywords": ["tee shirt", "tee shirts"], "results": ["clothes"] } }, "commit": { } }
I finally do my query:
/select?defType=dismax&q=tee%20my%20shirt&qf=keywords
It return a document while I don't want one ("my" between "tee" and "shirt").
Maybe it's a tokenizer issue or maybe dismax query is not what I need.
I tried escaping quotes or spaces, modifying the mm parameter to 2 (which kinda works but prevents one-word to match) and other tweaking that didn't work.
Any idea ?
Thanks to #MatsLindth hints, I came up with a solution.
First, I needed to replace spaces in multiple words by '_' and to use a KeywordTokenizerFactory tokenizer to store them verbatim.
On the query side, I had to specify "tokenSeparator" parameter as '_'.
So my custom field type definition command is now :
curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field-type" : { "name":"myShingleTextField", "class":"solr.TextField", "positionIncrementGap":"100", "indexAnalyzer": { "tokenizer":{ "class":"solr.KeywordTokenizerFactory" } }, "queryAnalyzer" : { "tokenizer":{ "class":"solr.StandardTokenizerFactory" }, "filters":[ { "class":"solr.LowerCaseFilterFactory" }, { "class":"solr.ShingleFilterFactory", "maxShingleSize":"3", "outputUnigrams":"true", "tokenSeparator":"_" }, ]}} }' http://localhost:8983/solr/tags/schema
My update command is now:
{ "add": { "doc": { "keywords": ["tee_shirt", "tee_shirts"], "results": ["clothes"] } }, "commit": { } }
So when I do :
/select?defType=dismax&q=tee%20my%20shirt&qf=keywords
we have
"tee my shirt"
"tee" "my" "shirt" (StandardTokenizerFactory + LowerCaseFilterFactory)
"tee" "tee_my" "tee_my_shirt" "my" "my_shirt" "shirt" (ShingleFilterFactory)
Nothing matches as expected but querying "my tee shirt" brings "tee_shirt" which obviously matches "tee_shirt", yay!
Thanks again to #MatsLindth and Solr analysis page!

equiv: "Could not add an item of type WORD_ALTERNATIVES"

Using equiv() on an empty table throws a strange error in vespa.ai 7.99.22:
Could not add an item of type WORD_ALTERNATIVES: Equiv can only have word/int/phrase as children
Definition:
search post {
document post {
field description type string {
indexing: summary | index
stemming: multiple
}
}
fieldset text {
fields: description
}
}
Query (no rows in table post):
curl -s -H "Content-Type: application/json"
--data '{"yql" : "select * from post where text contains equiv(\"Q123\",\"Q456\");"}'
http://localhost:8080/search/ | jq .
Result:
{
"root": {
"id": "toplevel",
"relevance": 1,
"fields": {
"totalCount": 0
},
"errors": [
{
"code": 4,
"summary": "Invalid query parameter",
"source": "content",
"message": "Could not add an item of type WORD_ALTERNATIVES: Equiv can only have word/int/phrase as children"
}
]
}
}
What is the issue?
Using stemming:multiple leads to a WordAlternativesItem which is not a permitted child of EquivItem, so this combination is not supported.
However, we believe this is unnecessarily restrictive. I'ill lift this restriction now, please try again in the next version which should be out on Monday (2019-09-16) if the winds are favourable.

Migrating existing data in cloudant

Say I have a data structure like this in cloudant where this is one record:
{
"UserId": "0014807347",
"Conq": {
"reqs": "Testing",
"tag": "ARRANGEMENT"
},
"Outcome": {
"tag": "ARRANGEMENT",
"rating": 0
},
"id": "cdc11dc55a0006bb544d235e7dc1540a"
}
How could I transform each record of a particular table to add new fields?
Do a PUT with the id and current revision with the updated JSON body:
curl https://$USERNAME:$PASSWORD#$USERNAME.cloudant.com/$DATABASE/cdc11dc55a0006bb544d235e7dc1540a\
-X PUT \
-H "Content-Type: application/json" \
-d "$JSON"
{
"_id": "cdc11dc55a0006bb544d235e7dc1540a",
"_rev": "1-THE_CURRENT_REV_ID_HERE",
"UserId": "0014807347",
"Conq": {
"reqs": "Testing",
"tag": "ARRANGEMENT"
},
"Outcome": {
"tag": "ARRANGEMENT",
"rating": 0
},
"my_new_data_field": "My New Content Goes Here"
}
}
You should get a response of the type:
{
"ok":true,
"id":"cdc11dc55a0006bb544d235e7dc1540a",
"rev":"2-9176459034"
}
The current revision (indicated by 1-THE_CURRENT_REV_ID_HERE above) should be the revision you got when the document was last written.

Resources