Sorting result set by date attribute in 'vespa.ai' - vespa

How to sort a resultset by date (unixtime)?
search post {
document post {
field created type long {
indexing: summary | attribute
}
field description type string {
indexing: summary | index
}
}
rank-profile date inherits default {
first-phase {
expression: attribute(created)
}
}
}
curl:
curl -s -H "Content-Type: application/json" --data
'{"yql" : "select * from post where description contains \"computer\";","ranking":"date"}'
http://localhost:8080/search/ | jq .
The resultset is not sorted by 'created'.
And 'relevance' is always zero:
{
"id": "id:post:post::1",
"relevance": 0,
"source": "content",
"fields": {...}
}

For straight forward sorting on long attributes, it would be more efficient to use the sorting/ordering functionality rather than the more powerful, yet more expensive ranking framework.
As mentioned in the sorting documentation, it is recommended to use the built-in unranked ranking profile for queries with sorting/ordering. Also, I'm not sure the ranking alias is allowed when using the JSON query language - I believe you would need to use the full ranking.profile parameter which is nested in JSON.
Your curl would then look something like:
curl -s -H "Content-Type: application/json" --data
'{"yql" : "select * from post where description contains \"computer\" order by created desc;","ranking": { "profile" : "unranked" } }'
http://localhost:8080/search/ | jq .

Related

How to check not exist field in document in vespa

curl --location --request POST 'http://xxxxxx/search/' \
--header 'Content-Type: application/json' \
--data-raw '{
"offset": 0,
"hits": 60,
"ranking.softtimeout.factor": 0.7,
"ranking.profile": "default",
"yql": "select id, origin_id from mt_challenge where best_score = \"null\";",
"timeout": "1500ms",
"request_timeout": 5
}'
best_score field is float type how to checkout all document with not exist best_score field
Sorry but NaN/null values are not searchable so this is currently not supported. Using a default value at ingestion time which does not conflict with your real data can be used to work around this.

primary key must not be string?

define a schema just like this:
field id type string {
indexing: summary | index
}
then post some data
curl --location --request POST 'http://xxxxxxx:xxx/document/v1/mt_challenge/mt_challenge/docid/1680092807002114?create=true' --header 'Content-Type: application/json' --data-raw ' {"fields": {"id": {"assign": "1680092807002114"}}}'
and return
{"errors":[{"description":"UNSPECIFIED Has created a composite field value the reader does not know how to handle: com.yahoo.document.datatypes.StringFieldValue This is a bug. token = START_OBJECT","id":-15}],"id":"id:mt_challenge:mt_challenge::1680092807002114","pathId":"/document/v1/mt_challenge/mt_challenge/docid/1680092807002114"}
POST is used to create a document, PUT to update. Here you use POST with using assign which generates the error.
Correct syntax for POST (create) :
curl --location --request POST 'http://xxxxxxx:xxx/document/v1/mt_challenge/mt_challenge/docid/1680092807002114?create=true' --header 'Content-Type: application/json' --data-raw ' {"fields": {"id":"1680092807002114"}}'

LDAP - How to filter users with specific attributes

We have an Active Directory but don't have direct access to the machine hosting this AD, so I'm using a Linux box to connect to it.
We are able to successfully login using :
ldapsearch -x -h ldapmd.ad.test.com -p 3268 -D "cn=test\, test1,ou=users,ou=Australia,ou=asia,OU=Sites,DC=ad,DC=test,DC=com" -W -b "OU=Access-Groups,OU=OrgResources,DC=ad,DC=test,DC=com"
Is there a filter that I could add to get all users with the following attributes :
Common Name
email
sAMAccountName
Country
You can use this filter to grab only users : (|(objectCategory=person)(objectClass=user))
For the attribute list, refer to this mapping :
Friendly Name | Attribute Name
-----------------------------------
Common Name | cn
email | mail
Username | sAMAccountName
Country | co (open string value)
Country Name | c (ISO-3166 2-digit string value)
Country Code | countryCode (ISO-3166 Integer value)
So you can try something like (using placeholders for readability) :
ldapsearch -x -H ldap://<host>:<port> -D <binddn> -W -b <base> <filter> cn,mail,sAMAccountName,co
If you want to grab only users for which some of / all these atributes are set, just extend the filter with a presence (=*) filter. For example, if email and country are required, the filter string would be :
(&(|(objectCategory=person)(objectClass=user))(mail=*)(co=*))

How to extract data from specific fields in a NESTED JSON using AWS Athena - Presto?

I have JSONs in the below format in a S3 bucket and I'm trying to extract only the "id", "label" & "value" from the "fields" key using Athena. I tried ARRAY-MAP but wasn't successful. Also, on the "value" field - I want the content to be captured as a simple text ignoring any list / dictionaries in it.
I also don't want to create any Hive schema for these JSONs and looking for a Presto SQL solution if possible.
{
"reports":{
"client":{
"pdf":"https://reports.s3-accelerate.amazonaws.com/looks/123/reports/client.pdf",
"html":"https://api.com/looks/123/reports/client.html"
},
"public":{
"pdf":"https://s3.amazonaws.com/reports.com/looks/123/reports/public.pdf",
"html":"https://api.look.com/looks/123/reports/public.html"
}
},
"actors":{
"looker":{
"firstName":"Rosa",
"lastName":"Mart"
},
"client":{
"email":"XXX.XXX#XXXXXX.com",
"firstName":"XXX",
"lastName":"XXX"
}
},
"_id":"123",
"fields":[
{
"id":"fence_condition_missing_sections",
"context":[
"Fence Condition"
],
"label":"Missing Sections",
"type":"choice",
"value":"None"
},
{
"id":"photos_landscaped_area",
"context":[
"Landscaping Photos"
],
"label":"Landscaped Area",
"type":"photo-with-description",
"value":[
{
"description":"Front",
"photo":"https://reports-wegolook-com.s3-accelerate.amazonaws.com/looks/123/looker/1.jpg"
},
{
"description":"Front entrance ",
"photo":"https://reports-wegolook-com.s3-accelerate.amazonaws.com/looks/123/looker/2.jpg"
}
]
}
],
"jobNumber":"xxx",
"createdAt":"2018-10-11T22:39:37.223Z",
"completedAt":"2018-01-27T20:13:49.937Z",
"inspectedAt":"2018-01-21T23:33:48.718Z",
"type":"ZZZ-commercial",
"name":"Commercial"
}'
expected output:
--------------------------------------------------------------------------------
| ID | LABEL | VALUE |
--------------------------------------------------------------------------------
| photos_landscaped_area | Landscaped Area | [{"description":"Front",...}] |
----------------------------------------------------------------------------
| fence_condition_missing_sections | Missing Sections | None|
----------------------------------------------------------------------------
I'm going to assume your data is in a one-document-per-line format and that you provided a formatted example for readability's sake. If this is incorrect, please see the question Multi-line JSON file querying in hive
.
When the schema of a JSON document is not entirely regular you can create that column as a string column and use the JSON_* functions to extract values out of it.
First you need to create a table for the raw data:
CREATE TABLE data (
fields array<struct<id:string,label:string,value:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://…'
(if you're not interested in the other fields in the JSON documents you can just ignore those when creating the table)
Then you create a view that flattens the data:
CREATE VIEW flat_data AS
SELECT
field.id,
field.label,
field.value
FROM data
CROSS JOIN UNNEST(fields) AS f(field)
Selecting from this view should give you the results you are looking for.
I suspect you are also looking for how to extract properties from the values structure, which is what I alluded to above:
SELECT
label,
JSON_EXTRACT(value, '$.photo') AS photo_urls
FROM flat_data
WHERE id = 'photos_landscaped_area'
Look in the Presto documentation for all available JSON functions.

How to partial update multiple documents at once in solr?

I have the following command which I was expecting to work
curl http://localhost/solr/collection1/update?commit=true -H 'Content-type:application/json' -d '
[
{
"meta.something": "78c93c7d-2a9d-4cee-8cbc-1a8bba544678",
"meta.type": "newsletter",
"meta.type": { "set": "report" }
}
]'
But it fails with
"error":{"msg":"Document is missing mandatory uniqueKey field: _id","code":400}}
So it seems it is not possible to do this without specifying the primary key. But is there some way I can update everything that matches that criteria with some script or something?

Resources