FQL Fauna Function - Query Indexed Document Data Given Conditions - database

I have a collection of shifts for employees, data (trimmed out some details, but this is the structure for start/end times) looks like this:
{
"ref": Ref(Collection("shifts"), "123451234512345123"),
"ts": 1234567891012345,
"data": {
"id": 1,
"start": {
"time": 1659279600000
},
"end": {
"time": 1659283200000
},
"location": "12341234-abcd-1234-cdef-123412341234"
}
}
I have an index that will query return an array of shifts_by_location in this format: ["id", "startTime", "endTime"] ...
Now I want to create a user-defined-function to filter these results "start" and "end" times to fall in between given dayStart and dayEnd times to get shifts by date, hoping to get some FQL assistance here, thanks!
Here's my broken attempt:
Query(
Lambda(
["location_id", "dayStart", "dayEnd"], // example: ["124-abd-134", 165996000, 165922000]
Map(
Paginate(Match(Index("shifts_by_location"), Var("location_id"))),
Lambda(["id", "startTime", "endTime"],
If(
And(
GTE(Var("startTime"), Var("dayStart")), // GOAL -> shift starts after 8am on given day
LTE(Var("endTime"), Var("dayEnd")) // GOAL -> shift ends before 5pm on given day
),
Get(Var("shift")) // GOAL -> return shift for given day
)
)
)
)
)

Found a working solution with this query, the biggest fix was really just to use a filter over the map, which seems obvious in hindsight:
Query(
Lambda(
["location_id", "dayStart", "dayEnd"],
Filter(
Paginate(Match(Index("shifts_by_location"), Var("location_id"))),
Lambda(
["start", "end", "id"],
And(GTE(Var("start"), Var("dayStart")), LTE(Var("end"), Var("dayEnd")))
)
)
)
)

Related

How to perform exact nearest neighbors search in Vespa?

I have such schema
schema embeddings {
document embeddings {
field id type int {}
field text_embedding type tensor<double>(d0[960]) {
indexing: attribute | index
attribute {
distance-metric: euclidean
}
}
}
rank-profile distance {
num-threads-per-search:1
inputs {
query(query_embedding) tensor<double>(d0[960])
}
first-phase {
expression: distance(field, text_embedding)
}
}
}
and such query body:
body = {
'yql': 'select * from embeddings where ({approximate:false, targetHits:10} nearestNeighbor(text_embedding, query_embedding));',
"hits":10,
'input': {
'query(query_embedding)': [...],
},
'ranking': {
'profile': 'distance',
},
}
The thing is the output of this query returns different results depending on targetHits parameter. For example, the top-1 distance for targetHits: 10 is 2.847000, and the top-1 distance for targetHits: 200 is 3.028079.
More of that, if I perform the same query using vespa cli:
vespa query -t http://query "select * from embeddings where ([{\"targetHits\":10}] nearestNeighbor(text_embedding, query_embedding));" \
"approximate=false" \
"ranking.profile=distance" \
"ranking.features.query(query_embedding)=[...]"
I'm receiving the third result:
{
"root": {
"id": "toplevel",
"relevance": 1.0,
"fields": {
"totalCount": 10
},
"coverage": {
"coverage": 100,
"documents": 1000000,
"full": true,
"nodes": 1,
"results": 1,
"resultsFull": 1
},
"children": [
{
"id": "id:embeddings:embeddings::926288",
"relevance": 0.8158006540357854,
...
where as we can see top-1 distance is 0.8158
So, how can I perform the exact and not approximate nearest neighbors search, which results do not depend on any parameters?
Vespa sorts results by descending relevance score. When you use the distance rank-feature instead of closeness as the relevance score (your first-phase ranking expression), you end up inverting the order, so that more distant (worse) neighbors are ranked higher. As you increase targetHits you get even worse neighbors.
The correct query syntax for exact search is to set approximate:false:
select * from embeddings where ({approximate:false, targetHits:10} nearestNeighbor(text_embedding, query_embedding));
But you want to use closeness(field, text_embedding) in your first-phase ranking expression.
From https://docs.vespa.ai/en/nearest-neighbor-search.html
The closeness(field, image_embedding) is a rank-feature calculated by the nearestNeighbor query operator. The closeness(field, tensor) rank feature calculates a score in the range [0, 1], where 0 is infinite distance, and 1 is zero distance. This is convenient because Vespa sorts hits by decreasing relevancy score, and one usually want the closest hits to be ranked highest.
The first-phase is part of Vespa’s phased ranking support. In this example the closeness feature is re-used and documents are not re-ordered.

I would like to search by filtering by date in mongodb

I would like to have the total of names that do not have "SENT" as status and I would also like to filter this number according to a date of beginning and end, for the moment only the first part walks it marks the filtering by the dated. my field is called "date".
sorry my code did not indent ):
db.name.aggregate([{ $group: {_id:"$id", sent:
{$max: {$cond: {if: {
$eq: [ "$status", "SENT" ] }, then: 1, else: 0}}} } },
{ $match: { sent:
0 } }, {$count: "total"}])
You can rewrite your query to add the $match as first stage and include date and status filter followed by $count stage to count the matched documents.
Something like
db.name.aggregate([
{"$match": {"status":"SENT","date":{"$gte":input date1, "$lte":input date2} }
{"$count": "total"}
])
You don't need aggregation for the request. You can use regular queries.
db.name.count({"status":"SENT","date":{"$gte":input date1, "$lte":input date2} })

Attribute Syntax for JSON query in check_json.pl

So, I'm trying to set up check_json.pl in NagiosXI to monitor some statistics. https://github.com/c-kr/check_json
I'm using the code with the modification I submitted in pull request #32, so line numbers reflect that code.
The json query returns something like this:
[
{
"total_bytes": 123456,
"customer_name": "customer1",
"customer_id": "1",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
},
{
"total_bytes": 123456,
"customer_name": "customer2",
"customer_id": "2",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
}
]
I'm trying to monitor the sized of specific files. so a check should look something like:
/path/to/check_json.pl -u https://path/to/my/json -a "SOMETHING" -p "SOMETHING"
...where I'm trying to figure out the SOMETHINGs so that I can monitor the total_bytes of filename1 in customer2 where I know the customer_id and index but not their position in the respective arrays.
I can monitor customer1's total bytes by using the string "[0]->{'total_bytes'}" but I need to be able to specify which customer and dig deeper into file name (known) and file size (stat to monitor) AND the working query only gives me the status (OK,WARNING, or CRITICAL). Adding -p all I get are errors....
The error with -p no matter how I've been able to phrase it is always:
Not a HASH reference at ./check_json.pl line 235.
Even when I can get a valid OK from the example "[0]->{'total_bytes'}", using that in -p still gives the same error.
Links pointing to documentation on the format to use would be very helpful. Examples in the README for the script or in the -h output are failing me here. Any ideas?
I really have no idea what your question is. I'm sure I'm not alone, hence the downvotes.
Once you have the decoded json, if you have a customer_id to search for, you can do:
my ($customer_info) = grep {$_->{customer_id} eq $customer_id} #$json_response;
Regarding the error on line 235, this looks odd:
foreach my $key ($np->opts->perfvars eq '*' ? map { "{$_}"} sort keys %$json_response : split(',', $np->opts->perfvars)) {
# ....................................... ^^^^^^^^^^^^^
$perf_value = $json_response->{$key};
if perfvars eq "*", you appear to be looking for $json_reponse->{"{total}"} for example. You might want to validate the user's input:
die "no such key in json data: '$key'\n" unless exists $json_response->{$key};
This entire business of stringifying the hash ref lookups just smells bad.
A better question would look like:
I have this JSON data. How do I get the sum of total_bytes for the customer with id 1?
See https://stackoverflow.com/help/mcve

Why can't I retrieve the master appointment of a series via `AppointmentCalendar.FindAppointmentsAsync`?

I'm retrieving multiple appointments via AppointmentCalendar.FindAppointmentsAsync. I'm evaluating the Recurrence.RecurrenceType and noticed an unexpected value of 1 for master appointments of a series. I expect the Recurrence.RecurrenceType to be 0 (Master) but instead it is 1 (Instance).
(Note: I added AppointmentProperties.Recurrence to FindAppointmentsOptions.FetchProperties that is passed to GetAppointmentsAsync, so the Recurrence data should be fetched propertly.)
To double check I retrieved the respective master appointment via GetAppointmentAsync (instead of FindAppointmentsAsync) using its LocalId - and here the RecurrenceType is correctly set to 0.
Here is demo output for a test appointment series:
Data gotten by FindAppointmentsAsync (Instance??):
"Recurrence": {
"Unit": 0,
"Occurrences": 16,
"Month": 1,
"Interval": 1,
"DaysOfWeek": 0,
"Day": 1,
"WeekOfMonth": 0,
"Until": "2016-09-29T02:00:00+02:00",
"TimeZone": "Europe/Budapest",
"RecurrenceType": 1,
"CalendarIdentifier": "GregorianCalendar"
},
"StartTime": "2016-09-14T19:00:00+02:00",
"OriginalStartTime": "2016-09-14T19:00:00+02:00",
Data gotten by GetAppointmentAsync for the same appointment (Master):
"Recurrence": {
"Unit": 0,
"Occurrences": 16,
"Month": 1,
"Interval": 1,
"DaysOfWeek": 0,
"Day": 1,
"WeekOfMonth": 0,
"Until": "2016-09-29T02:00:00+02:00",
"TimeZone": "Europe/Budapest",
"RecurrenceType": 0,
"CalendarIdentifier": "GregorianCalendar"
},
"StartTime": "2016-09-14T19:00:00+02:00",
"OriginalStartTime": null,
Notice the difference in RecurrenceType. Also note that OriginalStartTime is set to null for the master gotten by GetAppointmentAsync but has a value for the appointment gotten by FindAppointmentsAsync.
You can also see that the StartTime for the master appointment is the start time set for the alleged Instance (which in reality is the master).
Shouldn't FindAppointmentsAsync return a master as the first element of a series, instead of an instance?
(SDK: 10.0.14393.0, Anniversary)
Code to explicitly find such a master/instance situation for a given calendar:
var appointmentsCurrent = await calendar.FindAppointmentsAsync(DateTimeOffset.Now, TimeSpan.FromDays(365), findAppointmentOptions);
foreach(var a in appointmentsCurrent)
{
var a2 = await calendar.GetAppointmentAsync(a.LocalId);
if (a2.Recurrence?.RecurrenceType == RecurrenceType.Master &&
a2.StartTime == a.StartTime &&
a.Recurrence?.RecurrenceType == RecurrenceType.Instance &&
a.OriginalStartTime == a2.StartTime)
{
Debug.WriteLine("Gotcha!");
}
}
I tested above code on my side. If you get the count of the appointments which are got from FindAppontmentsAsync by the following code:var count=appointmentsCurrent.Count;, you will find it does return the count of the appointment instances, not the count of master appointments. So the FindAppontmentsAsync method got all instances of the appointments not master appointments. This is the reason why the RecurrenceType is instance.
It seems like we can get one master appointment by method GetAppointmentAsync as you mentioned above, so I suppose this may not block you.
If you think this is not a good design for this API or you require a API for finding all the master appointments in one calendar, you can submit your ideas to the windows 10 feedback tool or the user voice site.

lucene solr - how to know numCount of each word in query

i have a query string with 5 words. for exmple "cat dog fish bird animals".
i need to know how many matches each word has.
at this point i create 5 queries:
/q=name:cat&rows=0&facet=true
/q=name:dog&rows=0&facet=true
/q=name:fish&rows=0&facet=true
/q=name:bird&rows=0&facet=true
/q=name:animals&rows=0&facet=true
and get matches count of each word from each query.
but this method takes too many time.
so is there a way to check get numCount of each word with one query?
any help appriciated!
In this case, functionQueries are your friends. In particular:
termfreq(field,term) returns the number of times the term appears in the field for that document. Example Syntax:
termfreq(text,'memory')
totaltermfreq(field,term) returns the number of times the term appears in the field in the entire index. ttf is an alias of
totaltermfreq. Example Syntax: ttf(text,'memory')
The following query for instance:
q=*%3A*&fl=cntOnSummary%3Atermfreq(summary%2C%27hello%27)+cntOnTitle%3Atermfreq(title%2C%27entry%27)+cntOnSource%3Atermfreq(source%2C%27activities%27)&wt=json&indent=true
returns the following results:
"docs": [
{
"id": [
"id-1"
],
"source": [
"activities",
"activities"
],
"title": "Ajones3 Activity Entry 1",
"summary": "hello hello",
"cntOnSummary": 2,
"cntOnTitle": 1,
"cntOnSource": 1,
"score": 1
},
{
"id": [
"id-2"
],
"source": [
"activities",
"activities"
],
"title": "Common activity",
"cntOnSummary": 0,
"cntOnTitle": 0,
"cntOnSource": 1,
"score": 1
}
}
]
Please notice that while it's working well on single value field, it seems that for multivalued fields, the functions consider just the first entry, for instance in the example above, termfreq(source%2C%27activities%27) returns 1 instead of 2.

Resources