Skill output mapping to Edm.DateTimeOffset field - azure-cognitive-search

Is there a way in Azure Cognitive search to map output of skill to DateTimeOffset field?
Getting an error:
Skill returns:
{
"values": [
{
"recordId": "0",
"data": {
"date": "2020-09-25T04:00:00.0000000Z"
},
"errors": null,
"warnings": null
}
]
}
Indexer maps the skill output
"outputFieldMappings" :
[
{
"sourceFieldName": "/document/message_date",
"targetFieldName": "message_date"
}
]
where message_date defined as
{
"name": "message_date",
"type": "Edm.DateTimeOffset",
"sortable": true,
"searchable": false,
"filterable": true,
"facetable": false
},
Getting indexer error:
The data field 'message_date' in the document with key 'NA_0138373324' has an invalid value of type 'Edm.String' (JSON String maps to Edm.String). The expected type was 'Edm.DateTimeOffset'
How to force indexer to convert it to date ? there is no mapping function like that

Date and time values represented in the OData V4 format: yyyy-MM-ddTHH:mm:ss.fffZ or yyyy-MM-ddTHH:mm:ss.fff[+|-]HH:mm. Precision of DateTimeOffset fields is limited to milliseconds. If you upload DateTimeOffset values with sub-millisecond precision, the value returned will be rounded up to milliseconds (for example, 2015-04-15T10:30:09.7552052Z will be returned as 2015-04-15T10:30:09.7550000Z). When you upload DateTimeOffset values with time zone information to your index, Azure Cognitive Search normalizes these values to UTC. For example, 2017-01-13T14:03:00-08:00 will be stored as 2017-01-13T22:03:00Z. If you need to store time zone information, you will need to add an extra field to your index.
You can changed your datetime format to:
yyyy-MM-ddTHH:mm:ssZ
Example:"date": "2020-09-25T04:00:00Z"

Related

Representing JSON data in relational database table

I have a problem where I need to convert a JSON payload into SQL tables, while maintaining the relationships established in the payload. This is so that later I have the ability to query the tables and recreate the JSON payload structure in the future.
For example:
{
"batchId": "batch1",
"payees" : [
   {
 "payeeId": "payee1",
"payments": [
{
"paymentId": "paymentId1",
"amount": 200,
"currency": "USD"
},
{
"paymentId": "paymentId2",
"amount": 200,
"currency": "YEN"
},
{
"paymentId": "paymentId2",
"amount": 200,
"currency": "EURO"
}
]
}
]
}
For the above payload, I have a batch with payments grouped by payees. At its core it all boils down to a batch and its payments. But in that you can have groupings, for example above, it's grouped by payees.
One thing to note is that the payload may not necessarily always follow the above structure. Instead of grouping by payees, it could be by something else like currency for example. Or even no grouping at all, just a root level batch and an array of payments.
I want to know if there are conventions/rules I can follow to approach represent such data into relational tables? Thanks.
edit:
I am primarily looking to use Postgres and have looked into the jsonb feature that it provides for storing json data. However, I'm still struggling to figure out how/where (in terms of which table) to best store the grouping info.

Why is Snowflake query result via curl of DATE type returned as days from 1970-01-01?

The query is via curl call in php to https://<account>.snowflakecomputing.com/queries/<token>/result
The 'rowtype' of the field is
{
"name": "SNAPSHOT_DATE",
"database": <database>,
"schema": <schema>,
"table": "DAILY_USER_SNAPSHOT",
"byteLength": null,
"type": "date",
"scale": null,
"precision": null,
"nullable": true,
"collation": null,
"length": null
}
The 'rowset' of the field is "18350", which is the number of days since 1970-01-01 (Unix epoch date).
In the web UI Worksheet, the field is return as expected in the DATE_OUTPUT_FORMAT of YYYY-MM-DD (verified with SHOW PARAMETERS in account;) like this:
2020-03-29
Why isn't the 'rowset' of the field via the curl call returned as "2020-03-29" instead of as "18350"?
I'm not sure but it is probably just the standard format that Snowflake returns date/time datatypes in via API calls. It's up to the client application to determine how it gets rendered on screen. The WebUI is one of these clients using the API and it (probably) uses the session parameters for output formats shown here to determine how to display it to the users.
Why are you using curl / rest API to get this data and not one of the drivers provided by Snowflake?

Azure search is returning Invalid Expression with Date Filter

I am running into an issue with Azure search which was working before but now
receiving Invalid Expression. Am I missing something.
Date Type of the filter filed-
{ "name": "ModifiedDateTime",
"type": "Edm.DateTimeOffset",
"searchable": false,
"filterable": true,
"facetable": true,
"sortable": true }
enter image description here
Api-version=2016-09-01-Preview
Request-
{"queryType":"full","searchMode":"all","filter":"ModifiedDateTime ge 2018-12-12","search":null,"searchFields":null,"count":true}
Error -
{
"error": {
"code": "",
"message": "Invalid expression: Literal '2018-12-12' of unsupported data type 'Date' was found. Please use a literal that matches the type of the field in the expression.\r\nParameter name: $filter"
}
}
This error was caused by a regression that has since been fixed. Only Search services in West Central US were affected.
We had missing test coverage for this case, which we actually never intended to support. Although we have fixed this to avoid breaking backward compatibility, we may remove the ability to use Edm.Date literals in filters in a future API version.
You should always include the time and offset portions as well when comparing with dates. Otherwise, how do you decide when one day starts and the next begins? We assume midnight UTC for plain dates, but this assumption may not be valid for your users.
We recommend writing filters on Edm.DateTimeOffset fields like this instead:
ModifiedDateTime ge 2018-12-12T00:00:00Z
The Z is for UTC, to which Azure Search normalizes all DateTimeOffset values.

WKS - Training model to identify entities on tables

Browser type and version: GoogleChrome 67.0.3396.99
We are trying to train our model to identify values from multiple types of tables whom contain different number of rows and columns. A text row was extracted to begin the training, first we configure our system types and then, marked the entities and also the relation “AllInOne”. We are able to train 10 relations in a training set, but when the model is tested, we are only able to see 8 relations even creating other document sets for training and test the model multiple times. Is there another way to associate the column value with the row values in a single relation considering there isn’t a standard for the types of tables we are analyzing with the Discovery service?
We are expecting the discovery service response as the following:
"relations": [
{
"type": "AllInOne",
"sentence": "…",
"arguments": [
{
"entities": [
{
"“text": "””",
"type": "entity1"
}
]
},
{
"entities": [
{
"“text": "””",
"type": "entity2"
}
]
},
{
"entities": [
{
"“text": "””",
"type": "\"entity..n”,"
}
]
},
{ "..." }
]
}
The machine learning model that is trained in Watson Knowledge Studio targets unstructured natural language text. It may not be suitable for (semi-) structured format like table, especially for relations.

Optimizing seemingly simple couchbase query for "items whose children satisfy"

I'm developing a system to store our translations using couchbase.
I have about 15,000 entries in my bucket that look like this:
{
"classifications": [
{
"documentPath": "Test Vendor/Test Project/Ordered",
"position": 1
}
],
"id": "message-Test Vendor/Test Project:first",
"key": "first",
"projectId": "project-Test Vendor/Test Project",
"translations": {
"en-US": [
{
"default": {
"owner": "414d6352-c26b-493e-835e-3f0cf37f1f3c",
"text": "first"
}
}
]
},
"type": "message",
"vendorId": "vendor-Test Vendor"
},
And I want, as an example, to find all messages that are classified with a "documentPath" of "Test Vendor/Test Project/Ordered".
I use this query:
SELECT message.*
FROM couchlate message UNNEST message.classifications classification
WHERE classification.documentPath = "Test Vendor/Test Project/Ordered"
AND message.type="message"
ORDER BY classification.position
But I'm very surprised that the query takes 2 seconds to execute!
Looking at the query execution plan, it seems that couchbase is looping over all the messages and then filtering on "documentPath".
I'd like it to first filter on "documentPath" (because there are in reality only 2 documentPaths matching my query) and then find the messages.
I've tried to create an index on "classifications" but it did not change anything.
Is there something wrong with my index setup, or should I structure my data differently to get fast results?
I'm using couchbase 4.5 beta if that matters.
Your query filters on the documentPath field, so an index on classifications doesn't actually help. You need to create an array index on the documentPath field itself using the new array index syntax on Couchbase 4.5:
CREATE INDEX ix_documentPath ON myBucket ( DISTINCT ARRAY c.documentPath FOR c IN classifications END ) ;
Then you can query on documentPath with a query like this:
SELECT * FROM myBucket WHERE ANY c IN classifications SATISFIES c.documentPath = "your path here" END ;
Add EXPLAIN to the start of the query to see the execution plan and confirm that it is indeed using the index ix_documentPath.
More details and examples here: http://developer.couchbase.com/documentation/server/4.5-dp/indexing-arrays.html

Resources