Querying CosmosDB based on timestamps - sql-server

I am working with a CosmosDB setup by one of my colleagues and connecting to it using a connection string. The database contains several JSON documents with the following schema:
{
"period": "Raw",
"source": "Traffic",
"batchId": "ee737270-0b72-49b7-a2f1-201f642e9c81",
"periodName": "Raw",
"sourceName": "Traffic",
"groupKey": "gc4151_a",
"partitionKey": "traffic-gc4151_a-raw-raw",
"time": "2021-08-05T23:55:10",
"minute": 55,
"hour": 23,
"day": 05,
"month": 08,
"quarter": 3,
"year": 2021,
"minEventTime": "2021-08-05T23:55:09",
"maxEventTime": "2021-08-05T23:55:11",
"meta": {
"siteId": "GC4151_A",
"from": {
"lat": "55.860894822588506",
"long": "-4.284365958508686"
},
"to": {
"lat": "55.86038667864348",
"long": "-4.2826901232101795"
}
},
"measurements": {
"flow": [
{
"calculation": "Raw",
"name": "flow",
"calculationName": "Raw",
"value": 0
}
],
"concentration": [
{
"calculation": "Raw",
"name": "concentration",
"calculationName": "Raw",
"value": 0
}
]
},
"added": "2021-08-05T12:21:32.000819Z",
"updated": "2021-08-05T12:21:32.000819Z",
"id": "d4346f50-543e-4c4d-82cf-835b480914c2",
"_rid": "4RRTAIYVA1AIAAAAAAAAAA==",
"_self": "dbs/4RRTAA==/colls/4RRTAIYVA1A=/docs/4RRTAIYVA1AIAAAAAAAAAA==/",
"_etag": "\"1c0015a1-0000-1100-0000-5f3fbc4c0000\"",
"_attachments": "attachments/",
"_ts": 1598012492
}
I am trying to write a SQL query to select all the records that fall between the current date-time and one week earlier, so I can use these to perform future calculations.
I have attempted to use both of the following:
SELECT *
FROM c
WHERE c.time > date_sub(now(), interval 1 week);
and
SELECT *
FROM c
WHERE c.time >= DATE_ADD(CURDATE(), INTERVAL -7 DAY);
However, both of these return the following error:
Gateway Failed to Retrieve Query Plan: Message: {"errors":[{"severity":"Error","location":{"start":124,"end":125},"code":"SC1001","message":"Syntax error, incorrect syntax near '1'."}]}
ActivityId: 51c3b6f7-e760-4062-bd80-8cc9f8de5352, Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Documents.Common/2.14.0
My question is what is the issue with my code, and how can I fix it?

You may use DateTimeAdd and GetCurrentDateTime() to achieve this. Eg.
SELECT *
FROM c
WHERE c.time > DateTimeAdd("day",-7,GetCurrentDateTime() )
Let me know if this works for you.

Related

Array within Element within Array in Variant

How can I get the data out of this array stored in a variant column in Snowflake. I don't care if it's a new table, a view or a query. There is a second column of type varchar(256) that contains a unique ID.
If you can just help me read the "confirmed" data and the "editorIds" data I can probably take it from there. Many thanks!
Output example would be
UniqueID ConfirmationID EditorID
u3kd9 xxxx-436a-a2d7 nupd
u3kd9 xxxx-436a-a2d7 9l34c
R3nDo xxxx-436a-a3e4 5rnj
yP48a xxxx-436a-a477 jTpz8
yP48a xxxx-436a-a477 nupd
[
{
"confirmed": {
"Confirmation": "Entry ID=xxxx-436a-a2d7-3525158332f0: Confirmed order submitted.",
"ConfirmationID": "xxxx-436a-a2d7-3525158332f0",
"ConfirmedOrders": 1,
"Received": "8/29/2019 4:31:11 PM Central Time"
},
"editorIds": [
"xxsJYgWDENLoX",
"JR9bWcGwbaymm3a8v",
"JxncJrdpeFJeWsTbT"
] ,
"id": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"messages": [],
"orderJson": {
"EntryID": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"Orders": [
{
"DropShipFlag": 1,
"FromAddressValue": 1,
"OrderAttributes": [
{
"AttributeUID": 548
},
{
"AttributeUID": 553
},
{
"AttributeUID": 2418
}
],
"OrderItems": [
{
"EditorId": "aC3f5HsJYgWDENLoX",
"ItemAssets": [
{
"AssetPath": "https://xxxx573043eac521.png",
"DP2NodeID": "10000",
"ImageHash": "000000000000000FFFFFFFFFFFFFFFFF",
"ImageRotation": 0,
"OffsetX": 50,
"OffsetY": 50,
"PrintedFileName": "aC3f5HsJYgWDENLoX-10000",
"X": 50,
"Y": 52.03909266409266,
"ZoomX": 100,
"ZoomY": 93.75
}
],
"ItemAttributes": [
{
"AttributeUID": 2105
},
{
"AttributeUID": 125
}
],
"ItemBookAttribute": null,
"ProductUID": 52,
"Quantity": 1
}
],
"SendNotificationEmailToAccount": true,
"SequenceNumber": 1,
"ShipToAddress": {
"Addr1": "Addr1",
"Addr2": "0",
"City": "City",
"Country": "US",
"Name": "Name",
"State": "ST",
"Zip": "00000"
}
}
]
},
"orderNumber": null,
"status": "order_placed",
"submitted": {
"Account": "350000",
"ConfirmationID": "xxxxx-436a-a2d7-3525158332f0",
"EntryID": "xxxxx-5AvGgeSHy8Ms6Ytyc-1",
"Key": "D83590AFF0CC0000B54B",
"NumberOfOrders": 1,
"Orders": [
{
"LineItems": [],
"Note": "",
"Products": [
{
"Price": "00.30",
"ProductDescription": "xxxxxint 8x10",
"Quantity": 1
},
{
"Price": "00.40",
"ProductDescription": "xxxxxut Black 8x10",
"Quantity": 1
},
{
"Price": "00.50",
"ProductDescription": "xxxxx"
},
{
"Price": "00.50",
"ProductDescription": "xxxscount",
"Quantity": 1
}
],
"SequenceNumber": "1",
"SubTotal": "00.70",
"Tax": "1.01",
"Total": "00.71"
}
],
"Received": "8/29/2019 4:31:10 PM Central Time"
},
"tracking": null,
"updatedOn": 1.598736670503000e+12
}
]
So, this is how I'd query that exact JSON assuming the data is in column var in table x:
SELECT x.var[0]:confirmed:ConfirmationID::varchar as ConfirmationID,
f.value::varchar as EditorID
FROM x,
LATERAL FLATTEN(input => var[0]:editorIds) f
;
Since your sample output doesn't match the JSON that you provided, I will assume that this is what you need.
Also, as a note, your JSON includes outer [ ] which indicates that the entire JSON string is inside an array. This is the reason for var[0] in my query. If you have multiple records inside that array, then you should remove that. In general, you should exclude those and instead load each record into the table separately. I wasn't sure whether you could make that change, so I just wanted to make note.

Group array doc by sequence: MongoDB groupby or mapreduce?

In mongodb, I have a collection of documents with an array of records that I want to group by similar tag preserving the natural order
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": ISODate("2019-01-07T09:06:56Z"),
"score": 1
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:06Z"),
"score": 0
},
{
"tag": "ou",
"unixTime": ISODate("2019-01-07T09:07:06Z"),
"score": 0
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:20Z"),
"score": 0
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:37Z"),
"score": 1
}
]
I want to group (and aggregate) the records by similar sequence of tags and NOT simply by grouping unique tags
Desired output:
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": [ISODate("2019-01-07T09:06:56Z")],
"score": 1
"nbRecords": 1
},
{
"tag": "u",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0,
"nbRecords":1
},
{
"tag": "ou",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0
},
{
"tag": "u",
"unixTime: [ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")]
"score": 1
"nbRecords":2
}
]
Groupby
It seems that '$groupby' aggregation operator in mongodb previously sort the array and group by the unique field
db.coll.aggregate(
[
{"$unwind":"$records"},
{"$group":
{
"_id":{
"tag":"$records.tag",
"day":"$day"
},
...
}
}
]
)
Returns
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": [ISODate("2019-01-07T09:06:56Z")],
"score": 1
"nbRecords": 1
},
{
"tag": "u",
"unixTime": [ISODate("2019-01-07T09:07:06Z"),ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")],
"score": 2,
"nbRecords":3
},
{
"tag": "ou",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0
},
]
Map/reduce
As I'm currently using pymongo driver, I implemented the solution back in python
using itertools.groupby that as a generator performs the grouping respecting the natural order but I'm confronted to server timing out problem (cursor.NotFound Error) as an insane time processing.
Any idea of how to use directly the mapreduce function of mongo
to perform the equivalent of the itertools.groupby() in python?
Help would be very appreciated: I'm using pymongo driver 3.8 and MongoDB 4.0
Ni! Run through the array of records adding a new integer index that increments whenever the groupby target changes, then use the mongo operation on that index. .~´
With the recommendation of #Ale and without any tips on the way to do that in MongoDb. I switch back to a python implementation solving the cursor.NotFound problem.
I imagine that I could be done inside Mongodb but this is working out
for r in db.coll.find():
session = [
]
for tag, time_score in itertools.groupby(r["records"], key=lambda x:x["tag"]):
time_score = list(time_score)
session.append({
"tag": tag,
"start": time_score[0]["unixTime"],
"end": time_score[-1]["unixTime"],
"ca": sum([n["score"] for n in time_score]),
"nb_records": len(time_score)
})
db.col.update(
{"_id":r["_id"]},
{
"$unset": {"records": ""},
"$set":{"sessions": session}
})

Update by query with a clause in ElasticSearch

I have an elasticsearch index where some records have a #timestamp of 1st Feb. There is also a field in the _source called process_time which is different for each record with same #timestamp.
{
"_index": "elasticsearch-index",
"_type": "doc",
"_id": "qByop2gBw60PM5VYP0aG",
"_score": 1,
"_source": {
"task": "INFO",
"#timestamp": "2019-02-01T06:04:08.365Z",
"num_of_batches": 0,
"batch_size": 1000,
"process_time": "2019-02-04 06:04:04,489"
}
},
{
"_index": "elasticsearch-index",
"_type": "doc",
"_id": "qByop2gBw60PM5VYP0aG",
"_score": 1,
"_source": {
"task": "INFO",
"#timestamp": "2019-02-01T06:04:08.365Z",
"num_of_batches": 0,
"batch_size": 1000,
"process_time": "2019-02-05 06:04:04,489"
}
}
I want to update the #timestamp of all records having #timestamp as 1st Feb to whatever is the process_time in that record.
How can I do this?
EDIT:
After #mysterion's answer I did the following changes as below:
{
"script": {
"source": """ctx._source['#timestamp'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").parse(ctx._source.process_time.replaceAll(",","."))""",
"lang": "painless"
},
"query": {
"term": {
"#timestamp": "2019-02-01"
}
}
}
But I got the following exception:
"type": "class_cast_exception",
"reason": "Cannot cast java.lang.String to java.util.function.Function"
You could utilize Update by Query API to update needed documents in the Elasticsearch.
POST index_name/_update_by_query?conflicts=proceed
{
“script”: {
“source”: “ctx._source[‘#timestamp’] = new SimpleDateFormat(‘yyyy-MM-dd HH:mm:ss,SSS’).parse(ctx._source.process_time)”,
“lang”: “painless”
},
“query”: {
“term”: {
“#timestamp”: “2019-01-01T00:00:00Z”
}
}
}
You’re using replaceAll method expecting that it’s a plain Java String method, in fact - it’s not. It is
String replaceAll(Pattern, Function)
More information on Painless API reference - https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-api-reference.html

Retrieve faceted results with non default count value(10) from Azure Search

I am using the Azure index for an index search. My objective behind the Index search is to retrieve the Unique records depend upon some unique parameter say System_ID and I started using facets feature for this, but when using it I am unable to retrieve more than 10 unique facets despite providing a count value to 20 in the query.
Below is the summary:
I am able to retrieve only 10 unique records even though more than 10 unique records are there in Index.
When i modify the count property of facet to 20 Still I am getting only 10 records
Can you please help with me to modify it in such a way that I will get more than 10 records.
Any help will be appreciable.
Default query:
$filter=(systemID ne null) and (ownerSalesforceRecordID eq 'a0h5B000000gJKfQAM')&facet=machineTagSystemID,sort:value&queryType=full
Default Results:
{"machineTagSystemID": [
{
"count": 9,
"value": "ABCS test machines-111-test - change|*1XA78RUGV23PVPN"
},
{
"count": 6,
"value": "Ajit Machine testing1jjcdxxxxxxxxxxxxxx|*1L693D439H5ZNG9"
},
{
"count": 19,
"value": "Anvesh test111dsaa|*13SSNP5AJ3L96C5"
},
{
"count": 3,
"value": "Dead End cross 2|*1NK7KNNLFVTM4QC"
},
{
"count": 3,
"value": "hehehe|*1NDC32TDNXT5RAH"
},
{
"count": 14,
"value": "high2 Machine12345678ppjk fvrf|*1T2F3VQEJ58ZLQL"
},
{
"count": 31,
"value": "prashant dev machine 213|*12L343TZTFGH3M6"
},
{
"count": 1,
"value": "ryansjcilaptop465986543|*1E2PG9V3BMEYDM7"
},
{
"count": 12,
"value": "snehali DEV June|*1QXEDL8E2V8MGBY"
},
{
"count": 27,
"value": "tarun Machine-dev|*1YRPHS3J7NGUVA8"
}
]}
Facet with count:
$filter=(systemID ne null) and (ownerSalesforceRecordID eq 'a0h5B000000gJKfQAM')&facet=machineTagSystemID,sort:value,count:20&queryType=full
But same results:
{"machineTagSystemID": [
{
"count": 9,
"value": "ABCS test machines-111-test - change|*1XA78RUGV23PVPN"
},
{
"count": 6,
"value": "Ajit Machine testing1jjcdxxxxxxxxxxxxxx|*1L693D439H5ZNG9"
},
{
"count": 19,
"value": "Anvesh test111dsaa|*13SSNP5AJ3L96C5"
},
{
"count": 3,
"value": "Dead End cross 2|*1NK7KNNLFVTM4QC"
},
{
"count": 3,
"value": "hehehe|*1NDC32TDNXT5RAH"
},
{
"count": 14,
"value": "high2 Machine12345678ppjk fvrf|*1T2F3VQEJ58ZLQL"
},
{
"count": 30,
"value": "prashant dev machine 213|*12L343TZTFGH3M6"
},
{
"count": 1,
"value": "ryansjcilaptop465986543|*1E2PG9V3BMEYDM7"
},
{
"count": 12,
"value": "snehali DEV June|*1QXEDL8E2V8MGBY"
},
{
"count": 27,
"value": "tarun Machine-dev|*1YRPHS3J7NGUVA8"
}
]}
This is based on the documentation link: https://learn.microsoft.com/en-us/azure/search/search-faceted-navigation
Facets honor the filter specified in a query. It's possible this is why you are only seeing 10 unique facet value for this field. Generally speaking, your query looks fine. If there were more than 10 unique values in this field for the query specified, I would expect them to show up.
How many total results are returned by this query? I see 125 total values in the facets you provided and I'm wondering if the count aligns with your results.
Mike
This is old but I hit the same issue - there is a default limit of 10 values returned in a facet, you can extend this to return more facet values by adding a count to a given facet. E.g.:
facet=Month,count:12&search=something
This can also be done in the c# API by just adding the count to the facet name:
var options = new SearchOptions();
options.Facets.Add("Month,count:12");

moongoose query array of objects of array using lodash or underscorejs

Here is an example of my data:
[
{
"_id": "54ff1f21592a15378825aa33",
"timeline": "54fb49274e3e0c17271205d9",
"name": "Ade Idowu",
"first_name": "Ade",
"last_name": "Idowu",
"cohort": {
"name": "Class III",
"color": "#308cea"
},
"__v": 3,
"tasks": [
{
"_id": "54ff2eb0a6299969a08f8797",
"personName": "Ade Idowu",
"projectName": "yec",
"startDate": "2015-03-19T23:00:00.000Z",
"endDate": "2015-03-06T23:00:00.000Z",
}
]
},
{
"_id": "54ff1f21592a15378825aa33",
"timeline": "54fb49274e3e0c17271205d9",
"name": "Bola Idowu",
"first_name": "Bola",
"last_name": "Idowu",
"cohort": {
"name": "Class III",
"color": "#308cea"
},
"__v": 3,
"tasks": []
}
]
I want to query person.tasks to display names of those whose tasks array is empty and those whose tasks.endDate has expired. I would appreciate using javascript or probably lodash.
Use lodash _.filter(collection, predicate):
Iterates over elements of collection, returning an array of all
elements predicate returns truthy for.
Predicate checks if iteree's task array is empty or the task endDate is smaller then the current time.
.getTime() returns the ms value of the date object so you can compare the two dates. This could be adjusted if you would like to filter by another granularity (ex. hours, days).
_.filter(dataObject, function(item){
var now = new Date();
return item.tasks.length == 0 || (newDate(item.tasks.endDate).getTime() < now.getTime());
});

Resources