cloudant groupby and count number of times a value appears - cloudant

First time working with a nosql DB and having trouble writing a query that can look in my DB and for a key count the number of time it appears by another key.
For instance if my DB contains
{
"person": "user1",
"status": "good"
},
{
"person": "user1",
"status": "good"
},
{
"person": "user1",
"status": "bad"
},
{
"person": "user2",
"status": "good"
}
would like to know that person1 was good 2 and bad 1 and person2 was only good 1
in sql would do
select person, status, count(*)
from mydb
groupby person, status
or to get it by a user in the db
select person, status, count(*)
from mydb
groupby person, status
where person = "user1"

You can achieve this with Cloudant's MapReduce views and suitably chosen query parameters. I created a view where the map is
function (doc) {
emit([doc.person, doc.status], null);
}
and the reduce the built-in _count. That gives us an index where the key is a vector, and we can then group at different levels. Using groupby=true with group_level=2 gives us the desired result:
curl 'https://A.cloudant.com/D/_design/so/_view/by-status?groupby=true&group_level=2'
{
"rows": [
{
"key": [
"user1",
"bad"
],
"value": 1
},
{
"key": [
"user1",
"good"
],
"value": 2
},
{
"key": [
"user2",
"good"
],
"value": 1
}
]
}

Related

Solr's Complex Query for Nested Documents is not Working

The DataSet I am working on Solr.
[
{
"id": "doc_1",
"name": "Harpreet Chaggar",
"_childDocuments_": [
{ "id": "child_doc_a", "number": 22,"created_at":"2020-03-20T00:00:00Z" },
{ "id": "child_doc_b", "number": 10 ,"created_at":"2021-05-28T00:00:00Z"},
]
},
{
"id": "doc_2",
"name": "Hardik Deshmukh",
"_childDocuments_": [
{ "id": "child_doc_1", "number": 67,"created_at":"2022-03-20T00:00:00Z" },
{ "id": "child_doc_2", "number": 78 ,"created_at":"2022-05-28T00:00:00Z"},
]
},
]
My objective is to make exclude query for a nested Date Data along with some parent conditions and to return parent document for all queries.
I am trying to fetch "id" : "doc_2", "name": "Hardik Deshmukh" by the following query. Note:- I need parent document in return.
q = {!parent which='(name:("Hardik" OR "Harpreet") AND id:"doc_1")'}-created_at:[2020-01-17T00:00:00Z TO 2021-12-17T00:00:00Z]
But I am not getting any results.
To make sure if the date query is working properly, I executed the below query.
q = -created_at:[2020-01-17T00:00:00Z TO 2021-12-17T00:00:00Z]
And it was working.
"response":{"numFound":4,"start":0,"numFoundExact":true,"docs":[
{
"id":"doc_1",
"name":["Harpreet Chaggar"],
"_version_":1746310602768252928},
{
"id":"child_doc_1",
"number":["67"],
"created_at":"2022-03-20T00:00:00Z",
"_version_":1746310602791321600},
{
"id":"child_doc_2",
"number":["78"],
"created_at":"2022-05-28T00:00:00Z",
"_version_":1746310602791321600},
{
"id":"doc_2",
"name":["Hardik Deshmukh"],
"_version_":1746310602791321600}]
}}
Field types:
For created_at
Field-Type:org.apache.solr.schema.DatePointField
For name
Field-Type:org.apache.solr.schema.TextField
And if I want to fetch "id": "doc_1", I am able to get it by executing the following query.
{!parent which='(name:("Hardik" OR "Harpreet") AND id:"doc_1")'} ( created_at:[2020-01-17T00:00:00Z TO 2021-12-17T00:00:00Z] )
It fetches desired results.
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"id":"doc_1",
"name":["Harpreet Chaggar"],
"_version_":1746310602768252928}]
}}

Is it possible to get key value pairs from snowflake api instead rowType?

I'm working with an API from snowflake and to deal with the json data, I would need to receive data as key-value paired instead of rowType.
I've been searching for results but haven't found any
e.g. A table user with name and email attributes
Name
Email
Kelly
kelly#email.com
Fisher
fisher#email.com
I would request this body:
{
"statement": "SELECT * FROM user",
"timeout": 60,
"database": "DEV",
"schema": "PLACE",
"warehouse": "WH",
"role": "DEV_READER",
"bindings": {
"1": {
"type": "FIXED",
"value": "123"
}
}
}
The results would come like:
{
"resultSetMetaData": {
...
"rowType": [
{ "name": "Name",
...},
{ "name": "Email",
...}
],
},
"data": [
[
"Kelly",
"kelly#email.com"
],
[
"Fisher",
"fisher#email.com"
]
]
}
And the results needed would be:
{
"resultSetMetaData": {
...
"data": [
[
"Name":"Kelly",
"Email":"kelly#email.com"
],
[
"Name":"Fisher",
"Email":"fisher#email.com"
]
]
}
Thank you for any inputs
The output is not valid JSON, but the return can arrive in a slightly different format:
{
"resultSetMetaData": {
...
"data":
[
{
"Name": "Kelly",
"Email": "kelly#email.com"
},
{
"Name": "Fisher",
"Email": "fisher#email.com"
}
]
}
}
To get the API to send it that way, you can change the SQL from select * to:
select object_construct(*) as KVP from "USER";
You can also specify the names of the keys using:
select object_construct('NAME', "NAME", 'EMAIL', EMAIL) from "USER";
The object_construct function takes an arbitrary number of parameters, as long as they're even, so:
object_construct('KEY1', VALUE1, 'KEY2', VALUE2, <'KEY_N'>, <VALUE_N>)

How to write a SQL query in CosmosDB for a JSON document which has nested/multiple array

I need to write a SQL query in the CosmosDB query editor, that will fetch results from JSON documents stored in Collection, as per my requirement shown below
The example JSON
{
"id": "abcdabcd-1234-1234-1234-abcdabcdabcd",
"source": "Example",
"data": [
{
"Laptop": {
"New": "yes",
"Used": "no",
"backlight": "yes",
"warranty": "yes"
}
},
{
"Mobile": [
{
"order": 1,
"quantity": 2,
"price": 350,
"color": "Black",
"date": "07202019"
},
{
"order": 2,
"quantity": 1,
"price": 600,
"color": "White",
"date": "07202019"
}
]
},
{
"Accessories": [
{
"covers": "yes",
"cables": "few"
}
]
}
]
}
Requirement:
SELECT 'warranty' (Laptop), 'quantity' (Mobile), 'color' (Mobile), 'cables' (Accessories) for a specific 'date' (for eg: 07202019)
I've tried the following query
SELECT
c.data[0].Laptop.warranty,
c.data[1].Mobile[0].quantity,
c.data[1].Mobile[0].color,
c.data[2].Accessories[0].cables
FROM c
WHERE ARRAY_CONTAINS(c.data[1].Mobile, {date : '07202019'}, true)
Original Output from above query:
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
}
]
But how can I get this Expected Output, that has all order details in the array 'Mobile':
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
},
{
"warranty": "yes",
"quantity": 1,
"color": "White",
"cables": "few"
}
]
Since I wrote c.data[1].Mobile[0].quantity i.e 'Mobile[0]' which is hard-coded, only one entry is returned in the output (i.e. the first one), but I want to have all the entries in the array to be listed out
Please consider using JOIN operator in your sql:
SELECT DISTINCT
c.data[0].Laptop.warranty,
mobile.quantity,
mobile.color,
c.data[2].Accessories[0].cables
FROM c
JOIN data in c.data
JOIN mobile in data.Mobile
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
Output:
Update Answer:
Your sql:
SELECT DISTINCT c.data[0].Laptop.warranty, mobile.quantity, mobile.color, accessories.cables FROM c
JOIN data in c.data JOIN mobile in data.Mobile
JOIN accessories in data.Accessories
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
My advice:
I have to say that,actually, Cosmos DB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document. Cross-document joins are NOT supported.However,your sql try to implement mutiple parallel join.In other words, Accessories and Mobile are hierarchical, not nested.
I suggest you using stored procedure to execute two sql,than put them together. Or you could implement above process in the code.
Please see this case:CosmosDB Join (SQL API)

How to do a NoSql linked query

I have a noSql (Cloudant) database
-Within the database we have documents where one of the document fields represents “table” (type of document)
-Within the documents we have fields that represent links other documents within the database
For example:
{_id: 111, table:main, user_id:222, field1:value1, other1_id: 333}
{_id: 222, table:user, first:john, other2_id: 444}
{_id: 333, table:other1, field2:value2}
{_id: 444, table:other2, field3:value3}
We want of way of searching for _id:111
And the result be one document with data from linked tables:
{_id:111, user_id:222, field1:value1, other1_id: 333, first:john, other2_id: 444, field2:value2, field3:value3}
Is there a way to do this?
There is flexibility on the structure of how we store or get the data back—any suggestions on how to better structure the data to make this possible?
The first thing to say is that there are no joins in Cloudant. If you're schema relies on lots of joining then you're working against the grain of Cloudant which may mean extra complication for you or performance hits.
There is a way to de-reference other documents' ids in a MapReduce view. Here's how it works:
create a MapReduce view to emit the main document's body and its linked document's ids in the form { _id: 'linkedid'}
query the view with include_docs=true to pull back the document AND the de-referenced ids in one go
In your case, a map function like this:
function(doc) {
if (doc.table === 'main') {
emit(doc._id, doc);
if (doc.user_id) {
emit(doc._id + ':user', { _id: doc.user_id });
}
}
}
would allow you to pull back the main document and its linked user document in one API by hitting the GET /mydatabase/_design/mydesigndoc/_view/myview?startkey="111"&endkey="111z"&include_docs=true endpoint:
{
"total_rows": 2,
"offset": 0,
"rows": [
{
"id": "111",
"key": "111",
"value": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
},
"doc": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
}
},
{
"id": "111",
"key": "111:user",
"value": {
"_id": "222"
},
"doc": {
"_id": "222",
"_rev": "1-6a277581235ca01b11dfc0367e1fc8ca",
"table": "user",
"first": "john",
"other2_id": "444"
}
}
]
}
Notice how we get two rows back, the first is the main document body, the second the linked user.

MongoDB Array Query Performance

I'm trying to figure out what the best schema is for a dating site like app. User's have a listing (possibly many) and they can view other user listings to 'like' and 'dislike' them.
Currently i'm just storing the other persons listing id in a likedBy and dislikedBy array. When a user 'likes' a listing, it puts their listing id into the 'liked' listings arrays. However I would now like to track the timestamp that a user likes a listing. This would be used for a user's 'history list' or for data analysis.
I would need to do two separate queries:
find all active listings that this user has not liked or disliked before
and for a user's history of 'liked'/'disliked' choices
find all the listings user X has liked in chronological order
My current schema is:
listings
_id: 'sdf3f'
likedBy: ['12ac', 'as3vd', 'sadf3']
dislikedBy: ['asdf', 'sdsdf', 'asdfas']
active: bool
Could I do something like this?
listings
_id: 'sdf3f'
likedBy: [{'12ac', date: Date}, {'ds3d', date: Date}]
dislikedBy: [{'s12ac', date: Date}, {'6fs3d', date: Date}]
active: bool
I was also thinking of making a new collection for choices.
choices
Id
userId // id of current user making the choice
userlistId // listing of the user making the choice
listingChoseId // the listing they chose yes/no
type
date
I'm not sure of the performance implications of having these choices in another collection when doing the find all active listings that this user has not liked or disliked before.
Any insight would be greatly appreciated!
Well you obviously thought it was a good idea to have these embedded in the "listings" documents so your additional usage patterns to the cases presented here worked properly. With that in mind there is no reason to throw that away.
To clarify though, the structure you seem to want is something like this:
{
"_id": "sdf3f",
"likedBy": [
{ "userId": "12ac", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "as3vd", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "sadf3", "date": ISODate("2014-04-09T07:30:47.091Z") }
],
"dislikedBy": [
{ "userId": "asdf", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "sdsdf", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "asdfas", "date": ISODate("2014-04-09T07:30:47.091Z") }
],
"active": true
}
Which is all well and fine except that there is one catch. Because you have this content in two array fields you would not be able to create an index over both of those fields. That is a restriction where only one array type of field (or multikey) can be be included within a compound index.
So to solve the obvious problem with your first query not being able to use an index, you would structure like this instead:
{
"_id": "sdf3f",
"votes": [
{
"userId": "12ac",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "as3vd",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "sadf3",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "asdf",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "sdsdf",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "asdfas",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
}
],
"active": true
}
This allows an index that covers this form:
db.post.ensureIndex({
"active": 1,
"votes.userId": 1,
"votes.date": 1,
"votes.type": 1
})
Actually you will probably want a few indexes to suit your usage patterns, but the point is now can have indexes you can use.
Covering the first case you have this form of query:
db.post.find({ "active": true, "votes.userId": { "$ne": "12ac" } })
That makes sense considering that you clearly are not going to have both an like and dislike option for each user. By the order of that index, at least active can be used to filter because your negating condition needs to scan everything else. No way around that with any structure.
For the other case you probably want the userId to be in an index before the date and as the first element. Then your query is quite simple:
db.post.find({ "votes.userId": "12ac" })
.sort({ "votes.userId": 1, "votes.date": 1 })
But you may be wondering that you suddenly lost something in that getting the count of "likes" and "dislikes" was as easy as testing the size of the array before, but now it's a little different. Not a problem that cannot be solved using aggregate:
db.post.aggregate([
{ "$unwind": "$votes" },
{ "$group": {
"_id": {
"_id": "$_id",
"active": "$active"
},
"likes": { "$sum": { "$cond": [
{ "$eq": [ "$votes.type", "like" ] },
1,
0
]}},
"dislikes": { "$sum": { "$cond": [
{ "$eq": [ "$votes.type", "dislike" ] },
1,
0
]}}
])
So whatever your actual usage form you can store any important parts of the document to keep in the grouping _id and then evaluate the count of "likes" and "dislikes" in an easy manner.
You may also not that changing an entry from like to dislike can also be done in a single atomic update.
There is much more you can do, but I would prefer this structure for the reasons as given.

Resources