How to count multiple fields with group by another field in solr - solr

I have solr document which is like below.
agentId : 100
emailDeliveredDate : 2018-02-08,
emailSentDate : 2018-02-07
agentId : 100
emailSentDate : 2018-02-06
agentId : 101
emailDeliveredDate : 2018-02-08,
emailSentDate : 2018-02-07
I need a result like below.
agentId : 100
emailDeliveredDate : 1,
emailSentDate : 2
agentId : 101
emailDeliveredDate : 1,
emailSentDate : 1
In mysql it will be :
select count(emailDeliveredDate),count(emailSentDate) group by agentId;
I need help in solr for this.

I did not get any way in Solr which can help me. So I used facet with pivot which gave me half results. Rest half calculation I did in Java.

Related

CAST string as DATETIME

I have the following query :
SELECT '31/12/1999 00:00:00' AS ODSUpdatedate
I want to ass a derived column ODSUpdatedate like below :
I am getting this following error :
Change the expression to :
(DT_DATE) "31/12/1999 00:00:00"

Nested IF ELSE in a derived column

I have the following logic to store the date in BI_StartDate as below:
If UpdatedDate is not null then BI_StartDate=UpddatedDate
ELSE BI_StartDate takes EntryDate value , if the EntryDate is null
then BI_StartDate=CreatedDate
If the CreatedDate IS NULL then BI_StartDate=GetDATE()
I am using a derived column as seen below:
ISNULL(UpdatedDateODS) ? EntryDateODS : (ISNULL(EntryDateODS) ? CreatedDateODS :
(ISNULL(CreatedDateODS) ? GETDATE() ))
I am getting this error:
The expression "ISNULL(UpdatedDateODS) ? EntryDateODS :
(ISNULL(EntryDateODS) ? CreatedDateODS :(ISNULL(CreatedDateODS) ?
GETDATE() ))" on "Derived Column.Outputs[Derived Column
Output].Columns[Derived Column 1]" is not valid.
You are looking the first non-null which is a coalesce which doesn't exist in SSIS Data Flow (derived Column).
I'd suggest a very simple script component:
Row.BIStartDate = Row.UpdateDate ?? Row.EntryDate ?? Row.CreatedDate ?? DateTime.Now;
This is the Input Columns Screen:
This is the Inputs and Outputs:
And then you add the above code to Row Processing section:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
/*
* Add your code here
*/
Row.BIStartDate = Row.UpdateDate ?? Row.EntryDate ?? Row.CreatedDate ?? DateTime.Now;
}
From syntax perspective, the nested if-else condition is not written well, since you have to make sure that all possible output should have the same data type, also you didn't mentioned the last "else" condition:
ISNULL(UpdatedDateODS) ? EntryDateODS : (ISNULL(EntryDateODS) ? CreatedDateODS :
(ISNULL(CreatedDateODS) ? GETDATE() : **<missing>** ))
From logical perspective, you the expression may throw exception since you are using EntryDateODS column if ISNULL(UpdatedDateODS) is true, while you should check if EntryDateODS is not null before using it, I suggest that the expression is as following:
ISNULL(UpdatedDateODS) ? UpdatedDateODS : (ISNULL(EntryDateODS) ? EntryDateODS :
(ISNULL(CreatedDateODS) ? CreatedDateODS : GETDATE() ))
As mentioned above, if UpdatedDateODS , EntryDateODS, CreatedDateODS and GETDATE() don't have the same data type then you should cast to a unified data type as example:
ISNULL(UpdatedDateODS) ? (DT_DATE)UpdatedDateODS : (ISNULL(EntryDateODS) ? (DT_DATE)EntryDateODS :
(ISNULL(CreatedDateODS) ? (DT_DATE)CreatedDateODS : (DT_DATE)GETDATE() ))

MongoDb groupby count query

I have a mongoDB document having certain columns like Id, EmployeeID, SiteID, EmployeeAddress.
A employee can be present at a site more than once.
I want to have a group by query along with count which will give result set as
EmployeeID SiteID Count EmployeeAddress
basically how many times an employee is present as a site.
I am using this query but not getting the desired data.
db.pnr_dashboard.aggregate(
[
{
"$group" : {
"_id" : { "siteId" : "$siteId" , "employeeId" : "$employeeId"} ,
"count" : { "$sum":1}},
}
]
);

Solr demote all documents with condition

I want to demote all documents that have inv=0(possible values from 0 to 1000) to the end of the result set. i have got other sorting options like name desc also as part of the query.
For example below are my solr documents
Doc1 : name=apple , Inv=2
Doc2 : name=ball , Inv=1
Doc3 : name=cat , Inv=0
Doc4 : name=dog , Inv=0
Doc5 : name=fish , Inv=4
Doc6 : name=Goat , Inv=5
I want achieve below sorting ...here, i want to push all documents with inv=0 down to bottom and then apply "name asc" sorting.
Doc1
Doc2
Doc5
Doc6
Doc3
Doc4
my solr request is like
bq: "(: AND -inv:"0")^999.0" & defType: "edismax"
here 999 is the rank that i gave to demote results.
this boosting query works fine. it moves all documents with inv=0 down to the bottom.
But when i add &sort=name asc to the solr query, it prioritizes "sort" over bq..i am seeing below results with "name asc".
Doc1 : name=apple , Inv=2
Doc2 : name=ball , Inv=1
Doc3 : name=cat , Inv=0
Doc4 : name=dog , Inv=0
Doc5 : name=fish , Inv=4
Doc6 : name=Goat , Inv=5
can anyone please help me out. ?
The easy solution: You can just sort by inv first, then the other values. This requires that inv only have true (1) or false (0) values. I guess that is not the case, so:
You can sort by a function query - and you can use the function if to return different values based on whether the value is set or not:
sort=if(inv, 1, 0) desc, name desc
If Solr fails to resolve inv by itself, you can use field(inv), but it shouldn't be necessary.
Another option is to use the function min to get either 1 or 0 for the sort field, depending on whether it's in inventory or not:
sort=min(1, inv)

NoSQL store re-orderable list of elements

I have a NoSQL setup as follows:
UserId : ContentId : OrderId
User1 : Content1 : 0
User1 : Content2 : 1
User2 : Content3 : 0
User2 : Content4 : 1
User2 : Content5 : 2
User2 : Content6 : 3
User2 : Content7 : 4
I get the list of User2 items sorted by order which
SELECT * FROM table WHERE UserId = 'User2' SORT BY OrderId DESC
which results in
UserId : ContentId : OrderId
User2 : Content3 : 0
User2 : Content4 : 1
User2 : Content5 : 2
User2 : Content6 : 3
User2 : Content7 : 4
Great! Now I want to swap so that the table looks like this:
UserId : ContentId : OrderId
User2 : Content3 : 0
User2 : Content6 : 3
User2 : Content4 : 1
User2 : Content5 : 2
User2 : Content7 : 4
So I move Content6 to after Content3 and before Content4. The drawback now is that to update the OrderId I have to update every row after Content3 resulting in multiple writes to the datastore.
What is a better way of doing this in a NoSQL database?
You can solve this with a more sophisticated algorithm, you can create a big gap between the keys and then you could move item from one place to inbetween other keys.
After a while some spaces may run out so the algorithm in this case will have to normalize the table and even the gaps between the keys, resulting in a one time procedure which will be a bit heavier on the database. this can be done periodically or on demand when you detect that you are running/ran out of space for example.
So the original table would look like:
Before
UserId : ContentId : OrderId
User2 : Content3 : 0
User2 : Content4 : 1000
User2 : Content5 : 2000
User2 : Content6 : 3000
User2 : Content7 : 4000
After
UserId : ContentId : OrderId
User2 : Content3 : 0
User2 : Content6 : 500
User2 : Content4 : 1000
User2 : Content5 : 2000
User2 : Content7 : 4000
There is no problem in massive updating in a good NoSQL solution, as under the hood an update looks like an append to a file which is commonly called Write Ahead Log. For example, if you issue 1000000 updates and each of them is, say, 32 bytes in size, then it will result in just writing 32MB to a file which can be done even on magnetic disks in less than 1 second. Moreover if all of those updates are in one transaction then this should be exactly one write/writev syscall with a large buffer.

Resources