solr count group by - solr

I would like to "convert" a SQL query in a solr command.
I have 2 SQL tables "jobs" and "companies".
Today to count the number of jobs published by a company, I run this query :
select c.id, count(j.id)
from companies c
left join jobs j on j.client_id = c.id
group by c.id;
In other hand I have 2 collections "jobs" and "companies" there is the same fields.
How can I "convert" the query below in solr ?
I saw it's possible to make a join with a parent collection, but I don't want to create a hierarchy between job and company (it doesn't make sense).

You can't implement a direct version of the query, as Solr doesn't do joins the way you want to do joins. A possible solution would be to facet on client_id in jobs, so you get a count for each company id, then look up values from that result when you display the counts for each company (if you really need the left join part). If the client_id isn't present in the data, its count is 0. Something like facet=true&facet.field=client_id should work.
If you only need counts for companies present in the index, I'd index the company name together with the id, so that you can facet on the name directly.

Related

How to perform a LEFT JOIN in SOQL

I have a query in SQL that I want to convert to SOQL.
I know that a LEFT JOIN is not possible in SOQL. But I don't how to write this in SOQL.
I want to check Cases without Decision__c. There is a Lookup relation between Case(Id) and Decision__c (Case__c).
That would be in SQL:
Select Id
FROM Case
LEFT JOIN Decision__c D
on D.Case__c = Case.Id
WHERE Case__c IS NULL
I exported all Cases (Case) and all Decisions (Decision__c) to Excel. With a VLOOKLUP I connected the Case with the decision. An error = no linked decision.
I exported the objects in PowerQuery and performed a left join to merge the two queries. Those with no decision where easily filtered (null value).
So I got my list of Cases without Decision, but I want to know if I can get this list with a SOQL query, instead of these extra steps.
To simply put it, you must, literally, select cases without Decision__c, the query should look like this:
SELECT Id FROM Case WHERE Id NOT IN(SELECT Case__c FROM Decision__c)
Although we don't JOINs in Salesforce we can use several "subqueries" to help filter records.
refer to the following link:
https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select.htm

SQL Server Performance: LEFT JOIN vs NOT IN

The output of these two queries is the same. Just wondering if there's a performance difference between these 2 queries, if the requirement is to list down all the customers who have not purchased anything? The database used is the Northwind sample database. I'm using T-SQL.
select companyname
from Customers c
left join Orders o on c.customerid = o.customerid
where o.OrderID is null
select companyname
from Customers c
where Customerid not in (select customerid from orders)
If you want to find out empirically, I'd try them on a database with one customer who placed 1,000,000 orders.
And even then you should definitely keep in mind that the results you'll be seeing are valid only for the particular optimiser you're using (comes with particular version of particular DBMS) and for the particular physical design you're using (different sets of indexes or different detailed properties of some index might yield different performance characteristics).
Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.
But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.

Summary reports using solr

I need to build reports using solr (even though solr is search tool) is it possible to get an equivalent result from solr using
stats, group by and pivot. Or do I need to use nosql something like MongoDB?
select field1,field2,count(*) from TABLE1 group by field1,field2
select field1,max(field2),min(field2),count(field1),max(field3),sum(field4) from TABLE1 group by field1
I can achieve group wise stats when there is only one field in a group, not able to achieve the same when I want to group by for more than one field
Thanks in advance

Solr join returns zero results

I am using Solr 6.4.2. I have defined 2 cores:
companies, with the fields 'Id, Town, Name, Type, ManagerId'
users, with the fields 'Id, Login, ManagerId, Email'
In users core the field ManagerId is parent-child relation (ManagerId->Id).
Companies and users are related by companies.ManagerId->users.Id
I am trying to build a very simple join query:
{!join from=ManagerId to=Id fromIndex=users}Login:someuser1
url looks like:
?q=*:*&fq={!join%20from=ManagerId%20to=Id%20fromIndex=users}Login:someuser1
nothing works, I always get zero results. I just want to understand how Solr join works. It seems to me that there is a big difference in understanding between Solr joins and SQL joins.
In fact I want to do the queries like:
get all docs from users by company Types
get companies by user managers
Right now I always get zero results no matter how I write the join query.
Try this on the company core (assuming you want the get all companies run by a certain manager login):
login:someuser1 is the filter condition you put on the child table, this should be the manager login you are looking for
from=ManagerId should be the id on the child table, so this is wrong
to=id is the field on the parent table that relates to the child table, so this is wrong
fromIndex is the child table, this is correct
{!join from=ManagerId to=Id fromIndex=users}Login:someuser1

Query Database Vs Query Datatable

I have a situation where i want to return a count of members in a database by category i have 6 categories in all and approx 15,000 members.
Therefore is it better to query the database 6 separate times using something like "select count(*)" or is it better to return all records, returning only the category column, and then query the data resulting table for each of the 6 categories to get a count.
The first method limits the db queries to one, but returns more data which has to be processed further,
The second method queries the db six times but provides the result via less data and no further processing.
I guess what i'm asking in the database engine quicker or is .net? I'm using sqlserver 2008 with .net4
Is there any best practice or reasons people know of why i should use one method over the other?
Thanks
I understand you just only need Catetory and Count. So you can do just one time query as follow.
SELECT CATEGORY, COUNT(CATEGRORY) TOTAL_COUNT
FROM TABLE
GROUP BY CATEGORY
It doesn't seem to be a good idea to quer the DB 6 times, while you have a group by. Not to mention if you need to join the category to a category table. Besides it has the drawback that you'll have to either hardcode the categories or query the tables JUST to get the categories (if there is no separate lookup category table). If you query the database to get the 6 categories dinamically... how would you do it? With a select distinct? With a group by?
In any case just to get the categories present in all rows it'll be a heavy query. So, if you're going to perform a heavy query, at least do it in the simplest way:
select category, count(*) CategoryCount from table
group by category

Resources