SolrNet/Solr - Large set of range queries causing 400 bad request - solr

Running Solr on Tomcat 7 on Win 2008 Server.
I am looping through a number of variables and creating a set of range queries to create a query containing more than 500 clauses.
List<ISolrQuery> queryList = new List<ISolrQuery>();
//THis is for var 1 , I have 6 sets of vars like this...
for (int n = 0; n < N; n++)
{
queryList.Add(new SolrQueryByRange<double>("VAR1_" + n, val1[n] * lowerbound, val1[n] * upperBound));
}
//...var 2
for (int n = 0; n < N; n++)
{
queryList.Add(new SolrQueryByRange<double>("VAR2_" + n, val2[n] * lowerbound, val2[n] * upperBound));
}
//...var 3... and so on...
var results = solr.Query(new SolrMultipleCriteriaQuery(queryList.ToArray<ISolrQuery>(),"OR"), new QueryOptions
{
Rows = 100,
Fields = new[] { "FileName, ID,score" },
Facet = new FacetParameters
{
Queries = new[]
{
new SolrFacetFieldQuery("Extension"),
new SolrFacetFieldQuery("FileName"),
}
}
});
I am getting a 400 bad request back from solr. The query works fine, when I run just 1 var. I am assuming this is some bool query limitation in solr. I did set the maxBoolClauseCount (from 1024) to 9999. BUt the error persists.
Any ideas?

Could it be because it is running into default GET para meter size limit of jetty?
Please refer to this answer Solr search query returning full head exception .

Related

Please tell me how to fill in the fields in the list that lies on the map

I have a map
Map<Id, List<ExpenseController.MonthRow>> monthsPerKeeper = new Map<Id, List<ExpenseController.MonthRow>>();
This is how she looks:
Right now I'm only showing the months ("monthNumber") that are full.
I need to fill in all the missing ("monthNumber") and the ("amount") field for that ("monthNumber") should be 0.
It should look like this:
I tried to get the value from the map by id (that is, the List)
but I don't know how to refer to the fields in this List to fill them in
for (Id c : monthsPerKeeper.keySet()) {
if(monthsPerKeeper.containsKey(c)) {
for (ExpenseController.MonthRow a : monthsPerKeeper.get(c)) {
List<ExpenseController.MonthRow> monthsListAdd = new List<ExpenseController.MonthRow>();
for (Integer i = 1; i < 13; i++) {
if (!monthsPerKeeper.get(c).monthNumber.containsKey(i)) {
ExpenseController.MonthRow row = new ExpenseController.MonthRow();
row.monthName = ExpenseController.monthNumbers.get(i);
row.amount = 0;
row.monthNumber = i;
monthsListAdd.add(row);
}
}
}
}
}
so you have monthsListAdd with say 5 dummy months and want to add it to the original list of 7 real months? Many ways to do it, seeing how your screenshot list is unsorted and really the gaps (no expenses this month) can be anywhere... I'd probably reverse the whole thing, start with 12 placeholders with 0 amounts and just update them as I go through list...
Anyway:
After that for (ExpenseController.MonthRow a : monthsPerKeeper.get(c)) { ends add this:
monthsPerKeeper.get(c).addAll(monthsListAdd);
I stil think you'll have this in rubbish order so maybe read up about "implements Comparable" to sort your list of helper objects after they're all added up?
edit
To make it cleaner I'd use a Map where key is month number and value is... well, if all you need is Amount then probably Map<Integer,Decimal> would do. If you'd want to display more - Map<Integer, ExpenseController.MonthRow> probably. Or you know, use the fact that List could have month numbers as indexes ;)
Consider this query (works in my sandbox and returns me something but with gaps. And that's OK)
SELECT CALENDAR_MONTH(CloseDate), SUM(Amount)
FROM Opportunity
WHERE CloseDate = THIS_YEAR AND IsWon = true AND Owner.Profile.Name = 'System Administrator' AND Amount != null
GROUP BY CALENDAR_MONTH(CloseDate)
It's not even sorted - but I don't care.
Map<Integer, Decimal> myMap = new Map<Integer, Decimal>{
1 => 0,
2 => 0,
3 => 0,
4 => 0,
5 => 0,
6 => 0,
7 => 0,
8 => 0,
9 => 0,
10 => 0,
11 => 0,
12 => 0
};
for(AggregateResult ar : [SELECT CALENDAR_MONTH(CloseDate) m, SUM(Amount) a
FROM Opportunity
WHERE CloseDate = THIS_YEAR AND IsWon = true AND Owner.Profile.Name = 'System Administrator' AND Amount != null
GROUP BY CALENDAR_MONTH(CloseDate)]){
myMap.put((Integer) ar.get('m'), (Decimal) ar.get('a'));
}
for(Integer i = 1; i < 13; ++i){
System.debug(i + ' ' + DateTime.newInstance(2022,i,1).format('MMMM') + ': ' + myMap.get(i));
// instead of debug you'd have your someList.add(new ExpenseController.MonthRow(...) or whatever
}

Create array of "deep" struct (scalar) fields

How can I collapse the values of "deep" struct fields into arrays by just indexing?
In the example below, I can only do it for the "top-most" level, and for "deeper" levels I get the error:
"Expected one output from a curly brace or dot indexing expression, but there were XXX results."
The only workaround I found so far is to unfold the operation into several steps, but the deeper the structure the uglier this gets...
clc; clear variables;
% Dummy data
my_struc.points(1).fieldA = 100;
my_struc.points(2).fieldA = 200;
my_struc.points(3).fieldA = 300;
my_struc.points(1).fieldB.subfieldM = 10;
my_struc.points(2).fieldB.subfieldM = 20;
my_struc.points(3).fieldB.subfieldM = 30;
my_struc.points(1).fieldC.subfieldN.subsubfieldZ = 1;
my_struc.points(2).fieldC.subfieldN.subsubfieldZ = 2;
my_struc.points(3).fieldC.subfieldN.subsubfieldZ = 3;
my_struc.info = 'Note my_struc has other fields besides "points"';
% Get all fieldA values by just indexing (this works):
all_fieldA_values = [my_struc.points(:).fieldA]
% Get all subfieldM values by just indexing (doesn't work):
% all_subfieldM_values = [my_struc.points(:).fieldB.subfieldM]
% Ugly workaround:
temp_array_of_structs = [my_struc.points(:).fieldB];
all_subfieldM_values = [temp_array_of_structs.subfieldM]
% Get all subsubfieldZ values by just indexing (doesn't work):
% all_subsubfieldZ_values = [my_struc.points(:).fieldC.subfieldN.subsubfieldZ]
% Ugly workaround:
temp_array_of_structs1 = [my_struc.points(:).fieldC];
temp_array_of_structs2 = [temp_array_of_structs1.subfieldN];
all_subsubfieldZ_values = [temp_array_of_structs2.subsubfieldZ]
Output:
all_fieldA_values =
100 200 300
all_subfieldM_values =
10 20 30
all_subsubfieldZ_values =
1 2 3
Thanks for any help!
You can use arrayfun to have acces to each individual 'point', and then acces its data. This will return an array with the same dimensions as my_struc.points:
all_subfieldM_values = arrayfun(#(in) in.fieldB.subfieldM, my_struc.points)
all_subsubfieldZ_values = arrayfun(#(in) in.fieldC.subfieldN.subsubfieldZ, my_struc.points)
Not optimal, but at least it's one line.

Solr 6.0.0 - SolrCloud java example

I have solr installed on my localhost.
I started standard solr cloud example with embedded zookeepr.
collection: gettingstarted
shards: 2
replication : 2
500 records/docs to process time took 115 seconds[localhost tetsing] -
why is this taking this much time to process just 500 records.
is there a way to improve this to some millisecs/nanosecs
NOTE:
I have tested the same on remote machine solr instance, localhost having data index on remote solr [inside java commented]
I started my solr myCloudData collection with Ensemble with single zookeepr.
2 solr nodes,
1 Ensemble zookeeper standalone
collection: myCloudData,
shards: 2,
replication : 2
Solr colud java code
package com.test.solr.basic;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudSolrClient;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.common.SolrInputDocument;
public class SolrjPopulatorCloudClient2 {
public static void main(String[] args) throws IOException,SolrServerException {
//String zkHosts = "64.101.49.57:2181/solr";
String zkHosts = "localhost:9983";
CloudSolrClient solrCloudClient = new CloudSolrClient(zkHosts, true);
//solrCloudClient.setDefaultCollection("myCloudData");
solrCloudClient.setDefaultCollection("gettingstarted");
/*
// Thread Safe
solrClient = new ConcurrentUpdateSolrClient(urlString, queueSize, threadCount);
*/
// Depreciated - client
//HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr");
long start = System.nanoTime();
for (int i = 0; i < 500; ++i) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("cat", "book");
doc.addField("id", "book-" + i);
doc.addField("name", "The Legend of the Hobbit part " + i);
solrCloudClient.add(doc);
if (i % 100 == 0)
System.out.println(" Every 100 records flush it");
solrCloudClient.commit(); // periodically flush
}
solrCloudClient.commit();
solrCloudClient.close();
long end = System.nanoTime();
long seconds = TimeUnit.NANOSECONDS.toSeconds(end - start);
System.out.println(" All records are indexed, took " + seconds + " seconds");
}
}
You are committing every new document, which is not necessary. It will run a lot faster if you change the if (i % 100 == 0) block to read
if (i % 100 == 0) {
System.out.println(" Every 100 records flush it");
solrCloudClient.commit(); // periodically flush
}
On my machine, this indexes your 500 records in 14 seconds. If I remove the commit() call from the for loop, it indexes in 7 seconds.
Alternatively, you can add a commitWithinMs parameter to the solrCloudClient.add() call:
solrCloudClient.add(doc, 15000);
This will guarantee your records are committed within 15 seconds, and also increase your indexing speed.

Text mining Clustering Analysis in R - Error :Two dimensional array

I'm trying to follow a document that has some code on text mining clustering analysis.
I'm fairly new to R and the concept of text mining/clustering so please bear with me if i sound illiterate.
I create a simple matrix called dtm and then run kmeans to produce 3 clusters. The code im having issues is where a function has been defined to get "five most common words of the documents in the cluster"
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
Upon running this code i get the following error:
Error in colSums(df[clusters == n, ]) :
'x' must be an array of at least two dimensions
Could someone help me fix this please?
Thank you.
I can't reproduce your error, it works fine for me. Update your question with a reproducible example and you might get a more useful answer. Perhaps your input data object is empty, what do you get with dim(dtm0.75)?
Here it is working fine on the data that comes with the tm package:
library(tm)
data(crude)
dt0.75 <- DocumentTermMatrix(crude)
dtm0.75 = as.matrix(dt0.75)
dim(dtm0.75)
kmeans.result = kmeans(dtm0.75, 3)
perClusterCounts = function(df, clusters, n)
{
v = sort(colSums(df[clusters == n, ]),
decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
d[1:5, ]
}
perClusterCounts(dtm0.75, kmeans.result$cluster, 1)
word freq
the the 69
and and 25
for for 12
government government 11
oil oil 10

Mom file creation (5 product limit)

Ok, I realize this is a very niche issue, but I'm hoping the process is straight forward enough...
I'm tasked with creating a data file out of Customer/Order information. Problem is, the datafile has a 5 product max limit.
Basically, I get my data, group by cust_id, create the file structure, within that loop, group by product_id, rewrite the fields in previous file_struct with new product info. That's worked all well and good until a user exceeded that max.
A brief example.. (keep in mind, the structure of the array is set by another process, this CANNOT change)
orderArray = arranyew(2);
set order = 1;
loop over cust_id;
field[order][1] = "field(1)"; // cust_id
field[order][2] = "field(2)"; // name
field[order][3] = "field(3)"; // phone
field[order][4] = ""; // product_1
field[order][5] = ""; // quantity_1
field[order][6] = ""; // product_2
field[order][7] = ""; // quantity_2
field[order][8] = ""; // product_3
field[order][9] = ""; // quantity_3
field[order][10] = ""; // product_4
field[order][11] = ""; // quantity_4
field[order][12] = ""; // product_5
field[order][13] = ""; // quantity_5
field[order][14] = "field(4)"; // trx_id
field[order][15] = "field(5)"; // total_cost
counter = 0;
loop over product_id
field[order[4+counter] = productCode;
field[order[5+counter] = quantity;
counter = counter + 2;
end inner loop;
order = order + 1;
end outer loop;
Like I said, this worked fine until I had a user who ordered more than 5 products.
What I basically want to do is check the number of products for each user if that number is greater than 5, start a new line in the text field, but I'm stufk on how to get there.
I've tried numerous fixes, but nothing gives the results I need.
I can send the entire file if It can help, but I don't want to post it all here.
You need to move the inserting of the header and footer fields into product loop eg. the custid and trx_id fields.
Here's a rough idea of one why you can go about this based on the pseudo code you provided. I'm sure that there are more elegant ways that you could code this.
set order = 0;
loop over cust_id;
counter = 1;
order = order + 1;
loop over product_id
if (counter == 1 || counter == 6) {
if (counter == 6) {
counter == 1;
order= order+1;
}
field[order][1] = "field(1)"; // cust_id
field[order][2] = "field(2)"; // name
field[order][3] = "field(3)"; // phone
}
field[order][counter+3] = productCode; // product_1
field[order][counter+4] = quantity; // quantity_1
counter = counter + 1;
if (counter == 6) {
field[order][14] = "field(4)"; // trx_id
field[order][15] = "field(5)"; // total_cost
}
end inner loop;
if (counter == 6) {
// loop here to insert blank columns and the totals field to fill out the row.
}
end outer loop;
One thing goes concern me. If you start a new line every five products then your transaction id and total cost is going to be entered into the file more than once. You know the receiving system. It may be a non-issue.
Hope this helps
As you put the data into the row, you need check if there are more than 5 products and then create an additional line.
loop over product_id
if (counter mod 10 == 0 and counter > 0) {
// create the new row, and mark it as a continuation of the previous order
counter = 0;
order = order + 1;
field[order][1] = "";
...
field[order][15] = "";
}
field[order[4+counter] = productCode;
field[order[5+counter] = quantity;
counter = counter + 2;
end inner loop;
I've actually done the export from an ecommerce system to MOM, but that code has since been lost. I have samples of code in classic ASP.

Resources