Solr 6.0.0 - SolrCloud java example - solr

I have solr installed on my localhost.
I started standard solr cloud example with embedded zookeepr.
collection: gettingstarted
shards: 2
replication : 2
500 records/docs to process time took 115 seconds[localhost tetsing] -
why is this taking this much time to process just 500 records.
is there a way to improve this to some millisecs/nanosecs
NOTE:
I have tested the same on remote machine solr instance, localhost having data index on remote solr [inside java commented]
I started my solr myCloudData collection with Ensemble with single zookeepr.
2 solr nodes,
1 Ensemble zookeeper standalone
collection: myCloudData,
shards: 2,
replication : 2
Solr colud java code
package com.test.solr.basic;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudSolrClient;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.common.SolrInputDocument;
public class SolrjPopulatorCloudClient2 {
public static void main(String[] args) throws IOException,SolrServerException {
//String zkHosts = "64.101.49.57:2181/solr";
String zkHosts = "localhost:9983";
CloudSolrClient solrCloudClient = new CloudSolrClient(zkHosts, true);
//solrCloudClient.setDefaultCollection("myCloudData");
solrCloudClient.setDefaultCollection("gettingstarted");
/*
// Thread Safe
solrClient = new ConcurrentUpdateSolrClient(urlString, queueSize, threadCount);
*/
// Depreciated - client
//HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr");
long start = System.nanoTime();
for (int i = 0; i < 500; ++i) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("cat", "book");
doc.addField("id", "book-" + i);
doc.addField("name", "The Legend of the Hobbit part " + i);
solrCloudClient.add(doc);
if (i % 100 == 0)
System.out.println(" Every 100 records flush it");
solrCloudClient.commit(); // periodically flush
}
solrCloudClient.commit();
solrCloudClient.close();
long end = System.nanoTime();
long seconds = TimeUnit.NANOSECONDS.toSeconds(end - start);
System.out.println(" All records are indexed, took " + seconds + " seconds");
}
}

You are committing every new document, which is not necessary. It will run a lot faster if you change the if (i % 100 == 0) block to read
if (i % 100 == 0) {
System.out.println(" Every 100 records flush it");
solrCloudClient.commit(); // periodically flush
}
On my machine, this indexes your 500 records in 14 seconds. If I remove the commit() call from the for loop, it indexes in 7 seconds.
Alternatively, you can add a commitWithinMs parameter to the solrCloudClient.add() call:
solrCloudClient.add(doc, 15000);
This will guarantee your records are committed within 15 seconds, and also increase your indexing speed.

Related

My H2/C3PO/Hibernate setup does not seem to preserving prepared statements?

I am finding my database is the bottleneck in my application, as part of this it looks like Prepared statements are not being reused.
For example here method I use
public static CoverImage findCoverImageBySource(Session session, String src)
{
try
{
Query q = session.createQuery("from CoverImage t1 where t1.source=:source");
q.setParameter("source", src, StandardBasicTypes.STRING);
CoverImage result = (CoverImage)q.setMaxResults(1).uniqueResult();
return result;
}
catch (Exception ex)
{
MainWindow.logger.log(Level.SEVERE, ex.getMessage(), ex);
}
return null;
}
But using Yourkit profiler it says
com.mchange.v2.c3po.impl.NewProxyPreparedStatemtn.executeQuery() Count 511
com.mchnage.v2.c3po.impl.NewProxyConnection.prepareStatement() Count 511
and I assume that the count for prepareStatement() call should be lower, ais it is looks like we create a new prepared statment every time instead of reusing.
https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html
I am using C3po connecting poolng wehich complicates things a little, but as I understand it I have it configured correctly
public static Configuration getInitializedConfiguration()
{
//See https://www.mchange.com/projects/c3p0/#hibernate-specific
Configuration config = new Configuration();
config.setProperty(Environment.DRIVER,"org.h2.Driver");
config.setProperty(Environment.URL,"jdbc:h2:"+Db.DBFOLDER+"/"+Db.DBNAME+";FILE_LOCK=SOCKET;MVCC=TRUE;DB_CLOSE_ON_EXIT=FALSE;CACHE_SIZE=50000");
config.setProperty(Environment.DIALECT,"org.hibernate.dialect.H2Dialect");
System.setProperty("h2.bindAddress", InetAddress.getLoopbackAddress().getHostAddress());
config.setProperty("hibernate.connection.username","jaikoz");
config.setProperty("hibernate.connection.password","jaikoz");
config.setProperty("hibernate.c3p0.numHelperThreads","10");
config.setProperty("hibernate.c3p0.min_size","1");
//Consider that if we have lots of busy threads waiting on next stages could we possibly have alot of active
//connections.
config.setProperty("hibernate.c3p0.max_size","200");
config.setProperty("hibernate.c3p0.max_statements","5000");
config.setProperty("hibernate.c3p0.timeout","2000");
config.setProperty("hibernate.c3p0.maxStatementsPerConnection","50");
config.setProperty("hibernate.c3p0.idle_test_period","3000");
config.setProperty("hibernate.c3p0.acquireRetryAttempts","10");
//Cancel any connection that is more than 30 minutes old.
//config.setProperty("hibernate.c3p0.unreturnedConnectionTimeout","3000");
//config.setProperty("hibernate.show_sql","true");
//config.setProperty("org.hibernate.envers.audit_strategy", "org.hibernate.envers.strategy.ValidityAuditStrategy");
//config.setProperty("hibernate.format_sql","true");
config.setProperty("hibernate.generate_statistics","true");
//config.setProperty("hibernate.cache.region.factory_class", "org.hibernate.cache.ehcache.SingletonEhCacheRegionFactory");
//config.setProperty("hibernate.cache.use_second_level_cache", "true");
//config.setProperty("hibernate.cache.use_query_cache", "true");
addEntitiesToConfig(config);
return config;
}
Using H2 1.3.172, Hibernate 4.3.11 and the corresponding c3po for that hibernate version
With reproducible test case we have
HibernateStats
HibernateStatistics.getQueryExecutionCount() 28
HibernateStatistics.getEntityInsertCount() 119
HibernateStatistics.getEntityUpdateCount() 39
HibernateStatistics.getPrepareStatementCount() 189
Profiler, method counts
GooGooStaementCache.aquireStatement() 35
GooGooStaementCache.checkInStatement() 189
GooGooStaementCache.checkOutStatement() 189
NewProxyPreparedStatement.init() 189
I don't know what I shoud be counting as creation of prepared statement rather than reusing an existing prepared statement ?
I also tried enabling c3p0 logging by adding a c3p0 logger ands making it use same log file in my LogProperties but had no effect.
String logFileName = Platform.getPlatformLogFolderInLogfileFormat() + "songkong_debug%u-%g.log";
FileHandler fe = new FileHandler(logFileName, LOG_SIZE_IN_BYTES, 10, true);
fe.setEncoding(StandardCharsets.UTF_8.name());
fe.setFormatter(new com.jthink.songkong.logging.LogFormatter());
fe.setLevel(Level.FINEST);
MainWindow.logger.addHandler(fe);
Logger c3p0Logger = Logger.getLogger("com.mchange.v2.c3p0");
c3p0Logger.setLevel(Level.FINEST);
c3p0Logger.addHandler(fe);
Now that I have eventually got c3p0Based logging working and I can confirm the suggestion of #Stevewaldman is correct.
If you enable
public static Logger c3p0ConnectionLogger = Logger.getLogger("com.mchange.v2.c3p0.stmt");
c3p0ConnectionLogger.setLevel(Level.FINEST);
c3p0ConnectionLogger.setUseParentHandlers(false);
Then you get log output of the form
24/08/2019 10.20.12:BST:FINEST: com.mchange.v2.c3p0.stmt.DoubleMaxStatementCache ----> CACHE HIT
24/08/2019 10.20.12:BST:FINEST: checkoutStatement: com.mchange.v2.c3p0.stmt.DoubleMaxStatementCache stats -- total size: 347; checked out: 1; num connections: 13; num keys: 347
24/08/2019 10.20.12:BST:FINEST: checkinStatement(): com.mchange.v2.c3p0.stmt.DoubleMaxStatementCache stats -- total size: 347; checked out: 0; num connections: 13; num keys: 347
making it clear when you get a cache hit. When there is no cache hit yo dont get the first line, but get the other two lines.
This is using C3p0 9.2.1

how to make sure that flink job has finished executing and then perform some tasks

I want to perform some tasks after flink job is completed,I am not having any issues when I run code in Intellij but there are isssues when I run Flink jar in a shell file. I am using below line to make sure that execution of flink program is complete
//start the execution
JobExecutionResult jobExecutionResult = envrionment.execute(" Started the execution ");
is_job_finished = jobExecutionResult.isJobExecutionResult();
I am not sure, if above check is correct or not ?
Then I am using the above varible in below method to perform some tasks
if(print_mode && is_job_finished){
System.out.println(" \n \n -- System related variables -- \n");
System.out.println(" Stream_join Window length = " + WindowLength_join__ms + " milliseconds");
System.out.println(" Input rate for stream RR = " + input_rate_rr_S + " events/second");
System.out.println("Stream RR Runtime = " + Stream_RR_RunTime_S + " seconds");
System.out.println(" # raw events in stream RR = " + Total_Number_Of_Events_in_RR + "\n");
}
Any suggestions ?
You can register a job listener to execution environment.
For example
env.registerJobListener(new JobListener {
//Callback on job submission.
override def onJobSubmitted(jobClient: JobClient, throwable: Throwable): Unit = {
if (throwable == null) {
log.info("SUBMIT SUCCESS")
} else {
log.info("FAIL")
}
}
//Callback on job execution finished, successfully or unsuccessfully.
override def onJobExecuted(jobExecutionResult: JobExecutionResult, throwable: Throwable): Unit = {
if (throwable == null) {
log.info("SUCCESS")
} else {
log.info("FAIL")
}
}
})
Register a JobListener to your StreamExecutionEnvironment.
JobListener is grate program if not SQL API.
if use SQL API, onJobExecuted will never be called. I have a idea, you can refer to it. the source is Kafka, sink can use any type.
let me explain it :
EndSign: follow to last data. when your Flink job consumed it, meaning the partition element rest is empty.
Close loigcal:
When you flink job processing EndSign. job need to call JobController, then JobController counter +1
Until the JobController counter equals partition count. then JobController will check consumer group lag, ensure Flink job get all data.
Now, we know the job is finished

Getting SocketTimeoutException after only four seconds?

I'm opening a url and getting a SocketTimeoutException:
long now = System.currentTimeMillis();
try {
URL url = new URL("https://example.com");
InputStreamReader is = new InputStreamReader(url.openStream());
..
}
catch (SocketTimeoutException ex) {
long diff = System.currentTimeMillis() - now;
System.err.println("Timeout!: " + diff + "ms"); // ~4 seconds
}
java.net.SocketTimeoutException: Timeout while fetching URL: https://example.com
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:142)
but the elapsed time is only 4 seconds. This code of mine hasn't changed since February, same with the "example.com" url it's hitting (which is also under my control).
Could something have changed at a lower level by the app engine team to reduce the length of time before a timeout exception is thrown?
Thanks

Is Execution/Min in SQL Server Management Studio's Expensive Queries the number of executions or just an estimate?

We are occasionally seeing a query come up as over 11,000 in Activity Monitor under Expensive Queries.
I see that in the code a query is being executed in a loop, which I realize is not the best approach (I didn't write it but might need to fix it).
I do not think the loop is creating 11,000 iterations, more like 20 at a time. So my question is that if code executes 20 queries in say 1/550 sec, would that appear as 11,000 executions per min? Or does Activity Monitor really mean the query is really executed 11,000 times?
DataTable JobsDT = new DataTable();
DataTable oqDT = new DataTable();
DataTable ePickupDT = new DataTable();
DataTable upDT = new DataTable();
JobsDT = Q.SelectRecords("SELECT [Quote]... etc etc etc" + ((Filters.Length > 0) ? Filters : "") + ") ORDER BY " + SortBy + " " + SortDirection);
oqDT=...;
ePickupDT=...;
upDT=...;
//Merge the datatables
DT.Merge(JobsDT);
DT.Merge(oqDT);
DT.Merge(ePickupDT);
DT.Merge(upDT);
//Build cart header
Cart += "<table id='CurrentOrdersDiv' style='font-family: Arial; font-size: small;' width='100%' cellpadding='2' cellspacing='0'>";
//Build cart body
for (int row = 0; row < DT.Rows.Count; row++)
{
try { Adjustment = (Q.SelectRecords("SELECT [PriceAdjustment] FROM [Media] WHERE [PriceAdjustment] > 0 AND [Quote] = " + Convert.ToInt32(DT.Rows[row]["Quote"])).Rows.Count > 0) ? true : false; } catch { }
//Create flags
}
I don't know this for sure, but I don't think Execution/Min "extrapolates". So in your example, if 20 queries occurred in a fraction of a second (and didn't occur again during that minute), I think Execution/Min would only be averaged towards 20. You could check sys.dm_exec_query_stats.execution_count to see if the 11000 number makes sense.
Perhaps your code only executes around 20 queries per run, but is run very frequently?

SolrNet/Solr - Large set of range queries causing 400 bad request

Running Solr on Tomcat 7 on Win 2008 Server.
I am looping through a number of variables and creating a set of range queries to create a query containing more than 500 clauses.
List<ISolrQuery> queryList = new List<ISolrQuery>();
//THis is for var 1 , I have 6 sets of vars like this...
for (int n = 0; n < N; n++)
{
queryList.Add(new SolrQueryByRange<double>("VAR1_" + n, val1[n] * lowerbound, val1[n] * upperBound));
}
//...var 2
for (int n = 0; n < N; n++)
{
queryList.Add(new SolrQueryByRange<double>("VAR2_" + n, val2[n] * lowerbound, val2[n] * upperBound));
}
//...var 3... and so on...
var results = solr.Query(new SolrMultipleCriteriaQuery(queryList.ToArray<ISolrQuery>(),"OR"), new QueryOptions
{
Rows = 100,
Fields = new[] { "FileName, ID,score" },
Facet = new FacetParameters
{
Queries = new[]
{
new SolrFacetFieldQuery("Extension"),
new SolrFacetFieldQuery("FileName"),
}
}
});
I am getting a 400 bad request back from solr. The query works fine, when I run just 1 var. I am assuming this is some bool query limitation in solr. I did set the maxBoolClauseCount (from 1024) to 9999. BUt the error persists.
Any ideas?
Could it be because it is running into default GET para meter size limit of jetty?
Please refer to this answer Solr search query returning full head exception .

Resources