What is the cause of [alert] HTTP: unable to determine chunk size - benchmarking

I'm getting this error with siege, I'm not too sure what the cause of it is:
[alert] HTTP: unable to determine chunk size - this seems to occur randomly, but when it does happen, it occurs multiple times in a row.
Would it impact the results of my bench marking at all?

as i have the same error and siege is a load testing tool:
i assume you got blocked by the server due to excessive requests.
and it will impact your results.

Related

Index on array element attribute extremely slow

I'm new to mongodb but not new to databases. I created a collection of documents that look like this:
{_id: ObjectId('5e0d86e06a24490c4041bd7e')
,
,
match[{
_id: ObjectId(5e0c35606a24490c4041bd71),
ts: 1234456,
,
,}]
}
So there is a list of objects on the documents and within the list there might be many objects with the same _id field. I have a handful of documents in this collection and my query that selects on selected match._id's is horribly slow. I mean unnaturally slow.
Query is simply this: {match: {$elemMatch: {_id:match._id }}} and literally hangs the system for like 15 seconds returning 15 matching documents out of 25 total!
I put an index on the collection like this:
collection.createIndex({"match._id" : 1}) but that didn't help.
Explain says execution time is 0 and says it's using the index but it still takes 15 seconds or longer to complete.
I'm getting the same slowness in nodejs and in compass.
Explain Output:
{"explainVersion":"1","queryPlanner":{"namespace":"hp-test-39282b3a-9c0f-4e1f-b953-0a14e00ec2ef.lead","indexFilterSet":false,"parsedQuery":{"match":{"$elemMatch":{"_id":{"$eq":"5e0c3560e5a9e0cbd994fa52"}}}},"maxIndexedOrSolutionsReached":false,"maxIndexedAndSolutionsReached":false,"maxScansToExplodeReached":false,"winningPlan":{"stage":"FETCH","filter":{"match":{"$elemMatch":{"_id":{"$eq":"5e0c3560e5a9e0cbd994fa52"}}}},"inputStage":{"stage":"IXSCAN","keyPattern":{"match._id":1},"indexName":"match._id_1","isMultiKey":true,"multiKeyPaths":{"match._id":["match"]},"isUnique":false,"isSparse":false,"isPartial":false,"indexVersion":2,"direction":"forward","indexBounds":{"match._id":["[ObjectId('5e0c3560e5a9e0cbd994fa52'), ObjectId('5e0c3560e5a9e0cbd994fa52')]"]}}},"rejectedPlans":[]},"executionStats":{"executionSuccess":true,"nReturned":15,"executionTimeMillis":0,"totalKeysExamined":15,"totalDocsExamined":15,"executionStages":{"stage":"FETCH","filter":{"match":{"$elemMatch":{"_id":{"$eq":"5e0c3560e5a9e0cbd994fa52"}}}},"nReturned":15,"executionTimeMillisEstimate":0,"works":16,"advanced":15,"needTime":0,"needYield":0,"saveState":0,"restoreState":0,"isEOF":1,"docsExamined":15,"alreadyHasObj":0,"inputStage":{"stage":"IXSCAN","nReturned":15,"executionTimeMillisEstimate":0,"works":16,"advanced":15,"needTime":0,"needYield":0,"saveState":0,"restoreState":0,"isEOF":1,"keyPattern":{"match._id":1},"indexName":"match._id_1","isMultiKey":true,"multiKeyPaths":{"match._id":["match"]},"isUnique":false,"isSparse":false,"isPartial":false,"indexVersion":2,"direction":"forward","indexBounds":{"match._id":["[ObjectId('5e0c3560e5a9e0cbd994fa52'), ObjectId('5e0c3560e5a9e0cbd994fa52')]"]},"keysExamined":15,"seeks":1,"dupsTested":15,"dupsDropped":0}},"allPlansExecution":[]},"command":{"find":"lead","filter":{"match":{"$elemMatch":{"_id":"5e0c3560e5a9e0cbd994fa52"}}},"skip":0,"limit":0,"maxTimeMS":60000,"$db":"hp-test-39282b3a-9c0f-4e1f-b953-0a14e00ec2ef"},"serverInfo":{"host":"Dans-MacBook-Pro.local","port":27017,"version":"5.0.9","gitVersion":"6f7dae919422dcd7f4892c10ff20cdc721ad00e6"},"serverParameters":{"internalQueryFacetBufferSizeBytes":104857600,"internalQueryFacetMaxOutputDocSizeBytes":104857600,"internalLookupStageIntermediateDocumentMaxSizeBytes":104857600,"internalDocumentSourceGroupMaxMemoryBytes":104857600,"internalQueryMaxBlockingSortMemoryUsageBytes":104857600,"internalQueryProhibitBlockingMergeOnMongoS":0,"internalQueryMaxAddToSetBytes":104857600,"internalDocumentSourceSetWindowFieldsMaxMemoryBytes":104857600},"ok":1}
The explain output confirms that the operation that was explained is perfectly efficient. In particular we see:
The expected index being used with tight indexBounds
Efficient access of the data (totalKeysExamined == totalDocsExamined == nReturned)
No meaningful duration ("executionTimeMillis":0 which implies that the operation took less than 0.5ms for the database to execute)
Therefore the slowness that you're experiencing for that particular operation is not related to the efficiency of the plan itself. This doesn't always rule out the database (or its underlying server) as the source of the slowness completely, but it is usually a pretty strong indicator that either the problem is elsewhere or that there are multiple factors at play.
I would suggest the following as potential next steps:
Check the mongod log file (you can confirm its location by running db.adminCmd("getCmdLineOpts") via the shell connected to the instance). By default any operation slower than 100ms is captured. This will help in a variety of ways:
If there is a log entry (with a meaningful duration) then it confirms that the slowness is being introduced while the database is processing the operation. It could also give some helpful hints as to why that might be the case (waiting for locks or server resources such as storage for example).
If an associated entry cannot be found, then that would be significantly stronger evidence that we are looking in the wrong place for the source of the slowness.
Is the operation that you gathered explain for the exact one that the application and Compass are observing as being slow? Were you connected to the same server and namespace? Is the explained operation simplified in some way, such as the original operation containing sort, projection, collation, etc?
As a relevant example that combines these two, I notice that there are skip and limit parameters applied to the command explained on a mongod seemingly running on a laptop. Are those parameters non-zero when running the application and does the application run against a different database with a larger data set?
The explain command doesn't include everything that an application would. Notably absent is the actual time it takes to send the results across the network. If you had particularly large documents that could be a factor, though it seems unlikely to be the culprit in this particular situation.
How exactly are you measuring the full execution time? Does it potentially include the time to connect to the database? In this case you mentioned that Compass itself also demonstrates the slowness, so that may rule out most of this.
What else is running on the server hosting the database? Is there a container or VM involved? Would the database or the underlying server be experiencing resource contention due to concurrency?
Two additional minor asides:
25 total documents in a collection is extremely small. I would expect even the smallest hardware to be able to process such a request without an index unless there was some complicating factor.
Assuming that match is always an array then the $elemMatch operator is not strictly necessary for this particular query. You can read more about that here. I would not expect this to have a performance impact for your situation.

Interpretation of Steps (1/2) failed in Snowflake

I'm running an sql query on snowflake, which has been running for more than 24 hours.
When I go to 'Profile Overview', I see Steps (1/2, Failed):
However, under 'Details' tab, I see that the status of the query is still 'Running'.
Can someone please explain whether the query is still running or has there been some error ?
This must be retrying after failing on a first attempt. For example, it might have failed on a memory error and would retry internally twice (total three tries) before either successfully completing or failing with an error (in case all tries fail).
You may reach out to Snowflake Support for looking into this.
Your query is likely resource (memory) intensive. Something to check in your query would include:
- large sort or large number of sort?
- large group by (or many aggregates?)
- Many Windowing functions with order by?
For a query to be running for many hours, likely it would be best to dig deeper either with help of support (open case), or look for opportunity to review the query pattern.

GMail API: Queuing tasks so they don't get rate limited

I am trying to get the emails of a bunch of users on our service. I am first getting a list of messages, and if the message is not in the DataStore, then we fetch them. However, I'm using the deferred library to avoid the DeadlineExceeded error. The current algorithm is:
Put each user task on a queue
For each user, get the list of messages
For each 10 messages from this list, enqueue to fetch the messages 10 at a time.
However, I realized that this also exceeds the rate limit since I could be doing more than 10 queries/sec. When I tried to do only 1 message at a time instead of 10, and included getting the list of messages (which makes 1 network request for each page of emails), I got an error saying I was using too much memory and my process was shut down.
What is the best algorithm so I can ensure I am always under 10 qps to GMail and yet not run out of memory?
I don't think hitting the rate limit is a big deal, just make sure you handle the error and slow down a little in that case. Fetching messages in batches of 10 seems fine.
If you run out of memory in the scenario that you described, that means you have a memory leak or an infinite loop in your code. 10 queries can be easily processed on the smallest instance possible.

Solr indexing - Master/Slave replication, how to handle huge index and high traffic?

I'm currently facing an issue with SOLR (more exactly with the slaves replication) and after having spent quite a few time reading online I find myself having to ask for some enlightenment.
- Does Solr have some limitation in size for its index?
When dealing with a single master, when is it the right moment to decide to use multi cores or multi indexes?
Is there any indications on when reaching a certain size of index, partitioning is recommended?
- Is there any max size when replicating segments from master to slave?
When replicating, is there a segment size limit when the slave won't be able to download the content and index it? What is the threshold to which a slave won't be able to replicate when there's a lot of traffic to retrieve info and lots of new documents to replicate.
To be more factual, here is the context that led me to these questions:
We want to index a fair amount of documents, but when the amount reaches more than a dozen millions, the slaves can't handle it and start failing replicating with a SnapPull error.
The documents are composed with a few text fields (name, type, description, ... about 10 other fields of let's say 20 characters max).
We have one master, and 2 slaves which replicate data from the master.
This is my first time working with Solr (I work usually on webapps using spring, hibernate... but no use of Solr), so I'm not sure how to tackle this issue.
Our idea is for the moment to add multiple cores to the master, and have a slave replicating from each of this core.
Is it the right way to go?
If it is, how can we determine the number of cores needed? Right now we're just going to try and see how it behaves and adjust if necessary, but I was wondering if there was any best practices or some benchmarks that have been done on this specific topic.
For this amount of documents with this average size, x cores or indexes are needed ...
Thanks for any help in how I could deal with a huge amount of documents of average size!
Here is a copy of the error that is being thrown when a slave is trying to replicate:
ERROR [org.apache.solr.handler.ReplicationHandler] - <SnapPull failed >
org.apache.solr.common.SolrException: Index fetch failed :
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.RuntimeException: java.io.IOException: read past EOF
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 11 more
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70)
at org.apache.lucene.index.SegmentInfos$2.doBody(SegmentInfos.java:410)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:538)
at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:402)
at org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:791)
at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:404)
at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:352)
at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:413)
at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:424)
at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:35)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1049)
... 14 more
EDIT:
After Mauricio's answer, the solr libraries have been updated to 1.4.1 but this error was still raised.
I increased the commitReserveDuration and even if the "SnapPull Failed" error seems to have disappeared, another one started being raised, not sure about why as I can't seem to find much answer on the web:
ERROR [org.apache.solr.servlet.SolrDispatchFilter] - <ClientAbortException: java.io.IOException
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:370)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:323)
at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:396)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:385)
at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)
at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:183)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.java:48)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:322)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:837)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.IOException
at org.apache.coyote.http11.InternalAprOutputBuffer.flushBuffer(InternalAprOutputBuffer.java:703)
at org.apache.coyote.http11.InternalAprOutputBuffer$SocketOutputBuffer.doWrite(InternalAprOutputBuffer.java:733)
at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)
at org.apache.coyote.http11.InternalAprOutputBuffer.doWrite(InternalAprOutputBuffer.java:539)
at org.apache.coyote.Response.doWrite(Response.java:560)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:365)
... 22 more
>
ERROR [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/].[SolrServer]] - <Servlet.service() for servlet SolrServer threw exception>
java.lang.IllegalStateException
at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:405)
at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:362)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:837)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
at java.lang.Thread.run(Thread.java:595)
I still wonder what are the best practices to handle a big index (more than 20G) containing a lot of documents with solr. Am I missing some obvious links somewhere? Tutorials, documentations?
Cores are a tool primarily used to have different schemas in a single Solr instance. Also used as on-deck indexes. Sharding and replication are orthogonal issues.
You mention "a lot of traffic". That's a highly subjective measure. Instead, try to determinate how many QPS (queries per second) you need from Solr. Also, does a single Solr instance answer your queries fast enough? Only then can you determine if you need to scale out. A single Solr instance can handle a lot of traffic, maybe you don't even need to scale.
Make sure you run Solr on a server with plenty of memory (and make sure Java has access to it). Solr is quite memory-hungry, if you put it on a memory-constrained server, performance will suffer.
As the Solr wiki explains, use sharding if a single query takes too long to run, and replication if a single Solr instance can't handle the traffic. "Too long" and "traffic" depend on your particular application. Measure them.
Solr has lots of settings that affect performance: cache auto-warming, stored fields, merge factor, etc. Check out SolrPerformanceFactors.
There are no hard rules here. Every application has different search needs. Simulate and measure for your particular scenario.
About the replication error, make sure you're running 1.4.1 since 1.4.0 had a bug with replication.

What does the sys.dm_exec_query_optimizer_info "timeout" record indicate?

During an investigation of some client machines losing their connection with SQL Server 2005, I ran into the following line of code on the web:
Select * FROM sys.dm_exec_query_optimizer_info WHERE counter = 'timeout'
When I run this query on our server - we are getting the following results:
counter - occurrence - value
timeout - 9100 - 1
As far as I can determine, this means that the query optimizer is timing out while trying to optimize queries run against our server – 9100 times. We are however, not seeing any timeout errors in the SQL Server error log, and our end-users have not reported any timeout specific errors.
Can anyone tell me what this number of “occurrences” means? Is this an issue we should be concerned about?
This counter is nothing to do with your connection issues.
SQL Server won't spend forever trying to compile the best possible plan (at least without using trace flags).
It calculates two values at the beginning of the optimisation process.
Cost of a good enough plan
Maximum time to spend on query optimisation (this is measured in number of transformation tasks carried out rather than clock time).
If a plan with a cost lower than the threshold is found then it needn't continue optimising. Also if it exceeds the number of tasks budgeted then optimisation will also end and it will return the best plan found so far.
The reason that optimisation finished early shows up in the execution plan in the StatementOptmEarlyAbortReason attribute. There are actually three possible values.
Good enough plan found
Timeout
Memory Limit Exceeded.
A timeout will increment the counter you ask about in sys.dm_exec_query_optimizer_info.
Further Reading
Reason for Early Termination of Statement
Microsoft SQL Server 2014 Query Tuning & Optimization
The occurence column will tell you the number of times that counter has been incremented and the value column is an internal column for this counter.
See here
Sorry, the documentation say this is internal only.
Based on the other link, I suspect this is for internal engine timeouts (eg SET QUERY_GOVERNOR_COST_LIMIT)
A client timeout will also not be logged in SQL because the client aborts the batch, ths stopping SQL processing.
Please do you have more details?

Resources