I discovered a troubling anomaly this morning while reviewing our monthly Sitecore Analytics reports. Our average 'time on site' this month reach an average of about 9 minutes. This is up from an average of around 1-2 minutes for the previous month.
My first reaction was "great, looks like we're doing better this month", but after further investigation, it appears as though each and every visit to the site is recording a 20-25 minute 'time on site' statistic - even for single page visits.
Has anyone experienced this before? It appears as though the addition of a SessionEnd processor causes Sitecore to keep each and every Session alive for the default duration of 20 minutes. If that's true, how does one add a custom SessionEnd pipeline processor without affecting the 'time on site' statistic for every visit?
Sitecore version: 6.4.1 Update 1
UPDATE
Unfortunately, site traffic is still being recorded above 20 minutes for each visit... and this is with the custom SessionEnd processor completely removed. I am currently investigating other possible causes.
UPDATE 2
We are seeing many Analytics warning message appear in our logs that look like the following:
Analystics: Max size of insert queue reached. Dropped 3826.
I now believe this is somehow related...
UPDATE 3
I discovered that the 'time on site' statistics would go back to normal after restarting the Sitecore application. From there, the average time on site would gradually climb at a rate of about 1 minute every 10 minutes or so until leveling out around 20 minutes. I believe that's around the same time we start seeing the 'Max size of insert queue reached' warnings in our logs.
I also discovered that the actual 'time on site' figure is calculated from the average time-span between the [Session].[Timestamp] and [Session].[LastPageTimestamp] columns in the [Sessions] table. What's interesting here is that the newest records entering the Session table appear to have a [LastPageTimestamp] of the actual time they are inserted into the table. It's as if the INSERT statement uses GETDATE() to stamp each record as they are inserted into the database. If that's true, then I think I found the culprit. I believe I have a performance issue on my hands, and to make matters worse, the queued Sessions are being inserting into the database incorrectly.
Don't know the answer to this question... but the first thing I would do is take apart the existing sessionEnd pipeline code with Reflector and see if it's doing something tricky that you are essentially undoing by adding another processor.
In my web.config, the only processor seems to be Sitecore.Pipelines.SessionEndSaveRecentDocuments.
Related
What does it mean there is a longer time for COMPILATION_TIME, QUEUED_PROVISIONING_TIME or both more than usual?
I have a query runs every couple of minutes and it usually takes less than 200 milliseconds for compilation and 0 for provisioning. There are 2 instances in the last couple of days the values are more than 4000 for compilation and more than 100000 for provisioning.
Is that mean warehouse was being resumed and there was a hiccup?
COMPILATION_TIME:
The SQL is parsed and simplified, and the tables meta data is loaded. Thus a compile for select a,b,c from table_name will be fractally faster than select * from table_name because the meta data is not needed from every partition to know the final shape.
Super fragmented tables, can give poor compile performance as there is more meta data to load. Fragmentation comes from many small writes/deletes/updates.
Doing very large INSERT statements can give horrible compile performance. We did a lift-and-shift and did all data loading via INSERT, just avoid..
PRIOVISIONING_TIME is the amount of time to setup the hardware, this occurs for two main reasons ,you are turning on 3X, 4X, 5X, 6X servers and it can take minutes just to allocate those volume of servers.
Or there is failure, sometime around releases there can be a little instability, where a query fails on the "new" release, and query is rolled back to older instances, which you would see in the profile as 1, 1001. But sometimes there has been problems in the provisioning infrastructure (I not seen it for a few years, but am not monitoring for it presently).
But I would think you will mostly see this on a on going basis for the first reason.
The compilation process involves query parsing, semantic checks, query rewrite components, reading object metadata, table pruning, evaluating certain heuristics such as filter push-downs, plan generations based upon the cost-based optimization, etc., which totally accounts for the COMPILATION_TIME.
QUEUED_PROVISIONING_TIME refers to Time (in milliseconds) spent in the warehouse queue, waiting for the warehouse compute resources to provision, due to warehouse creation, resume, or resize.
https://docs.snowflake.com/en/sql-reference/functions/query_history.html
To understand the reason behind the query taking long time recently in detail, the query ID needs to be analysed. You can raise a support case to Snowflake support with the problematic query ID to have the details checked.
When my cluster runs for a certain time(maybe a day, maybe two days), some queries may become very slow, about 2~10min to finish, when this happens, I need to restart the whole cluster, and the query become normal, but after some time, very slow queries happen again
The query response time depends on multiple factor including
1. Table size, if table size grows with time then response time will also increase
2. If it is open source version then time spent in GC pauses, which in turn will depend on number of objects/grabage present in the JVM Heap
3. Number of Concurrent queries being run
4. Amount of data overflown to disk
You will need to describe in detail your usage pattern of snappydata. Only then It would be possible to characterise the issue.
Some of the questions that should be answered are like
1. What is cluster size?
2. What are the table sizes?
3. Are writes happening continuously on the tables or only queries are getting executed?
You can engage us at slack channel to provide us informations related to your clusters.
I'm currently using the level2 orderbook channel via the GDAX WebSocket API. Quite recently a "time" field started appearing on the l2update JSON messages and this doesn't appear to be documented on the API reference pages. Some questions:
What does this time field represent and is it reliable enough to use? Is it message sending time from GDAX?
If it is sending time, I am occassionally seeing latencies of up to two minutes - is this expected?
Thanks!
I am playing with the L2 api's right now and have the same question. I'm seeing a range of timestamps from 4000ms delayed to -300ms (in the future).
The negative number makes me feel that the time can't be trusted. I attempted connecting from 2 different datacenters and from home and I can replicate both sides of the problem.
I've been using this field reliably for a couple of months assuming it is the time the order was received, and it was typically coming out as a pretty consistent 0.05 second lag in relation to my system time; however, in the last few days, it's been increasing - over 1 second yesterday and 2.02 seconds right now. I note (https://status.gdax.com/) maintenance was carried out on May 9th, but it was fine for me after that for a few days.
To answer the question more directly, no 2 minute latencies are not expected. I would check your system time is correct. Quick google brings up https://time.is/.
We have millions of documents in mongo we are looking to index on solr. Obviously when we do this the first time we need to index all the documents.
But after that, we should only need to index the documents as they change. What is the best way to do this? Should we call addDocument and then in cron call commit()? What does addDocument vs commit vs optimize do (I am using Apache_Solr_Service)
If you're using Solr 3.x you can forget the optimize, which merges all segments into one big segment. The commit makes changes visible to new IndexReaders; it's expensive, I wouldn't call it for each document you add. Instead of calling it through a cron, I'd use the autocommit in solrconfig.xml. You can tune the value depending on how much time you can wait to get new documents while searching.
The document won't actually be added to the index until you do commit() - it could be rolled back. optimize() will (ostensibly; I've not had particularly good luck with it) reduce the size of the index (documents that have been deleted still take up room unless the index is optimized).
If you set autocommit for your database, then you can be sure that any documents added to the database via update, have been committed, when the autocommit interval has passed. I have used a 5-minute interval and it works fine even when a few thousand updates happen within the 5 minutes. After a full reindex is complete, I wait 5 minutes and then tell people that it is done. In fact, when people ask how quickly updates get into the db, I will tell them that we poll for changes every minute, but that there are variables (such as a sudden big batch) and it is best to not expect things to be updated for 5 or 6 minutes. So far, nobody has really claimed a business need to have it update faster than that.
This is with a 350,000 record db totalling roughly 10G in RAM.
I'm developing a high score web service for my game, and it's running on Google App Engine.
My game has 5 difficulties, so I originally had 5 boards with entries for each (player_login, score and time). If the player submitted a lower score than the previously scored, it got dismissed, so only the highest score is kept for each player.
But to add more fun into this, I'd decided to include daily/weekly/monthly/yearly high score tables. So I've created 5 boards for each difficulty, making it 25 boards. When a score is submitted, it's saved into each board, and the boards are supposed to be cleared on every day/week/month/year.
This happens by a cron job that is invoked and deletes all entries from a specific board.
Here comes the problem: it looks like deleting entries from the datastore is slow. From my test daily cleanups it looks like deleting a single entry takes around 200 ms.
In the worst-case scenario, if the game would be quite popular and would have, say, 100 000 players, and each of them would have an entry in the yearly board, it would take 100 000 * 0.012 seconds = 12 000 seconds (3 hours!!) to clear that board. I think we are allowed to have jobs of up to 30 seconds in App Engine, so this wouldn't work.
I'm deleting with following code (thanks to Nick Johnson):
q = Score.all(keys_only=True).filter('b = ',boardToClear)
results = q.fetch(500)
while results:
self.response.out.write("deleting one batch;")
db.delete(results)
q = Score.all(keys_only=True).filter('b = ',boardToClear).with_cursor(q.cursor())
results = q.fetch(500)
What do you recommend me to do with this problem?
One approach that comes to my mind is to use a task queue and delete older scores than that are permitted in each board, i.e. which have expired, but in smaller quantities. This way I wouldn't hit the CPU limit for one task, but the cleanup would not be (nearly) instantaneous, so my 12 000 seconds long cleanup would be split into 1 200 tasks, each roughly 10 seconds long.
But I think that there is something that I'm doing wrong, this kind of operation would be a lot faster when done in relational database. Possibly something is wrong with my approach to the datastore and scoring, because being locked in RDBMS mindset.
First, a couple of small suggestions:
Does deletion take 200ms per item even when you delete items in a batch process? The fastest way to delete should be to do a keys_only query and then call db.delete() on an entire list of keys at once.
The 30-second limit was recently relaxed to 10 minutes for background work (like the cron jobs or queue tasks that you're contemplating) as of 1.4.0.
These may not fundamentally address your problem, though. I think there's no way to get around the fact that deleting a large number of records (hundreds of thousands, say), will take some time. I'm not sure that this is as big a problem for your use case though, as I can see a couple of techniques that would help.
As you suggest, use a task queue to split up a long-running tasks into several smaller tasks. Your use case (deleting a huge number of items that match a particular query) is ideal for a map-reduce task. Nick Johnson's blog post on the Mapper API may be very helpful for you (so that you don't have to write all of that task management code on your own).
Do you need to delete all the out-of-date board entries immediately? If you had a field that listed which week, month, or year that a particular entry counted for, you could index on that field and then only display entries from the current month on the visible leaderboard. (Disk space is cheap, after all.) And then if you wanted to slowly (over hours, say, instead of milliseconds) remove the out-of-date data, you could do that in the background without ever having incorrect data on your leaderboards.
Delete entities in batches. Although a single delete takes a noticeable amount of time (though 200ms seems very high), batch deletes take no longer, as they delete all the entities in parallel. Task Queue and cron jobs can now run for up to 10 minutes, so timeouts should not be an issue.