Plone site update runs ages now - database

I update a Plone instance from 4.0.3 to 4.3.11 and now the site update runs for about 16 hours. Shure, the webserver timouted after an hour or so, but the process is still running. Strace says:
select(12, [4 11], [], [4 11], {25, 609847}) = 0 (Timeout)
futex(0x1d01d30, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1d01d30, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1d01d30, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x1d01d30, FUTEX_WAKE_PRIVATE, 1) = 1
select(12, [4 11], [], [4 11], {30, 0}
while this line
select(12, [4 11], [], [4 11], {30, 0}
repeats very often and sometimes this occours:
futex(0x1d01d30, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
iostat is telling me that the Disk (new SSD) is utilized 10% max, but mostly it is idling around. It is also the system-disk, so I don't exclusively see plones Disk IO.
The Database for the site contains about 60.0000 Objects, mostly of the same type. They are very small objects with no fancy extra.
The machine has 16GB memory and 8 Cores. While only one core is performing the actual plone-upgrade (why?)
Does it really take this long to Upgrade the ZEO DB with 60.000 objects? How can I know, that he is really doing something? (strace is not very telling here).

The machine has 16GB memory and 8 Cores. While only one core is performing the actual plone-upgrade (why?)
Because only one thread (so one CPU) is running the upgrade.
Does it really take this long to Upgrade the ZEO DB with 60.000 objects?
It is not normal. Maybe you have some custom code which is making strange things. Are you connecting to other services (solr, other databases, ...)? Are you generating document previews? How big is your Data.fs and how many blobs do you have?
How can I know, that he is really doing something?
The first step for debugging it, is to know what is happening. Try to install https://pypi.python.org/pypi/Products.LongRequestLogger (or similar addons).
This is going to point out the point where you are stuck in.
If you have the instance running in foreground you can also have a traceback by sending the USR1 signal. See:
What's the modern way to solve Plone deadlock issues?
for a more complete insight.
webserver timeouted after an hour or so
This also sounds strange. If the webserver is apache or nginx, the time out should be in the minute range.
If you call directly the instance port, you should not have any timeout at all.
I suggest you to do so.
Also the instance logs (usually under $BUILDOUT_DIRECTORY/var/log/) should suggest you about the status of your upgrades.

Related

Apache Flink and Apache Pulsar

I am using Flink to read data from Apache Pulsar.
I have a partitioned topic in pulsar with 8 partitions.
I produced 1000 messages in this topic, distributed across the 8 partitions.
I have 8 cores in my laptop, so I have 8 sub-tasks (by default parallelism = # of cores).
I opened the Flink-UI after executing the code from Eclipse, I found that some sub-tasks are not receiving any records (idle).
I am expecting that all the 8 sub-tasks will be utilized (I am expecting that each sub-task will be mapped to one partition in my topic).
After restarting the job, I found that some times 3 sub-takes are utilized and some times 4 tasks are utilized while the remaining sub-tasks kept idle.
please your support to clarify this scenario.
Also how can I know that there is a shuffle between sub-takes or not?
My Code:
ConsumerConfigurationData<String> consumerConfigurationData = new ConsumerConfigurationData<>();
Set<String> topicsSet = new HashSet<>();
topicsSet.add("flink-08");
consumerConfigurationData.setTopicNames(topicsSet);
consumerConfigurationData.setSubscriptionName("my-sub0111");
consumerConfigurationData.setSubscriptionType(SubscriptionType.Key_Shared);
consumerConfigurationData.setConsumerName("consumer-01");
consumerConfigurationData.setSubscriptionInitialPosition(SubscriptionInitialPosition.Earliest);
PulsarSourceBuilder<String> builder = PulsarSourceBuilder.builder(new SimpleStringSchema()).pulsarAllConsumerConf(consumerConfigurationData).serviceUrl("pulsar://localhost:6650");
SourceFunction<String> src = builder.build();
DataStream<String> stream = env.addSource(src);
stream.print(" >>> ");
For the Pulsar question, I don't know enough to help. I recommend setting up a larger test and see how that turns out. Usually, you'd have more partitions than slots and have some slots consume several partitions in a somewhat random fashion.
Also how can I know that there is a shuffle between sub-takes or not?
The easiest way is to look at the topology of the Flink Web UI. There you should see the number of tasks and the channel types. You could post a screenshot if you want more details but in this case, there is nothing that will be shuffled, since you only have a source and a sink.

How does Raft deals with delayed replies in AppendEntries RPC?

I came up with one question when reading the Raft paper. The scenario is followed. There are 3 newly started Raft instances, R1, R2, R3. R1 is elected as leader, with nextIndex {1, 1, 1}, matchIndex {0, 0, 0} and term 1. Now it receives command 1 from the client and the logs of the instances are as follow:
R1: [index 0, command 0], [index 1, command 1]
R2: [index 0, command 0]
R3: [index 0, command 0]
What if the network is not reliable? If R1 is sending this log to R2 but the AppendEntries RPC times out, the leader R1 has to resend the [index 1, command 1] again. Then it may receive replies{term: 1, success: true} twice.
The paper says:
If last log index ≥ nextIndex for a follower: send AppendEntries RPC with log entries starting at nextIndex
• If successful: update nextIndex and matchIndex for follower (§5.3)
• If AppendEntries fails because of log inconsistency: decrement nextIndex and retry (§5.3)
So the leader R1 will increse nextIndex and matchIndex twice: nextIndex {1, 3, 1}, matchIndex {0, 2, 0}, which is not correct. When the leader sends the next AppendEntries RPC, i.e., a heartbeat or log replication, it can fix the nextIndex, but the matchIndex will never have a chance to be fixed.
My solution is to add a sequence number to both AppendEntries arguments and results for every single RPC calls. However, I was wondering if there is a way to solve this problem only with the arguments given by the paper, that is, without the sequence number.
Any advice will be appreciated and thank you in advance.
The protocol assumes that there’s some context with respect to which AppendEntries RPC a follower is responding to. So, at some level there does need to be a sequence number (or more accurately a correlation ID), whether that be at the protocol layer, messaging layer, or in the application itself. The leader has to have some way to correlate the request with the response to determine which index a follower is acknowledging.
But there’s actually an alternative to this that’s not often discussed. Some modifications of the Raft protocol have the followers send their last log index in responses. You could also use that last log index to determine which indexes have been persisted on the follower.

what could be the reason for high "SQL Server parse and compile time"?

a little background first:
i have a legacy application with security rules inside the app.
to reuse the database model with an addition app on this model with integreated security model inside! the database, i deside to use views with the security rules inside the view sql.the logic works well but the perf was not really good (high io cause by scan some/many tbls).so i use indexed views instead of standard views for the basic coles i need for the security and add additional a view on top of the index view with security rules. works perfect when i see the io perf.but now o have a poor parse and compile time.
when i erase all buffers a simple sql against the top view deliver this timings:
"SQL Server-Analyse- und Kompilierzeit:
, CPU-Zeit = 723 ms, verstrichene Zeit = 723 ms.
-----------
7
(1 Zeile(n) betroffen)
#A7F38F33-Tabelle. Scananzahl 1, logische Lesevorgänge 7, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.
xADSDocu-Tabelle. Scananzahl 1, logische Lesevorgänge 2, physische Lesevorgänge 0, Read-Ahead-Lesevorgänge 0, logische LOB-Lesevorgänge 0, physische LOB-Lesevorgänge 0, Read-Ahead-LOB-Lesevorgänge 0.
SQL Server-Ausführungszeiten:
, CPU-Zeit = 0 ms, verstrichene Zeit = 0 ms.
when i execute the same stmt again the parsetime is of course zero.
in past i see sometimes later with the same statement a very long parse time again when i reexecute it >1sec (no dml is done during this time!).now i have deactivate all statistics automatisations and never see this long parse times again.
but what could be the reason for so a long initial parse and compile time? this time is very huge and cause a very bad perf on the app itselve with this solution.
is there a way to look deeper inside the parse time to find the root cause for it?
the reason for poor compile time is the number of indexed views.
“The query optimizer may use indexed views to speed up the query execution. The view does not have to be referenced in the query for the optimizer to consider that view for a substitution.”
https://msdn.microsoft.com/en-us/library/ms191432(v=sql.120).aspx
this means, that the optimizer may check when parse a sql all! index from views.
i have a sample on my db, where i can see this behaviour, that a simple sql on base tables use the index from an indexed view.
as far as good, but when you reach a limit from about 500 idx the system escalate and the optimizer need at least more then 10 times more cpu and memory to calc the plan. this behaviour is nearly the same from version 2008 to 2014.

How to debug long-running mapreduce jobs with millions of writes?

I am using the simple control.start_map() function of the appengine-mapreduce library to start a mapreduce job. This job successfully completes and shows ~43M mapper-calls on the resulting /mapreduce/detail?mapreduce_id=<my_id> page. However, this page makes no mention of the reduce step or any of the underlying appengine-pipeline processes that I believe are still running. Is there some way to return the pipeline ID that this calls makes so I can look at the underlying pipelines to help debug this long-running job? I would like to retrieve enough information to pull up this page: /mapreduce/pipeline/status?root=<guid>
Here is an example of the code I am using to start up my mapreduce job originally:
from third_party.mapreduce import control
mapreduce_id = control.start_map(
name="Backfill",
handler_spec="mark_tos_accepted",
reader_spec=(
"third_party.mapreduce.input_readers.DatastoreInputReader"),
mapper_parameters={
"input_reader": {
"entity_kind": "ModelX"
},
},
shard_count=64,
queue_name="backfill-mapreduce-queue",
)
Here is the mapping function:
# This is where we keep our copy of appengine-mapreduce
from third_party.mapreduce import operation as op
def mark_tos_accepted(modelx):
# Skip users who have already been marked
if (not modelx
or modelx.tos_accepted == myglobals.LAST_MATERIAL_CHANGE_TO_TOS):
return
modelx.tos_accepted = user_models.LAST_MATERIAL_CHANGE_TO_TOS
yield op.db.Put(modelx)
Here are the relevant portions of the ModelX:
class BackupModel(db.Model):
backup_timestamp = db.DateTimeProperty(indexed=True, auto_now=True)
class ModelX(BackupModel):
tos_accepted = db.IntegerProperty(indexed=False, default=0)
For more context, I am trying to debug a problem I am seeing with writes showing up in our data warehouse.
On 3/23/2013, we launched a MapReduce job (let's call it A) over a db.Model (let's call it ModelX) with ~43M entities. 7 hours later, the job "finished" and the /mapreduce/detail page showed that we had successfully mapped over all of the entities, as shown below.
mapper-calls: 43613334 (1747.47/sec avg.)
On 3/31/2013, we launched another MapReduce job (let's call it B) over ModelX. 12 hours later, the job finished with status Success and the /mapreduce/detail page showed that we had successfully mapped over all of the entities, as shown below.
mapper-calls: 43803632 (964.24/sec avg.)
I know that MR job A wrote to all ModelX entities, since we introduced a new property that none of the entities contained before. The ModelX contains an auto_add property like so.
backup_timestamp = ndb.DateTimeProperty(indexed=True, auto_now=True)
Our data warehousing process runs a query over ModelX to find those entities that changed on a certain day and then downloads those entities and stores them in a separate (AWS) database so that we can run analysis over them. An example of this query is:
db.GqlQuery('select * from ModelX where backup_timestamp >= DATETIME(2013, 4, 10, 0, 0, 0) and backup_timestamp < DATETIME(2013, 4, 11, 0, 0, 0) order by backup_timestamp')
I would expect that our data warehouse would have ~43M entities on each of the days that the MR jobs completed, but it is actually more like ~3M, with each subsequent day showing an increase, as shown in this progression:
3/16/13 230751
3/17/13 193316
3/18/13 344114
3/19/13 437790
3/20/13 443850
3/21/13 640560
3/22/13 612143
3/23/13 547817
3/24/13 2317784 // Why isn't this ~43M ?
3/25/13 3701792 // Why didn't this go down to ~500K again?
3/26/13 4166678
3/27/13 3513732
3/28/13 3652571
This makes me think that although the op.db.Put() calls issued by the mapreduce job are still running in some pipeline or queue and causing this trickle effect.
Furthermore, if I query for entities with an old backup_timestamp, I can go back pretty far and still get plenty of entities, but I would expect all of these queries to return 0:
In [4]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2013,2,23,1,1,1)').count()
Out[4]: 1000L
In [5]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2013,1,23,1,1,1)').count()
Out[5]: 1000L
In [6]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2012,1,23,1,1,1)').count()
Out[6]: 1000L
However, there is this strange behavior where the query returns entities that it should not:
In [8]: old = ModelX.all().filter('backup_timestamp <', 'DATETIME(2012,1,1,1,1,1)')
In [9]: paste
for o in old[1:100]:
print o.backup_timestamp
## -- End pasted text --
2013-03-22 22:56:03.877840
2013-03-22 22:56:18.149020
2013-03-22 22:56:19.288400
2013-03-22 22:56:31.412290
2013-03-22 22:58:37.710790
2013-03-22 22:59:14.144200
2013-03-22 22:59:41.396550
2013-03-22 22:59:46.482890
2013-03-22 22:59:46.703210
2013-03-22 22:59:57.525220
2013-03-22 23:00:03.864200
2013-03-22 23:00:18.040840
2013-03-22 23:00:39.636020
Which makes me think that the index is just taking a long time to be updated.
I have also graphed the number of entities that our data warehousing downloads and am noticing some cliff-like drops that makes me think that there is some behind-the-scenes throttling going on somewhere that I cannot see with any of the diagnostic tools exposed on the appengine dashboard. For example, this graph shows a fairly large spike on 3/23, when we started the mapreduce job, but then a dramatic fall shortly thereafter.
This graph shows the count of entities returned by the BackupTimestamp GqlQuery for each 10-minute interval for each day. Note that the purple line shows a huge spike as the MapReduce job spins up, and then a dramatic fall ~1hr later as the throttling kicks in. This graph also shows that there seems to be some time-based throttling going on.
I don't think you'll have any reducer functions there, because all you've done is start a mapper. To do a complete mapreduce, you have to explicitly instantiate a MapReducePipeline and call start on it. As a bonus, that answers your question, as it returns the pipeline ID which you can then use in the status URL.
Just trying to understand the specific problem. Is it that you are expecting a bigger number of entities in your AWS database? I would suspect that the problem lies with the process that downloads your old ModelX entities into an AWS database, that it's somehow not catching all the updated entities.
Is the AWS-downloading process modifying ModelX in any way? If not, then why would you be surprised at finding entities with an old modified time stamp? modified would only be updated on writes, not on read operations.
Kind of unrelated - with respect to throttling I've usually found a throttled task queue to be the problem, so maybe check how old your tasks in there are or if your app is being throttled due to a large amount of errors incurred somewhere else.
control.start_map doesn't use pipeline and has no shuffle/reduce step. When the mapreduce status page shows its finished, all mapreduce related taskqueue tasks should have finished. You can examine your queue or even pause it.
I suspect there are problems related to old indexes for the old Model or to eventual consistency. To debug MR, it is useful to filter your warnings/errors log and search by the mr id. To help with your particular case, it might be useful to see your Map handler.

Profiling Mnesia Queries

Our Mnesia DB is running slowly and we think it should be somewhat faster.
So we need to profile it and work out what is happening.
There are a number of options that suggest themselves:
run fprof and see where the time is going
run cprof and see which functions are called a lot
However these are both fairly standard performance monitoring style tools. The question is how do I actually do query profiling - which queries are taking the longest times. If we were an Oracle or MySQL shop we would just run a query profiler which would return the sorts of queries that were taking a long time to run. This is not a tool that appears to be available for Mnesia.
So the question is:
what techniques exist to profile Mnesia
what tools exist to profile Mnesia - none I think, but prove me wrong :)
how did you profile your queries and optimise your mnesia database installation
Expanded In The Light Of Discussion
One of the problems with fprof as a profiling tool is that it only tells you about the particular query you are looking at. So fprof tells me that X is slow and I tweak it down to speed it up. Then, low and behold, operation Y (which was fast enough) is now dog slow. So I profile up Y and realise that the way to make Y quick is to make X slow. So I end up doing a series of bilateral trade-offs...
What I actually need is a way to manage multilateral trade-offs. I now have 2 metric shed-loads of actual user activities logged which I can replay. These logs represent what I would like to optimize.
A 'proper' query analyser on an SQL database would be able to profile the structure of SQL statements, eg all statements with the form:
SELECT [fieldset] FROM [table] WHERE {field = *parameter*}, {field = *parameter*}
and say 285 queries of this form took on average 0.37ms to run
They magic answers are when it says: 17 queries of this form took 6.34s to run and did a full table scan on table X, you should put an index on field Y
When I have a result set like this over a representative set of user-activities I can then start to reason about trade-offs in the round - and design a test pattern.
The test pattern would be something like:
activity X would make queries A, C
and C faster but queries E and F
slower
test and measure
then approve/disapprove
I have been using Erlang long enough to 'know' that there is no query analyser like this, what I would like to know is how other people (who must have had this problem) 'reason' about mnesia optimization.
I hung back because I don't know much about either Erlang or Mnesia, but I know a lot about performance tuning, and from the discussion so far it sounds pretty typical.
These tools fprof etc. sound like most tools that get their fundamental approach from gprof, namely instrumenting functions, counting invocations, sampling the program counter, etc. Few people have examined the foundations of that practice for a long time. Your frustrations sound typical for users of tools like that.
There's a method that is less-known that you might consider, outlined here. It is based on taking a small number (10-20) of samples of the state of the program at random times, and understanding each one, rather than summarizing. Typically, this means examining the call stack, but you may want to examine other information as well. There are different ways to do this, but I just use the pause button in a debugger. I'm not trying to get precise timing or invocation counts. Those are indirect clues at best. Instead I ask of each sample "What is it doing and why?" If I find that it is doing some particular activity, such as performing the X query where it's looking for y type answer for the purpose z, and it's doing it on more than one sample, then the fraction of samples it's doing it on is a rough but reliable estimate of what fraction of the time it is doing that. Chances are good that it is something I can do something about, and get a good speedup.
Here's a case study of the use of the method.
Since Mnesia queries are just erlang functions I would imagine you can profile them the same way you would profile your own erlang code. http://www.erlang.org/doc/efficiency_guide/profiling.html#id2266192 has more information on the erlang profiling tools available.
Update As a test I ran this at home on a test mnesia instance and using fprof to trace an mnesia qlc query returned output a sample of which I'm including below. So it definitely includes more information than just the query call.
....
{[{{erl_lint,pack_errors,1}, 2, 0.004, 0.004}],
{ {lists,map,2}, 2, 0.004, 0.004}, %
[ ]}.
{[{{mnesia_tm,arrange,3}, 1, 0.004, 0.004}],
{ {ets,first,1}, 1, 0.004, 0.004}, %
[ ]}.
{[{{erl_lint,check_remote_function,5}, 2, 0.004, 0.004}],
{ {erl_lint,check_qlc_hrl,5}, 2, 0.004, 0.004}, %
[ ]}.
{[{{mnesia_tm,multi_commit,4}, 1, 0.003, 0.003}],
{ {mnesia_locker,release_tid,1}, 1, 0.003, 0.003}, %
[ ]}.
{[{{mnesia,add_written_match,4}, 1, 0.003, 0.003}],
{ {mnesia,add_match,3}, 1, 0.003, 0.003}, %
[ ]}.
{[{{mnesia_tm,execute_transaction,5}, 1, 0.003, 0.003}],
{ {erlang,erase,1}, 1, 0.003, 0.003}, %
[ ]}.
{[{{mnesia_tm,intercept_friends,2}, 1, 0.002, 0.002}],
{ {mnesia_tm,intercept_best_friend,2}, 1, 0.002, 0.002}, %
[ ]}.
{[{{mnesia_tm,execute_transaction,5}, 1, 0.002, 0.002}],
{ {mnesia_tm,flush_downs,0}, 1, 0.002, 0.002}, %
[ ]}.
{[{{mnesia_locker,rlock_get_reply,4}, 1, 0.002, 0.002}],
{ {mnesia_locker,opt_lookup_in_client,3}, 1, 0.002, 0.002}, %
[ ]}.
{[ ],
{ undefined, 0, 0.000, 0.000}, %
[{{shell,eval_exprs,6}, 0, 18.531, 0.000},
{{shell,exprs,6}, 0, 0.102, 0.024},
{{fprof,just_call,2}, 0, 0.034, 0.027}]}.
Mike Dunlavey's suggestion reminds me about redbug that allow you to sample calls in production systems. Think of it as an easy-to-use erlang:trace that doesnt give you enough rope to hang your production system.
Using something like this call should give you lots of stack traces to identify where your mnesia transactions are called from:
redbug:start(10000,100,{mnesia,transaction,[stack]}).
Its not possible to get call duration for these traces though.
If you have organized all mnesia lookups into modules that export an api to perform them, you could also use redbug to get a call-frequency on specific queries only.

Resources