Error: This has disconnected the Server from Cluster and stopped the Distributed-Connector Services. Please help
[2021-12-08 14:07:20,484] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-12-08 14:17:20,484] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-12-08 14:24:23,198] INFO [GroupCoordinator 0]: Member connect-1-37b915a1-36f0-47df-81f0-f5b67985f278 in group connect-cluster has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2021-12-08 14:24:23,200] INFO [GroupCoordinator 0]: Preparing to rebalance group connect-cluster in state PreparingRebalance with old generation 18 (__consumer_offsets-13) (reason: removing member connect-1-37b915a1-36f0-47df-81f0-f5b67985f278 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2021-12-08 14:24:25,680] INFO [GroupCoordinator 0]: Stabilized group connect-cluster generation 19 (__consumer_offsets-13) (kafka.coordinator.group.GroupCoordinator)
[2021-12-08 14:24:25,717] INFO [GroupCoordinator 0]: Assignment received from leader for group connect-cluster for generation 19 (kafka.coordinator.group.GroupCoordinator)
[2021-12-08 14:27:20,484] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-12-08 14:37:20,484] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-12-08 14:47:20,484] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2021-12-08 14:57:20,484] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
Related
One of my production machines (SQL Server Express 2012) has not been performing too well. I started running the WhoIsActive script and I've been getting a lot of these wait types:
(10871ms)PREEMPTIVE_OS_AUTHZINITIALIZECON
They always occur when calling a function that checks certain user privileges. If I understand this correctly, the function had to wait almost 11 seconds for the Windows function AuthzInitializeContextFromSid (see https://www.sqlskills.com/help/waits/preemptive_os_authzinitializecontextfromsid/).
Am I correct in my assumption? (full output below)
I couldn't find any info online about this wait type going hayrwire. What could be causing this?
Full output:
00 00:00:10.876 75
<?query --
select #RetValue = ([dbo].[Users_IsMember]('some_role_name', #windowsUserName)
| is_srvrolemember('SysAdmin', #windowsUserName))
--?>
<?query --
MyDB.dbo.StoredProcName;1
--?>
DOMAIN\User (10871ms)PREEMPTIVE_OS_AUTHZINITIALIZECON master: 0 (0 kB),tempdb: 0 (0 kB),MyDB: 0 (0 kB) 10,875 0 0 NULL 93 0 0 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.5" Build="11.0.7001.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="select #RetValue = ([dbo].[Users_IsMember]('some_role_name', #windowsUserName)
| is_srvrolemember('SysAdmin', #windowsUserName))
" StatementId="1" StatementCompId="49" StatementType="ASSIGN WITH QUERY" RetrievedFromCache="true" /></Statements></Batch></BatchSequence></ShowPlanXML> 3 runnable NULL 0 NULL ServerName AppName .Net SqlClient Data Provider 2018-12-17 09:29:35.413 2018-12-17 09:29:35.413 0 2018-12-17 09:29:46.447
In my experience the PREEMPTIVE_OS wait stats are related to an imbalance between how much memory is allocated to SQL Server vs. the Windows OS itself.
In this case, the OS is being starved for memory resources. You might try either adding more total memory to the box, or ensuring that SQL Server is configured to only use 80 percent of the total memory installed on the instance. Or both.
Note - this is not a blanket statement on how to configure memory for SQL server, but rather a good place to start with tuning for PREEMPTIVE_OS related wait types.
I am using Solr 4.10.4, but I am in trouble understanding the meaning of the log.
In the log of path =/update, what does the meaning of the two numbers printed at the end of the line mean?
ex)
2018-12-03 20: 0 6: 30.969; org.apache.solr.update.processor.LogUpdateProcessor; [sample_index] webapp=/solr path=/ update params={version=2.2} {commit=} 0 13500
what's mean '0 13500' ?
The first number is the status of the request - 0 means that everything went as planned. The second number is the QTime - i.e. how long time the query took. In this case the commit took 13.5 seconds.
running DSE 4.8.10 - I have 3 DSE Search nodes in my cluster, RF=3. I'm seeing some messages in system.log like those below. It seems they always come after a compaction. Is there a problem with the solr indexes or is there at least an explanation of these messages?
INFO [CompactionExecutor:12] 2016-11-14 23:09:31,243 CompactionTask.java:274 - Compacted 4 sstables to [/data/lib/cassandra/data/system/local/system-local-ka-13314,]. 1,564 bytes to 1,378 (~88% of original) in 17ms = 0.077304MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, }
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,008 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,053 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,144 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,187 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,230 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,270 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,311 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,353 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,395 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,436 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,478 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,519 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,559 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,600 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,640 AbstractSolrSecondaryIndex.java:1689 - Found 200 rows with expired columns.
INFO [Solr TTL scheduler-0] 2016-11-14 23:09:36,681 AbstractSolrSecondaryIndex.java:1689 - Found 31 rows with expired columns.
I am assuming you have some TTL set on your data.
If you want to expire data in a cassandra, you don’t have much choice: you need a periodic task that somehow finds expired data and removes it. With lots of data, keeping this efficient can be a challenge. Cassandra actually includes a great opportunity for that kind of job: compaction. Compaction already goes through your data periodically, throwing away old versions of your data, so it is really easy and cheap to use it for data expiration.
This might be the reason why you see those messages only after compaction.
You can read more here :
http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns
I am using Hive (version 0.11.0) and trying to join two tables.
One has 26,286,629 records and the other one has 931 records.
This is the query I am trying to run.
SELECT *
FROM table_1 hapmap a tabl_2 b
WHERE a.chrom = b.chrom
AND a.start_pos >= b.start_pos
AND a.end_pos <= b.end_pos
LIMIT 10
;
It looks fine at the first few minutes but if the map and reduce task reached to 100% both, then it starts to reduce again from 0%.
2014-02-20 20:38:35,531 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 399.23 sec
2014-02-20 20:38:36,533 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 399.23 sec
2014-02-20 20:38:37,536 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 399.23 sec
2014-02-20 20:38:38,539 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 399.23 sec
2014-02-20 20:38:39,541 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 192.49 sec
2014-02-20 20:38:40,544 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 192.49 sec
2014-02-20 20:38:41,547 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 192.49 sec
2014-02-20 20:38:42,550 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 192.49 sec
2014-02-20 20:38:43,554 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 192.49 sec
Here are the last 4KB logs of map and reduce task.
Map Task Log (Last 4KB)
.hadoop.mapred.MapTask: bufstart = 99180857; bufend = 16035989; bufvoid = 99614720
2014-02-20 19:57:03,008 INFO org.apache.hadoop.mapred.MapTask: kvstart = 196599; kvend = 131062; length = 327680
2014-02-20 19:57:03,180 INFO org.apache.hadoop.mapred.MapTask: Finished spill 12
2014-02-20 19:57:04,244 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2014-02-20 19:57:04,244 INFO org.apache.hadoop.mapred.MapTask: bufstart = 16035989; bufend = 32544041; bufvoid = 99614720
2014-02-20 19:57:04,244 INFO org.apache.hadoop.mapred.MapTask: kvstart = 131062; kvend = 65525; length = 327680
2014-02-20 19:57:04,399 INFO org.apache.hadoop.mapred.MapTask: Finished spill 13
2014-02-20 19:57:05,440 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2014-02-20 19:57:05,440 INFO org.apache.hadoop.mapred.MapTask: bufstart = 32544041; bufend = 48952648; bufvoid = 99614720
2014-02-20 19:57:05,440 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65525; kvend = 327669; length = 327680
2014-02-20 19:57:05,598 INFO org.apache.hadoop.mapred.MapTask: Finished spill 14
2014-02-20 19:57:05,767 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 9 forwarding 4000000 rows
2014-02-20 19:57:05,767 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 4000000 rows
2014-02-20 19:57:05,767 INFO ExecMapper: ExecMapper: processing 4000000 rows: used memory = 123701072
2014-02-20 19:57:06,562 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 9 finished. closing...
2014-02-20 19:57:06,574 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 9 forwarded 4182243 rows
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 2 finished. closing...
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 2 forwarded 0 rows
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 3 finished. closing...
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 3 forwarded 0 rows
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 2 Close done
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 4182243 rows
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 1 finished. closing...
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 1 forwarded 0 rows
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2014-02-20 19:57:06,575 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 9 Close done
2014-02-20 19:57:06,575 INFO ExecMapper: ExecMapper: processed 4182243 rows: used memory = 128772720
2014-02-20 19:57:06,577 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2014-02-20 19:57:06,713 INFO org.apache.hadoop.mapred.MapTask: Finished spill 15
2014-02-20 19:57:06,720 INFO org.apache.hadoop.mapred.Merger: Merging 16 sorted segments
2014-02-20 19:57:06,730 INFO org.apache.hadoop.mapred.Merger: Merging 7 intermediate segments out of a total of 16
2014-02-20 19:57:08,308 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 272242546 bytes
2014-02-20 19:57:11,762 INFO org.apache.hadoop.mapred.Task: Task:attempt_201402201604_0005_m_000000_0 is done. And is in the process of commiting
2014-02-20 19:57:11,834 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201402201604_0005_m_000000_0' done.
2014-02-20 19:57:11,837 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-02-20 19:57:11,868 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2014-02-20 19:57:11,869 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName root for UID 0 from the native implementation
Reduce Task Log (Last 4KB)
x.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
2014-02-20 20:11:32,743 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
2014-02-20 20:11:32,743 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0" - Aborting...
2014-02-20 20:11:32,744 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0
org.apache.hadoop.ipc.RemoteException:org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0 File does not exist. Holder DFSClient_attempt_201402201604_0005_r_000000_0_636140892_1 does not have any open files
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1999)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1990)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1899)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
Namenode log at (2014-02-20 20:11:32)
2014-02-20 20:01:28,769 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 93 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 46 SyncTimes(ms): 29
2014-02-20 20:11:32,984 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 94 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 47 SyncTimes(ms): 30
2014-02-20 20:11:32,987 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0 File does not exist. Holder DFSClient_attempt_201402201604_0005_r_000000_0_636140892_1 does not have any open files
2014-02-20 20:11:32,987 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000, call addBlock(/tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0, DFSClient_attempt_201402201604_0005_r_000000_0_636140892_1, null) from 172.27.250.92:55640: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0 File does not exist. Holder DFSClient_attempt_201402201604_0005_r_000000_0_636140892_1 does not have any open files
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-root/hive_2014-02-20_19-56-40_541_484791785779427461/_task_tmp.-ext-10001/_tmp.000000_0 File does not exist. Holder DFSClient_attempt_201402201604_0005_r_000000_0_636140892_1 does not have any open files
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1999)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1990)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1899)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
2014-02-20 20:14:48,995 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 96 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 48 SyncTimes(ms): 31
2014-02-20 20:18:28,063 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 172.27.114.218
2014-02-20 20:18:28,064 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 96 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 49 SyncTimes(ms): 32
Anyone can help me?
Thanks in advance.
Hive 11 is not able to infer JOIN conditions from the WHERE clause. The query as you wrote it will be executed as a cross product join and then filtered based on the where conditions. This is extremely expensive and you should use this instead:
SELECT *
FROM table_1 hapmap a JOIN tabl_2 b ON a.chrom = b.chrom
WHERE
a.start_pos >= b.start_pos
AND a.end_pos <= b.end_pos
LIMIT 10;
Based on what you've said, this should be executed as a mapjoin and will be much faster.
Single-threaded version description:
Program gathers a list of questions.
For each question, get model answers, and run each one through a scoring module.
Scoring module makes a number of (read-only) database queries.
Serial processing, single database connection.
I decided to multi-thread the above described program by splitting the question list into chunks and creating a thread for each one.
Each thread opens it's own database connection and works on it's own list of questions (about 95 questions on each of 6 threads). The application waits for all threads to finish, then aggregates the results for display.
To my surprise, the multi-threaded version ran in approximately the same time, taking about 16 seconds instead of 17.
Questions:
Why am I not seeing the kind of gain in performance I would expect from executing queries concurrently on separate threads with separate connections? Machine has 8 processors.
Will SQL Server process queries concurrently when they are coming from a single application, or might it (or .net itself) be serializing them?
Might there be something misconfigured, that would make it go faster, or might I just be pushing SQL Server to its computational limits?
Current configuration:
Microsoft SQL Server Developer Edition 9.0.1406 RTM
OS: Windows Server 2003 Standard
Processors: 8
RAM: 4GB
This is just a shot in the dark, but I bet you are not seeing the performance gain because they serialize themselves in the database due to locking of shared resources (records). Now for the small print.
I assume your C# code is actually correct and you actually do start separate threads and issue each query in parallel. No offense, but I've seen many making that claim and the code being actually serial in the client, for various reasons. You should validate this by monitoring the server (via Profiler, or use the sys.dm_exec_requests and sys.dm_exec_sessions).
Also I assume that your queries are of similar weight. i.e., you do not have one thread that lasts 15 seconds and 5 that 100 ms.
The symptoms you describe, in lack of more details, would point that you have a write operation at the beginning of each thread that takes an X lock on some resource. First thread starts and locks the resource, other 5 wait. 1st thread is done, releases the resource then the next one grabs it, other 4 wait. So last thread has to wait for the execution of all other 5. This would be extremely easy to troubleshoot by looking at sys.dm_exec_requests and monitor what blocks the requests.
BTW you should consider using Asynchronous Processing=true and rely on the async methods like BeginExecuteReader to launch your commands in execution in parallel w/o the overhead of client side threads.
You can simply check the task manager when the process is running. If it's showing 100% CPU usage then its CPU bound. Otherwise its IO Bound.
For hyperthreading 50% CPU usage is roughly equal to 100% usage!
Wow I didn't realize how old the thread was. I guess its always good to leave the response for others looking.
How large is your database?
How fast are your HDDs / Raid / Other storage
Perhaps your DB is I/O bound?
My first inclination is that you're trying to solve an IO problem with threads, which almost never works. IO is IO, and more threads doesn't increase the pipe. You'd be better off downloading all questions and their answers in one batch and processing the batch locally with multiple threads.
Having said that, you're probably experiencing some db locking that is causing slowness. Since you're talking about read-only queries, try using the with (nolock) hint on your queries to see if that helps.
Regarding SQL server processing, it is my understanding that SQL Server will try to process as many connections concurrently as possible (one statement at a time per connection), up to the max connections allowed by configuration. The kind if issue you're seeing is almost never a thread issue and almost always a locking or IO problem.
is it possible that the the threads share a connection? did you verify that multiple SPIDs are created when this runs (sp_who)?
I ran a join query across sys.dm_os_workers, sys.dm_os_tasks, and sys.dm_exec_requests on task_address, and here are the results (some uninteresting/zero-valued fields excluded, others prefixed with ex or os to resolve ambiguities):
-COL_NAME- -Thread_1- -Thread_2- -Thread_3- -Thread_4-
task_state SUSPENDED SUSPENDED SUSPENDED SUSPENDED
context_switches_count 2 2 2 2
worker_address 0x3F87A0E8 0x5993E0E8 0x496C00E8 0x366FA0E8
is_in_polling_io_completion_routine 0 0 0 0
pending_io_count 0 0 0 0
pending_io_byte_count 0 0 0 0
pending_io_byte_average 0 0 0 0
wait_started_ms_ticks 1926478171 1926478187 1926478171 1926478187
wait_resumed_ms_ticks 1926478171 1926478187 1926478171 1926478187
task_bound_ms_ticks 1926478171 1926478171 1926478156 1926478171
worker_created_ms_ticks 1926137937 1923739218 1921736640 1926137890
locale 1033 1033 1033 1033
affinity 1 4 8 32
state SUSPENDED SUSPENDED SUSPENDED SUSPENDED
start_quantum 3074730327955210 3074730349757920 3074730321989030 3074730355017750
end_quantum 3074730334339210 3074730356141920 3074730328373030 3074730361401750
quantum_used 6725 11177 11336 6284
max_quantum 4 15 5 20
boost_count 999 999 999 999
tasks_processed_count 765 1939 1424 314
os.task_address 0x006E8A78 0x00AF12E8 0x00B84C58 0x00D2CB68
memory_object_address 0x3F87A040 0x5993E040 0x496C0040 0x366FA040
thread_address 0x7FF08E38 0x7FF8CE38 0x7FF0FE38 0x7FF92E38
signal_worker_address 0x4D7DC0E8 0x571360E8 0x2F8560E8 0x4A9B40E8
scheduler_address 0x006EC040 0x00AF4040 0x00B88040 0x00E40040
os.request_id 0 0 0 0
start_time 2009-05-26 19:39 39:43.2 39:43.2 39:43.2
ex.status suspended suspended suspended suspended
command SELECT SELECT SELECT SELECT
sql_handle 0x020000009355F1004BDC90A51664F9174D245A966E276C61 0x020000009355F1004D8095D234D39F77117E1BBBF8108B26 0x020000009355F100FC902C84A97133874FBE4CA6614C80E5 0x020000009355F100FC902C84A97133874FBE4CA6614C80E5
statement_start_offset 94 94 94 94
statement_end_offset -1 -1 -1 -1
plan_handle 0x060007009355F100B821C414000000000000000000000000 0x060007009355F100B8811331000000000000000000000000 0x060007009355F100B801B259000000000000000000000000 0x060007009355F100B801B259000000000000000000000000
database_id 7 7 7 7
user_id 1 1 1 1
connection_id BABF5455-409B-4F4C-9BA5-B53B35B11062 A2BBCACF-D227-466A-AB08-6EBB56F34FF2 D330EDFE-D49B-4148-B7C5-8D26FE276D30 649F0EC5-CB97-4B37-8D4E-85761847B403
blocking_session_id 0 0 0 0
wait_type CXPACKET CXPACKET CXPACKET CXPACKET
wait_time 46 31 46 31
ex.last_wait_type CXPACKET CXPACKET CXPACKET CXPACKET
wait_resource
open_transaction_count 0 0 0 0
open_resultset_count 1 1 1 1
transaction_id 3052202 3052211 3052196 3052216
context_info 0x 0x 0x 0x
percent_complete 0 0 0 0
estimated_completion_time 0 0 0 0
cpu_time 0 0 0 0
total_elapsed_time 54 41 65 39
reads 0 0 0 0
writes 0 0 0 0
logical_reads 78745 123090 78672 111966
text_size 2147483647 2147483647 2147483647 2147483647
arithabort 0 0 0 0
transaction_isolation_level 2 2 2 2
lock_timeout -1 -1 -1 -1
deadlock_priority 0 0 0 0
row_count 6 0 1 1
prev_error 0 0 0 0
nest_level 2 2 2 2
granted_query_memory 512 512 512 512
The query plan predictor for all queries shows a couple nodes, 0% for select, and 100% for a clustered index seek.
Edit: The fields and values I left out where (same for all 4 threads, except for context_switch_count): exec_context_id(0), host_address(0x00000000), status(0), is_preemptive(0), is_fiber(0), is_sick(0), is_in_cc_exception(0), is_fatal_exception(0), is_inside_catch(0), context_switch_count(3-89078), exception_num(0), exception_Severity(0), exception_address(0x00000000), return_code(0), fiber_address(NULL), language(us_english), date_format(mdy), date_first(7), quoted_identifier(1), ansi_defaults(0), ansi_warnings(1), ansi_padding(1), ansi_nulls(1), concat_null_yields_null(1), executing_managed_code(0)