SymmetricDS Could Not Find Batch to Acknowledge as OK - symmetricds

Im doing bi-directional push on 3 tier nodes.
Why on 1st and 2nd tier nodes are spamming error like this:
1st tier node is logging error:
"IncomingBatchService - Skipping batch x"
"DataLoaderService - x data and x batches loaded during push request from 2nd tier. There were x batches in error."
2nd tier node is logging error:
"PushService - Push data sent to 3rd tier"
"AcknowledgeService - Could not find batch to acknowledge as OK"
"PushService - Pushed data to 3rd tier. x data and x batches were processed"
After checking DBs:
On 2nd tier node the batch is pointed to 3rd tier node with LD status and reload channel. No batch in same id that pointed to 1st tier node
On 1st tier node the batch is pointed to 2nd tier node with OK status an reload channel
Help, thank you.

there must be logs on target nodes with exceptions thrown by data loader trying to load batches in error. find them and they'll tell what's wrong
there's a mistake in the 3rd tier node. sync.url should be http://<3rd_tier_node_IP>/sync/<engine.name>

Related

TDengine database 2.6 cluster failed to build

The first node had data. Now add the second node tdengine-server-b, but the status not received cannot be added.
I want to add the second node tdengine-server-b

batch query is not allowed to request data from "".""

I'm getting started with Kapacitor and have been trying to run the first guide in the Kapacitor documentation, but with data I already have. I managed to define a task, but I can neither enable it nor can I run a backfill. I came across this question, which is similar to my problem, but the answer there didn't help. In contrast to the error message there I get empty strings for database, retention policy, and/or measurement.
In Kapacitor config I set an InfluxDB connection to the local host instance with the name localhost (which has a database mydb and the measurements weather.current.clouds and weather.current.visibility with default retention policy autogen) and created the following weathertest.tick script:
dbrp "mydb"."autogen"
var clouds = batch
|query('select mean(value) / 100.0 as val from "mydb"."autogen"."weather.current.clouds"')
.period(1h)
.every(1h)
.groupBy(time(1m), *)
.fill(0)
var vis = batch
|query('select mean(value) / 10000.0 as val from "mydb"."autogen"."weather.current.visibility"')
.period(1h)
.every(1h)
.groupBy(time(1m), *)
.fill(0)
clouds
|join(vis)
.as('c', 'v')
|eval(lambda: 100 * (1 - "c.val") * "v.val")
.as('pcent')
|influxDBOut()
.cluster('localhost')
.database('mydb')
.retentionPolicy('autogen')
.measurement('testmetric')
.tag('host', 'myhost.local')
.tag('key', 'weather.current.lightidx')
This is what I came up with after hours of trial and (especially) error. As given in the title, when I try to enable my task with kapacitor enable weathertest, I get the error message enabling task weathertest: batch query is not allowed to request data from ""."". Same thing when I try to record as in the "Backfill" example. Also, in that example there is a start and a stop date for limiting the time frame. The time format given there is wrong and is not understood by Kapacitor. Instead of e. g. 2015-10-01 I have to put in 2015-10-01T00:00Z to make it at least pass the error message regarding time format error.
In the Kapacitor logs there is not a single line regarding these errors, only when I try to remove a record, I get something like remove /var/lib/kapacitor/replay/1f5...750.brpl: no such file or directory and this can be found in the logs. There are lots of info lines in the logs showing successful POSTs to/from InfluxDB for the _internal database with HTTP response result 204.
Has anyone an Idea what I may be doing wrong?
OK, after the weekend I tried again. Without any change it accepted my script now in the failing steps, however, now I was able to find error messages in the log. The node mentioned there was the eval node and pointed towards a type mismatch. When I changed the line
|eval(lambda: 100 * (1 - "c.val") * "v.val")
to
|eval(lambda: 100.0 * (1.0 - "c.val") * "v.val")
the error messages were gone and the command kapacitor show weathertest showed a rather sane content now.
Furthermore, I redefined, recorded, replayed and deleted the tasks and recordings during my tests over and over again and I may have forgotten to redefine tasks after making changes to the tick script (not really sure). After changing the above, redefining the task and replaying it I finally found the expected data in the InfluxDB instance.

Solr recovery mode

I am running Solr cluster 7.4 with 2 nodes and 9 shards and 2 replicas for each shard.
When one of the servers crashes, I see this message (Skipping download for _3nap.fnm because it already exists) in logs:
2019-04-16 09:20:21.333 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr
x:telegram_channel_post_archive_shard5_replica_n53
c:telegram_channel_post_archive s:shard5 r:core_node54)
[c:telegram_channel_post_archive s:shard5 r:core_node54
x:telegram_channel_post_archive_shard5_replica_n53]
o.a.s.h.IndexFetcher Skipping download for _3nap.fnm because it already exists
2019-04-16 09:20:35.265 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.dim because it already exists
2019-04-16 09:20:51.437 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.si because it already exists
2019-04-16 09:21:00.528 INFO (qtp1543148593-32) [c:telegram_channel_post_archive s:shard20 r:core_node41 x:telegram_channel_post_archive_shard20_replica_n38] o.a.s.u.p.LogUpdateProcessorFactory [telegram_channel_post_archive_shard20_replica_n38] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=dedupe&distrib.from=http://192.168.1.1:4239/solr/telegram_channel_post_archive_shard20_replica_n83/&min_rf=2&wt=javabin&version=2}{add=[9734588300_4723 (1630961769251864576), 9734588300_4693 (1630961769253961728), 9734588300_4670 (1630961769255010304), 9734588300_4656 (1630961769255010305)]} 0 80197
How is the recovery method in Solar?
Will they transfer all the documents from the shard or only the broken parts?
I found this note in the document:
If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If a replica is too far out of sync, the system asks for a full replication/replay-based recovery.
but I don't understand this part and what does this mean?
If a replica is too far out of sync
The note just says that it'll attempt to sync as little as possible, but if that's not possible - i.e. the sync is so far behind that the transaction log isn't usable any longer, the complete set of files in the index will be replicated to the index. This takes longer than regular replication.
The message you're getting is that the file in question has already been replicated, so it doesn't have to be sent to the replica again.

WSO2 Message Broker Error while adding Queue - Invalid Object Name

I have just set up a WSO2 Message Broker 3.0.0 connecting to a SQL Server DB.
The DB for the Carbon MB component has been created successfully as well.
The DB for the Message Broker Data store is created and contains the table MB_QUEUE_MAPPING.
However when adding a Queue via the MB UI I see the following error in the stack trace:
[2015-12-16 15:00:41,472] ERROR {org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl} - Error occurred while retrieving destination queue id for destina
tion queue TestQ
java.sql.SQLException: Invalid object name 'MB_QUEUE_MAPPING'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2421)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:671)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1029)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.getQueueID(RDBMSMessageStoreImpl.java:1324)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.getCachedQueueID(RDBMSMessageStoreImpl.java:1298)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.addQueue(RDBMSMessageStoreImpl.java:1634)
at org.wso2.andes.store.FailureObservingMessageStore.addQueue(FailureObservingMessageStore.java:445)
at org.wso2.andes.kernel.AMQPConstructStore.addQueue(AMQPConstructStore.java:116)
at org.wso2.andes.kernel.AndesContextInformationManager.createQueue(AndesContextInformationManager.java:154)
at org.wso2.andes.kernel.disruptor.inbound.InboundQueueEvent.updateState(InboundQueueEvent.java:151)
at org.wso2.andes.kernel.disruptor.inbound.InboundEventContainer.updateState(InboundEventContainer.java:167)
at org.wso2.andes.kernel.disruptor.inbound.StateEventHandler.onEvent(StateEventHandler.java:67)
at org.wso2.andes.kernel.disruptor.inbound.StateEventHandler.onEvent(StateEventHandler.java:41)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
The "Add Queue" screen does not go away however the Queue does get added to the MB_QUEUE table just fine in the DB. Both tables MB_QUEUE_MAPPING & MB_QUEUE_COUNTER are blank.
The "List Queues" screen does blank despite a number of Queues in the MB_QUEUE table. Stack trace also shows errors but is not included as its not relevant to the error above.
I can create a Topic just fine however.
I want to know why MB would say the table MB_QUEUE_MAPPING is an Invalid object name when the table clearly exists ?
I suspect the way you have configure the mysql database is incorrect.So you can better try out one of these below two scenarios to make sure about this issue.
1) starting the server for the first time with the -Dsetup parameter or
2) you can refer the documentation(https://docs.wso2.com/display/MB300/Configuring+MySQL) "Configuring MySQL" and follow step by step instructions given in order.
I have tried out the second scenario and I did not get any exception when I am adding queue.And the document I have mentioned will have to be update as below.
you can see this command in the step 3.
mysql -u <db_user_name> -p -D<database_name> < '<WSO2MB_HOME>/dbscripts/mb-store/mysql-mb.sql ';
db_user_name - username of db.
database_name - database name that you have created in the step 1.
WSO2MB_HOME - home directory path for MB.
Hope this could help you to resolve this issue.
It seems user connecting to MSSQL database not having correct permission. Most probably SELECT permission. Reason why I am saying is, when you adding queue, it does get added. This means user has INSERT permission. Once queue added, page redirected to Queue List page. User must have SELECT permission to retrieve queue list. Topic are not getting added to database, it keeps in registry. You can verify user who connecting to MSSQL from configuration like below in wso2mb-3.0.0/repository/conf/datasources/master-datasources.xml.
<datasource>
   <name>WSO2_MB_STORE_DB</name>
   <jndiConfig>
       <name>WSO2MBStoreDB</name>
   </jndiConfig>
   <definition type="RDBMS">
         <configuration>
                    <url>jdbc:jtds:sqlserver://localhost:1433/wso2_mb</url>
                    <username>sa</username>
                    <password>sa</password>
                    <driverClassName>net.sourceforge.jtds.jdbc.Driver</driverClassName>
                    <maxActive>200</maxActive>
                    <maxWait>60000</maxWait>
                    <minIdle>5</minIdle>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
         </configuration>
     </definition>
</datasource>

New Solr node in "Active - Joining" state for several days

We are trying to add a new Solr node to our cluster:
DC Cassandra
Cassandra node 1
DC Solr
Solr node 1 <-- new node (actually, a replacement for an old node; we followed the steps for "replacing a dead node")
Solr node 2
Solr node 3
Solr node 4
Solr node 5
Our Cassandra data is approximately 962gb. Replication factor is 1 for both DCs. Is it normal for the new node to be in "Active - Joining" state for several days? Is there a way to know the progress?
Last week, there was a time when we had to kill and restart the DSE process because it began throwing "too many open files" exception. Right now, the system log is full of messages about completed compaction/flushing tasks (no errors so far).
EDIT:
The node is still in "Active - Joining" state as of this moment. It's been exactly a week since we restarted the DSE process in that node. I started monitoring the size of the solr.data directory yesterday and so far I haven't seen an increase. The system.log is still filled with compacting/flushing messages.
One thing that bothers me is that in OpsCenter Nodes screen (ring/list view), the node is shown under the "Cassandra" DC even though the node is a Solr node. In nodetool status, nodetool ring, and dsetool ring, the node is listed under the correct DC.
EDIT:
We decided to restart the bootstrap process from scratch by deleting the data and commitlog directories. Unfortunately, during the subsequent bootstrap attempt:
The stream from node 3 to node 1 (the new node) failed with an exception: ERROR [STREAM-OUT-/] 2014-04-01 01:14:40,887 CassandraDaemon.java (line 196) Exception in thread Thread[STREAM-OUT-/,5,main]
The stream from node 4 to node 1 never started. The last relevant line in node 4's system.log is: StreamResultFuture.java (line 116) Received streaming plan for Bootstrap. It should have been followed by: Prepare completed. Receiving 0 files(0 bytes), sending x files(y bytes)
How can I force those streams to be retried?

Resources