Unable to setup replication in ClickHouse using Zookeeper - database

I've spent past two days trying to setup replication in ClickHouse, but what ever configuration I try I end up with the same behavior.
I'm able to create a ReplicatedMergeTree table on the first node and insert data to it. Then I create a replica on the second node. The data gets replicated and I can see it querying the second node. But when I insert data to the second node the weird behavior starts. Data is not copied to the first node and it gets the following error:
2017.11.14 11:16:43.464565 [ 30 ] <Error> DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Code: 33, e.displayText() = DB::Exception: Cannot read all data, e.what() = DB::Exception,
It is very similar to this issue on GitHub.
When I restart the first node it is able to load the new data inserted to the second node and seems to be working. However inserting some more data brings the same error again.
The most recent setup I tried:
Following the tutorial, I have a three node Zookeeper cluster with the following config:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zoo2/data
clientPort=12181
server.1=10.201.1.4:2888:3888
server.2=0.0.0.0:12888:13888
server.3=10.201.1.4:22888:23888
The zookeeper config for ClickHouse loooks like this:
<?xml version="1.0"?>
<yandex>
<zookeeper>
<node>
<host>10.201.1.4</host>
<port>2181</port>
</node>
<node>
<host>10.201.1.4</host>
<port>12181</port>
</node>
<node>
<host>10.201.1.4</host>
<port>22181</port>
</node>
</zookeeper>
</yandex>
I create all tables like this:
CREATE TABLE t_r (
id UInt32,
d Date
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/t_r', '03', d, (d, id), 8192);
The only difference accross all replicas is the replica id '03' which is set accordingly.
Thanks for any advice!

Actually I figured out the issue by myself. Thanks to #egorlitvinenko I went through all the configs again and I noticed that for all three nodes I had set up the same interserver_http_port. It would not be problem if all the nodes were running on separate machines, but in my test scenario they run side by side hosted on the same OS.

ReplicatedMergeTree('/clickhouse/tables/t_r', '03', d, (d, id), 8192);
You should configure zookeeper unique id for each replicas. Currently you use '03', it is not correct.
In tutorial, by {replica} means macros, which configured in clickhouse config file on each nodes.
See - https://clickhouse.yandex/docs/en/table_engines/replication.html#replicatedmergetree
p.s. For futher help, please, provide config of all nodes.

Related

DataflowTemplates debezium connector issue

I am referring this document to perform mysql(installed on local machine) to pubsub data streaming using debezium connector.
My properties file looks like below
databaseName=testdb
databaseUsername=root
databaseAddress=localhost
databasePort=3306
gcpProject=GCP_project_name
databasePassword=password
whitelistedTables=instance-name.testdb.testtab
singleTopicMode=true
gcpPubsubTopicPrefix=debeziumTest
databaseManagementSystem=mysql
I have already created topic in pubsub with name "debeziumTest".
But the issue is, when i run
sudo mvn exec:java -pl cdc-embedded-connector -Dexec.args="/path/to/properties-file"
, it runs without any error:
but there is no data uploaded to pubsub.
Based on the documentation, table updates are pushed to a topic that matches this format- ${PREFIX}${DB_INSTANCE}.${DATABASE}.${TABLE}
In your case I believe you should create a topic with the name - "debeziumTestinstance-name.testdb.testtab"
This may not be the only problem based on the warnings I see in the logs you shared.
The problem seems to be with your whitelistedTables.
According to the docuumentation, you should use ${instance}.${database}.${table}, for your given example it should be whitelistedTables=testdb.databaseName.testTab (if testTab is your tablename)

Debezium error, schema isn't known to this connector

I have a project using Debezium, mostly based on this example, which is then connected to an Apache Pulsar.
I have changed a few configurations. The file now looks like this:
database.history=io.debezium.relational.history.MemoryDatabaseHistory
connector.class=io.debezium.connector.mysql.MySqlConnector
offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
offset.storage.file.filename=offset.dat
offset.flush.interval.ms=5000
name=mysql-dbz-connector
database.hostname={ip}
database.port=3308
database.user={user}
database.password={pass}
database.dbname=database
database.server.name=test
table.whitelist=database.history_table,database.project_table
snapshot.mode=schema_only
schemas.enable=false
include.schema.changes=false
pulsar.topic=persistent://public/default/{0}
pulsar.broker.address=pulsar://{ip}:6650
database.history=io.debezium.relational.history.MemoryDatabaseHistory
As you may understand, what I'm trying to do is to monitor the history_table and the project_table modifications from the database and then write payloads onto an Apache Pulsar.
My problem is as follows. In whatever snapshot mode I use, when an offset has been written, I can't restart the Debezium without getting an error on the next database update.
Encountered change event for table database.history_table whose schema isn't known to this connector
It only happens with an existing offset.dat file. I think this is because the schema is null within the offset.dat file. Take this one for example:
¨Ìsrjava.util.HashMap⁄¡√`—F
loadFactorI thresholdxp?#wur[B¨Û¯T‡xpG{"schema":null,"payload":["mysql-dbz-connector",{"server":"test"}]}uq~U{"ts_sec":1563802215,"file":"database-bin.000005","pos":79574,"server_id":1,"event":1}x
I first suspected the schemas.enable=false or the include.schema.changes=false parameters that I used to make the JSON more concise, but their values don't change anything in the offset.dat file.
The problem lies in line database.history=io.debezium.relational.history.MemoryDatabaseHistory. The history will not survive restart. You should use FileDatabaseHistory instead of it.

WSO2 Message Broker Error while adding Queue - Invalid Object Name

I have just set up a WSO2 Message Broker 3.0.0 connecting to a SQL Server DB.
The DB for the Carbon MB component has been created successfully as well.
The DB for the Message Broker Data store is created and contains the table MB_QUEUE_MAPPING.
However when adding a Queue via the MB UI I see the following error in the stack trace:
[2015-12-16 15:00:41,472] ERROR {org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl} - Error occurred while retrieving destination queue id for destina
tion queue TestQ
java.sql.SQLException: Invalid object name 'MB_QUEUE_MAPPING'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2421)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:671)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1029)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.getQueueID(RDBMSMessageStoreImpl.java:1324)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.getCachedQueueID(RDBMSMessageStoreImpl.java:1298)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.addQueue(RDBMSMessageStoreImpl.java:1634)
at org.wso2.andes.store.FailureObservingMessageStore.addQueue(FailureObservingMessageStore.java:445)
at org.wso2.andes.kernel.AMQPConstructStore.addQueue(AMQPConstructStore.java:116)
at org.wso2.andes.kernel.AndesContextInformationManager.createQueue(AndesContextInformationManager.java:154)
at org.wso2.andes.kernel.disruptor.inbound.InboundQueueEvent.updateState(InboundQueueEvent.java:151)
at org.wso2.andes.kernel.disruptor.inbound.InboundEventContainer.updateState(InboundEventContainer.java:167)
at org.wso2.andes.kernel.disruptor.inbound.StateEventHandler.onEvent(StateEventHandler.java:67)
at org.wso2.andes.kernel.disruptor.inbound.StateEventHandler.onEvent(StateEventHandler.java:41)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
The "Add Queue" screen does not go away however the Queue does get added to the MB_QUEUE table just fine in the DB. Both tables MB_QUEUE_MAPPING & MB_QUEUE_COUNTER are blank.
The "List Queues" screen does blank despite a number of Queues in the MB_QUEUE table. Stack trace also shows errors but is not included as its not relevant to the error above.
I can create a Topic just fine however.
I want to know why MB would say the table MB_QUEUE_MAPPING is an Invalid object name when the table clearly exists ?
I suspect the way you have configure the mysql database is incorrect.So you can better try out one of these below two scenarios to make sure about this issue.
1) starting the server for the first time with the -Dsetup parameter or
2) you can refer the documentation(https://docs.wso2.com/display/MB300/Configuring+MySQL) "Configuring MySQL" and follow step by step instructions given in order.
I have tried out the second scenario and I did not get any exception when I am adding queue.And the document I have mentioned will have to be update as below.
you can see this command in the step 3.
mysql -u <db_user_name> -p -D<database_name> < '<WSO2MB_HOME>/dbscripts/mb-store/mysql-mb.sql ';
db_user_name - username of db.
database_name - database name that you have created in the step 1.
WSO2MB_HOME - home directory path for MB.
Hope this could help you to resolve this issue.
It seems user connecting to MSSQL database not having correct permission. Most probably SELECT permission. Reason why I am saying is, when you adding queue, it does get added. This means user has INSERT permission. Once queue added, page redirected to Queue List page. User must have SELECT permission to retrieve queue list. Topic are not getting added to database, it keeps in registry. You can verify user who connecting to MSSQL from configuration like below in wso2mb-3.0.0/repository/conf/datasources/master-datasources.xml.
<datasource>
   <name>WSO2_MB_STORE_DB</name>
   <jndiConfig>
       <name>WSO2MBStoreDB</name>
   </jndiConfig>
   <definition type="RDBMS">
         <configuration>
                    <url>jdbc:jtds:sqlserver://localhost:1433/wso2_mb</url>
                    <username>sa</username>
                    <password>sa</password>
                    <driverClassName>net.sourceforge.jtds.jdbc.Driver</driverClassName>
                    <maxActive>200</maxActive>
                    <maxWait>60000</maxWait>
                    <minIdle>5</minIdle>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
         </configuration>
     </definition>
</datasource>

App Engine no longer updating index.yaml

The index.yaml file of my GAE app is no longer updated by the development server.
I have recently added a new kind to my app and a handler that queries this kind like so:
from google.appengine.ext import ndb
class MyKind(ndb.Model):
thing = ndb.TextProperty()
timestamp = ndb.DateTimeProperty(auto_now_add=True)
and in the handler I have a query
query = MyKind.query()
query.order(-MyKind.timestamp)
logging.info(query.iter().index_list())
entities = query.fetch(100)
for entity in entities:
# do something
AFAIK, the development server should create an index for this query and update index.yaml accordingly. However, it doesn't. It just looks like this:
indexes:
# AUTOGENERATED
The logging.info(query.iter().index_list()) should output the index used for the query, it just says 'None'. Also, the SDK console says 'Datastore contains no indexes.'
Running the query returns the entities unsorted. I have two questions:
is there some syntax error in my code causes the query results be unsorted or is it the missing index?
if it's the missing index, is there a way to manually force the dev server to update index.yaml? Other suggestions?
Thank you
your call to order returns the new query..
query = MyKind.query()
query = query.order(-MyKind.timestamp)
..to clarify..
query.order(-MyKind.timestamp) does not change the query, it returns a new one, so you need to use the query returned by that method. As it is query.order(-MyKind.timestamp) in your code does nothing.

How to use indexed properties of NodeModels in cypher queries of Neo4django?

I'm a newbie to Django as well as neo4j. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8
I've created NodeModel for a person node and indexed it on 'owner' and 'name' properties. Here is my models.py:
from neo4django.db import models as models2
class person_conns(models2.NodeModel):
owner = models2.StringProperty(max_length=30,indexed=True)
name = models2.StringProperty(max_length=30,indexed=True)
gender = models2.StringProperty(max_length=1)
parent = models2.Relationship('self',rel_type='parent_of',related_name='parents')
child = models2.Relationship('self',rel_type='child_of',related_name='children')
def __unicode__(self):
return self.name
Before I connected to Neo4j server, I set auto indexing to True and and gave indexable keys in conf/neo4j.properties file as follows:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=owner,name
# Enable auto-indexing for relationships, default is false
relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled
relationship_keys_indexable=child_of,parent_of
I followed Neo4j: Step by Step to create an automatic index to update above file and manually create node_auto_index on neo4j server.
Below are the indexes created on neo4j server after executing syndb of django on neo4j database and manually creating auto indexes:
graph-person_conns lucene
{"to_lower_case":"true", "_blueprints:type":"MANUAL","type":"fulltext"}
node_auto_index lucene
{"_blueprints:type":"MANUAL", "type":"exact"}
As suggested in https://github.com/scholrly/neo4django/issues/123 I used connection.cypher(queries) to query the neo4j database
For Example:
listpar = connection.cypher("START no=node(*) RETURN no.owner?, no.name?",raw=True)
Above returns the owner and name of all nodes correctly. But when I try to query on indexed properties instead of 'number' or '*', as in case of:
listpar = connection.cypher("START no=node:node_auto_index(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives 0 rows.
listpar = connection.cypher("START no=node:graph-person_conns(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives
Exception Value:
Error [400]: Bad Request. Bad request syntax or unsupported method.
Invalid data sent: (' expected but-' found after graph
I tried other strings like name, person_conns instead of graph-person_conns but each time it gives error that the particular index does not exist. Am I doing a mistake while adding indexes?
My project mainly depends on filtering the nodes based on properties, so this part is really essential. Any pointers or suggestions would be appreciated. Thank you.
This is my first post on stackoverflow. So in case of any missing information or confusing statements please be patient. Thank you.
UPDATE:
Thank you for the help. For the benefit of others I would like to give example of how to use cypher queries to traverse/find shortest path between two nodes.
from neo4django.db import connection
results = connection.cypher("START source=node:`graph-person_conns`(person_name='s2sp1'),dest=node:`graph-person_conns`(person_name='s2c1') MATCH p=ShortestPath(source-[*]->dest) RETURN extract(i in nodes(p) : i.person_name), extract(j in rels(p) : type(j))")
This is to find shortest path between nodes named s2sp1 and s2c1 on the graph. Cypher queries are really cool and help traverse nodes limiting the hops, types of relations etc.
Can someone comment on the performance of this method? Also please suggest if there are any other efficient methods to access Neo4j from Django. Thank You :)
Hm, why are you using Cypher? neo4django QuerySets work just fine for the above if you set the properties to indexed=True (or not, it'll just be slower for those).
people = person_conns.objects.filter(name='n2')
The neo4django docs have some other querying examples, as do the Django docs. Neo4django executes those queries as Cypher on the backend- you really shouldn't need to drop down to writing the Cypher yourself unless you have a very particular traversal pattern or a performance issue.
Anyway, to more directly tackle your question- the last example you used needs backticks to escape the index name, like
listpar = connection.cypher("START no=node:`graph-person_conns`(name='s2') RETURN no.owner?, no.name?",raw=True)
The first example should work. One thought- did you flip the autoindexing on before or after saving the nodes you're searching for? If after, note that you'll have to manually reindex the nodes either using the Java API or by re-setting properties on the node, since it won't have been autoindexed.
HTH, and welcome to StackOverflow!

Resources