DataflowTemplates debezium connector issue - google-cloud-pubsub

I am referring this document to perform mysql(installed on local machine) to pubsub data streaming using debezium connector.
My properties file looks like below
databaseName=testdb
databaseUsername=root
databaseAddress=localhost
databasePort=3306
gcpProject=GCP_project_name
databasePassword=password
whitelistedTables=instance-name.testdb.testtab
singleTopicMode=true
gcpPubsubTopicPrefix=debeziumTest
databaseManagementSystem=mysql
I have already created topic in pubsub with name "debeziumTest".
But the issue is, when i run
sudo mvn exec:java -pl cdc-embedded-connector -Dexec.args="/path/to/properties-file"
, it runs without any error:
but there is no data uploaded to pubsub.

Based on the documentation, table updates are pushed to a topic that matches this format- ${PREFIX}${DB_INSTANCE}.${DATABASE}.${TABLE}
In your case I believe you should create a topic with the name - "debeziumTestinstance-name.testdb.testtab"
This may not be the only problem based on the warnings I see in the logs you shared.

The problem seems to be with your whitelistedTables.
According to the docuumentation, you should use ${instance}.${database}.${table}, for your given example it should be whitelistedTables=testdb.databaseName.testTab (if testTab is your tablename)

Related

Debezium error, schema isn't known to this connector

I have a project using Debezium, mostly based on this example, which is then connected to an Apache Pulsar.
I have changed a few configurations. The file now looks like this:
database.history=io.debezium.relational.history.MemoryDatabaseHistory
connector.class=io.debezium.connector.mysql.MySqlConnector
offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
offset.storage.file.filename=offset.dat
offset.flush.interval.ms=5000
name=mysql-dbz-connector
database.hostname={ip}
database.port=3308
database.user={user}
database.password={pass}
database.dbname=database
database.server.name=test
table.whitelist=database.history_table,database.project_table
snapshot.mode=schema_only
schemas.enable=false
include.schema.changes=false
pulsar.topic=persistent://public/default/{0}
pulsar.broker.address=pulsar://{ip}:6650
database.history=io.debezium.relational.history.MemoryDatabaseHistory
As you may understand, what I'm trying to do is to monitor the history_table and the project_table modifications from the database and then write payloads onto an Apache Pulsar.
My problem is as follows. In whatever snapshot mode I use, when an offset has been written, I can't restart the Debezium without getting an error on the next database update.
Encountered change event for table database.history_table whose schema isn't known to this connector
It only happens with an existing offset.dat file. I think this is because the schema is null within the offset.dat file. Take this one for example:
¨Ìsrjava.util.HashMap⁄¡√`—F
loadFactorI thresholdxp?#wur[B¨Û¯T‡xpG{"schema":null,"payload":["mysql-dbz-connector",{"server":"test"}]}uq~U{"ts_sec":1563802215,"file":"database-bin.000005","pos":79574,"server_id":1,"event":1}x
I first suspected the schemas.enable=false or the include.schema.changes=false parameters that I used to make the JSON more concise, but their values don't change anything in the offset.dat file.
The problem lies in line database.history=io.debezium.relational.history.MemoryDatabaseHistory. The history will not survive restart. You should use FileDatabaseHistory instead of it.

How to I relate the metrics in Datadog with execution plan operators in Flink?

In my case scenario, Flink is sending the metrics to Datadog. Datadog Host map is as shown below { I have no Idea why is showing me latency here }
Flink metrics are sent to localhost. The issue here is that when
flink-conf.yaml file configuration is as follows
# adding metrics
metrics.reporters: stsd , dghttp
metrics.reporter.stsd.class: org.apache.flink.metrics.statsd.StatsDReporter
metrics.reporter.stsd.host: localhost
metrics.reporter.stsd.port: 8125
# for datadog
metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: xxx
metrics.reporter.dghttp.tags: host:localhost, job_id : jobA , tm_id : task1 , operator_name : operator1
metrics.scope.operator: numRecordsIn
metrics.scope.operator : numRecordsInPerSecond
metrics.scope.operator : numRecordsOut
metrics.scope.operator : numRecordsOutPerSecond
metrics.scope.operator : latency
The issue is that Datadog is showing 163 metrics which I don't understand, which I will explain in a while
I don't understand the metrics format in datadog as it shows me metrics something like this
Now as shown in above Image
Latency is expressed in time
Number of events per second is event /sec
count is some value
So my question is that which metric is this?
Also, the execution plan of my job is something like this
How do I relate the metrics in Datadog with execution plan operators in Flink?
I have read in Flink API 1.3.2 that I can use tags, I have tried to use them in flink-conf.yaml file but I don't have complete Idea what sense they make here.
My ultimate goal is to find operator latency, number of records out and in /second at each operator in this case
There are a variety of issues here.
1. You've misconfigured the scope formats. (metrics.scope.operator)
For one the configuration doesn't make sense since you specify "metrics.scope.operator" multiple times; only the last config entry is honored.
Second, and more importantly, you have misunderstood for scope formats are used for.
Scope formats configure which context information (like the ID of the task) is included in the reported metric's name.
By setting it to a constant ("latency") you've told Flink to not include anything. As a result, the numRecordsIn metrics for every operator is reported as "latency.numRecordsIn".
I suggest to just remove your scope configuration.
2. You've misconfigured the Datadog Tags
I do not understand what you were trying to do with your tags configuration.
The tags configuration option can only be used to provide global tags, i.e. tags that are attached to every single metrics, like "Flink".
By default every metric that the Datadog reports has tags attached to it for every available scope variable available.
So, if you have an operator name A, then the numRecordsIn metric will be reported with a tag "operator_name:A".
Again, I would suggest to just remove your configuration.

WSO2 Message Broker Error while adding Queue - Invalid Object Name

I have just set up a WSO2 Message Broker 3.0.0 connecting to a SQL Server DB.
The DB for the Carbon MB component has been created successfully as well.
The DB for the Message Broker Data store is created and contains the table MB_QUEUE_MAPPING.
However when adding a Queue via the MB UI I see the following error in the stack trace:
[2015-12-16 15:00:41,472] ERROR {org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl} - Error occurred while retrieving destination queue id for destina
tion queue TestQ
java.sql.SQLException: Invalid object name 'MB_QUEUE_MAPPING'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2421)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:671)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1029)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.getQueueID(RDBMSMessageStoreImpl.java:1324)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.getCachedQueueID(RDBMSMessageStoreImpl.java:1298)
at org.wso2.andes.store.rdbms.RDBMSMessageStoreImpl.addQueue(RDBMSMessageStoreImpl.java:1634)
at org.wso2.andes.store.FailureObservingMessageStore.addQueue(FailureObservingMessageStore.java:445)
at org.wso2.andes.kernel.AMQPConstructStore.addQueue(AMQPConstructStore.java:116)
at org.wso2.andes.kernel.AndesContextInformationManager.createQueue(AndesContextInformationManager.java:154)
at org.wso2.andes.kernel.disruptor.inbound.InboundQueueEvent.updateState(InboundQueueEvent.java:151)
at org.wso2.andes.kernel.disruptor.inbound.InboundEventContainer.updateState(InboundEventContainer.java:167)
at org.wso2.andes.kernel.disruptor.inbound.StateEventHandler.onEvent(StateEventHandler.java:67)
at org.wso2.andes.kernel.disruptor.inbound.StateEventHandler.onEvent(StateEventHandler.java:41)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
The "Add Queue" screen does not go away however the Queue does get added to the MB_QUEUE table just fine in the DB. Both tables MB_QUEUE_MAPPING & MB_QUEUE_COUNTER are blank.
The "List Queues" screen does blank despite a number of Queues in the MB_QUEUE table. Stack trace also shows errors but is not included as its not relevant to the error above.
I can create a Topic just fine however.
I want to know why MB would say the table MB_QUEUE_MAPPING is an Invalid object name when the table clearly exists ?
I suspect the way you have configure the mysql database is incorrect.So you can better try out one of these below two scenarios to make sure about this issue.
1) starting the server for the first time with the -Dsetup parameter or
2) you can refer the documentation(https://docs.wso2.com/display/MB300/Configuring+MySQL) "Configuring MySQL" and follow step by step instructions given in order.
I have tried out the second scenario and I did not get any exception when I am adding queue.And the document I have mentioned will have to be update as below.
you can see this command in the step 3.
mysql -u <db_user_name> -p -D<database_name> < '<WSO2MB_HOME>/dbscripts/mb-store/mysql-mb.sql ';
db_user_name - username of db.
database_name - database name that you have created in the step 1.
WSO2MB_HOME - home directory path for MB.
Hope this could help you to resolve this issue.
It seems user connecting to MSSQL database not having correct permission. Most probably SELECT permission. Reason why I am saying is, when you adding queue, it does get added. This means user has INSERT permission. Once queue added, page redirected to Queue List page. User must have SELECT permission to retrieve queue list. Topic are not getting added to database, it keeps in registry. You can verify user who connecting to MSSQL from configuration like below in wso2mb-3.0.0/repository/conf/datasources/master-datasources.xml.
<datasource>
   <name>WSO2_MB_STORE_DB</name>
   <jndiConfig>
       <name>WSO2MBStoreDB</name>
   </jndiConfig>
   <definition type="RDBMS">
         <configuration>
                    <url>jdbc:jtds:sqlserver://localhost:1433/wso2_mb</url>
                    <username>sa</username>
                    <password>sa</password>
                    <driverClassName>net.sourceforge.jtds.jdbc.Driver</driverClassName>
                    <maxActive>200</maxActive>
                    <maxWait>60000</maxWait>
                    <minIdle>5</minIdle>
                    <testOnBorrow>true</testOnBorrow>
                    <validationQuery>SELECT 1</validationQuery>
                    <validationInterval>30000</validationInterval>
                    <defaultAutoCommit>false</defaultAutoCommit>
         </configuration>
     </definition>
</datasource>

Error on creating database in phppgadmin

I have PostgreSQL 9.2.0. On clicking create database it shows following error:
SQL error:
ERROR: column "spclocation" does not exist
LINE 1: ...pg_catalog.pg_get_userbyid(spcowner) AS spcowner, spclocatio...
^
In statement:
SELECT spcname, pg_catalog.pg_get_userbyid(spcowner) AS spcowner, spclocation,
(SELECT description FROM pg_catalog.pg_shdescription pd WHERE pg_tablespace.oid=pd.objoid) AS spccomment
FROM pg_catalog.pg_tablespace WHERE spcname NOT LIKE $$pg\_%$$ ORDER BY spcname
Quick Fix: (worked with my Version (5.0.3) / pg 9.2.3 )
change to /classes/database
copy Postgres84.php to Postgres92.php
open Connection.php
add a line case '9.2': return 'Postgres92'; break; at the // Detect version and choose appropriate database driver switch.
open Postgres.php and copy functions getTablespaces + getTablespace
open Postgres92.php and paste the functions into the class
replace ", spclocation," with ", pg_tablespace_location(oid) as
spclocation," in both functions.
in Postgres92.php change class name to Postgres92
I updated to Mountain Lion on my Mac Mini Server on the evening of 12/12/2012 which runs PostgreSQL 9.2.1. I had the same problem when I found this question. When I did a search about this problem I found the following bug tracker on the matter.
http://sourceforge.net/tracker/?func=detail&aid=3570272&group_id=37132&atid=418980
One of the comments suggests to download the developer branch from github to get around this from https://github.com/phppgadmin/phppgadmin/zipball/master. I did this and copied the to /Library/Server/Web/Data/Sites/Default and renamed the folder. I modified $conf['servers'][0]['host'] to 127.0.0.1 in /conf/config.inc.php. I think I had to copy config.inc.php-dist. I have been successful in creating databases. As of 12/12 they had not released a stable version for 9.2. Hopefully they will soon.
In short: table pg_tablespace does not have that column in 9.2.
It seems that information should be obtained from other means now, as mentioned in the mailing list.
Also notice how in the official phpPgAdmin page, the latest PostgreSQL supported version is 9.0.
And for 'Quick Fix', add one more step after (2):
Change the class name in Postgres92.php from Postgres84 to Postgres92.

How to aggregate files in Mule ESB CE

I need to aggregate a number of csv inbound files in-memory, if necessary resequencing them, on Mule ESB CE 3.2.1.
How could I implement this kind of logics?
I tried with message-chunking-aggregator-router, but it fails on startup because xsd schema does not admit such a configuration:
<message-chunking-aggregator-router timeout="20000" failOnTimeout="false" >
<expression-message-info-mapping correlationIdExpression="#[header:correlation]"/>
</message-chunking-aggregator-router>
I've also tried to attach mine correlation ids to inbound messages, then process them by a custom-aggregator, but I've found that Mule internally uses a key made up of:
Serializable key=event.getId()+event.getMessage().getCorrelationSequence();//EventGroup:264
The internal id is everytime different (also if correlation sequence is correct): this way, Mule does not use only correlation sequence as I expected and same message is processed many times.
Finally, I can re-write a custom aggregator, but I would like to use a more consolidated technique.
Thanks in advance,
Gabriele
UPDATE
I've tried with message-chunk-aggregator but it doesn't fit my requisite, as it admits duplicates.
I try to detail the scenario I need to cover:
Mule polls (on a SFTP location)
file 1 "FIXEDPREFIX_1_of_2.zip" is detected and kept in memory somewhere (as an open SFTPStream, it's ok).
Some correlation info are mantained for grouping: group, sequence, group size.
file 1 "FIXEDPREFIX_1_of_2.zip" is detected again, but cannot be inserted because would be duplicated
file 2 "FIXEDPREFIX_2_of_2.zip" is detected, and correctly added
stated that group size has been reached, Mule routes MessageCollection with the correct set of messages
About point 2., I'm lucky enough to get info from filename and put them into MuleMessage::correlation* properties, so that subsequent components could use them.
I did, but duplicates are processed the same.
Thanks again
Gabriele
Here is the right router to use with Mule 3: http://www.mulesoft.org/documentation/display/MULE3USER/Routing+Message+Processors#RoutingMessageProcessors-MessageChunkAggregator

Resources