Logging number of records queried from each shard - database

PreparedStatement sm = conn.prepareStatement(
SQL,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY
)
ResultSet rs = sm.executeQuery();
given a query using JDBC on a sharded table using ShardingSphere how would I be able to know which record from which physical table and datasource?
I'd like to log something like:
*** records extracted from table *** of datasource ***
*** records extracted from table *** of datasource ***
*** records extracted from table *** of datasource ***
...

which release are you using? Generally, you can set sql.show = true to log the routing info (All the actual SQLs on specific physical data sources for each logic SQL), like,
props:
sql-show: false
If you wanna learn more, please visit
https://shardingsphere.apache.org/document/5.2.1/en/user-manual/shardingsphere-jdbc/yaml-config/rules/sharding/

Related

MariaDB replication is not working when no database is selected

I'm using MariaDB 10.6.8 and have one of master DB and two of slave DBs. Those DBs are set up for replication.
When I excute INSERT or UPDATE query without database selection, replication doesn't seem to work. In other words, master DB's data is changed but slave DB's data is remains intact.
/* no database is selected */
MariaDB [(none)]> show master status \G
*************************** 1. row ***************************
File: maria-bin.000007
Position: 52259873
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.000 sec)
MariaDB [(none)]> UPDATE some_database.some_tables SET some_datetime_column = now() WHERE primary_key_column = 1;
Query OK, 1 row affected (0.002 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [(none)]> show master status \G
*************************** 1. row ***************************
File: maria-bin.000007
Position: 52260068
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.000 sec)
/* only change master database's record even though the replication position is changed */
However, after selecting the database, replication work fine.
/* but, after selecting the database */
MariaDB [(none)]> USE some_database;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [some_database]> UPDATE some_tables SET some_datetime_column = now() WHERE primary_key_column = 1;
Query OK, 1 row affected (0.002 sec)
Rows matched: 1 Changed: 1 Warnings: 0
/* then change master and slave database's record */
Can anyone tell me what could be the cause of this situation?
Regardless of the binary log format (MIXED, STATEMENT, ROW) all DML commands will be written to the binary log file as soon the transaction will be committed.
When using ROW format a TABLE_MAP event will be logged first, which contains a unique ID, the database and table name. The ROW_EVENT (Delete/Insert/Update) refers to one or more table id's to identify the tables used.
The STATEMENT format logs a query event, which contains the default database name, timestamp and the SQL statement. If there is no default database, the statement itself will contain the database name.
Binlog dump example for STATEMENT format (I removed the non relevant parts such as timestamp and user variables from output)
without default database
#230210 4:42:41 server id 1 end_log_pos 474 CRC32 0x1fa4fa55 Query thread_id=5 exec_time=0 error_code=0 xid=0
insert into test.t1 values (1),(2)
/*!*/;
# at 474
#230210 4:42:41 server id 1 end_log_pos 505 CRC32 0xfecc5d48 Xid = 28
COMMIT/*!*/;
# at 505
with default database:
#230210 4:44:35 server id 1 end_log_pos 639 CRC32 0xfc862172 Query thread_id=5 exec_time=0 error_code=0 xid=0
use `test`/*!*/;
insert into t1 values (1),(2)
/*!*/;
# at 639
#230210 4:44:35 server id 1 end_log_pos 670 CRC32 0xca70b57f Xid = 56
COMMIT/*!*/;
If a session doesn't use a default database on the source server, it may not be replicated if a binary log filter was specified on the replica, e.g. replicate_do_db, since the replica doesn't parse the statement but checks if the database name applies to the filter.
To avoid inconsistent data on your replicas I would recommend to use ROW format instead.

Finding missing records from 2 data sources with Flink

I have two data sources - an S3 bucket and a postgres database table. Both sources have records in the same format with a unique identifier of type uuid. Some of the records present in the S3 bucket are not part of the postgres table and the intent is to find those missing records. The data is bounded as it is partitioned by every day in the s3 bucket.
Reading the s3-source (I believe this operation is reading the data in batch mode since I am not providing the monitorContinuously() argument) -
final FileSource<GenericRecord> source = FileSource.forRecordStreamFormat(
AvroParquetReaders.forGenericRecord(schema), path).build();
final DataStream<GenericRecord> avroStream = env.fromSource(
source, WatermarkStrategy.noWatermarks(), "s3-source");
DataStream<Row> s3Stream = avroStream.map(x -> Row.of(x.get("uuid").toString()))
.returns(Types.ROW_NAMED(new String[] {"uuid"}, Types.STRING));
Table s3table = tableEnv.fromDataStream(s3Stream);
tableEnv.createTemporaryView("s3table", s3table);
For reading from Postgres, I created a postgres catalog -
PostgresCatalog postgresCatalog = (PostgresCatalog) JdbcCatalogUtils.createCatalog(
catalogName,
defaultDatabase,
username,
pwd,
baseUrl);
tableEnv.registerCatalog(postgresCatalog.getName(), postgresCatalog);
tableEnv.useCatalog(postgresCatalog.getName());
Table dbtable = tableEnv.sqlQuery("select cast(uuid as varchar) from `localschema.table`");
tableEnv.createTemporaryView("dbtable", dbtable);
My intention was to simply perform left join and find the missing records from the dbtable. Something like this -
Table resultTable = tableEnv.sqlQuery("SELECT * FROM s3table LEFT JOIN dbtable ON s3table.uuid = dbtable.uuid where dbtable.uuid is null");
DataStream<Row> resultStream = tableEnv.toDataStream(resultTable);
resultStream.print();
However, it seems that the UUID column type is not supported just yet because I get the following exception.
Caused by: java.lang.UnsupportedOperationException: Doesn't support Postgres type 'uuid' yet
at org.apache.flink.connector.jdbc.dialect.psql.PostgresTypeMapper.mapping(PostgresTypeMapper.java:171)
As an alternative, I tried to read the database table as follows -
TypeInformation<?>[] fieldTypes = new TypeInformation<?>[] {
BasicTypeInfo.of(String.class)
};
RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes);
JdbcInputFormat jdbcInputFormat = JdbcInputFormat.buildJdbcInputFormat()
.setDrivername("org.postgresql.Driver")
.setDBUrl("jdbc:postgresql://127.0.0.1:5432/localdatabase")
.setQuery("select cast(uuid as varchar) from localschema.table")
.setUsername("postgres")
.setPassword("postgres")
.setRowTypeInfo(rowTypeInfo)
.finish();
DataStream<Row> dbStream = env.createInput(jdbcInputFormat);
Table dbtable = tableEnv.fromDataStream(dbStream).as("uuid");
tableEnv.createTemporaryView("dbtable", dbtable);
Only this time, I get the following exception on performing the left join (as above) -
Exception in thread "main" org.apache.flink.table.api.TableException: Table sink '*anonymous_datastream_sink$3*' doesn't support consuming update and delete changes which is produced by node Join(joinType=[LeftOuterJoin]
It works if I tweak the resultStream to publish the changeLogStream -
Table resultTable = tableEnv.sqlQuery("SELECT * FROM s3table LEFT JOIN dbtable ON s3table.sync_id = dbtable.sync_id where dbtable.sync_id is null");
DataStream<Row> resultStream = tableEnv.toChangelogStream(resultTable);
resultStream.print();
Sample O/P
+I[9cc38226-bcce-47ce-befc-3576195a0933, null]
+I[a24bf933-1bb7-425f-b1a7-588fb175fa11, null]
+I[da6f57c8-3ad1-4df5-9636-c6b36df2695f, null]
+I[2f3845c1-6444-44b6-b1e8-c694eee63403, null]
-D[9cc38226-bcce-47ce-befc-3576195a0933, null]
-D[a24bf933-1bb7-425f-b1a7-588fb175fa11, null]
However, I do not want the sink to have Inserts and Deletes as separate. I want just the final list of missing uuids. I guess it happens because my Postgres Source created with DataStream<Row> dbStream = env.createInput(jdbcInputFormat); is a streaming source. If I try to execute the whole application in BATCH mode, I get the following exception -
org.apache.flink.table.api.ValidationException: Querying an unbounded table '*anonymous_datastream_source$2*' in batch mode is not allowed. The table source is unbounded.
Is it possible to have a bounded JDBC source? If not, how can I achieve this using the streaming API. (using Flink version - 1.15.2)
I believe this kind of case would be a common usecase that can be implemented with Flink but clearly I'm missing something. Any leads would be appreciated.
For now common approach would be to sink the resultStream to a table. So you can schedule a job which truncates the table and then executes the Apache Flink job. And then read the results from this table.
I also noticed Apache Flink Table Store 0.3.0 is just released. And they have materialized views on the roadmap for 0.4.0. This might be a solution too. Very exciting imho.

Relation IDs mismatch - Mapping OWL to Oracle DB with Ontop

As a Part of my little App I try to map Data between my Ontology and an Oracle DB with ontop. But my first mapping is not accepted by the reasoner and it's not clear why.
As my first attempt I use the following target:
:KIS/P_PVPAT_PATIENT/{PPVPAT_PATNR} a :Patient .
and the following source:
select * from P_PVPAT_PATIENT
Here KIS is the schema, p_pvpat_patient the table and ppvpat_patnr the key.
Caused by: it.unibz.inf.ontop.exception.InvalidMappingSourceQueriesException:
Error: Relation IDs mismatch: P_PVPAT_PATIENT v "KIS"."P_PVPAT_PATIENT" P_PVPAT_PATIENT
Problem location: source query of triplesMap
[id: MAP_PATIENT
target atoms: triple(s,p,o) with
s/RDF(http://www.semanticweb.org/grossmann/ontologies/kis-ontology#KIS/P_PVPAT_PATIENT/{}(TmpToVARCHAR2(PPVPAT_PATNR)),IRI), p/<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, o/<http://www.semanticweb.org/grossmann/ontologies/kis-ontology#Patient>
source query: select * from P_PVPAT_PATIENT]
As the error said my source query was wrong because I forgot to use the schema in my sql.
the correct sql is
select * from kis.P_PVPAT_PATIENT

How to log the failed records while insert the data in bulk mode

I am inserting the data in bulk mode.I want to insert the data from one db table to another db table. I am using Scatter-gather message processor.I have 10 records in source db table, in this 10 records the second record has some invalid data (like firstname is null) remaining 9 records are valid data, but in my target db table firstname column is not null. While inserting these 10 records into target db, its throwing the error as firstname is not null. How to identify particular record has invalid data using exception handling in mule. I am new in mule esb. Can anyone help on this scenario.`
%output application/java
payload map
{
id : $.Id,
customerid : $.Customerid,
address : $.Address,
dob : $.Dob,
firstname : $.Firstname,
lastname : $.LastName,
middlename : $.Middlename,
phoneno : $.Phoneno,
batch : $.Batch,
recorddate : $.RecordDate
}]]>
`
Kindly post the exception message you are getting along with your xml flow.
But as of now i may give below suggestion.
Use a collection splitter to split and process each records.
catch the exception in the error handling block using the context #**[Exception.causedBy(your exception class)]**
After this kindly configure your strategy what to do in case of this exception happens.
In your case log your information with any column value or any record id that is unique for every message.This may help you out to see on which particular record your exception has occurred.
Thanks!

Db2 select certain columns not working after set Integrity

Im learning DB2 and I had a problem while testing some options in my db.
I have 2 tables like this:
Country
=========
IdCountry -- PK
Name
State
=========
IdState -- PK
IdCountry -- FK to Country.IdCountry
Name
Code
And I am using queries like:
SELECT IdState, Name
FROM Tables.State
WHERE IdCountry = ?
Where ? is any working IdCountry, and everything worked fine.
Then I used set integrity in my db2 control center using the default info in the options and the process was successful but now my query isn't giving me any results.
I tried using :
SELECT *
FROM Tables.State
Where IdCountry = ?
and it gives me back results.
While making tests to the table I try adding new States and they appear in the query using column names instead of *, but old records still missing.
I have no clue about what's happening, does anyone have an idea?.
Thanks in advance, and sorry for my poor English.
I'm assuming here that you're on Linux/Unix/Windows DB2, since z/OS doesn't have a SET INTEGRITY command, and I couldn't find anything about it with a quick search on the iSeries Info Center.
It's possible that your table is still in "set integrity pending" state (previously known as CHECK PENDING). You could test this theory by checking SYSCAT.TABLES using this query:
SELECT TRIM(TABSCHEMA) || '.' || TRIM(TABNAME) AS tbl,
CASE STATUS
WHEN 'C' THEN 'Integrity Check Pending'
WHEN 'N' THEN 'Normal'
WHEN 'X' THEN 'Inoperative'
END As TblStatus
FROM SYSCAT.TABLES
WHERE TABSCHEMA NOT LIKE 'SYS%'
If your table shows up, you will need to use the SET INTEGRITY command to bring your table into checked status:
SET INTEGRITY FOR Tables.State IMMEDIATE CHECKED

Resources