Spark : Open a database session in some node, broadcast it, retrieve it in another node... The session still works? - database

Suppose we have a spark cluster of multiple nodes. In the driver program, I open a cassandra session (Cluster.builder().addContactPoint(...).build().connect(keyspace)). What happens if I pass this session in a Broadcast variable and retrieve it in another node ? Is this session still usable ? Is a database session (once opened) tied to the machine that opened it ?
It's a bit difficult to try as I don't have a cluster of multiple nodes...

No. If you are trying to share a session you will need to re-establish it on other nodes (It is not serializable). But if you would like to pool sessions on a particular node you can use the Spark Cassandra Connector.
The Spark Cassandra Connector has an object CassandraConnector(SparkConf) which will pool sessions and makes it easy to share a C* connection between tasks on the same machine.
For example
rdd.map( item => CassandraConnector(conf).withSessionDo ( session => ... ) )
Will only utilize 1 C* cluster connection per executor JVM.
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md

Related

How to start a new Logstash instance when another goes down

We have a requirement, where we send DB events to one logstash instance. If one Logstash instance goes down, another logstash instance should start automatically. Both logstash will be deployed in same machine with different node name, example (active node and optional active node)
Please let us know how to handle logstash clustering for DB events.
use lvs+keepalived, Two hosts use a VIP

Setup VoltDB Cluster IP

Trying with VoltDB cluster, created a cluster of 2 nodes with k=1
Cluster initialization was successful, both the nodes are up.
Now, how do i connect to this cluster, i could not find any documentation to setup single IP for cluster.
Will the client connect to particular node IP or cluster IP ?
I am using VoltDB community edition.
In general, you can connect to one node, or to multiple nodes. For simple usage, one node is fine. For a client application where you want lower latency and higher throughput, you should connect to all of the nodes in the cluster. See Connecting to the VoltDB Database for the java client, and in particular section 6.1.2 on using the auto-connecting client which enables you to connect to only one node and the client will automatically connect to all of the other nodes.
For command-line access, see the sqlcmd reference:
--servers=server-id[,...]
Specifies the network address of one or more nodes in the database cluster. By default, sqlcmd attempts to connect to a database on localhost.
Disclosure: I work at VoltDB.
If you wish to connect to a single node try
jdbc:voltdb://192.168.1.5:<port>
as the connection URL or if you wish to connect to cluster try
jdbc:voltdb://192.168.1.5:<port>,192.168.1.6:<port>,<any additional nodes you might have in your cluster>
as the connection url.

SymmetricDS: sync client nodes to each other

I have symmetricDS configured so that there is one master node in the cloud, and then two "store" (client) nodes in remote locations.
If I insert data in the cloud, it is syncd to both clients. If I insert data in a client, it is syncd to the cloud.
However, data added on client1 never makes it to client2 and data added on client2 never makes it to client1...
Any ideas on this?
Thanks
Yes you would want a second set of triggers (maybe prefix each with the name cloud_*) that has an additional flag turned on sym_trigger.sync_on_incoming_batch=1. This will cause changes coming in as part of replication from client 1..n to be captured and resent to all other clients.
This can be more efficient that a client to client group link solution because usually the clients do not all have access over a network to sync to each other. So the change would sync to the cloud and then be redistributed to the other clients.

Why in cluster PSQL with 3 node one node - sync_state= sync and next node sync_state= async?

I want PostgreSQL Synchronous Streaming Database Replication Status = sync.
I deployed PostgreSQL cluster with 3 node and write sync type - Synchronous. But when i check type SELECT * FROM pg_stat_replication;
- i get first node - sync_state=sync,and other async, what is ? Why its two different type ?
With synchronous streaming replication in PostgreSQL, the commit on the primary is delayed until one of the standby servers has received the corresponding WAL information (the exact meaning of this is configurable with synchronous_commit).
The standby server who first confirms the reception of the WAL information is the one with sync_state 'sync', the other will be 'async'.

AlwaysON SQL Server 2014 Application exception: Failed to update database because database is readonly

We have two nodes availability group. The two nodes being SQL cluster1- node1 and SQL cluster 2- node2 and a Availability group listener. The Java application is connecting to this listener and all is working fine initially i.e application is able to perform both read/writes on the database, untill we do a failover.
The connector string is driverURL=jdbc:jtds:sqlserver://[Listerner DNS Name]:[Port]/[Database]
Say initially the node1 was primary and node2 was the secondary.
After failover, node1 becomes secondary and node2 becomes primary. Now the application is still able to connect to the database but only able to perform reads on the database. The application throws exceptions (which is mentioned in the title) if we try to do inserts on that DB.
Basically what I need is for the application to be able to perform read/writes all the time irrespective of which node is the primary. Any ideas ?
There should be no reason why you get a read-only database when the connection string is pointing to the listener. That's the point of the avail grp listener - to direct flow to the read/write (primary) database. Ping the DNS name and check that it resolves to the listener (before and after an AG failover). Unfortunatelyy I don't use Java so can't help you any further. Cheers, Mark.

Resources