Access distributed mnesia database from different nodes - database

I have a mnesia database containning different tables.
I want to be able to access the tables from different Linux terminals.
I have a function called add_record, which takes a few parameters, say name and id. I want to be able to call add_record on node1 and add record on node2 but I want to be updating the same table from different locations.
I read a few sources and the only thing i found out was that i should use net_adm:ping (node2). but somehow I cant access the data from the table.

i assume that you probably meant replicated table. Suppose you have your mnesia table on node: nodea#127.0.0.1 with -setcookie mycookie, whether its replicated on another node or not, if i want to access the records from another terminal, then i have to use erlang in this other terminal as well by creating a node, connecting this node to our node with the table (you ensure that they all have the same cookie), then you call a method on the remote node. Lets say you want to use a method add_record in module mydatabase.erl on the node nodea#127.0.0.1 which is having the mnesia table, the i open a linux terminal and i enter the following:
$ erl -name remote#127.0.0.1 -setcookie mycookie
Eshell V5.8.4 (abort with ^G)
1> N = 'nodea#127.0.0.1'.
'nodea#127.0.0.1'
2> net_adm:ping(N).
pong
3> rpc:call(N,mydatabase,add_record,[RECORD]).
{atomic,ok}
4>
with this module (rpc), you can call any method on a remote node, if the two nodes are connected using the same cookie. start by calling this method on the remote node:
rpc:call('nodea#127.0.0.1',mnesia,info,[]).
It should display everything in your remote terminal. I suggest that probably, you first go through this lecture: Distributed Erlang Programming and then you will be able to see how replicated mnesia tables are managed. Go through that entire tutorial on that domain.

Related

SingleStore Change Master Agregator

I've created a test environment with 2 physical nodes.
I use docker.
Each node has one leaf and one aggregator node.
The aggregator on the first node is the master aggregator.
When I deploy db and create some tables and rows, everything ok.
When I shutting down the second node where I have child aggregator and second leaf, after 30 sec. i able to access to DB and I can create, update or select tables.
After getting up the second node, sync of db is ok.
But when I shut down first node with the master agregator and first leaf, I can't do anything with my DB on child agregator.
For example when i type (select * from table1;) i got an error (ERROR 1735 (HY000): Cannot connect to node #192.168.99.91:3307 with user distributed using password: YES [2004] Cannot connect to '192.168.99.91':3307. Errno=113 (No route to host)) where 192.168.99.91:3307 is leaf on first node.
I saw there is a "information_schema" in memsql.
When i type on child agregator "select * from LEAVES;" to check leaves and "select * from AGGREGATORS" to check agregators, leaf and master agregator shows with online status.
So after some search i undersand tah only master agregator can make change in information_schema.
To change child to master i typed the command AGGREGATOR SET AS MASTER on child agregator.
Then i checked information_schema on second node againd and saw that status whas changen and now child agregator is a master agregator and my test db debome available again.
Question.
How can i automate this (child to master change)?
How can i automate thing that when first master will become up again it must to change itself to leaf?
So after some search i undersand tah only master agregator can make change in information_schema.
To change child to master i typed the command AGGREGATOR SET AS MASTER on child agregator.
Then i checked information_schema on second node againd and saw that status whas changen and now child agregator is a master agregator and my test db debome available again.

Citus "... is a metadata node, but is out of sync HINT: If the node is up, wait until metadata gets synced to it and try again."

I've got a Citus (v10.1) sharded PostgreSQL (v13) cluster with 4 nodes. Master node address is 10.0.0.2 and the rest are up to .5 When trying to manage my sharded table, I've got this error:
ERROR: 10.0.0.5:5432 is a metadata node, but is out of sync
HINT: If the node is up, wait until metadata gets synced to it and try again.
I've been waiting. After 30 minutes or more, I've literally did drop schema ... cascade, drop extension Citus cascade; and after re-importing the data, creating a shard I've got the same error message once more and can't get past it.
Some additional info:
Another thing that might be an actual hint is I cannot distribute my function through create_distributed_function(), because it's says it in a DeadLock state, and transaction cannot be committed.
I've checked idle processes, nothing out of ordinary.
Created shard like that:
SELECT create_distributed_table('test_table', 'id');
SELECT alter_distributed_table('test_table', shard_count:=128, cascade_to_colocated:=true);
There is no topic in google search result regarding this subject.
EDIT 1:
I did bomb (20k-200k hits per second) my shard with a huge amount of requests for a function that does insert/update or delete if specific argument is set.
This is a rather strange error. It might be the case that you hit the issue in https://github.com/citusdata/citus/issues/4721
Do you have column defaults that are generated by sequences? If so, consider using bigserial types for these columns.
If it does not work, you can disable metadata syncing by SELECT stop_metadata_sync)to_node('10.0.0.5','5432') optionally followed by SELECT start_metadata_sync_to_node('10.0.0.5','5432') to stop waiting for metadata syncing and (optionally) retry metadata creation from scratch.

DB2 temp table session concurrent user issue

We are working with an application in .net with DB2 as a database. I am using the temp table in my stored procedure. Sometimes it throws an error "table is in use".
Declare Global Temporary Table TRNDETAILS (USERID INT ,
Name VARCHAR ( 25 )) WITH REPLACE;
As per the below document, temp tables are specific to the session. Then why is showing "table is in use". How can resolve it?
https://www.ibm.com/support/knowledgecenter/en/SS6NHC/com.ibm.swg.im.dashdb.sql.ref.doc/doc/r0003272.html
SQL0913N is either a lock-timeout or a deadlock.
This might not be in your session table. Unless your .net app is multithreading SQL on a single connection.
Check DSPRCDLCK, WRKOBJLCK related tools. You need to track down the SQL-statement(s) that are conflicting, and take action dependent on the cause. Sometimes this involves changing the isolation level in your application.
Examine the Db2 for i diagnostics to get more information , i.e. whether it is a lock-timeout or a deadlock, and which connections are involved, and which objects are involved.
QTEMP is unique for every job/connection...
I assume your app only creates the temporary table in one place and that creating the temporary table is one of the first things it does.
I also suspect that you're using connection pooling within .NET.
Thus the connection isn't actually being closed, it's left open in the connection pool.
Somewhere in your app, you're not properly disposing of a result set and/or committing changes to the temp table. Thus leaving rows inside it locked when the connection is returned to the pool.
You probably should drop the temporary table before your app closes the connection and returns the connection to the pool.
That should prevent the error, but it'd be a good idea to track down the bug that's leaving the rows locked in the first place.

Perl SQL and file create race condition

How do I handle "race condition" between instances of script that is scheduled to run every minute, performing following tasks for every file in directory:
Connect to SQL database and check last element (filename) in table
Create several files (multiple folders) with the next available filename
Insert to SQL new record with the filename and created files' information
Because process runs every 1 minute, it's possible that 2 instances overlap and work on same files. I can prevent that by file locking and skipping already opened file, however the issue persists with:
Checking next available filename in database (2 processes want to use the same filename)
Creating files with this filename
Process A takes inputA.jpg and finds next available as filename image_01.
Process B takes inputB.jpg and finds next available as filename image_01.
And so the chaos begins...
Unfortunately, I can't insert any placeholder record in SQL table to show that the next filename is being processed.
Pseudo-code of the loop:
foreach ($file)
{
$name = findFileNameInSql($file)
$path1 = createFile($name, $settings1);
$path2 = createFile($name, $settings2);
$path3 = createFile($name, $settings3);
addToSql($file, $name, $path1, $path2, $path3)
}
The actual code is a bit more complicated, including file modifications and transactional insert to 2 SQL tables. In case of createFile() failure the application is rolling back all previously created files. It obviously creates issue when one instance of app is creating file "abc" and second instance has error that file "abc" already exists.
EDIT :
Sure, limiting script to have only one instance could be solution, but I was hoping to find a way to run them in parallel. If there's no way to do it, we can close this as duplicate.
You need to make the code which returns the next available filename from the database atomic in the database, so that the database can't return the same filename twice. This is really a database problem rather than a perl problem, per se.
You don't say which database you're using, but there are several ways to do it.
A naive and brutish way to do it in MySQL is for the perl script to perform a LOCK TABLE table WRITE on the table with the filenames while it calculates a new one and does its work. Once the table is updated with the new filename, you can release the lock. TABLE LOCKS don't play nicely with transactions though.
Or you could do something rather more elegant, like implement a stored procedure with appropriate locking within the database itself to return the new filename.
Or use an AUTOINCREMENT column, so that each time you add something to the table you get a new number (and hence a new filename).
This can all get quite complicated though; if you have multiple transactions simultaneously, how the database resolves those is usually a configurable thing, so I can't tell you what will happen.
Given that it sounds as though your code is principally reorganising data on disk, there's not much advantage to having multiple jobs running at the same time; this code is probably I/O bound anyway. In which case it's much simpler just to make the code changes others have suggested to run only one copy at once.

'tail -f' a database table

Is it possible to effectively tail a database table such that when a new row is added an application is immediately notified with the new row? Any database can be used.
Use an ON INSERT trigger.
you will need to check for specifics on how to call external applications with the values contained in the inserted record, or you will write your 'application' as a SQL procedure and have it run inside the database.
it sounds like you will want to brush up on databases in general before you paint yourself into a corner with your command line approaches.
Yes, if the database is a flat text file and appends are done at the end.
Yes, if the database supports this feature in some other way; check the relevant manual.
Otherwise, no. Databases tend to be binary files.
I am not sure but this might work for primitive / flat file databases but as far as i understand (and i could be wrong) the modern database files are encrypted. Hence reading a newly added row would not work with that command.
I would imagine most databases allow for write triggers, and you could have a script that triggers on write that tells you some of what happened. I don't know what information would be available, as it would depend on the individual database.
There are a few options here, some of which others have noted:
Periodically poll for new rows. With the way MVCC works though, it's possible to miss a row if there were two INSERTS in mid-transaction when you last queried.
Define a trigger function that will do some work for you on each insert. (In Postgres you can call a NOTIFY command that other processes can LISTEN to.) You could combine a trigger with writes to an unpublished_row_ids table to ensure that your tailing process doesn't miss anything. (The tailing process would then delete IDs from the unpublished_row_ids table as it processed them.)
Hook into the database's replication functionality, if it provides any. This should have a means of guaranteeing that rows aren't missed.
I've blogged in more detail about how to do all these options with Postgres at http://btubbs.com/streaming-updates-from-postgres.html.
tail on Linux appears to be using inotify to tell when a file changes - it probably uses similar filesystem notifications frameworks on other operating systems. Therefore it does detect file modifications.
That said, tail performs an fstat() call after each detected change and will not output anything unless the size of the file increases. Modern DB systems use random file access and reuse DB pages, so it's very possible that an inserted row will not cause the backing file size to change.
You're better off using inotify (or similar) directly, and even better off if you use DB triggers or whatever mechanism your DBMS offers to watch for DB updates, since not all file updates are necessarily row insertions.
I was just in the middle of posting the same exact response as glowcoder, plus another idea:
The low-tech way to do it is to have a timestamp field, and have a program run a query every n minutes looking for records where the timestamp is greater than that of the last run. The same concept can be done by storing the last key seen if you use a sequence, or even adding a boolean field "processed".
With oracle you can select an psuedo-column called 'rowid' that gives a unique identifier for the row in the table and rowid's are ordinal... new rows get assigned rowids that are greater than any existing rowid's.
So, first select max(rowid) from table_name
I assume that one cause for the raised question is that there are many, many rows in the table... so this first step will be taxing the db a little and take some time.
Then, select * from table_name where rowid > 'whatever_that_rowid_string_was'
you still have to periodically run the query, but it is now just a quick and inexpensive query

Resources