Questions about Datomic

Questions about Datomic - datomic

Quick questions.
1) If I perform a Transaction, when that result of that transaction returns, are all peers updated? Or will the other peers eventually reflect the change of the transaction?
2) How do peers update their state? From what I understand, Peers have direct access to storage, and is their own cache. So when they don't have what you're asking for, I guess they just retrieve it from storage. How do peers know when their information is out of date?
3) How much does the choice of storage/backend matter? Since db's like MySQL and the like were optimized for a server/client relationship, wouldn't it be possible to create a Datomic-optimized storage solution? Or wouldn't it be worth it?

1) The only peer guaranteed to have been updated is the peer requesting the transaction. The others will eventually see the change. If your application requires peer coordination, you need to use the blocking (sync connection t) function to achieve it. For more information, see datomic-docs.
2) The transactor broadcasts new changes to all connected peers. The peers state is composed of the novelty coming from the transactor and the data segments they retrieve directly from storage. Again, the sync function can communicate with the transactor to ensure that a peer blocks until all transactions written before the call to sync are available. Most other functions do not communicate with the transactor or block.
3) If your system doesn't require huge write scalability and your application data tends to fit in memory, then the choice of a particular storage service is generally irrelevant except, of course, for their operational capabilities (backups, admin tools, etc.) which have nothing to do with Datomic. See How does the storage backend influence Datomic? for more details.

Related

How does etcd handle reads/writes during a network partition?

I am looking for something to use as a simple service registry and am considering etcd. For this use-case availability is more important than consistency. Clients must be able to read/write keys to any of the nodes even when the cluster is split. Can etcd be used in this way? It doesn't matter if some of the writes are lost when things come back together as they will be quickly updated by service "I am alive" heartbeat timers.

I'm also new to etcd. What I have noticed is when network partitioning happens, reads still work for the nodes which are not in main quorum. They will see inconsistent data.
As for the writes they fail with "Raft internal error"

Peer to peer replication of local databases

I have a program in C that monitors traffic and records the URLs visited by the user. Currently, I am maintaining this in a hash table. My key is the src-IP address and the result is a data-structure with a linked list of URLs. I am currently maintaining 50k to 100k records in a hash table. When the user logs out, the record can get deleted.
The program independently runs on a Active-Standby pair. I want to replicate this database to another machine in case my primary machine crashes (the 2 systems act as Client and Server) and continue recording stuff associated with the user.
The hard way is to write code for sending this information to the peer and on the peer system to receive and store. The issue is, it will add lots of code (and bugs!). To do data-replication and data-store, here are a few prereqs:
I want data-record replication between these machines. I am NOT looking at adding another machine/cluster unless required.
Prefer library so that query is quick. If not another process on the same machine to which I can IPC.
Add, update and delete operations should be supported.
In memory database a must.
Support multiple such databases with different keys.
Something that has publish/subscribe.
Resync capability if the backup dies and comes back again.
Interface should be in C
Possible options I looked at were zookeeper, redis, memcached, sql-lite, berkeley-db.
Zookeeper - Needs odd number of systems for tie-break. Not suitable for 1 to 1.
Redis - Looks to fit my requirements with hiredis for C interface. Separate process though.
Memcached - I don't have any caching requirements.
Sql-lite - Embedded database with C interface
Berkeley-DB - Embedded database for better scale.
So, Redis, Sql-lite and Berkeley-DB look like my options to go forward. Appreciate any help/thoughts on the DBs I should research more for my requirements. Or if there are any other DBs I should research? I apologize if my question is very generic. If the question does not belong here, please point me to the right forum.

Best approach to handle database transactions and business logic

I wonder if it is right to handle the transactions of the database as follows:
**locate database service**
**open connection**
**begin transaction**
get objects from relational database
call business logic
**commit transaction**
**close connection**
**release**
The code in asterisks its going to be injected via IoC**
While thus the business logic is not affected by data access code, asked whether the implementation is correct and what possible consequences it brings.
Thank you!

Usually you don't want to keep transaction open while dealing with business logic. Your application may perform lengthy computations, sending data over network, calling remote services, etc. Having database transaction open during this process can and will cause many problems; some of them are deadlocks, running out of RDMS connection pool, lock escalation, lost updates, etc.
In general, Repository module is responsible for loading/persisting objects, including transaction management. Business logic doesn't have to worry about transactions, all it needs to know is how to call the right method of Repository. Also, don't forget that storing data may fail due to a number of reasons, so make sure you handle it properly. For example,
1.Read objects from external storage (transaction management, if any, is hidden inside Repository)
2.Manipulate objects according to business logic
3.Store result of manipulation (assuming your storage is RDMS that supports transaction, you begin transaction, save data, commit if success, rollback if error)

There is sufficient overhead to locating a database service and opening a connection that you normally want to keep the connection open for re-use. A connection pool can do this if it is inconvenient in the application.

Why use Singleton to manage db connection?

I know this has been asked before here there and everywhere but i can't get a clear explanation so i'm going to pitch it again. So what is all of the fuss about using a singleton to control the db connection in your web app? Some like it some hate it i don't understand it. From what I've read, "it's to ensure that there is always only one active connection to your DB". I mean why is that a good thing? 1 active DB connection on a data driven web app processing multiple requests per second spells trouble doesn't it? For whatever reason nobody can properly explain this. I've been all over the web. I know i'm thick.

Assuming Java here, but is relevant to most other technologies as well.
I'm not sure whether you've confused the use of a plain singleton with a service locator. Both of them are design patterns. The service locator pattern is used by applications to ensure that there is a single class entrusted with the responsibility of obtaining and providing access to databases, files, JMS queues, etc.
Most service locators are implemented as singletons, since there is no need for multiple service locators to do the same job. Besides, it is useful to cache information obtained from the first lookup that can be later used by other clients of the service locator.
By the way, the argument about
"it's to ensure that there is always
only one active connection to your DB"
is false and misleading. It is quite possible that the connection can be closed/reclaimed if left inactive for quite a long period of time. So caching a connection to the database is frowned upon. There is one deviation from this argument; "re-using" the connection obtained from the connection pool is encouraged as long as you do so with the same context, i.e. within the same HTTP request, or user request (whichever is applicable). This done obviously, from the point of view of performance, since establishing new connections can prove to be an expensive operation.

High-performance (or even medium-performance) web apps use database connection pooling, so one DB connection can be shared among many web requests. The singleton is usually the object which manages this pool. I think the motivation for using a singleton is to idiot-proof against maintenance programmers that might otherwise instantiate many of these objects needlessly.

"it's to ensure that there is always only one active connection to your DB." I think that would be better stated as to ensure each CLIENT has only one active connection to your DB. The reason why this is incredibly important is because you want to prevent deadlocks. If I have TWO open database connections (as a client) I might be updating on one connection, then I might try to update the same row in another connection. This will a deadlock which the database cannot detect. So, the idea of the singleton is basically to make sure that there is ONE object who is charge of handing out database connections to each client. Basically. You don't HAVE to have a singleton for this, but most people will tell you it just makes sense that the system only has one.

You're right--usually this isn't what you want.
However, there are plenty of cases where you need to throttle yourself down to a single connection. By serializing your access to the database through a singleton, you can address other issues or constraints like load, bandwidth, etc.
I've done something similar in the past for a bulk processing app. Instead, though, I used a semaphore to synchronize access to the database so I could allow n concurrent db operations.

One might want to use a singleton due to database server constraints, for example, a server might limit the number of connections.
My main conscious reason is that you know what connections can be managed/closed etc., just makes things a bit more organised when you don't have unnecessary, redundant connections.

I don't think it's a simple answer. For instance on ASP.NET, the platform implements connection pooling by default, so it will automatically adjust a "pool" of connections and re-use them so you're not constantly creating and destroying expensive objects.
However, let's say you were writing a data collection application that monitored 200 separate input sources. Every time one of those inputs changed, you fire off a thread that records the event to the database. I would say that could be a bad design if there's a chance that even a fraction of those could fire off at the same time. Suddenly having 20 or 40 active database connections is inefficient. It might be better to queue the updates, and as long as there are updates left in the queue, a singleton connection picks them off the queue and executes them on the server. It's more efficient because you only have to negotiate the connection and authentication once. Once there's no activity for a while you could choose to close down the connection. This kind of behavior would be hard to implement without a central resource manager like a singleton.

"only one active connection" is a very narrow statement for illustration. It could just as well be a singleton managing a pool of connection. The point of a singleton for database connections is that you don't want every consumer making it's own connection or set of connections.

I think you might want to be more specific about, "using a singleton to control the db connection in your web app." Ideally, a java.sql.Connection object will not be thread safe, but your javax.sql.DataSource may want to pool connections, so you should go to a single instance of it to share the pooling.

you are more looking for one connection per request, not one connection for the entire application. you can still control access to it through a singleton though (storing the connection in the HttpContext.Items collection).

It guarantees that each client using your site only gets one connection to the db.
You really do not want a new connection being made everytime a user does an action that will create a db query. Not only for performance reasons with the connection handshaking involved, but to decrease load on the db server.
DB connections are a precious commodity, and this technique helps minimize the amount used at any given time.

.NET CF mobile device application - best methodology to handle potential offline-ness?

I'm building a mobile application in VB.NET (compact framework), and I'm wondering what the best way to approach the potential offline interactions on the device. Basically, the devices have cellular and 802.11, but may still be offline (where there's poor reception, etc). A driver will scan boxes as they leave his truck, and I want to update the new location - immediately if there's network signal, or queued if it's offline and handled later. It made me think, though, about how to handle offline-ness in general.
Do I cache as much data to the device as I can so that I use it if it's offline - Essentially, each device would have a copy of the (relevant) production data on it? Or is it better to disable certain functionality when it's offline, so as to avoid the headache of synchronization later? I know this is a pretty specific question that depends on my app, but I'm curious to see if others have taken this route.
Do I build the application itself to act as though it's always offline, submitting everything to a local queue of sorts that's owned by a local class (essentially abstracting away the online/offline thing), and then have the class submit things to the server as it can? What about data lookups - how can those be handled in a "Semi-live" fashion?
Or should I have the application attempt to submit requests to the server directly, in real-time, and handle it if it itself request fails? I can see a potential problem of making the user wait for the timeout, but is this the most reliable way to do it?
I'm not looking for a specific solution, but really just stories of how developers accomplish this with the smoothest user experience possible, with a link to a how-to or heres-what-to-consider or something like that. Thanks for your pointers on this!

We can't give you a definitive answer because there is no "right" answer that fits all usage scenarios. For example if you're using SQL Server on the back end and SQL CE locally, you could always set up merge replication and have the data engine handle all of this for you. That's pretty clean. Using the offline application block might solve it. Using store and forward might be an option.
You could store locally and then roll your own synchronization with a direct connection, web service of WCF service used when a network is detected. You could use MSMQ for delivery.
What you have to think about is not what the "right" way is, but how your implementation will affect application usability. If you disable features due to lack of connectivity, is the app still usable? If you have stale data, is that a problem? Maybe some critical data needs to be transferred when you have GSM/GPRS (which typically isn't free) and more would be done when you have 802.11. Maybe you can run all day with lookup tables pulled down in the morning and upload only transactions, with the device tracking what changes it's made.
Basically it really depends on how it's used, the nature of the data, the importance of data transactions between fielded devices, the effect of data latency, and probably other factors I can't think of offhand.
So the first step is to determine how the app needs to be used, then determine the infrastructure and architecture to provide the connectivity and data access required.

I haven't used it myself, but have you looked into the "store and forward" capabilities of the CF? It may suit your needs. I believe it uses an Exchange mailbox as a message queue to send SOAP packets to and from the device.

The best way to approach this is to always work offline, then use message queues to handle sending changes to and from the device. When the driver marks something as delivered, for example, update the item as delivered in your local store and also place a message in an outgoing queue to tell the server it's been delivered. When the connection is up, send any queued items back to the server and get any messages that have been queued up from the server.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight