I am unable to come up with fundamental differences between the two. I could only come up with examples:
BitTorrent is a distributed p2p system, but ToR is decentralized.
Web services are decentralized, but I cannot think of a distributed web-service. (Maybe Diaspora?)
distributed systems can be in two type .. centralized and decentralized . Centralized Distributed Systems ,as client-server system and Multitired Application . Decentralized Distributed Systems not contains server ,such as peer to peer communication .
If architecturally there is a single point of failure within a system, it is not a distributed system. A decentralized system could be a hierarchical structure where there is still a notion of the root of the hierarchy, a point of failure. A traffic round-about is a distributed version of a centralized system where traffic-lights (in traditional sense) is used.
Related
I was reading about the difference between the tiers of architecture (2 and 3). I got to know that the later was safer than the first. The 2 Tier poses security risks, a website said. I am unable to understand what security risks the 2 tier architecture could pose?
I took the example of a ticketing software that used to have a 2 tier system. Now, if multiple clients are sending queries, can one client access information of the other one? can the response to the request get mixed up, sending wrong information to each of the clients?
I am unable to think of security issues which could exist. It would be great if anybody could drop in an answer.
In a two tier system, clients are accessing a database directly. An improperly secured database could grant too much access to a client. Securing databases for public access takes quite a bit of work. They are general execution systems and are not generally designed with fine grained security systems (exceptions do exist).
Three tier systems generally do not expose general execution systems to clients. They have specific methods and securing the middle tier is generally much more straightforward.
More specifically, what usecases does Hazelcast Jet solve that Flink does not solve (equally well) and vice versa?
NOTE: I belong to Hazelcast Jet's core engineering team.
I'd say the main advantage of Hazelcast Jet isn't in offering a brand-new computing model, but in bringing the same level of convenience that Hazelcast is known for to the realm of DAG-based distributed computing.
If you currently have a Java application running in a cluster, adding Jet will be a snap: add the Maven dependency and write one line of code to start a Jet instance on the local member. The instances will self-discover to form their own cluster, and you can now submit your job to it.
If you want a dedicated distributed computing cluster, you can download the distribution ZIP to the cluster machines. Jet has native support for the most popular cloud environments, allowing the nodes you start to self-discover. You can then connect to the cluster using a Jet client.
Needless to say, Jet makes it very convenient to use a Hazelcast IMap or IList as a data source. Jet cluster can host Hazelcast structures directly; then you benefit from data locality and get the data with no network traffic. On the other hand, the choice of data source is completely unconstrained and there is public API dedicated to implementing fast, arbitrarily partitioned, custom data sources.
Jet solves the concerns of infinite streams processing like aggregating over time-based windows, dealing with reordered events and resilience to changes in the cluster topology (e.g., failure of individual Jet nodes) while maintaining the Exactly-Once processing guarantee.
Jet's main programming paradigm is the Pipeline API which is quite similar to java.util.stream API but adapted to the specifics of distributed computing (lambda serialization and other concerns).
Pipeline API builds upon a lower-level DAG-based model that is also exposed as public API.
in my opinion, flink seems to offer some very useful streaming features, wich are not yet offered by hatecast jet.
Different flexible window operators, wich also can handle out-of-order and late items.
fault tolerance on the cluster and delivery guarantees
Beside this it also seems to be more stable and well known at the moment.
For example, you can use it as a runtime for Apache Beam and then migrate easily between Google Data Flow on the cloud and your own deployment.
So I would currently use flink.
Best
Multi-tier and/or ditstributed apps, do they have the same meaning ?
When we talk about layers in these apps, is it physical layers (database, browser, web server,...) or logical layers (data access layer, business layer,...) ?
Maybe these two sentences do convey intuitively the distinction between distributed and multi-tier:
Distributed: You replicate the processing amongst nodes
Multi-tier: You split the processing amongst tiers
In one case, the same processing is replicated over several nodes. In the other case, each tier has a distinct responsibility and the processing running on each tier differ.
Both notions are not exclusive: you can have non-distributed multi-tier apps (if there is no form of redundancy/replication), distributed apps which are not multi-tier, but also multi-tier apps which are distributed (if they have some form of redundancy).
There would be a lot more to say about the distinction, but the difference (to me) is essentially there.
ewernli has the mostly correct answer here. The only missing piece from the original question concerns physical and logical layers.
From a distributed and/or multi-tier perspective whether the layers are physically separate or just logically so is immaterial. Meaning, it doesn't matter. You can create multi-tier and even distributed applications which resides entirely on the same machine instance.
That said, it is more common to separate the tiers into different machines built specifically for that type of load. For example, a web server and a database server. Or even a web server, several web services machines, and one or more database servers.
All of these features, distributed, multi-tier, and/or load balanced with logical and/or physical layers are just features of the application design.
Further, in today's world of virtual machines, it's entirely possible (and even likely) to set up a multi-tier, distributed, and load balanced application within the confines of a single real machine. Although, I'd never recommend that course of action because the point of load balancing and distributed services is usually to increase availability or throughput.
Multi-tier means that your application will be based on multiple machines with different tasks (Database, Web application, ...). Distributed means that your application will run in multiple machines a the same time, for example your website could be hosted on 3 different servers.
For multi-tier applications we speak generally about physical layer. But in every application you can/should have different logical layers.
I'm building a web application that from day one will be on the limits of what a single server can handle. So I'm considering to adopt a distributed architecture with several identical nodes. The goal is to provide scalability (add servers to accommodate more users) and fault tolerance. The nodes need to share some state between them, therefore some communication between them is required. I believe I have the following alternatives to implement this communication in Java:
Implement it using sockets and a custom protocol.
Use RMI
Use web services (each node can send and receive/parse HTTP request).
Use JMS
Use another high-level framework like Terracotta or hazelcast
I would like to know how this technologies compare to each other:
When the number of nodes increases
When the amount of communication between the nodes increases (1000s of messages per second and/or messages up to 100KB etc)
On a practical level (eg ease of implementation, available documentation, license issues etc)
I'm also interested to know what technologies are people using in real production projects (as opposed to experimental or academic ones).
Don't forget Jini.
It gives you automatic service discovery, service leasing, and downloadable proxies so that the actual client/server communication protocol is up to you and not enforced by the framework (e.g. you can choose HTTP/RMI/whatever).
The framework is built around acknowledgement of the 8 Fallacies of Distributed Computing and recovery-oriented computing. i.e. you will have network problems, and the architecture is built to help you recover and maintain a service.
If you also use Javaspaces it's trivial to implement workflows and consumer-producer architectures. Producers will write into the Javaspaces, and one or more consumers will take that work from the space (under a transaction) and work with it. So you scale it simply by providing more consumers.
I'm facing the following challenge:
I have a bunch of databases in different geographical locations where the network may fail a lot (I'm using cellular network). I need to keep all the databases synchronized but there is no need to be in real time. I'm using Java but I have the freedom to choose any free database.
How can I achieve this?
It's a problem with a quite established corpus of research (of which people is apparently unaware). I suggest to not reinvent a poor, defective wheel if not absolutely necessary (such as, for example, so unusual requirements to allow a trivial solution).
Some keywords: replication, mobile DBMSs, distributed disconnected DBMSs.
Also these research papers are relevant (as an example of this research field):
Distributed disconnected databases,
The dangers of replication and a solution,
Improving Data Consistency in Mobile Computing Using Isolation-Only Transactions,
Dealing with Server Corruption in Weakly Consistent, Replicated Data Systems,
Rumor: Mobile Data Access Through Optimistic Peer-to-Peer Replication,
The Case for Non-transparent Replication: Examples from Bayou,
Bayou: replicated database services for world-wide applications,
Managing update conflicts in Bayou, a weakly connected replicated storage system,
Two-level client caching and disconnected operation of notebook computers in distributed systems,
Replicated document management in a group communication system,
... and so on.
I am not aware of any databases that will give you this functionality out of the box; there is a lot of complexity here due to the need for eventual consistency and conflict resolution (eg, what happens if the network gets split into 2 halves, and you update something to the value 123 while I update it on the other half to 321, and then the networks reconnect?)
You may have to roll your own.
For some ideas on how to do this, check out the design of Yahoo's PNUTS system: http://research.yahoo.com/node/2304 and Amazon's Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Check out SymmetricDS. SymmetricDS is web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.
I don't know your requirements or your apps, but this isn't a quick answer type of question. I'm very interested to see what others have to say. However, I have a suggestion that may or may not work for you, depending on your requirements and situation. particularly, this will not help if your users need to use the app even when the network is unavailable (offline access).
Keeping a bunch of small databases synchronized is a fairly complex task to do correctly. Is there any possibility of just having one centralized database, and either having the client applications connect directly to it or (my preferred solution) write some web services to handle accessing/updating data rather than having a bunch of client databases?
I realize this limits offline access, but there are various caching strategies you can use. (Which of course, leads you back to your original question.)