Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm just a little bit confused of this concept.
I heard the words "Distributed system" a lot, but I'm not really sure my stuff is kind of "Distributed system".
Basically, we have a master server( a very big one) as the front line production server.
Then , in order to reduce the load of master server(no crush it by ton of tasks). We put all kind of jobs into different small servers.
These small server consummate with master server pull & push processed data between each other.
But once I heard "Distributed System" i really get frightened, it feels so big for me , I don't really know my job is related or not.
From your small description it sounds like you have a bonafide distributed system.
From our good friend wikipedia:
A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.[1] The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components.[1] Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.
I think you fit the description because:
1) A common goal is being done by different servers.
2) The servers are communicating with each other by passing messages.
That second one is pretty important for multiple reasons. Besides the benefits you get from having these servers communicating with each other, it also means that as an engineer you are tackling the traditional problems that people in the distributed systems field handle. It exposes you to these problems and while you might not feel like you are in the field or you might not use the same jargon, you will be presented with the same problems and solutions.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
I'm developing a web backend with two modules. One handles a relatively small amount of data that doesn't change often. The other handles real-time data that's constantly being dumped into the database and never gets changed or deleted. I'm not sure whether to have separate databases for each module or just one.
The data between the modules is interconnected quite a bit, so it's a lot more convenient to have it in a single database.
But anything fails, I need the first database to be available for reads as soon as possible, and the second one can wait.
Also I'm not sure how much performance impact the constantly growing large database would have on the first one.
I'd like to make dumps of the data available to public, and I don't want users downloading gigabytes that they don't need.
And if I decide to use a single one, how easy is it to separate them later? I use Postgres, btw.
Sounds like you have a website with its content being the first DB, and some kind of analytics being the second DB.
It makes sense to separate those physically (as in on different servers). Especially if one of those is required to be available as much as possible. Separating mission critical parts from something not that important is a good design. Also, smaller DB means shorter recovery times from a backup, if such need to arise.
For the data that is interconnected, if you need remote lookup from one DB into another, Foreign Data Wrappers may help.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am preparing for the system design interview, and since I have little experience with this topic, I bought the "Grokking the system design interview" course from educative.io, which was recommended by several websites.
However I read it, I think I did not manage to understand several things, so if someone could answer my questions, that would be helpful.
Since I have no experience with nosql, I find it difficult to chose the proper db system. Several times the course just do not give any reasoning why it chose one db over another one. For example in chapter "Designing Youtube or Netflix" the editors chose mysql for db role with no explanation. In the same chapter we have the following non-functional requirements:
"The system should be highly available. Consistency can take a hit (in
the interest of availability); if a user doesn’t see a video for a
while, it should be fine."
Following the above hint and taking into account the size of the system and applying the material in the "CAP theorem" chapter for me it seems or Cassandra and CouchDB would be a better choise. What do I miss here?
Same question goes for "Designing Facebook’s Newsfeed"
Is CAP theorem still applicable?
What I mean is: HBase is according to the chapter "CAP theorem" good at consistency and partition tolerance, but according to the HBase documentation, it also supports High Availibility since version 2.X. So it seems to me that it is a one fits all / universal solution for db storage which goes against CAP theorem, unless they sacrificed something for HA. What do I miss here?
The numbers are kind of inconsistent around the course about how much RAM/storage/bandwidth can a computer handle, I guess they are outdated. What are the current numbers for a, regular computers, b, modern servers?
Almost every chapter has a part called "Capacity Estimation and Constraints", but what is calculated here changes from chapter to chapter. Sometimes only storage is calculated, often bandwidth too, sometimes QPS is added, sometimes there are task specific metrics. How do I know what should I calculate for a specific task?
Thanks in advance!
Each database is different and fulfills different requirements. I recommend you read dynamo-paper, and familiarize yourself with the rest of the terminology used in it (two-phase locking, leader/follower, multi-leader, async/sync replication, quorums), and know what guarantees the different databases provide. Now to the questions:
MySQL can be configured to prioritize Availability at the cost of Consistency with its asynchronous replication model (the leader doesn't wait for acknowledgement from its followers before committing a write; if a leader crashes before the data gets propagated to the followers, the data is lost), so it can be one of the suitable solutions here.
From the documentation of HBase, HBase guarantees strong consistency, even at the cost of availability.
The promise of high availability is for reads, not for writes i.e. for reading stale data while the rest of the system recovers from failure and can accept additional writes.
because of this single homing of the reads to a single location, if the server becomes unavailable, the regions of the table that were hosted in the region server become unavailable for some time.
Since all writes still have to go through the primary region, the writes are not highly-available (meaning they might block for some time if the region becomes unavailable).
The numbers used are estimates by the candidate i.e. you decide what are the specs of a single hypothetical server, and how many servers you would need in order to scale and accommodate the storage/throughput requirement.
You don't know in advance (although you can make a guess based on the requirements e.g. if it's a data storage system, a streaming service etc., I still wouldn't recommend it). Instead, you should ask the interviewer what area they are interested in, and you make estimates for it. The interview, especially the system design part, is a discussion, don't follow a template to the letter. You recognize the different areas you can tackle about the system, and approach them based on the interviewer's interest.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
We have a web service that needs a somewhat POSIX-compatible shared filesystem for the application servers (multiple redundant systems running in parallel behind redundant load balancers). We're currently running GlusterFS as the shared filesystem for the application servers but I'm not happy with the performance of the system. Compared to actual raw performance of the storage servers running GlusterFS, it starts to look more sensible to run DRBD and single NFS server with all the other GlusterFS servers (currently 3 servers) waiting in hot-standby role.
Our workload is highly read oriented and usually deals with small files and I'd be happy to use "eventually consistent" system as long as a client can request sync for a single file if needed (that is, client is prepared to wait until the file has been successfully stored in the backend storage). I'd even accept a system where such "sync" requires querying the state of the file via some other way than POSIX fdatasync(). File metadata such as modification times is not important, only filename and the contents.
I'm currently aware of possible candidates and the problems each one currently has:
GlusterFS: overall performance is pretty poor in practice, performance goes down while adding new servers/bricks.
Ceph: highly complex to configure/administrate, POSIX compatibility sacrifices performance a lot as far as I know.
MooseFS: partially obfuscated open source (huge dumps of internally written code published seldomly with intentionally lost patch history), documentation leaves lots to desire.
SeaweedFS: pretty simple design and supposedly high performance, future of this project is unclear because pretty much all code is written and maintained by Chris Lu - what happens if he no longer writes any code? Unclear if the "Filer" component supports no single point of failure.
I know that CAP theorem prevents ever having truly consistent and always available system. Is there any good system for distributed file system where writes must be durable, but read performance is really good and the system has no single point of failure?
I am Chris Lu working on SeaweedFS. There are plans to commercialize it. (By adding more advanced features.)
The filer does not have simple point of failure, you can have multiple filer instances. The filer store can be any key-value store. If you need no SPOF, you can use Cassandra, Redis cluster, CockroachDB, TiDB, or Etcd. Or you can add your own key-value store option, which is pretty easy.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Our company has five million users. We store user's code files. Users can edit and add their files, just like web IDE, the web IDE list users's file. We use PHP functions to implement these operations, such as readdir, file_get_contents and file_put_contents. We used the MooseFS but when we read the files in the program, in particular the slow loading speed.
So, we need to replace the file system , I hope someone can give me some advice , we have a huge number of small files, which distributed file system should be used.
Five million entries is small to a relational database. I'd wonder why you feel the need to store these in a file system.
Does every user require that all files be loaded on startup? If yes, I'd wonder about the design of the system. That operation is O(N) no matter how you design it.
If you put those five million small files into a relational or NoSQL database, and then let each user connect to it and query for the particular ones they want, then you eliminate the need to load them repeatedly on startup. Problem solved.
In any distributed filesystem, one of the most crucial aspects when we consider operations on small files is network latency - it should be as small as possible (like 0.1 ms) between such distributed filesystem components. The best way to achieve it is to use reliable switch and connect all machines to the same switch.
Also, in distributed filesystems (especially in MooseFS) the best thing is scalability - it means, that the more nodes you have (and the more your calculations are distributed, i.e. done simultaneously on more than one mount), the faster the cluster is.
If you use MooseFS, please check out MooseFS 3.0, because operations on small files are improved since 3.0 version. This is an easy way for now, because you don't have to make a "revolution" (before upgrade remember to backup the /var/lib/mfs on Master Server - i.e. metadata). MooseFS can handle small files well, so maybe there's a problem in configuration?
In MooseFS additionally (still considering small files operations), one of the most important things is to have high CPU clock (like e.g. 3.7 GHz) with small amount of CPU cores and disabled energy saving options in BIOS for Master Server (because Master Server is a single-threaded process). For Chunkservers and Clients situation is different - they are multi-threaded, so you'll get better results while using multicore CPUs.
Additionally, as stated in MooseFS Best practices in paragraph 4. "Virtual Machines and MooseFS":
[...] we do not recommend running MooseFS components (especially Master Server(s)) on Virtual Machines.
So if you run MFS on VMs, you in fact may have poor results.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm trying to optimize the backend for an information system for high-availability, which involves splitting off a part needed for time-critical client requests (front office) from the rest (back office).
Front office will have redundant application servers with load balancing for maximum performance and will use a database with pre-computed data. Back office will periodically prepare data for the front office based on client statistics and some external data.
A part of the data schema will be shared between both back and front office, but not the whole databases, only parts of some tables. The data will not need to correspond all the time, it will be synchronized between the two databases periodically. Continuous synchronization is also viable, but there is no real-time consistency requierement and it seems that batch-style synchronization would be better in terms of control, debug and backup possibilities. I expect no need for solving conflicts because data will mostly grow and change only on one side.
The solution should allow defining corresponding tables and columns and then it will insert/update new/changed rows. The solution should ideally use data model defined in Groovy classes (probably through annotations?), as both applications run on Grails. The synchronization may use the existing Grails web applications or run externally, maybe even on the database server alone (Postgresql).
There are systems for replicating whole mirrored databases, but I wasn't able to find any solution suiting my needs. Do you know of any existing framework to do help with that or to make my own is the only possibility?
I ended up using Londiste from SkyTools. The project page on pgFoundry site lists quite old binaries (and is currently down), so you better build it from source.
It's one direction (master-slave) only, so one has to set up two synchronization instances for bidirectional sync. Note that each instance consists of two Londiste binaries (master and slave worker) and a ticker daemon that pushes the changes.
To reduce synchronization traffic, you can extend the polling period (by default 1 second) in the configuration file or even turn it off completely by stopping the ticker and then trigger the sync manually by running SQL function pgq.ticker on master.
I solved the issue of partial column replication by writing a simple custom handler (londiste.handler.TableHandler subclass) with column-mapping configured in database. The mapping configuration is not model-driven (yet) as I originally planned, but I only need to replicate common columns, so this solution is sufficient for now.