Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm struggling to find a solution for the following issue:
Let's say I have a 'server' process in Kernel that centralize reports from all 'client' processes that signed up on it. This 'server' process will report other CPUs in the system when all 'client' processes finished their boot and they are ready.
A 'client' in my example will be any process in my system that wish to sign as one that 'server' needs to wait until he'll finish booting.
My problem is that the entire process above most be done in build time, because otherwise I am vulnerable to race cases, such as the following example:
Let's say my 'server' process finished his initial boot and he is ready, and he was the first process to boot in the system. In that case, if another CPU will query him - he will response that all 'clients' are ready (even if no one listed). So when other 'clients' will boot and list on it - it will be too late.
I want to build a generic solution, so once I finished building my environment - the 'server' process will 'know' how many 'clients' should sign up during system boot.
Any ideas here?
Thank you all
Here is what I have understood:
you want to build a service that will report whether other clients are up or not
you want the list of clients to be dynamic - ie a client could register or unregister at will
you want the list of clients to be persistent - the service should know the current list of clients immediately after each boot
A common way for that kind of requirement is to use a persistent database where the client can register (add one line) or unregister (delete their own line). The service has then only to read the database at boot time or on each request.
You cant then decide :
whether you want to use a simple file, a lite database (SQLite) or a full database (PosgreSQL, MariaDB, ...)
whether you want to read the database on each and every query or have the server cache the current state
in case of caching, whether you can accept non accurate responses, and just refresh state when it is older than n seconds, or if you need immediate synchronization (database is read at boot, but registration goes to service that writes database back to persistent storage) - that last way is more accurate but registration is only possible when service is up
Depending on you actual requirements, you can then imagine more clever solutions, but above should help you to start
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
Let's say we are running some services in a Kubernetes cluster and one of them requires a PostgreSQL instance, expected to persist data reliably. Should the DB live in the cluster or be configured separately?
Imagine that the DB is deployed in the cluster. This probably means one of the following:
We need a process for migrating the data to another node in case the current one goes down. This sounds like a non-trivial task. Or:
The node where the DB lives has to be treated in a special way. Horizontal scaling must be constrained to the other nodes and the cluster ceases to be homogeneous. This might be seen as a design flaw, going against the spirit of maintaining disposable, replaceable containers.
Point (1) applies only to self-managed clusters where all the storage we have at our disposal is tied to machines where the nodes run. If we are using a managed cloud, we can use persistent volume claims and a new instance can pick up the data automatically. Still, this means that if the node with the DB is removed, we will suffer a database downtime until a new instance comes up. So point (2) remains valid also for managed K8s offerings.
Therefore I can well understand the argument for keeping the DB outside of Kubernetes. What would some counterarguments look like? There are a lot of official helm charts for various DBs which suggests that people keep their DBs in Kubernetes clusters after all.
Happy to learn some critical thoughts!
This is not an anti-pattern. It is just difficult to implement and manage.
Point 1
In a self hosted cluster also you can have persistent volume storage provisioned though GlusterFS and CEPH. So, you don't always have to use ephemeral storage. So, Point 1 is not fully valid.
The DBs are generally created as a statefulsets, where every instance gets its own copy of data.
Point 2
When your DB cluster horizontally scales, the 'init' container of the new DB pod or a CRD provided by the DB needs to register the 'secondary' DB pod so it becomes the part of your dB cluster
A statefulset needs to also run as a headless service so the IPs of each endpoint is also known all the time for cluster healthcheck and primary->secondary data sync and to elect a new primary selection in case the primary node goes down
So, as long as the new pods register themselves to the DB cluster, you will be okay to run your db workload inside a kubernetes cluster
Further reading: https://devopscube.com/deploy-postgresql-statefulset/
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to write a server program which accepts incoming connections and process received requests. The first idea appeared in my head was to use non-blocking socket with epoll() or select().
For instance, when epoll() returns, it gives me a socket array with available IO events. Then I have to loop over the socket array to send and receive data, once the buffer is entirely received or sent, a callback functions is executed. This is also the technique the most discussed on internet.
However, if I use this technique, my program will keep all the other sockets waiting while it is dealing with one client connection. Isn't it an inefficient way to go if the request of client is time-consuming?
In the documents that I've found they said that such one thread/process design can easily handle hundreds of connections simultaneously, and multithread design is always heavily criticized for its complexity, system overhead etc.
Thus, my question is: How to design an efficient server program if it has to handle heavy workload?
Thanks.
Million dollar questions with a million different trade offs. For those that get Monty Python...
https://www.youtube.com/watch?v=pWS8Mg-JWSg
Back to reality... Apache2 can handle heavy workloads, nginx can handle heavy workloads, so can Node, Tomcat, Jetty, JBoss, Netty... In fact any of the well knowns applications servers in use today and quite a few less well known ones can handle heavy workloads and they all use various combinations of threads, events and processes to do it. Some languages ie Erlang or Go etc allow you to easily spin up high performance application servers in a few hundred lines of code.
Although out of date now the following page has some great information on why this is not a simple problem...
http://www.kegel.com/c10k.html
Rather than worrying about performance now get something working, benchmark it then ask how to make it faster... if you've been smart and made sure that you have a modular design swapping out parts of it will be relatively easy ie look at what Apache did with MPM, a pluggable engine with completely different performance characteristics etc.
Once you have your server outperforming any of the above in benchmarks your answer to this question would likely be accepted.
Heavy workload is a misleading term and in the end, it doesn't really dictate how you should design your system. The main issue here, is one of responsiveness and its requirements. If processing a single request takes a long time, and you do not want to starve the other clients (which you probably don't), then a single thread design will obviously not do. You should at least have a thread (or one per client) that handles responding to the request in some manner, even if only to notify the client that the request is being processed.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have MVC application hosted as Azure web role and I also have Worker role which checks some data and update records in database. Worker role checks data on every 15 minutes.
Yesterday, I went into big trouble because a lot of changes made via MVC application is just reverted.
I will try to give an example:
User made changes on one entity yesterday (this is tracked by event log)
In meantime, worker role updated that entity
Today, user updated entity multiple time
At the end, entity has data from yesterday, not from today
MVC application uses simple SaveChanges function while worker role uses BeginTransaction with SaveChanges.
I suspect on locking and isolation level, but it is strange that lock is almost 24h long.
I hope that someone will understand this and help me.
Thanks
If you're keeping a persistent EF database context in your worker role, it's possible you're seeing the effects of EF objects being cached.
Worker role loads an entity and does something with it. Since you're not creating and disposing the EF context each time, the entity stays cached.
User saves the entity and the database gets updated with their changes.
Worker role queries for the entity again, but since it's cached it returns the outdated, cached version. It does some sort of save operation, overwriting the user's edits with the cached values.
See Entity Framework and Connection Pooling, specifically,
When you use EF it by default loads each entity only once per context.
The first query creates entity instace and stores it internally. Any
subsequent query which requires entity with the same key returns this
stored instance. If values in the data store changed you still receive
the entity with values from the initial query.
Bottom line, you should never persist an EF database context for long periods of time. You may think of it as just an open database connection, but it is much more than that and "optimizing" things by keeping it around is a false savings and will cause bad things to happen. It's meant to be used in a UoW pattern where you create it, do what operations need to be done, and then dispose of it ASAP.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been doing r&d for the scope of SQL Server Service Broker to replace current messaging solution MSMQ. I want to know disadvantages of SQL Server Service Broker in contrast to MSMQ for following criteria.
Development
Troubleshooting
Performance (let's say we need to process 100,000 messages daily, having avg size around 25 KB)
Scalability
I've used Service Broker in my current project, having previously used MSMQ (brokered by MassTransit) with great success. I was initially dubious about using Service Broker, but I have to admit it has performed very well.
If you're using the Publish / Subscribe model, then I would use Message Queueing every time (though I would use RabbitMQ over MSMQ if the project allowed) but when you just want to chew through a stack of data and persist it to Sql Server, then Service Broker is a great solution: the fact it's so 'close to the metal' is a big advantage.
Development
Service Broker requires a lot of boilerplate, which is a pain, but unless you're planning on having lots of different queues it's manageable. Sql Server projects in Visual Studio take a lot of the pain of deploying away.
Troubleshooting
Service Broker is a black box - messages go in, and they usually come out, but if they don't then troubleshooting can be problematic, and all you can do is query the system views - and sometimes you simply can't find out what has gone wrong. This is annoying, but MSMQ has the same kind of issues..
Performance
Service Broker performance is excellent for our use case (see comment section below for discussion). We are processing a lot more than 100,000 messages per day, more than 30,000 per hour at our SLA load, and our message sizes are large. I would estimate we process close to 100,000 messages per hour during heavy load testing.
For best performance I would advise you to use a Dialog Pool like this one 1 as creating a Service Broker dialog can be an expensive operation.
You will also want to use the Error Handling procedures detailed by Remus Rusanu. (If you do use Service broker, you might as well read everything Remus has written on the subject before you start, as you'll end out reading it eventually!)
Scalability
You can certainly use more than one server to scale up if required, though we haven't had to do so, and from the load size you mention I don't think you would need to either.
I don't think I have really managed to answer your question, as I haven't highlighted enough disadvantages of Service Broker queues. I would say the impenetrable nature of its internal workings is the thing that most annoys me - when it works, it works very well, but when it stops working it can be very hard to figure out why. Also, if you have a lot of messages in a queue, using ALTER QUEUE takes a very long time to complete.
Not knowing how you are using MSMQ also makes it different to fairly compare the two technologies.
1 Recreated in a gist as the original url is now "disabled" and the page isn't in the internet archive. Eventually found a copy here
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I solved one of the problems I came across and need help on tuning it further to gain performance.
There will be an apache module that receives request over http protocol and respond in few milliseconds. It processes input parameters and put it into cache layer as JSON record. (Currently text files are used as cache).
Another asynchronous module reads JSON records, cook them and push them into MongoDB. This avoid latency in http response that might get added because of MongoDB performance degradation.
There will be multiple machines (currently 12) under Load Balancer and would expect 100 M requests per day which would be ~10-15 GB in size measured when JSON records are written in text files.
Basically I am searching for better data ingestion solution.
About using ZeroMQ or RabbitMQ in producer consumer architecture,
Message queues are in memory queues and they might not be able to consume this much data in memory.
Data being consumed in MongoDB is important data and we could not cost loosing it in case message queue goes down / crashes for some reason.
Please suggest.
Looks like in both cases each thread is coupled with its matching thread in the other module (either by a fixed file or offset) - this means that you still get bottlenecked if one of the modules is inherently slower than the other, as the threads of the faster module will become underutilized.
Instead, you can use tasks queue (or similar solutions to the multiple producers - multiple consumers problem), and let each thread choose from any available tasks once it becomes free. This will allow you greater freedom in balancing the number of threads in each module. If the front end http module for e.g. is 2x faster than the backend processing, you can spawn 2x more backend threads.
The "price" is that you'll need to maintain the shared queue safely (locking etc..) and make sure it's done efficiently and with no deadlocks.
you can use something like RabbitMQ or ZeroMQ for this kind of thing and use bulk data inserts into your database, or scale out to other servers as needed.
You create HTTP receiver threads to receive the data to be recorded, pushing the incoming data to a receive queue in memory. You create a single database writer thread that does nothing else but takes all received data queued in memory (by the receiver threads), converts them to a single database INSERT with multiple rows, sends them in a single transaction, and after the transaction is done, it goes back to the queue for another batch of received data (the queue collected the data while the previous transaction was in progress). Sending multiple INSERT transactions in parallel to a single database table can result in higher throughput only in special cases, that's the reason why a single writer thread is usually a good choice. Batching together multiple INSERTs into a single transaction will make a much better use of the limited I/O capabilities of HDDs.