Some time ago we implemented a warehouse management app that keeps track of quantities of each product we have in the store. We solved the problem of concurrent access to data with database locks (select for update), but this approach led to poor performance when many clients try to consume product quantities from the same store. Note that we manage only a small set of product types (less than 10) so the degree of concurrency could be heavy (also, we don't care of stock re-fill). We thought to split each resource quantity in smaller "buckets", but this approach could lead to starvation for clients that try to consume a quantity that is bigger than each bucket capacity: we should manage buckets merge and so on...
My question is: there are some broadly-accepted solutions to this problem? I also looked for academic articles but the topic seems too wide.
P.S. 1:
our application runs in a clustered environment, so we cannot rely on the application concurrency control. The question aims to find an algorithm that structures and manages the data in a different way than a single row, but keeping all the advantages that a db transaction (using locks or not) has.
P.S. 2: for your info, we manage a wide number of similar warehouses, the example focuses on a single one, but we keep all the data in one db (prices are all the same, etc).
Edit: The setup below will still work on a cluster if you use a queueing program that can coordinate among multiple processes / servers, e.g. RabbitMQ.
You can also use a simpler queueing algorithm that only uses the database, with the downside that it requires polling (whereas a system like RabbitMQ allows threads to block until a message is available). Create a Requests table with a column for unique requestIds (e.g. a random UUID) that acts as the primary key, a timestamp column, a respourceType column, and an integer requestedQuantity column. You'll also need a Logs table with a unique requestId column that acts as the primary key, a timestamp column, a resourceType column, an integer requestQuantity column, and a boolean/tinyint/whatever success column.
When a client requests a quantity of ResourceX it generates a random UUID and adds a row to the Requests table using the UUID as the requestId, and then polls the Logs table for the requestId. If the success column is true then the request succeeded, else it failed.
The server with the database assigns one thread or process to each resource, e.g. ProcessX is in charge of ResourceX. ProcessX retrieves all rows from the Requests table where resourceType = ResourceX, sorted by timestamp, and then deletes them from Requests; it then processes each request in order, decrementing an in-memory counter for each successful request, and at the end of processing the requests it updates the quantity of ResourceX on the Resources table. It then writes each request and its success status to the Logs table. It then retrieves all of the requests from Requests where requestType = RequestX again, etc.
It may be slightly more efficient to use an autoincrement integer as the Requests primary key, and to have ProcessX sort by primary key instead of by timestamp.
One option is to assign one DAOThread per resource - this thread is the only thing that accesses that resource's database table so that there's no locking at the database level. Workers (e.g. web sessions) request resource quantities using a concurrent queue - the example below uses a Java BlockingQueue, but most languages will have some sort of concurrent queue implementation you can use.
public class Request {
final int value;
final BlockingQueue<ReturnMessage> queue;
}
public class ReturnMessage {
final int value;
final String resourceType;
final boolean isSuccess;
}
public class DAOThread implements Runnable {
private final int MAX_CHANGES = 10;
private String resourceType;
private int quantity;
private int changeCount = 0;
private DBTable table;
private BlockingQueue<Request> queue;
public DAOThread(DBTable table, BlockingQueue<Request> queue) {
this.table = table;
this.resourceType = table.select("resource_type");
this.quantity = table.select("quantity");
this.queue = queue;
}
public void run() {
while(true) {
Requester request = queue.take();
if(request.value <= quantity) {
quantity -= request.value;
if(++changeCount > MAX_CHANGES) {
changeCount = 0;
table.update("quantity", quantity);
}
request.queue.offer(new ReturnMessage(request.value, resourceType, true));
} else {
request.queue.offer(new ReturnMessage(request.value, resourceType, false));
}
}
}
}
public class Worker {
final Map<String, BlockingQueue<Request>> dbMap;
final SynchronousQueue<ReturnMessage> queue = new SynchronousQueue<>();
public class WorkerThread(Map<String, BlockingQueue<Request>> dbMap) {
this.dbMap = dbMap;
}
public boolean request(String resourceType, int value) {
dbMap.get(resourceType).offer(new Request(value, queue));
return queue.take();
}
}
The Workers send resource requests to the appropriate DAOThread's queue; the DAOThread processes these requests in order, either updating the local resource quantity if the request's value doesn't exceed the quantity and returning a Success, else leaving the quantity unchanged and returning a Failure. The database is only updated after ten updates to reduce the amount of IO; the larger MAX_CHANGES is, the more complicated it will be to recover from system failure. You can also have a dedicated IOThread that does all of the database writes - this way you don't need to duplicate any logging or timing (e.g. there ought to be a Timer that flushes the current quantity to the database after every few seconds).
The Worker uses a SynchronousQueue to wait for a response from the DAOThread (a SynchronousQueue is a BlockingQueue that can only hold one item); if the Worker is running in its own thread the you may want to replace this with a standard multi-item BlockingQueue so that the Worker can process the ReturnMessages in any order.
There are some databases e.g. Riak that have native support for counters, so this might improve your IO thoughput and reduce or eliminate the need for a MAX_CHANGES.
You can further increase throughput by introducing BufferThreads to buffer the requests to the DAOThreads.
public class BufferThread implements Runnable {
final SynchronousQueue<ReturnMessage> returnQueue = new SynchronousQueue<>();
final int BUFFERSIZE = 10;
private DAOThread daoThread;
private BlockingQueue<Request> queue;
private ArrayList<Request> buffer = new ArrayList<>(BUFFERSIZE);
private int tempTotal = 0;
public BufferThread(DAOThread daoThread, BlockingQueue<Request> queue) {
this.daoThread = daoThread;
this.queue = queue;
}
public void run() {
while(true) {
Request request = queue.poll(100, TimeUnit.MILLISECONDS);
if(request != null) {
tempTotal += request.value;
buffer.add(request);
}
if(buffer.size() == BUFFERSIZE || request == null) {
daoThread.queue.offer(new Request(tempTotal, returnQueue));
ReturnMessage message = returnQueue.take();
if(message.isSuccess()) {
for(Request request: buffer) {
request.queue.offer(new ReturnMessage(request.value, daoThread.resourceType, message.isSuccess));
}
} else {
// send unbuffered requests to DAOThread to see if any can be satisfied
for(Request request: buffer) {
daoThread.queue.offer(request);
}
}
buffer.clear();
tempTotal = 0;
}
}
}
}
The Workers send their requests to the BufferThreads, who then wait until they've buffered BUFFERSIZE requests or have waited for 100ms for a request to come through the buffer (Request request = queue.poll(100, TimeUnit.MILLISECONDS)), at which point they forward the buffered message to the DAOThread. You can have multiple buffers per DAOThread - rather than sending a Map<String, BlockingQueue<Request>> to the Workers you instead send a Map<String, ArrayList<BlockingQueue<Request>>>, one queue per BufferThread, with the Worker either using a counter or a random number generator to determine which BufferThread to send a request to. Note that if BUFFERSIZE is too large and/or if you have too many BufferThreads then Workers will suffer from long pause times as they wait for the buffer to fill up.
Related
I am new to Apache Camel.
I need to split a file line by line and to do some operation on each lines.
At the end I need a footer line with information from previous lines (number of lines and sum of the values of a column)
My understanding is that I should be using an aggregation strategy, so I tried something like that:
.split(body().tokenize("\r\n|\n"), sumAggregationStrategy)
.process("fileProcessor")
In my aggregation strategy I just set two headers with the incremented values:
newExchange.getIn().setHeader("sum", sum);
newExchange.getIn().setHeader("numberOfLines", numberOfLines);
And in the processor I try to access those headers:
int sum = inMessage.getIn().getHeader("sum", Integer.class);
int numberOfLines = inMessage.getIn().getHeader("numberOfLines", Integer.class);
There are two problems.
First of all the aggregation strategy seem to be called after the first iteration of the processor.
Second, my headers don't exist in the processors, so I can't access the information I need when I am at the last line of the file. The headers do exist in the oldExchange of the aggregators though.
I think I can still do it, but I would have to create a new processor just for the purpose of making the last line of the file.
Is there something I'm missing with the aggregation strategies ? Is there a better way to do this ?
An aggregator will be called for every iteration of the split. This is how they are supposed to work.
The reason you don't see the headers within the processor is, headers live and die with the message and not visible outside. You need to set the 'sum' and 'numberOfLines' as exchange properties instead. Because every iteration within a split results in an exchange, you need get the property from old exchange and set them again in the new exchange to pass them to subsequent components in the route.
This is how you could do,
AggregationStrategy:
public class SumAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
long sum = 0;
long numberOfLines = 0;
if(oldExchange != null) {
sum = (Long) oldExchange.getProperty("sum");
numberOfLines = oldExchange.getProperty("numberOfLines ");
}
sum = sum + ((Line)newExchange.getIn().getBody()).getColumnValue();
numberOfLines ++;
newExchange.setProperty("sum", sum);
newExchange.setProperty("numberOfLines",numberOfLines);
oldExchange.setProperty("CamelSplitComplete", newExchange.getProperty("CamelSplitComplete")); //This is for the completion predicate
return newExchange;
}
}
Route:
.split(body().tokenize("\r\n|\n"),sumAggregationStrategy)
.completionPredicate(simple("${exchangeProperty.CamelSplitComplete} == true"))
.process("fileProcessor").to("file:your_file_name?fileExist=Append");
Processor:
public class FileProcessor implements Processor {
public void process(Exchange exchange) throws Exception {
long sum = exchange.getProperty("sum");
long numberOfLines = exchange.getProperty("numberOfLines");
String footer = "Your Footer String";
exchange.getIn().setBody(footer);
}
}
Using custom aggregator like Srini suggested is a good idea. It might also support streaming large files better.
However if you want to keep things simple and avoid split and aggregation you could just use .tokenize("\r\n|\n") and convertBodyTo(List.class) to convert the string to a list of strings.
from("direct:addFooter")
.routeId("addFooter")
.setBody().tokenize("\r\n|\n")
.convertBodyTo(List.class)
.process(exchange -> {
List<String> rows = exchange.getMessage().getBody(List.class);
int sum = 0;
for (int i = 0; i < rows.size(); i++) {
sum += Integer.parseInt(rows.get(i));
}
int numberOfLines = rows.size();
exchange.getMessage().setHeader("numberOfLines", numberOfLines);
exchange.getMessage().setHeader("sum", sum);
})
// Write data to file using file or stream component
// you could also use Velocity, FreeMarker or Mustache templates to format the
// result before writing it to file.
;
After reading the Throttling documentation https://docs.developer.amazonservices.com/en_US/products/Products_Throttling.html and https://docs.developer.amazonservices.com/en_US/dev_guide/DG_Throttling.html , I've started honoring the quotaRemaining and the quotaResetsAt response headers so that I dont go beyond the quote limit. However, whenever I fire a few requests within quick succession, i get the following exception.
The documentation doesnt mention anything about any burst limits. It talks about maximum request quota, but i dont know how that applies to my case. I'm invoking the ListMatchingProducts api
Caused by: com.amazonservices.mws.client.MwsException: Request is throttled
at com.amazonservices.mws.client.MwsAQCall.invoke(MwsAQCall.java:312)
at com.amazonservices.mws.client.MwsConnection.call(MwsConnection.java:422)
... 19 more
I guess I figured it out.
ListMatchingProducts mentions that the Maximum Request Quota is 20. Practically this means that you can fire at max 20 requests in quick succession, but after that you must wait until the Restore Rate "replenishes" your request "credits" (i.e in my case 1 request every 5 seconds).
This Restore rate will (every 5 seconds) start to then re-fill the quota, up to a max of 20 requests. The following code worked for me...
class Client {
private final int maxRequestQuota = 19
private Semaphore maximumRequestQuotaSemaphore = new Semaphore(maxRequestQuota)
private volatile boolean done = false
Client() {
new EveryFiveSecondRefiller().start()
}
ListMatchingProductsResponse fetch(String searchString) {
maximumRequestQuotaSemaphore.acquire()
// .....
}
class EveryFiveSecondRefiller extends Thread {
#Override
void run() {
while (!done()) {
int availablePermits = maximumRequestQuotaSemaphore.availablePermits()
if (availablePermits == maxRequestQuota) {
log.debug("Max permits reached. Waiting for 5 seconds")
sleep(5000)
continue
}
log.debug("Releasing a single permit. Current available permits are $availablePermits")
maximumRequestQuotaSemaphore.release()
sleep(5000)
}
}
boolean done() {
done
}
}
void close() {
done = true
}
}
This is more a general, whats the best practice question...
I have a few processes where the consumer template has been used to read a directory (or a MQ queue) for whatever is available and then stop itself, the entire route-set it calls is created programmatically based of a few parameters
So using the consumer template method below... Is there a way to assign
A filter operation programmatically (ie, if i want to filter out certain files from the below, its easy if its through a standard route... (through .filter) but at the moment, i have no predefined beans, so adding #filter=filter to the EIP is not really an option).
An aggregation function from inside my while loop. (while still using the template).
#Override
public void process(Exchange exchange) throws Exception {
getConsumer().start();
int exchangeCount = 0;
while (true) {
String consumerEp = "file:d://directory?delete=true&sendEmptyMessageWhenIdle=true&idempotent=false";
Exchange fileExchange = getConsumer().receive(consumerEp);
if (fileExchange == null || fileExchange.getIn()==null || fileExchange.getIn().getHeader(CAMEL_FILE_NAME)==null) {
break;
}
exchangeCount++;
Boolean batchStatus = (Boolean) fileExchange.getProperty(PROP_CAMEL_BATCH_COMPLETE);
LOG.info("---PROCESSING : " + fileExchange.getIn().getHeader(CAMEL_FILE_NAME));
getProducer().send("direct:some-other-process", fileExchange);
//Get the CamelBatchComplete Property to establish the end of the batch, and not cycle through twice.
if(batchStatus!=null && batchStatus==true){
break;
}
}
// Stop the consumer service
getConsumer().stop();
LOG.info("End Group Operation : Total Exchanges=" + exchangeCount);
}
Is Hazelcast always blocking in case initial.min.cluster.size is not reached? If not, under which situations is it not?
Details:
I use the following code to initialize hazelcast:
Config cfg = new Config();
cfg.setProperty("hazelcast.initial.min.cluster.size",Integer.
toString(minimumInitialMembersInHazelCluster)); //2 in this case
cfg.getGroupConfig().setName(clusterName);
NetworkConfig network = cfg.getNetworkConfig();
JoinConfig join = network.getJoin();
join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2").
addMember("192.168.0.3").addMember("192.168.0.4").
addMember("192.168.0.5").addMember("192.168.0.6").
addMember("192.168.0.7").setRequiredMember(null).setEnabled(true);
network.getInterfaces().setEnabled(true).addInterface("192.168.0.*");
join.getMulticastConfig().setMulticastTimeoutSeconds(MCSOCK_TIMEOUT/100);
hazelInst = Hazelcast.newHazelcastInstance(cfg);
distrDischargedTTGs = hazelInst.getList(clusterName);
and get log messages like
debug: starting Hazel pullExternal from Hazelcluster with 1 members.
Does that definitely mean there was another member that has joined and left already? It does not look like that would be the case from the log files of the other instance. Hence I wonder whether there are situtations where hazelInst = Hazelcast.newHazelcastInstance(cfg); does not block even though it is the only instance in the hazelcast cluster.
The newHazelcastInstance blocks till the clusters has the required number of members.
See the code below for how it is implemented:
private static void awaitMinimalClusterSize(HazelcastInstanceImpl hazelcastInstance, Node node, boolean firstMember)
throws InterruptedException {
final int initialMinClusterSize = node.groupProperties.INITIAL_MIN_CLUSTER_SIZE.getInteger();
while (node.getClusterService().getSize() < initialMinClusterSize) {
try {
hazelcastInstance.logger.info("HazelcastInstance waiting for cluster size of " + initialMinClusterSize);
//noinspection BusyWait
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ignored) {
}
}
if (initialMinClusterSize > 1) {
if (firstMember) {
node.partitionService.firstArrangement();
} else {
Thread.sleep(TimeUnit.SECONDS.toMillis(3));
}
hazelcastInstance.logger.info("HazelcastInstance starting after waiting for cluster size of "
+ initialMinClusterSize);
}
}
If you set the logging on debug then perhaps you can see better what is happening. Member joining and leaving should already be visible under info.
We are testing an application that is supposed to display real time data for multiple users on a 1 second basis. New data of 128 rows is inserted each one second by the server application into an SQL datatbase then it has to be queried by all users along with another old referential 128 rows.
We tested the query time and it didn't exceed 30 milliseonds; also the interface function that invokes the query didn't take more than 50 milliseconds with processing the data and all
We developed a testing application that creates a thread and an SQL connection per each user. The user issues 7 queries each 1 second. Everything starts fine, and no user takes more than 300 milliseconds for the 7 data series ( queries ). However, after 10 minutes, the latency exceeds 1 second and keeps on increasing. We don't know if the problem is from the SQL server 2008 handling multiple requests at the same time, and how to overcome such a problem.
Here's our testing client if it might help. Note that the client and server are made on the same 8 CPU machine with 8 GB RAM. Now we're questioning whether the database might not be the optimal solution for us.
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter Number of threads");
int threads = int.Parse(Console.ReadLine());
ArrayList l = new ArrayList();
for (int i = 0; i < threads; i++)
{
User u = new User();
Thread th = new Thread(u.Start);
th.IsBackground = true;
th.Start();
l.Add(u);
l.Add(th);
}
Thread.CurrentThread.Join();
GC.KeepAlive(l);
}
}
class User
{
BusinessServer client ; // the data base interface dll
public static int usernumber =0 ;
static TextWriter log;
public User()
{
client = new BusinessServer(); // creates an SQL connection in the constructor
Interlocked.Increment(ref usernumber);
}
public static void SetLog(int processnumber)
{
log = TextWriter.Synchronized(new StreamWriter(processnumber + ".txt"));
}
public void Start()
{
Dictionary<short, symbolStruct> companiesdic = client.getSymbolData();
short [] symbolids=companiesdic.Keys.ToArray();
Stopwatch sw = new Stopwatch();
while (true)
{
int current;
sw.Start();
current = client.getMaxCurrentBarTime();
for (int j = 0; j < 7; j++)
{
client.getValueAverage(dataType.mv, symbolids,
action.Add, actionType.Buy,
calculationType.type1,
weightType.freeFloatingShares, null, 10, current, functionBehaviour.difference); // this is the function that has the queries
}
sw.Stop();
Console.WriteLine(DateTime.Now.ToString("hh:mm:ss") + "\t" + sw.ElapsedMilliseconds);
if (sw.ElapsedMilliseconds > 1000)
{
Console.WriteLine("warning");
}
sw.Reset();
long diff = 0;//(1000 - sw.ElapsedMilliseconds);
long sleep = diff > 0 ? diff : 1000;
Thread.Sleep((int)sleep);
}
}
}
Warning: this answer is based on knowledge of MSSQL 2000 - not sure if it is still correct.
If you do a lot of inserts, the indexes will eventually get out of date and the server will automatically switch to table scans until the indexes are rebuilt. Some of this is done automatically, but you may want to force reindexing periodically if this kind of performance is critical.
I would suspect the query itself. While it may not take much time on an empty database, as the amount of data grows it may require more and more time depending on how the look up is done. Have you examined the query plan to make sure that it is doing index lookups instead of table scans to find the data? If not, perhaps introducing some indexes would help.