Database time acces in Heroku with Play Framework - database

I am having a problem and I need your help.
I am working with Play Framework v1.2.4 in java, and my server is uploaded in the Heroku servers.
All works fine, I can access to my databases and all is ok, but I am experiment troubles when I do a couple of saves to the database.
I have a method who store data many times in the database and return a notification to a mobile phone. My problem is that the notification arrives before the database finish to save the data, because when it arrives I request for the update data to the server, and it returns the data without the last update. After a few seconds I have trying to update again, and the data shows correctly, therefore I think there is a time-access problem.
The idea would be that when the databases end to save the data, the server send the notification.
I dont know if this is caused because I am using the free version of the Heroku Servers, but I want to be sure before purchasing it.

In general all requests to cloud databases are always slower than the same working on your local machine. Even simply query that on your computer needs just 0.0001 sec can be as slow as 0.5 sec in the cloud. Reason is simple clouds providers uses shared databases + (geo) replications, which just... cannot be compared to the database accessed only by one program on the same machine.
Also keep in mind that free Heroku DB plans doesn't offer ANY database cache, which means that every query is fetched from the cloud directly.
As we don't know your application it's hard to say what is the bottleneck anyway almost for sure you have at least 3 ways to solve your problem. They are not an alternatives, probably you will need to use (or at least check) all of them.
You need to risk some basic plan and see how things changed with paid version, maybe it will be good enough for you, maybe not.
Redesign your application to make less queries. For an example instead sending 10 queries to select 10 different rows, you will need to send one query, which selects all 10 records at once.
Use Play's cache API to avoid repeating selecting the same set of data again and again. For an example, if you have some categories, which changes rarely, but you need category tree for each article, you don't need to fetch categories from DB every time, instead you can store a List of categories in cache, so you will need to use only one request to fetch article's content (which can be cached for some short time as well...)

Related

How to sync two postgresql databases in real time

I have two postgresql databases configured for replication. Master node is sending new data in real time to secondary node and it works just fine. But there is one disadvantage - secondary node is read-only, so no write requests are accepted.
What I need is to be able to perform write/read operations on both of the databases and still be able to have them in perfect sync, so they are identical.
What is the best solution for such requirements?
Just for the context of this question:
I have two web app instances that are deployed in two different locations in the world (very high delay in sending requests is the reason I decided to deploy one instance locally in each location). Both instances are fetching the same data but they are also able to generate some data and input it into the DB. It is impossible to have only one DB due to too big delay when fetching data.
Maybe my solution is not perfect, I'm open for any suggestion really because I'm out of ideas how to make it work smoothly and maybe I'm lacking some knowledge.
Thanks

big data load in salesforce

I came across weird constraint, want to hear if anyone has resolved this issue.
Problem statement: load data in salesforce from outside. volume of data is 1 million record in a burst, every 3 hrs.
my source orchestration tool (NiFi) is capable of making this many REST API, but salesforce has asked not to use REST with this much throughput. I am not sure if its a limit of salesforce or product team has created a artificial ceiling.
they have suggested use dataloader, which seems to be a batch loader for salesforce, but it is not that fast either. also it has different issues. I cant trigger dataloader, when i get the data, so not that helpful either.
Long time back i have used Informatica to connect to salesforce, and we used to pass similar amount of data, and with no issue. Can someone answer how informatica connector has solved this bottleneck issue ?what does it use underneath?
also any other way to push this much data to salesforce?
Short answer: rethink your use case. Rewrite your app to use different mechanism of connecting to SF.
Long answer: Standard Salesforce API (SOAP or REST, doesn't matter) is synchronous. Request-response, job done. It's limited to 200 records max in one API call. Your volumes are better suited for bulk API. That one is REST-only (although it can accept XML, JSON or CSV), up to 10K records in one API call. The key difference is that it's asynchronous. You submit the job, you get back the job's id, you can check it (every 10 seconds? every minute?) "is it done yet? if it is - give me back my success/failure results". But every of these checks will of course consume 1 API call too. In meantime SF received a bunch of zipped files from you and will work on unzipping and processing them as fast as resources allow.
So (ignoring the initial login call) let's talk about limits. In sandboxes the 24h rolling limit of API calls is 5 million calls. Massive. In production it's 15K API calls + 1K per every full license user you have (sales cloud, service cloud) + you can buy more capacity... Or just go to Setup -> Company Information and check your limit.
let's say you have 5 users so 20K calls/day in production. In 24h at max capacity you'll be able to push 10K * 20K = 200M inserts/updates. Well, bit less because of login calls and checking the status and pulling down the results file but still - pretty good. If that's not enough - you have bigger problems ;) Using standard API would let you go 200 * 20K = mere 4M records.
SF support told you to use Data Loader because in DL it's just ticking a checkbox to use bulk API. You don't care that backend mechanism is different. You could even script Data Loader to run from commandline (https://resources.docs.salesforce.com/216/latest/en-us/sfdc/pdf/salesforce_data_loader.pdf chapter 4). Or if it's a Java application - just reuse the JAR file on top of which DL UI is built.
These might help too:
https://trailhead.salesforce.com/en/content/learn/modules/large-data-volumes/load-your-data
https://trailhead.salesforce.com/en/content/learn/modules/api_basics/api_basics_bulk

Handling large volume of data using Web API

We have a long running DB query that populates a temporary table (we are not supposed to change this behavior) which results 6 to 10 million records, around 4 to 6 GB data.
I need to use .NET Web API for fetching data from SQL DB and the API is hosted on IIS. When a request comes from the client to API, query runs minimum 5 minutes based on amount of data in different joining tables and populates temp table. Then API has to read data from DB temp table and send it to client.
Without blocking client, without loosing DB temp table, without blocking IIS, how can we achieve this requirement?
Just thinking, if I use async API, will I be able to achieve this?
there are things you need to consider and things you can do.
if you kick off the query execution as the result of an API call, what happens if you get 10 calls to that endpoint, at the same time? Dead API, that's what going to happen.
You might be able to find a different trigger for the execution of the query, so you can run this query once per day for example or once every 4 hours and then store the result in a permanent table. The APIs job then only becomes to look at this table, not wait for anything and return some data.
The second thing you can do is to return only the data you need for the screen you are displaying. You are not going to show 4-6 gb worth of data in one go, I suspect you have some pagination there and you can rejig the code a little to only return one page of data in one go.
You don't say what kind of data you have, but if it something which doesn't require you to run that query very often then you can definitely make some improvements.
<---- edited after report clarification ---->
ok, since it's a report, here's another idea.
the aim is to make sure that the pressure is not on the api itself which needs to be responsive and quick. Let the API receive the request with the parameters needed. Offload the actual report generation activity to another service.
Keep track of what this service is doing so you can report on the status of the activity : has it started, is it finished, whatever else you need. You can use a queue for that, or simply keep track of jobs in the database.
generate the report file and store it somewhere.
email the user with the file attached or email a link so the user can download it. Another option is to provide a link to the report somewhere in the UI.

How often should I have my server sync to the database?

I am developing a web-app right now, where clients will frequently (every few seconds), send read/write requests on certain data. As of right now, I have my server immediately write to the database when a user changes something, and immediately read from the database when they want to view something. This is working fine for me, but I am guessing that it would be quite slow if there were thousands of users online.
Would it be more efficient to save write requests in an object on the server side, then do a bulk update at a certain time interval? This would help in situations where the same data is edited multiple times, since it would now only require one db insert. It would also mean that I would read from the object for any data that hasn't yet been synced, which could mean increased efficiency by avoiding db reads. At the same time though, I feel like this would be a liability for two reasons: 1. A server crash would erase all data that hasn't yet been synced. 2. A bulk insert has the possibility of creating sudden spikes of lag due to mass database calls.
How should I approach this? Is my current approach ok, or should I queue inserts for a later time?
If a user makes a change to data and takes an action that (s)he expects will save the data, you should do everything you can to ensure the data is actually saved. Example: Let's say you delay the write for a while. The user is in a hurry, makes a change then closes the browser. If you don't save right when they take an action that they expect saves the data, there would be a data loss.
Web stacks generally scale horizontally. Don't start to optimize this kind of thing unless there's evidence that you really have to.

displaying # views on a page without hitting database all the time

More and more sites are displaying the number of views (and clicks like on dzone.com) certain pages receive. What is the best practice for keeping track of view #'s without hitting the database every load?
I have a bunch of potential ideas on how to do this in my head but none of them seem viable.
Thanks,
first time user.
I would try the database approach first - returning the value of an autoincrement counter should be a fairly cheap operation so you might be surprised. Even keeping a table of many items on which to record the hit count should be fairly performant.
But the question was how to avoid hitting the db every call. I'd suggest loading the table into the webapp and incrementing it there, only backing it up to the db periodically or on webapp shutdown.
One cheap trick would be to simply cache the value for a few minutes.
The exact number of views doesn't matter much anyway since, on a busy site, in the time a visitor goes through the page, a whole batch of new views is already in.
One way is to use memcached as a counter. You could modify this rate limit implementation to instead act as general counter. The key could be in yyyymmddhhmm format with an expiration of 15 or 30 minutes (depending on what you consider to be concurrent visitors) and then simply get those keys when displaying the page.
Nice libraries for communicating with the memcache server are available in many languages.
You could set up a flat file that has the number of hits in it. This would have issues scaling, but it could work.
If you don't care about displaying the number of page views, you could use something like google analytics or piwik. Both make requests after the page is already loaded, so it won't impact load times. There might be a way to make a ajax request to the analytics server, but I don't know for sure. Piwik is opensource, so you can probably hack something together.
If you are using server side scripting, increment it in a variable. It's likely to get reset if you restart the services so not such a good idea if accuracy is needed.

Resources