I want to do the following:
when user connect to personal cabinet, his get all data(array ~5000 rows), these data will be stored in state 'allOrders' and will be updated (every minute) only when was added new orders (I use websockets). It is normal practice (stored in state 'big' data) or better do differently?
I've found when you get into the thousands of items, working with data in the browser can be slow. Even if you optimize the rendering, you will likely want to do filtering and sorting to better visualize the data, and simply iterating through 5k items with a .filter or etc will noticeably affect the responsiveness of your UI, and feel sluggish.
Your alternative is to work with the data server side and this of course introduces network latency which also tends to impact performance; basically it's unlikely that you will be able to work with a dataset this large without making the user wait for certain operations. Working with the data in the browser however will make the browser appear to 'hang' (ie not respond to user actions) during large operations which is worse than waiting on network latency, which will not lock up the browser.. so there is that.
I've had success working with https://github.com/bvaughn/react-virtualized when rendering large lists. It's similar to the lib you mentioned, in that it only renders what is in view. You definitely do not want to try to render 5k things.
Related
I have a list of 20k employees to display in a React table. When the admin user changes one, I want the change reflected in the table - even if she does a reload - but I don't want to re-fetch all 20k including the unchanged 19 999.
(The table is of course paged and shows max N at once but I still need all 20k to support search and filtering, which is impractical to do server side for various reasons)
The solution I can think of is to set caching headers for /api/employees so that it is cached for e.g. one hour and have another endpoint, /api/employees?changedSince= and somehow ensure that server knows which employees have been changed. But I am sure somebody has already implemented a solution(s) for this...
Thank you!
A timestamp solution would be the best, and simplest, way to implement it. It would only require a small amount of extra data to be stored and would provide the most maintainable and expandable solution.
All you would need to do is update the timestamp when an item in the list is updated. Then, when the page loads for the first time, access /api/employees, then periodically request /api/employees?changedSince to return all of the changed rows in the table, for React to then update.
In terms of caching the main /api/employees endpoint, I’m not sure how much benefit you would gain from doing that, but it depends on how often the data is updated.
As you are saying your a in control of the frontends backend, imho this backend should cache all of the upstream data in its own (SQL or whatever) database. The backend then can expose a proper api (with pagination and search).
The backend can also implement some logic to identify which rows have changed.
If the frontend needs live updates about changes you can use some technology that allows bi-directional communication (SignalR if your backend is .NET based, or something like socket.io if you have a node backend, or even plain websockets)
I am currently using the cache for my current project, but i'm not sure if it is the right thing to do.
I need to retrieve a lot of data from a web api (nodes that can be picture, node, folder, gallery.... Those nodes will change very often, so I need fast access (loading up to 300-400 element at once). Currently I store them in cache (key as md5 of node_id, so easy to retrieve, and update).
It is working great so far, but if I clear the cache it takes up to 1 minute to create all the cache again.
Should I use a database to store those nodes ? Will it be quicker / slower / same ?
Your question is very broad and thus hard to answer. Saving 300-400 elements under a cache key sounds problematic to me. You can run into problems where serializing when storing in the cache and deserializing when retrieving the data will cause problems for you. Whenever your cache service is down your app will be practically unusable.
If you already run into problems when clearing/updating the cache you might want to look for an alternative. This might be a database or elasticsearch, advanced cache features like tagged caching could help with preventing you from having to clear the whole cache when part of the information updates. You might also want to use something like the chain provider to store things in multiple caches to prevent the aforementioned problem of an unreachable cache "breaking" your app. You could also look into a pattern that is common with CQRS called a read model.
There are a lot of variables that come into play. If you want to know which one will yield the best results, i.e. which one is quicker, you should do frequent performance tests with realistic data using Symfony's debug toolbar & profilers or a 3rd party service like blackfire.io or tideways. You might also want to do capacity test with a tool like JMeter to ensure those results still hold true, when there are multiple simultaneous users.
I use App Engine, but the following problem could very well occur in any server application:
My application uses memcache to cache both large (~50 KB) and small (~0.5 KB) JSON documents which aggregate information which is expensive to refresh from the datastore. These JSON documents can change often, but the changes are sparse in the document (i.e., one item out of hundreds may change at a time). Currently, the application invalidates an entire document if something changes, and then will lazily re-create it later when it needs it. However, I want to move to a more efficient design which updates whatever particular value changed in the JSON document directly from the cache.
One particular concern is contention from multiple tasks / request handlers updating the same document, but I have ways to detect this issue and mitigate it. However, my main concern is that it's possible that there could be rapid changes to a set of documents within a small period of time coming from different request handlers, and I don't want to have to edit the JSON document in the cache separately for each one. For example, it's possible that 10 small changes affecting the same set of 20 documents of 50 KB each could be triggered in less than a minute.
So this is my problem: What would be an effective solution to combine these changes together? In my old solution, although it is expensive to re-create an entire document when a small item changes, the benefit at least is that it does it lazily when it needs it (which could be a while later). However, to update the JSON document with a small change seems to require that it be done immediately (not lazily). That is, unless I come up with a complex solution that lazily applies a set of changes to the document later on. I'm hoping for something efficient but not too complicated.
Thanks.
Pull queue. Everyone using GAE should watch this video:
http://www.youtube.com/watch?v=AM0ZPO7-lcE
When a call comes in, update memcache and do an async_add to your task pull queue. You likely could run a process that will handle thousands of updates each minute without a lot of overhead (i.e. instance issues). Still have an issue should memcache get purged prior to your updates, but that it not too hard to work around. HTH. -stevep
since we suffer from creeping degradation in our web application we decided to monitor our application performance and measure individual actions.
for example we will measure the duration of each request, the duration of individual actions like editing a customer or creating an appointment, searching for a contract.
in most cases the database is the bottleneck for these actions.
i expect that the culminated data will be quite large, since we will gather 1-5 individual actions per request.
of course it would be nonsense to insert each an every element to the database, since this would slow down every request even more.
what is a good strategy for storing and evaluating those per-request data.
i thought about having a global Queue object which is appended and a seperate thread that empties the queue and handles the persistent storage/file. but where to store such data? are there any prebuilt tools for such a visualisation?
we use java, spring, mixed hibernate+jdbc+pl/sql, oracle.
the question should be language-agnostic, though.
edit: the measurement will be taken in production over a large period of time.
It seems like your archive strategy will be at least partially dependent on the scope of your tests:
How long do you intend to collect performance data?
What are you trying to demonstrate? Performance improvements over time? Improvements associated with specific changes? (Like perf issues for a specific set of releases)
As for visualization tools, I've found excel to be pretty useful for small to moderate amounts of data.
I am developing an application which involves multiple user interactivity in real time. It basically involves lots of AJAX POST/GET requests from each user to the server - which in turn translates to database reads and writes. The real time result returned from the server is used to update the client side front end.
I know optimisation is quite a tricky, specialised area, but what advice would you give me to get maximum speed of operation here - speed is of paramount importance, but currently some of these POST requests take 20-30 seconds to return.
One way I have thought about optimising it is to club POST requests and send them out to the server as a group 8-10, instead of firing individual requests. I am not currently using caching in the database side, and don't really have too much knowledge on what it is, and whether it will be beneficial in this case.
Also, do the AJAX POST and GET requests incur the same overhead in terms of speed?
Rather than continuously hitting the database, cache frequently used data items (with an expiry time based upon how infrequently the data changes).
Can you reduce your communication with the server by caching some data client side?
The purpose of GET is as its name
implies - to GET information. It is
intended to be used when you are
reading information to display on the
page. Browsers will cache the result
from a GET request and if the same GET
request is made again then they will
display the cached result rather than
rerunning the entire request. This is
not a flaw in the browser processing
but is deliberately designed to work
that way so as to make GET calls more
efficient when the calls are used for
their intended purpose. A GET call is
retrieving data to display in the page
and data is not expected to be changed
on the server by such a call and so
re-requesting the same data should be
expected to obtain the same result.
The POST method is intended to be used
where you are updating information on
the server. Such a call is expected to
make changes to the data stored on the
server and the results returned from
two identical POST calls may very well
be completely different from one
another since the initial values
before the second POST call will be
differentfrom the initial values
before the first call because the
first call will have updated at least
some of those values. A POST call will
therefore always obtain the response
from the server rather than keeping a
cached copy of the prior response.
Ref.
The optimization tricks you'd use are generally the same tricks you'd use for a normal website, just with a faster turn around time. Some things you can look into doing are:
Prefetch GET requests that have high odds of being loaded by the user
Use a caching layer in between as Mitch Wheat suggests. Depending on your technology platform, you can look into memcache, it's quite common and there are libraries for just about everything
Look at denormalizing data that is going to be queried at a very high frequency. Assuming that reads are more common than writes, you should get a decent performance boost if you move the workload to the write portion of the data access (as opposed to adding database load via joins)
Use delayed inserts to give priority to writes and let the database server optimize the batching
Make sure you have intelligent indexes on the table and figure out what benefit they're providing. If you're rebuilding the indexes very frequently due to a high write:read ratio, you may want to scale back the queries
Look at retrieving data in more general queries and filtering the data when it makes to the business layer of the application. MySQL (for instance) uses a very specific query cache that matches against a specific query. It might make sense to pull all results for a given set, even if you're only going to be displaying x%.
For writes, look at running asynchronous queries to the database if it's possible within your system. Data synchronization doesn't have to be instantaneous, it just needs to appear that way (most of the time)
Cache common pages on disk/memory in a fully formatted state so that the server doesn't have to do much processing of them
All in all, there are lots of things you can do (and they generally come down to general development practices on a more bite sized scale).
The common tuning tricks would be:
- use more indexing
- use less indexing
- use more or less caching on filesystem, database, application, or content
- provide more bandwidth or more cpu power or more memory on any of your components
- minimize the overhead in any kind of communication
Of course an alternative would be to:
0 develop a set of tests, preferable automatic that can determine, if your application works correct.
1 measure the 'speed' of your application.
2 determine how fast it has to become
3 identify the source of the performane problems:
typical problems are: network throughput, file i/o, latency, locking issues, insufficient memory, cpu
4 fix the problem
5 make sure it is actually faster
6 make sure it is still working correct (hence the tests above)
7 return to 1
Have you tried profiling your app?
Not sure what framework you're using (if any), but frankly from your questions I doubt you have the technical skill yet to just eyeball this and figure out where things are slowing down.
Bluntly put, you should not be messing around with complicated ways to try to solve your problem, because you don't really understand what the problem is. You're more likely to make it worse than better by doing so.
What I would recommend you do is time every step. Most likely you'll find that either
you've got one or two really long running bits or
you're running a shitton of queries because of an n+1 error or the like
When you find what's going wrong, fix it. If you don't know how, post again. ;-)