Making Large IndexedDB Persistent in Browser

Making Large IndexedDB Persistent in Browser - sql-server

I am looking at making a LOB html5 web application. The main part we are trying to accomplish is to make the application offline capable. This will mean taking a large chunk of SQL data from the server and storing it in the browser. This will need to be living in the browser for quite a while, dont want to have to continuously refresh it everytime the browser is closed and reopened.
We are looking at storing the data inside the client in indexedDB but I read that indexedDB is stored in temporary storage so the lifetime of it cannot be relied on. Does anyone know of any strategies on prolonging its lifetime? Also we will be pulling down massive chunks of data so 1-5mb storage might not suffice what we require.
My current thought is to somewhat store it down to the browser storage using html5 storage API's and hydrate it into the indexedDb as it's required. Just need to make sure we can grow the storage limit to whatever we need.
Any advice on how we approach this?

We are looking at storing the data inside the client in indexedDB but I read that indexedDB is stored in temporary storage so the lifetime of it cannot be relied on.
That is technically true but in practice I've never seen the browser actually delete data. More common if you're storing a lot of data, you will hit quota limits which are annoying and sometimes inconsistent/buggy.
Regardless, you shouldn't rely on data in IndexedDB always being there forever, because users can always delete data, have their computers break without backups, etc.

If you create a Chrome extension, you can enable unlimited storage. I've successfully stored several thousand large text documents persistently in indexedDB using this approach.

This might be a silly question to add, but can you access the storage area outside of the browser? For instance, if I did not want to have a huge lag when my app starts up and loads a bunch of data, could I make an external app to "refresh" the local data so that when the browser starts, it is ready to rock and roll?
I assume the answer here will be no, but I had to ask.
Has anyone worked around this for large data sets? For instance loading in one tab and working in another? Chrome extension to load, but access via the app?

Related

Global storage in React/Nextjs application

I have an application that fetches data, but due to limits on the server, I cannot make infinite requests (429 error).
Therefore I want to fetch data every night and store it in a global storage for all users to use so that the application seems faster and we do not get "too many requests".
I was thinking about Redis, but was thinking NextJs might have something similar. Any suggestions?

it's depand on number of data, the best practice is to use redis the fastest way

Where to save this kind of data?

I am developing a website in React that extracts data from packages that users request around the world, I have a searcher that fetch the data from my backend, I want to save the lastest Packages and save it, for the next time that users search for another packages, something like "result of last search" and keep it saved, even if customer navigate to other route, the questions is I dont know where to save, if I have to use, Redux or Local Storage or something else?

If you want the results to be saved over multiple loads of the page, Redux won't be enough, since Redux only saves data in the current running script's memory, which is not persistent.
Local Storage is an option, but it has a relatively low size limit. If you have a lot of data, it may reach the limit quickly and become inoperable.
IndexedDB is like Local Storage, but its size limit is much higher, and it has a more complicated interface. If you have a lot of data, that's what I'd recommend. You may wish to use a library like localForage to make things easier.
Both IndexedDB and Local Storage save the data to the user's local machine. So, for example, if you want to save data for users regardless of what machine they log on from, or to cache data from different users' requests, the local machine options won't be enough - you'll need to save the data in a database on your server instead.

What is considered "Too much data" in react state

I'm currently building an app with a dashboard that lists a ton of statistics. The data is loaded via an API and stored in my component's state (currently not using redux — just plain react). I'm loading more than 100.000 (small) rows of data, and storing them as an array in state. My question is: at what point does the size of my state become a problem? Surely there will be memory limitations at some point? is 100.000 entries in an array a problem, is 1.000.000? And if yes, what are alternative solutions to handling this amount of data? Is this where redux can help?

For the most part, it does not matter Where you store this data as much as how much of data you are storing. All of the data that you store, regardless of whether in a store or in a static variable, is stored in RAM. Because of this, your application might crash the browser because of taking too much resources.
A much better solution for storage (If you absolutely have to store data client-side) is to use something called IndexedDB. IndexedDB stores data in your hard disk instead of RAM
In most use-cases however, it is recommended to store the data in the backend, paginate it, and then send only the individual pages to the client as needed. This ensures that
The client doesn't have to load a massive chunk of data before the application works.
The client does not have to store large amounts of data in RAM.

How to persist large amounts of data through app closure?

I have noticed that apps like instagram keep some data persistent through app closures. Even if all internet connection is removed (perhaps via airplane mode) and the app is closed, reopening it still shows the last loaded data despite the fact that the app cannot call any loading functions from the database. I am curious as to how this is achieved? I would like to implement a similar process into my app (Xcode and swift 4), but I do not know which method is best. I know that NSUserDefaults can persist app data, but I have seen that this is for small and uncomplicated data, of which mine would not be. I know that I can store some of the data in an internal SQL db, via FMDB, but some of the data I would like to persist is image data, which I am not sure exactly how to save into SQL. I also know of Core Data but after reading through some of the documentation I have become a bit confused as to whether or not it fits my purpose. Which of these (or others?) would be best?
As an additional question, regardless of which persistence method I choose, I feel as though every time the data is actually loaded from the DB (when internet connection is available), which is in the viewDidLoad, I would need to be updating the data in the persistent storage in case the internet connection drops. I am concerned that this doubling of my writing procedures will slow the app down? Is there any validity to this concern? Or is it unavoidable anyway?

When to use a certain type of persistence in Google App Engine?

First of all I'll explain the question. By persistence, I mean storing data beyond the execution of a single request. It might not be the best question title, so feel free to edit it.
The way I see it, there are three types of persistence in GAE, each one "closer" to the request itself:
The datastore
This is where all data is most likely to be based. It may go into the higher layers of persistence temporarily, but in the end, this is where the data really is. Unfortunately, querying the datastore repeatedly is slow and uses a lot of resources.
Use when...
storing data that should be stored for an indefinite amount of time.
Avoid using when...
getting data that is queried often but rarely updated.
memcache
This is a highly complex caching engine that stores the data in memory and makes sure all users read from/write to the same cache. It's a much faster way to get/set data on a key→value basis than using the datastore. Unfortunately, data can only stay in the memory for so long, and there is no guarantee that it will stay for as long as you tell it to; the data may disappear at any time if memory is needed elsewhere.
Use when...
you need to get data more often than you need to update it. Even when data needs to be updated often, it can have its uses (if a few missed updates are considered okay), by setting up a task queue to persist data from the memcache to the datastore.
Avoid using when...
data needs to be updated often and has to be up-to-date when fetched.
Global variables
This isn't an official method of persisting data, but it works. However, it's the least reliable method, and since it has no data synchronization across servers, persisted data may show up differently for different users (but from what I've found, the server rarely changes for the same user.) Theoretically, this should be the method that has the least overhead in getting/setting values, however, and could have its uses.
Use when...
hell freezes over? I don't know... I haven't enough knowledge about what goes on behind the scenes to actually rely on this method. Discuss!
Avoid using when...
you rely on the data being the same across servers.
Cookies
If the data is user-specific, it can be efficient to store it as a cookie in the user's browser. There are some pitfalls to watch out for though:
Security – the user can meddle with cookies, and malicious people could potentially do the same. To make sure that the contents are unreadable and unchangeable to all, the cookie can be encrypted using the PyCrypto library which is available on GAE.
Performance – since cookies are sent with every request (even images), it can add to the bandwidth being used, and slow down requests. One solution is to use another domain for static content, so the browser won't send the cookie for that content.
When should the different types of persistence be used? How can they be combined to reduce/even out the amount of resources being spent?

Datastore
Use the datastore to hold any long living information. The datastore should be used like you would use a normal database to hold data that will be used in your site/application.
MemCache
Use this to access data a lot quicker than trying to access the datastore. MemCache can return data really quickly and can be used for any data that needs to span multiple calls from users. It is normally data that was originally in the datastore and then moved to the memcache.
def get_data():
data = memcache.get("key")
if data is not None:
return data
else:
data = self.query_for_data() #get data from the datastore
memcache.add("key", data, 60)
return data
The memcache will flush itself when the item is out of date. You set this in the last param of the add shown above.
Global Variables
I wouldn't use these at all since they can't span instances. In GAE a request creates a new instance, well in python it does. If you want to use Global variables I would store the data needed in the memcache.

Your post is a good summary of the 3 major options. You mostly have answered the question already. However, if you are currently building an app and stressing over whether or not you should memcache something, try this:
Write your app using the datastore for everything that needs to outlive more than one request.
Once your app (or some usable subset) is working, run some functional tests or simulations to see where the slow spots (or high quota usage) are.
Find the most slow or inefficient request path, and figure out how to make that faster (either by using memcache, or altering your datastructures so you can do gets instead of queries, or possibly storing something in a global instance variable*)
goto 2 until you're satisfied.
*Things that might be good for a "global" variable would be something that is relatively expensive to create/fetch, that a substantial portion of your requests will use, and that does not need to be consistent across requests/users.

I use global variable to speed up json conversion. Before I convert my data structure to json, I hash it and check if the json if already available. For my app this gives quite a speedup as the pure python implementation is quite slow.

Global variables
To complement AutomatedTester's answer, and also reply his further question about how to share information between GETs without memcache or datastore, below a quick illustration of how to use global variables:
if 'i' not in globals():
i = 0
def main():
global i
i += 1
print 'Status: 200'
print 'Content-type: text/plain\n'
print i
if __name__ == '__main__':
main()
Calling this script multiple times will give you 1, 2, 3... Of course as mentioned earlier by Blixt you should not count on this trick too much ('i' can sometimes switch back to zero) but it can be useful to store user-specific information in a dictionary, session data for instance.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight