Is it allowed to "copy" an API database? - database

I'm in need of a certain web API, but the ones have found have rate limits that won't be enough for me.
I can set up my own database and store the results that I get from the API, to reduce the amount of calls. This will eventually lead to a point where I, more or less, have copied the API database.
Is this okay? It might be morally questionable, but is there anything that actually prevents me from doing it?
Thank you.

It depends on the terms of service of the API vendor.
ex: google maps
10.1 Restrictions on How You May Use the Maps API(s). Except as explicitly
permitted in Section 8 (Licenses from
Google to You) or the Maps APIs
Documentation, you must not (nor may
you permit anyone else to) do any of
the following:
10.1.3 Restrictions against Data Export or Copying.
(b) No Pre-Fetching, Caching, or
Storage of Content. You must not
pre-fetch, cache, or store any
Content, except that you may store:
(i) limited amounts of Content for the
purpose of improving the performance
of your Maps API Implementation if you
do so temporarily, securely, and in a
manner that does not permit use of the
Content outside of the Service; and
(ii) any content identifier or key
that the Maps APIs Documentation
specifically permits you to store. For
example, you must not use the Content
to create an independent database of
“places.”

Ultimately it will depend on the terms and conditions set out by the API provider. You may find though that you can run in to issues where your copy is not as up-to-date as the live version - and you will need to find a way to refresh your local copy without going over the call limits. It also won't necessarily solve any issues you have with call throttling with write operations which must update the master database(s).

Related

How to cache batches of IDs "locally" in a serverless environment?

Traditionally, in a non-serverless environment, I would have the following system. Say I have a custom ID generation protocol for all my models. Say I also have 20 servers scattered around. I give each server a slice of IDs to work with off the whole stack of IDs. When they are done or the server goes down, it returns the IDs back to the system so they don't get wasted. The reason for sending each server a batch of IDs is so that every time a new record is created you don't need to fetch from a central ID server to get the next ID. Instead they have a local set they can work with freely.
How would you do this sort of thing in a serverless system? I am deploying to Vercel and wondering what the appropriate architecture might be for such an ID batching system. There are other use cases for needed a persistent copy of data in a local server, so if you don't like the ID example just imagine another sort of system. How do you solve this optimization problem in a serverless environment?
Serverless is an approach. Like all such things (solutions), it should be matched to the problem - not the other way around. Is this simply a case where serverless is a good solution choice for dealing with 80% of your problem, and that all you need to do is choose something appropriate to deal to the other 20%?
Assuming you have the freedom to do this, can't you just have the serverless parts of the solution consume non-serverless services - e.g. an ID Service?
Separately to this, caching comes to mind - just the general idea of having some data close by which might be mastered somewhere else. Caching patterns like Write Behind would allow you to work with local copies (i.e. immediate consumption) whilst farming out the cache-master communication.

Seeing multiple properties by type on local datastore console

I am writing entities using golang in my local data store. When I generate the statistics, I see some of the properties repeat per data type. For example, I have a "status int8" property with noindex but I see "Integer_status_entityname" and "INT64_status_entityname" and the INT64 property has builtin_index_count > 0 while the Integer property has 0. Same is happening for string (with STRING vs Text). Is this a problem with the statics generation or the internal storage itself is some how duplicated by internal data type?
Don't worry too much, the internals of the local development server emulation and of the real datastore differ quite significantly.
The local emulation will also automatically create some internal entities needed for the emulation itself - those won't be seen in the real datastore.
The official documentation, which describes the real datastore, should be followed when inconsistencies are seen with the local emulation.
When in doubt, just double-check your assumptions and/or the needed functionality when deployed on GAE (there may be some small surprises now and then), adjust if/as needed and move on with your work.
Donno if it's worthy to file issues for small SDK inconsistencies - they would be unnecessary noise while a long list of more important open issues with the SDK and/or GAE itself is still pending.

Cross-platform, queryable, local data storage using HTML5?

I've done quite some research now on HTML5, but I am still left wondering what would be my best guess to implement local data storage that is truly cross-platform (i.e., runs on all important mobile platforms + possibly on desktop), and can easily be queried?
I want an HTML5 web application (to reach all mobile/(desktop) platforms, and for its independence of third party frameworks/libraries), but using local/offline storage to mimic performance of native applications (and do not necessarily require connectivity). It creates/alters/manages certain records for a user (up to a couple of hundred records per year). Apart from data storage, as the app doesn't need any other access to the device, I think HTML5 would be a good option.
Some requirements on the data I want to store:
the best format would be some lightweight database like SQLite (due to performance reasons, and the ability to update single records without having to write a whole file (as in the case of XML))
disadvantage: I don't see any technology available across all platforms; WebSQL is deprecated, and IndexedDB is not available in too many browsers yet
the data records shall be easily exportable/downloadable in XML format (so that the user can read/modify it on his own)
therefore, XML would be a good way to go; I assume the datasize to be reasonably low for this option; 2 concerns though:
disadvantage 1: I need a query-language that allows me to easily select/sort/alter specific records (sthg like XQuery, but available in all browsers and running locally on the client)
disadvantage 2: as far as I have seen, HTML5 FileWriterAPI support is nowhere near mature - therefore, how would I be able to alter/save the XML data locally on the client? (ok, I have seen examples where the whole XML file is saved as a single key/value pair in local storage; but disadvantage 1 would still apply...)
What options do I have? Is HTML5 mature enough to do what I am longing for?
If not, what alternatives would meet my requirements? Couple of loose thoughts: some third party libraries (JQuery(?), JSON(?) or cross-platform frameworks (a la Phonegap - which I wanted to avoid in the first place, due to their limitations), or use some server-side storage (that is synced with local storage)?
I dont know what limitations of Phonegap are you talking about,
I would suggest your application needs to be a hybrid one.
According to you requirements, you need to use native SQLite in different operating systems.
For that you need to use Phonegap, where you can write your own plugin, in which javascript act as the interface and the implementation is in native code.
Otherwise you can always check out lawnchair
http://westcoastlogic.com/lawnchair/
Thanks
Gaurav Gupta
Paxcel
For others who happened upon this post. I am currently searching for a solution as well and ran into localForage which seems a pretty good choice.
https://github.com/mozilla/localForage

Does anyone else think instance variables are problematic in database-backed applications?

It occurs to me that state control in languages like C# is not well supported.
By this, I mean, it is left upto the programmer to manage the state of in-memory objects. A common use-case is that instance variables in the domain-model are copies of information residing in persistent storage (i.e. the database). Clearly this violates the single point of authority principle, and "synchronisation" has to be managed by the developer.
I envisage a system where instead of instance variables, we have simple public access/mutator methods marked with attributes that link them to the database, and where reads and writes are mediated by a framework that decides whether to hit the database. Does such a system exist?
Am I completely missing the point, or is there some truth to this idea?
If i understand correctly what you want: Any OR-Mapper with Lazy Loading is working this way. For example i use Genome and there every entity is a pure proxy and you have quite much influence to tell the OR-Mapper how to cache the fields.
Actually there's the concept of data prevalence (as implemented by prevayler in Java) where the in-memory objects are the single point of authority (SPA) for the data.
Also, some object databases (as db4o) blur lines a bit between the object representation and the "store" representation.
On the other hand, by bringing the SPA for the data inside the application, you need to handle transactions and/or data persistence by yourself. There is some work done on transactional memory systems such as JVSTM (currently in use by the information system of my old college) but it's not in widespread use.
On the other hand, if the data lives in a database, you can just commit the data when everything is good (or use the support for transactions built in the database) and be sure that data isn't corrupted or lost. You trade in the SPA principle for better data reliability and transactions (and other advantages of using a separate data store)

Distributed Key-Value Data Store with Offline Access (Static Partitioning)

Need to be able to set server(s) that replicate all information, as a master data store that has all the data.
Also need servers that specifically store/replicate certain data, available in local LANs, so that when the internet connection goes down, they can still access their local data. Under normal circumstances, the clients will access most of their data from the local LAN, and may use others when the local LAN server goes down.
This is wanted alongside the benefits of a distributed data store, such as failure resistance and speed.
Which Distributed Key-Value Data Store or other data storage method would be most suited for this?
Try out CouchDB. Your use case reads like it was build for it. Point taken, CouchDB is much more than a key/value store, but on the other hand, not less suitable for it.
Add replication and as an added bonus fault tolerance, conflict detection (and resolution) and an easy API (HTTP).
Let me know if you have any other questions.
Of course you must remember that replication is something completely different from backup, because one system's programmatic failure in handling the data can quickly replicate to other nodes resulting in total mayhem.
Maybe using a Hadoop File System or OpenAFS would be a good solution here?
I haven't used any of those systems in real-life scenarios, only had interest in them during my research on peer-to-peer and distributed storage solutions, but I think they're worth a try.
Have you checked out the new Microsoft's Velocity? http://msdn.microsoft.com/en-us/data/cc655792.aspx. Unlike many other cloud services, you can run the setup (for Velocity) on your premises.

Resources