I'm part of a team developing an ecommerce site through the ordercloud.io API. I'm trying to figure out how much information I can store in a JSON object's xp key. The project I'm working on requires some pretty custom configuration and data storage, and I want to make sure the process I come up with will scale well.
Thanks!
The XP value in OrderCloud is limited by a maximum string size of 8000 characters. However, you really shouldn't try to reach this value because it may drain on the performance of your API requests (that's a lot of data to send over the wire, especially if you are getting a list of objects).
Make sure you review existing features to see if there is a way to accomplish what you are trying to store/behavior you are trying to create. Or consider storing in a separate location (your own db or an integration that helps fulfill the feature).
Related
Traditionally, in a non-serverless environment, I would have the following system. Say I have a custom ID generation protocol for all my models. Say I also have 20 servers scattered around. I give each server a slice of IDs to work with off the whole stack of IDs. When they are done or the server goes down, it returns the IDs back to the system so they don't get wasted. The reason for sending each server a batch of IDs is so that every time a new record is created you don't need to fetch from a central ID server to get the next ID. Instead they have a local set they can work with freely.
How would you do this sort of thing in a serverless system? I am deploying to Vercel and wondering what the appropriate architecture might be for such an ID batching system. There are other use cases for needed a persistent copy of data in a local server, so if you don't like the ID example just imagine another sort of system. How do you solve this optimization problem in a serverless environment?
Serverless is an approach. Like all such things (solutions), it should be matched to the problem - not the other way around. Is this simply a case where serverless is a good solution choice for dealing with 80% of your problem, and that all you need to do is choose something appropriate to deal to the other 20%?
Assuming you have the freedom to do this, can't you just have the serverless parts of the solution consume non-serverless services - e.g. an ID Service?
Separately to this, caching comes to mind - just the general idea of having some data close by which might be mastered somewhere else. Caching patterns like Write Behind would allow you to work with local copies (i.e. immediate consumption) whilst farming out the cache-master communication.
I'm developing an App Engine app that offers users to keep a diary.
Now, I noticed that I can check all data in datastore through Developers Console.
This is not good for a diary app for privacy.
So I want to know how to make datastore private to prevent me from checking users' data.
Please help me.
This is a little bit tricky since the code can read the data in the datastore and so, by definition, anyone who can update the running code can also read the data in the datastore; however, there are ways that you can at least make it more difficult to inadvertently examine the data (though accessing the data will still be technically possible for you or any of the owners to do). The simplest way is to encrypt the data before storing it within the datastore model objects (and decrypting it when you read the data from the model objects); however, this will make indexed fields no longer work if you do that (you will need to decide whether that content really needs to be indexable or whether it is worthwhile to add manual indexing).
If you want data to not be readable by you at all, then you will need to encrypt/decrypt the data with a key that is only available to your application while the user is interacting with it (e.g. encrypting the data in the client that communicates with your server); however, you need to be aware that this will make any sort of indexing or background processing of the data impossible.
The only way to prevent you from viewing data in the datastore is to remove you from the developers of the app. A developer can always extract data if he wants to, either by looking it at directly in the Datastore viewer or by writing code that can read/forward this data.
I'm trying to build a very simple wiki-like system in Clojure and serving the http using Ring.
Instead of using a regular database i was thinking about using just an atom and serialise it to a file when it gets changed. Something like https://github.com/alandipert/enduro just with a delayed write.
Having the data in-mem in vectors and maps will surely make the service faster and the code simpler/more intuitive to write?
Will that work with a multithreaded Jetty/Ring server?
The content of the atom will surely fit in memory for now, but that might not hold true in the future. Any ideas to how i can structure the code to make it easier to switch to an alternative storage backend in the future?
This is the best guide for keeping data in memory and storing it to a single file: http://www.brandonbloom.name/blog/2013/06/26/slurp-and-spit/
Datomic would give you a few options.
You could use the in-memory db which would give you query power and thread safety. It would also be very easy to switch to a persistent datastore if/when the time comes. However, I'm not sure about serialization of the in-memory db.
Or you could use Datomic just for Datalog, which can be used for querying data structures. In that case, you could use an atom and then serialize as planned. Moving to a persistent datastore would be more work than the first case, but still not much. In either case, most of your code wouldn't need to change.
In my opinion, you'd be better of just starting with the free version of Datomic that uses the file system as a datastore. I don't think using an atom simplifies your code very much.
I second the recommendation for Datomic.
I've been using it on a "real" project for a few weeks now, and the more I use it, the more I realize that it would be a solid foundation for handling your data in any non-trivial project. Even if you never plan to use a "real" database in the future, just having a fact-based data model, powerful querying, and even full-text search built in is a huge win over just using an atom to store some huge map.
I checked and the free version does give you local storage as well as the in-memory database, so that would solve your storage needs perfectly (it uses an H2 database behind the scenes). And if you ever find yourself needing to scale to something bigger, you're already set.
I am building a mobile app using AngularJS and PhoneGap. The app allows the user to access a large amount of data-items. These data-items come with the app in form of a number of .json files.
One use-case is that a user can favorite any of those data-items.
Currently, I store the (ids of) the items that have been favorited in localStorage. It works and it's great and very simple.
But now I would like create an online-backend for the app. By this I mean that the (ids of) the items that have been favorited should also be stored on a server somewhere in some form of database.
Now my question is:
How do I best do this?
How do I keep the localStorage data and online-backend data in synch?
In particular, the user might not have an internet connection at the time were he favorites a data-item. Additionally, if the user favorites x data-items in a row, I would need to make x update calls to the server db, which clearly isn't great.
So, how do people do it?
Does Angular have anything build-in for this?
Is there any plugin?
Any other framework?
This very much seems like a common problem that must have a well-known solution?
I think you've almost got the entire solution. All you need to do is periodically (on app start load the data from the service if available, otherwise use current local storage, then maybe with a timer and on app close update the data if connected) send the JSON out to a service (I generally prefer PHP, but Python, Java, Ruby, Perl, whatever floats your boat) that puts it in a database. If you're concerned with merging synchronization changes you'll need to use timestamps in the data in local storage and in the database to make the right call on what should be inserted vs what should be updated.
I don't think there's a one size fits all solution to the problem, though I imagine someone may have crafted a library that handles the different potential scenarios the configuration may be as complicated as just writing the logic yourself.
Question:
Should I write my application to directly access a database Image Repository or write a middleware piece to handle document requests.
Background:
I have a custom Document Imaging and Workflow application that currently stores about 15 million documents/document images (90%+ single page, group 4 tiffs, the rest PDF, Word and Excel documents). The image repository is a commercial, 3rd party application that is very expensive and frankly has too much overhead. I just need a system to store and retrieve document images.
I'm considering moving the imaging directly into a SQL Server 2005 database. The indexing information is very limited - basically 2 index fields. It's a life insurance policy administration system so I index images with a policy number and a system wide unique id number. There are other index values, but they're stored and maintained separately from the image data. Those index values give me the ability to look-up the unique id value for individual image retrieval.
The database server is a dual-quad core windows 2003 box with SAN drives hosting the DB files. The current image repository size is about 650GB. I haven't done any testing to see how large the converted database will be. I'm not really asking about the database design - I'm working with our DBAs on that aspect. If that changes, I'll be back :-)
The current system to be replaced is obviously a middleware application, but it's a very heavyweight system spread across 3 windows servers. If I go this route, it would be a single server system.
My primary concerns are scalabity and performace - heavily weighted toward performance. I have about 100 users, and usage growth will probably be slow for the next few years.
Most users are primarily read users - they don't add images to the system very often. We have a department that handles scanning and otherwise adding images to the repository. We also have a few other applications that receive documents (via ftp) and they insert them into the repository automatically as they are received, either will full index information or as "batches" that a user reviews and indexes.
Most (90%+) of the documents/images are very small, < 100K, probably < 50K, so I believe that storage of the images in the database file will be the most efficient rather than getting SQL 2008 and using a filestream.
Oftentimes scalability and performance are ultimately married to each other in the sense that six months from now management comes back and says "Function Y in Application X is running unacceptably slow, how do we speed it up?" And all too the often the answer is to upgrade the back end solution. And when it comes to upgrading back ends, its almost always going to less expensive to scale out than to scale up in terms of hardware.
So, long story short, I would recommend building a middleware app that specifically handles incoming requests from the user app and then routes them to the appropriate destination. This will sufficiently abstract your front-end user app from the back end storage solution so that when scalability does become an issue only the middleware app will need to be updated.
This is straightforward. Write the application to an interface, use some kind of factory mechanism to supply that interface, and implement that interface however you want.
Once you're happy with your interface, then the application is (mostly) isolated from the implementation, whether it's talking straight to a DB or to some other component.
Thinking ahead a bit on your interface design but doing bone stupid, "it's simple, it works here, it works now" implementations offers a good balance of future proofing the system while not necessarily over engineering it.
It's easy to argue you don't even need an interface at this juncture, rather just a simple class that you instantiate. But if your contract is well defined (i.e. the interface or class signature), that is what protects you from change (such as redoing the back end implementation). You can always replace the class with an interface later if you find it necessary.
As far as scalability, test it. Then you know not only if you may need to scale, but perhaps when as well. "Works great for 100 users, problematic for 200, if we hit 150 we might want to consider taking another look at the back end, but it's good for now."
That's due diligence and a responsible design tactic, IMHO.
I agree with gabriel1836. However, an added benefit would be that you could for a time run a hybrid system for a time since you aren't going to convert 14 millions documents from your proprietary system to you home grown system overnight.
Also, I would strongly encourage you to store the documents outside of a database. Store them on a file system (local, SAN, NAS it doesn't matter) and store pointers to the documents in the database.
I'd love to know what document management system you are using now.
Also, don't underestimate the effort of replacing the capture (scanning and importing) provided by the proprietary system.