Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm starting my first PhoneGap project, using AngularJS. It's a database driven app, using a REST API as the backend. To start with, I'm not going to store data locally at all, so it won't do much without Internet.
However, I would eventually like to have it store data locally, and sync when Internet is available, since I know I personally disable the Internet connections on my phone at times (air planes, low battery), or have no bars. I was wondering if you could point me toward some good resources for this type of syncing. Some recommended libraries? Or perhaps some discussions of the pitfalls and how to circumnavigate them. I've Googled a bit, but I think right now, I don't know the questions to ask.
Also, my intent to build it Internet-dependent first, and then add syncing.... Is that a good idea, or am I shooting myself in the foot? Do I need to build it syncing from the start?
I had someone suggest building the app as local-only first, rather that the Internet-only part first, which has a certain logic to it. The remote storage is kind of important to me. I know the decision there has a lot to do with my goals for the app, but from the stand point of building this, with the eventual goal being local storage + Internet storage, and two-way syncing, what's going to be easier? Or does it even make a difference?
To start with, I'm thinking of using UUIDs, rather than sequential integer primary keys. I've also thought about assigning each device an ID that is prefixed on any keys it generates, but that seems delicate. Anyone used either technique? Thoughts?
I guess I need a good system to tell what data's been synced. On the client side, I guess any records that get created/edited, can be flagged for syncing. But on the server-side, you have multiple clients, so that wouldn't work. I guess you could have a last_updated timestamp, and sync everything updated sync the last successful sync.
What about records edited in multiple places? If two client edit, and then want to sync, you have some ambiguity about merging, like when merging branches in git or other version control systems. How do you handle that? I guess git does it by storing diffs of every commit. I guess you could store diffs? The more I think about this, the more complicated it sounds. Am I over-thinking it or under-thinking it?
What about client side storage? I've thought about SQLite, or the PhoneGap local storage thing (http://docs.phonegap.com/en/1.2.0/phonegap_storage_storage.md.html). Recommendations? The syncing will be over a REST API, exchanging JSON, so I was thinking something that actually stores the data as JSON, or something JSON-like that's easy to convert, would be nice. On the other hand, if I'm going to have to exchange some sort of data diff format, maybe that's what I need to be storing?
Let me provide the answer to your question based on my experience related to the sync part as I don’t have enough experience with PhoneGap so will skip the question about PhoneGap local storage v SQLite.
I was wondering if you could point me toward some good resources for this type of syncing. Some recommended libraries?
There are a number of open source projects for syncing the PhoneGap app with the remote server. But you probably have to adjust them for your own needs or implement your own sync functionality. Below I listed some of the open-source projects. You must’ve already aware of them if you’d search the net.
PhoneGap sync plugin
Simple Offline Data Synchronization for Mobile Web and PhoneGap Applications
Synchronize a local WebSQL Db to a server
Couchbase Lite PhoneGap plugin
Additionally, you might consider the other options but that depends on your server side:
Microsoft Sync Framework Toolkit (Html5 sample is available)
OpenSync Framework - platform independent, general purpose synchronization engine
Also, my intent to build it Internet-dependent first, and then add syncing.... Is that a good idea, or am I shooting myself in the foot? Do I need to build it syncing from the start?
I believe the sync functionality is more like an additional module and shouldn’t be tightly coupled with the rest of your business logic. Once you start thinking about testing strategy for your sync you’ll realise it will be easier to test that if your sync facility is decoupled from the main code.
I think you can launch your app as soon as possible with the minimum required functionality without sync. But you’d better think about your architecture and the way you add the sync facility in advance.
To start with, I'm thinking of using UUIDs, rather than sequential integer primary keys. I've also thought about assigning each device an ID that is prefixed on any keys it generates, but that seems delicate. Anyone used either technique? Thoughts?
That depends on your project specifications and specifically your server side. For example, Azure mobile services allow only integer type for the primary keys. Although unique identifiers as primary keys are pretty handy in the distributed systems (has some disadvantages as well).
Related to assigning a device ID – I am not sure I understand the point although I don’t know your project specifics. Have a look at the sync algorithm that is used in our system (bidirectional sync using REST between multiple Android clients and central SQL Server).
What about records edited in multiple places? If two client edit, and then want to sync, you have some ambiguity about merging, like when merging branches in git or other version control systems. How do you handle that? I guess git does it by storing diffs of every commit. I guess you could store diffs? The more I think about this, the more complicated it sounds. Am I over-thinking it or under-thinking it?
This is where you need to think about how to handle the conflict resolution in your system.
If the probability of conflicts in your system will be high, e.g. users will be changing the same records quite often. Then you’d better track what fields (columns) of the records had been modified in your sync and then once the conflict is detected:
Iterate through each modified field of the server side record in conflict
Compare each modified field of the server record with the relevant field of the client.
If the client field was not modified then there is no conflict so just overwrite it with the server one.
Else there is a conflict so save the both field’s content into a temporary place for the report
At the end of sync produce the report of records in conflict.
Related
I'm looking for a portable database solution I can use with a website that is designed to handle service outages. I need to nightly retrieve a list of users from SQL Server and upsert their details into a portable database. It's roughly about 250,000 users (and growing) and each one has probably 25 fields that are required. Of those fields, i'd say less than 5 need to be searched on. The rest just need retrieving.
The idea is, in times of a service outage, we can use a website that's designed to work from the portable database rather than SQL Server. Our long term goal, is to move to the cloud and handle things in an entirely different way, but for the short term this is our aim.
The website is going to be a .Net Core web api so will be being accessed by multiple users in multiple threads. The website will only ever need read access, it will not be updating these details what-so-ever.
To keep the portable database up-to-date i'm thinking of having another application that just runs nightly to update the data. Our business is 24 hours (albeit quieter overnight), so there is a potential this updater is in use while the website is in use. While service outage would assume the SQL Server is down, this may not be the case. There are other factors in play that could cause what we would describe as outages. This will be the only piece of software updating the database.
I've tried using LiteDB but I couldn't get it working in a way that worked with my concurrency requirements. It did seem to do some of the job, and was easy to get running. However, i'd often run into locked files due to the nature of web api. I did work out a solution for that, but then the updater app couldn't access the database file.
Does anyone have any recommendations I can look into?
Given the description of the problem (1 table, 250k rows with - I assume - relative fast growth rate) and requirements, I don't think a relational database is what you are looking for.
I think nosql databases, or, more specifically, document oriented databases are more fitted to meet your requirements. There are many choices: Mongo, Cassandra, CouchDB, ... the choice is yours.
Personally I have some experience with ElasticSearch (https://www.elastic.co/elasticsearch), that is quite easy to learn, is portable (runs on Linux, Windows, Containers, etc...), is scalable, and it is fast. I mean, really, really fast, you can get results in 10-20 milliseconds (even less, sometimes).
The NEST nuget package acts as a high level client for working with ElasticSearch (https://www.elastic.co/guide/en/elasticsearch/client/net-api/7.x/nest-getting-started.html)
I want to store some data in the database. Then using those data I will answer the queries for the user using Dialog flow.
Any idea on implementing these
You will need to use a webhook to do fulfillment. In your webhook, you can make the database queries you want.
You may want to use an NLIDB (natural language interface to database). An NLIDB maps natural language questions over the database schema into SQL, solves such SQL queries and returns answers. Additional misconception and ambiguity resolution steps may be included.
NLIDBs are in contrast to dialog management systems (such as DialogFlow) which use interactive dialog to fill in slots for specific question types, and then execute these questions in specialized code. This specialized code may very well interact with a database, but it is relative to a specific question type so it is fairly straight forward to implement.
The advantage of NLIDBs however is that if the mapping tool is robust, a practically infinite number of questions may be answered over a complex database schema. The disadvantage is that the mapping tools are often sometimes less than robust. But this is an area under active R&D.
There are several companies currently offering NLIDB systems.
See for example: https://friendlydata.io/, http://c-phrase.com and http://kueri.me/.
AWS might be of help. I have some answers where I detail how to use API gateway for example, as a pseudo back-end so you can run this all from a front end ( or static ) page. DOing this, my hack would be to just write a JSON file or create a variable thats imported (key/vales) which would include your database info. I created a react page once where I used a long list of database data (SQL) which i just put in a json file and imported. worked great.
Of course if you have experience building a back end, you can figure all this out. if not, i would recommend looking into wix. They have a great platform, which you can use javascript in and it also has a node back end with access to node modules. they also have fully functional built in databases. good luck!
I know this is a very generic and subjective question, so feel free to vote to close it if it does not meet the StackOverflow netiquette.. but for me, it's worth trying ;)
I've never built a high-traffic application since now, so I'm not aware (except for some reading on the web) about scaling practices.
How can I design a database that, when a scaling is needed, I dont have to refactor the database structure, or the application code?
I know that development (and optimization) should come step-by-step, optimize bottleneck as they happen, and is nearly impossible to design the perfect structure when you don't know how many users you'll have and how would they use the database (e.g. read/write ratio), I'm just looking for a good base to start.
What are the best practices for making a structure almost ready to be scaled with partitioning and sharding, and what hacks must be absolutely avoided?
Edit some detail about my application:
The application will run as a multisite behavior
I'll have a database for each application version (db_0_0_1, db_0_0_2, etc..)*
Every 'site' will have a schema inside a database* and a role that can access only his own schemas
Application code will be mostly PHP and few things (daemons and maintenance things) in Python
Web server will probably be Nginx and lighttpd or node.js as support for long-polling tasks (e.g. chat)
Caching will be done with memcached (plus apc for things strictly related to the php code, as it can be used outside php)
The question is really generic, but here are few tips:
Do not use any session variables (pg_backend_pid(), inet_client_addr()) or per-session control (SET ROLE, SET SESSION) in application code.
Do not use explicit transaction control (BEGIN/COMMIT/SET TRANSACTION) in application code. All such logic should be wrapped in UDFs. This enables stateless, statement-mode pooling which enables fastest possible DB pooling. (see pgbouncer docs, and pg wiki for more info)
Encapsulate all App<->Db communication in well defined DB API of UDFs - this will let you use PL/Proxy. If doing this with all SELECTs is too hard, do it at least for all data writes (INSERT/UPDATE/DELETE). Example: instead of INSERT INTO users(name) VALUES('Joe') you need SELECT create_user('Joe').
check your DB schema - is it easy to separate all data belonging to given user? (most probably this will be the partitioning key). All that's left is common, shared data which will need to be replicated to all nodes.
think of caching before you need it. what will be caching key? what will be cache timeout? will you use memcached?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm creating a web site that I think must have a client side database. The other option would be to stick everything on the server at the expense of increased complexity and decreased scalability. What options do I have? Must I build a plugin? Must I wait until everybody's HTML5 compliant?
Update There's been a lot of comments about why I would actually need this. Here are my thoughts. Tell me if I'm being silly:
The clients will have a large and complex state that will require something like a database to provide the data interaction that I need. Therefore (I think) cookies are out of the picture.
This data is transient, so the client won't care if it gets erased as soon as they close a session. However they will need to keep the data if they go to a different web page and then come back. Therefore (I think) somehow storing the data in some sort of a javascript SQL implementation will not work.
I can certainly do everything that I want to do on the server, and servers can scale to manage the load (Facebook). But (I think) I'd rather build a plugin than pay for the infrastructure to support this load. This is for a bare bones startup. (The richer the startup is, the barer my bones will be.)
Indexed Database (Can I use)
Web SQL (Can I use)
localStorage
I'm about 5 years late in answering this, but given that there are errors and outdated data in some of the existing answers, and unaddressed points in the original question, I figured i'd throw in my two cents.
First, contrary to what others have implied on here, localStorage is not a database. It is (or should be perceived as) a persistent, string-based key-value store...
...which may be perfectly fine for your needs (and brings me to my second point).
Do you need explicit or implicitly relationships between your data items?
How about the ability to query over said items?
Or more than 5 MB in space?
If you answered "no" to all all of the above, go with localStorage and save yourself from the headaches that are the WebSQL and IndexedDB APIs. Well, maybe just the latter headache, since the former has been deprecated.
There are also several other client-side storage facilities (native and non-native) you may want to look in to, some of which are deprecated* but still see support from some browsers:
userData*
The rest of webStorage (sessionStorage and globalStorage*)
HTML5 File System*
Flash Locally Shared Objects
Silverlight Isolated Storage
Check out BakedGoods if you want to utilize any of these facilities, and more, without having to write low-level storage operation code. With it, placing data in one (or more) of them, for example, is as simple as:
bakedGoods.set({
data: [{key: "key1", value: "val1"}, {key: "key2", value: "val2"}],
storageTypes: ["silverlight", "fileSystem", "localStorage"],
options: optionsObj,
complete: function(byStorageTypeStoredKeysObj, byStorageTypeErrorObj){}
});
Oh, and for the sake of complete transparency, BakedGoods is maintained by this guy right here :) .
Use PouchDB.
PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser.
It helps building applications that works online as well as offline.
Basically, it stores the last fetched data in the in-browser database (uses IndexedDB, WebSQL under the hood) and then syncs again when the network gets active.
I came across a JavaScript Database http://www.taffydb.com/ still trying it out myself, hope this helps.
If you are looking for a NoSQL-style db on the client you can check out http://www.forerunnerdb.com. It supports the same query language as MongoDB and has a data-binding module if you want your DOM to reflect changes to your data automatically.
It is also open source, is constantly being updated with new features and the community around it is growing rapidly.
Disclaimer, I'm the lead developer of the project.
If you feel like you need it then use it for the clients that support it and implement a server-side fallback for clients that don't.
An alternative is you can use Flash and Local Shared Objects which can store a lot more information than a cookie, will work in all browsers with Flash (which is pretty much all browsers), and store typed data. You don't have to do the whole app in Flash, you can just write a tiny utility to read/write LSO data. This can be done using straight ActionScript projects without any framework and will give you a tiny 5-15kb swf.
There are two API's you'll primarily need. SharedObject.getLocal() to get access to a LSO and read/write it's data, and ExternalInterface.addCallback which you can use to register an AS3 method as a callback to call your read/write LSO method.
SharedObject
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/SharedObject.html?filter_flex=4.1&filter_flashplayer=10.1&filter_air=2
ExternalInterface
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/external/ExternalInterface.html
These links are to Flex references but for this you can just create an ActionScript project with no need for the Flex framework and therefore greatly reduced swf size. There are a number of good IDEs including free open-source ones like FlashDevelop.
FlashDevelop
http://www.flashdevelop.org/
Check out HTML5 Local Storage:
http://people.w3.org/mike/localstorage.html
You may also find this helpful:
HTML5 database storage (SQL lite) - few questions
When Windows 98 first came out, there were a lot of us still stuck on MS-DOS 6.22. Naturally, there were really cool features on the new operating system that wouldn't run in MS-DOS.
There comes a time when some things must be left behind to make room for innovation. If your application is really innovative and will offer cool new functionality that uses the latest and greatest technologies, then some older browsers will naturally need to be left behind.
The advantage that you have is that, unlike upgrading an operating system, upgrading from IE7 to Chrome 8 or Firefox 3.6 is a more reachable goal for the average user of your app, especially if you provide a link and upgrade instructions.
I would try Mozilla's localForage. https://localforage.github.io/localForage/
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
This might be more of a serverfault.com question but a) it doesn't exist yet and b) I need more rep for when it does :~)
My employer has a few hundred servers (all *NIX) spread across several locations. As I suspect is common we don't really know how many servers we have: more than once I've been surprised to find a server that's been up for 5 years, apparently doing nothing but elevating the earth's temperature slightly. We have a number of databases that store bits of server information -- Puppet, Cobbler, Nagios, Cacti, our load balancers, DNS, various internal spreadsheets and so on but it's all very disparate, incomplete and overlapping. Maintaining this mess costs time and money.
So, I'd like to come up a single database which holds details of what each server is (hardware specs, role, etc) and replaces (or at least supplies data for) the databases mentioned above. The database and web interface are likely to be a Rails app as this is what I have most experience with. I'm more of a sysadmin than a coder.
Has this problem already been solved? I can't find any open source software that really fits the bill and I'm generally not too keen on bloaty, GUI vendor-supplied solutions.
How should I implement the device information collection bit? For instance, it'd be great to the database update device records when disks are added or removed, or when the server serial number changes because HP replace the board. This information comes from many different sources: dmidecode, command-line disk tools, SNMP against the server or its onboard lights-out card, and so on. I could expose all this through custom scripts and net-snmp, or I could run a local poller that reported the information back to the central DB (maybe via a RESTful interface or something). It must be easily extensible.
Have you done this? How? Tell me your experiences, discoveries, mistakes and recommendations!
This sounds like a great LDAP problem looking for a solution. LDAP is designed for this kind of thing: a catalog of items that is optimized for data searches and retrieval (but not necessarily writes). There are many LDAP servers to choose from (OpenLDAP, Sun's OpenDS, Microsoft Active Directory, just to name a few ...), and I've seen LDAP used to catalog servers before. LDAP is very standardized and a "database" of information that is usually searched or read, but not frequently updated, is the strong-suit of LDAP.
My team have been dumping all out systems in to RDF for a month or two now, we have the systems implementation people create the initial data in excel, which is then transformed to N3 (RDF) using Perl.
We view the data in Gruff (http://www.franz.com/downloads.lhtml) and keep the resulting RDF in Allegro (a triple store from the same guys that do Gruff)
It's incredibly simple and flexible - no schema means we simply augment the data on the fly and with a wide variety of RDF viewers and reasoning engines the presentation options are enless.
The best part for me? no coding, just create triples and throw them in the store then view them as graphs.
The collection of detailed machine information is a very frustrating problem (many vendors want to keep it this way). Even if you can spend a large amount of money, you probably will not find a simple solution to this problem. IBM and HP offer products that achieve what you are seeking, but they are very, very, expensive, and will leave a bad taste in your mouth once you realize that probably all you needed was 40-50% of the functionality they offer. You say that you need to monitor *Nix servers...most (if not all) unices support RFC 1514 (windows also supports this RFC as of windows 2000). The Host MIB support defined by RFC 1514 has its drawbacks however. Since it is SNMP based, it requires that SNMP be enabled on the machine, which is typically not the default for unix and windows machines. The reason for this is that SNMP was created before the entire world was using the Internet, and thus the old, crusty nature of its security is of concern. In many environs, this may not be acceptable for security reasons. However, if you are only dealing with machines behind the firewall, this might not be an issue (I suspect this is true in your case). Several years ago, I was working on a product that monitored hundreds of unix and windows machines. At the time, I did extensive research into the mechanics of how to acquire detailed information from each machine such as disk info, running processes, installed software, up-time, memory pressure, CPU and IO load (Including Network) without running a custom client on each machine. This info can be collected in a centralized fashion. As of three or four years ago, the RFC-1514 Host MIB spec was the only "standard" for acquiring detailed real-time machine info without resorting to OS-specific software. Sun and Microsoft announced a WebService based initiative many years ago to address some of this, but I suspect it never received any traction since I cannot at the moment even remember its marketing name.
I should mention that RFC 1514 is certainly no panacea. You are at the mercy of the OS-provided SNMP service, unless you have the luxury of deploying a custom info-collecting client to each machine. The RFC-1514 spec dictates that several parameters are optional, and if your target OS does not implement it, then you are back to custom code to provide the information.
I'm contemplating how to go about this myself, and I think this is one of the key pieces of infrastructure that not having around keeps us in the dark ages. Hopefully this will be a popular question on serverfault.com. :)
It's not just that you could install a single tool to collect this data, because that's not possible cheaply, but ideally you want everything from the hardware up to the applications on the network feeding into this thing.
I think the only approach that makes sense is a modular one. The range of devices and types of information is too disparate to come under a single tool. Also the collection of data needs to be as passive and asynchronous as possible - the reality of running infrastructure means that there will be interruptions and you can't rely on being able to get the data at all times.
I think the tools you've pointed out form something of an ecosystem that could work together - Cobbler can install from bare-metal and hand over to Puppet, which has support for generating Nagios configs, and storing configs in a database; for me only Cacti is a bit opaque in terms of programmatically inserting new devices, templates etc. but I know this is possible.
Ultimately you have to sit down and work out which pieces of information are important for the business you work for, and design a db schema around that. Then, work out how to get the information you need into the db, whether it's from Facter, Nagios, Cacti, or direct snmp calls.
Since you asked about collection of data, I think if you have quite disparate kit (Dell, HP etc.) then it makes sense to create a library to abstract away as much as possible the differences between them, so your scripts just make standard calls such as "checkdiskhealth". When you add new hardware you can add to the library rather than having to write a completely new script.
Sounds like a common problem that larger organizations would have. I know our (50 person company) sysadmin has a little access database of information about every server, license, and piece of hardware installed. He's very meticulous, but when it comes time to replace or repair hardware, he knows everything about it from his little db.
You and your organization could sponsor an open source project to get oyu what you need, and give back to the community so that additional features (that you may not need now) can be developed at no cost to you.
Maybe a simple web service? Just something that accepts a machine name or IP address. When the service gets input, it sticks it in a queue and kicks off a task to collect the data from the machine that notified it. The nature of the task (SNMP interrogation, remote call to a Perl script, whatever) could be stored as part of the machine information in the database. If the task fails, the machine ID stays in the queue and the machine is periodically re-polled until the information is collected. Of course, you also have to have some kind of monitor running on your servers to notice that something has changed and send the notification; hopefully this is easily accomplished with whatever server monitoring software you've already got in place.
There are some solutions from the big vendors for managing monstrous sets of machines - such as some of the Tivoli stuff from IBM. That is probably, however, overkill for mere hundreds of machines.
There are some free software server database solutions but I do not know if they provide hooks to update information automatically from the machines with dmidecode or SNMP. One I heard about (but no personal experience, sorry), is GLPI.
I believe you are looking for Zabbix. It's open source, easy to install and use.
I've installed for a client a few years ago, and if I remember right it has a client application that connects to the zabbix server to update it with the requested information.
I really recommend it: http://www.zabbix.com
Checkout Machdb Its an opensource solution to the problem you are describing.