Store about and contacts in database - database

I want to know if it would make sense to store the website about text and contacts in a database instead of just writing it in the html, and how would I do that?
I've been thinking just one table for each with the information and no id (since I only want to store one version of everything), but that doesn't make a lot of sense, I think.
The about table should contain the text and the contacts should contain an address, coordinates, email and phone number.
Any suggestions?

The question might get flagged as off topic, but I have some thoughts. I would only put stuff in a database that would be updated frequently. Simply putting text in a DB that never really changes doesn't get you any benefit and just adds complexity and burden on your DB. If you find yourself copying the same web text to different pages of your site, make it text you include.
User information like logins,etc. that is added/updated/deleted should clearly live in a DB. But unless your About Us kind of data is going to be changed constantly or you make a tool that allows somebody else to update it, don't put that in the DB.

Related

How to handle a single data point table in your database

This is sort of a random question but I am building a backend in express and mongodb and I need to store data for a settings page. This would contain random one-off global settings an admin user would input to use. However, it needs to be saved in the DB so it is constant across all users.
Right now I have a single collection/table that just has one record stored and I just updated that specific record whenever the settings are updated.
Just feels a little goofy to do that a create a full schema and collection for one piece of data but I can't think of any other way to do it. Is this the normal way of doing this?
Thanks

How to merge user data after login?

It doesn't matter if you're building an eshop or any other application which uses session to store some data between requests.
If you don't want to annoy the user by requiring him to register, you need to allow him to do certain tasks anonymously when possible (user really have to have a reason for registering).
There comes a problem - if user decides to login with his existing profile, he may already have some data in his "anonymous" session.
What are the best practices of merging these data? I'm guessing the application should merge it automatically where possible or let the user decide where not possible.
But what I'm asking more is if there are any resources about how to do the magic in database (where the session data are usually stored) effectively.
I have two basic solutions in my mind:
To keep anonymous session data and just add another "relation" saying what's actually used where and how it's merged
To physically merge these data
We could say that the first solution will probably be more effective, because the information about any relation will probably mean less data than data about the user. But it will also mean more effort when reading the data (as we firstly need to read the relation to get to actual user data).
Are there any articles/resources for designing data structures for this particular use case (anonymous + user data)?
An excellent question that any app developer using user data should ask, and, sadly very few do :(
In fact, there are two completely independent questions here:
Q1 - At what stage require user to sign in/up?
Q2 - Data concurrency and conflict resolution (see below).
And here some analysis for each of the questions. Please excuse my extra passion coming from my own "frustrated user" experience. :)
Q1 is a pure usability question. To which the answer is actually obvious:
Avoid or delay to force the user sign in as much as possible!
Even the need to save state is not enough a reason by itself. If I am as user not interested in saving that state, then don't force me to sign! Please!
The only reason for you (as website) to justify forcing me to sign is when I (as user) want to save my data for later use. Here I speak as user having wasted time on signing only to find the site useless. If you want to get rid of too many users, that is the right way. In any other case - please, delay it as much as possible!
Why so many sites completely disregard such an obvious rule? The possible reasons I can see:
R1- developer friendly vs user friendly. Yes, it is developer friendly to require sign in right away, so we don't need to bother with concurrency (Q2). So we can save developer costs, time etc. But every saving comes at a cost! Which in this case is called User Experience. Which is not necessarily where you would like to look for saving. Especially, since the solution should not be that hard (see below).
R2 - Designer or Manager making the decision is an "indoor enthusiast" :) She lives happy life surrounded by super-fast computers with super-fast internet connection and can't imagine singing up can be that hard for any user. So why is it such a big deal? So many reasons:
It breaks the application flow. Sites living in previous century still replace the whole screen with sometimes rather lengthy form. Some forms are badly designed, some have erratic instructions, some simply don't work. Some have submit buttons that are for some reason disabled in the browser used.
Some form designers have genius idea to lock certain fields with barely noticeable change or colour. Then don't show me the field if you don't want me to fill it!
If the site is serious about user's data, it must request Email and must verity it! Why? How else shall I get back to user who forgot all other credentials? Why verify? What if user mistyped the email? If I don't verify it, next time the user tries to recover password with her correct email, the recovery fails and all data are lost! Obvious, yet there are still sites out there not doing it. Then I need to wait till the verification email is received and click on, hopefully, well-formatted and uniquely identifiable link that does not break in my browser, nor get some funny characters due to broken encoding detection, making the whole link unusable.
The internet connection can be slow or broken, making every additional step a piece of pain. Even with good connection, it happens here and there that page suddenly takes much longer to load. Also the email may not arrive right away. Then impatient user starts furiously clicking the "resend verification" link. In which case 90% of sites resend their link with new token but also disable all previous tokens. Then several emails arrive in unpredictable order and poor user has to guess in vain, which one (and only one) is still valid. Now why those sites find it so hard to keep several tokens active, just for this case, is beyond my understanding.
Finally there is still this so hard to unlearn habit for sites to insist on the so-called "username". So now, beside my email, I have to think hard to come up with this unique username, different from any previous user! Thank you so much for making it sweet and easy! My own way of dealing with it is to use my email as username. Sadly, there are still sites not accepting it! Then what if some fun type used my email as his username? Not so unrealistic if your email is bill#gates.com. But why simply not use Email and Password to avoid all this mess?
Here some possible guidelines to relieve user's pain:
Only force me to sign in/up if you absolutely need and give me a chance to choose not to!
Make it one page form, so I know what I am up to and, needless to say, use as few input fields as possible. Ideally only Email and Password (possibly twice), no Username!
Show your sign in form as small window on top of your page without reloading, and allow me to get rid of it with single click away from that window. Don't force me to look for "close" button or, even worse, icon I could confuse for something else!
Account for user to click back/forth and reload buttons. Don't clear the form upon reload! Don't put clear button at all! It is too easy to click by accident. The data you are ask me to fill should not be so long in first place, that I could not re-enter it without the need of "assistance" to clear.
Now to question Q2. Here we have well known problem of conflict resolution that occurs any time two data need to be merged. For instance, the anonymous data and the registered user data, but also whenever two users modify the same data, or the same user modifies it from different devices at different times, or locally stored data conflict with server data, and so on.
But whatever the source is, the problem is always the same. We have two data, say two objects $obj1 and $obj2 and you need to produce your single merged object $obj3. The logic can be as simple as the rule that server's object always wins, or that the last modified object always wins, or the last modified object keys always win or any more complicated logic. This really depends on the nature of your application. But in each case, all you need to do is to write your logic function with arguments $obj1, $obj2 that returns $obj3.
A solution that will possibly work in many cases is to store timestamp on each object attribute (key) and let the latest changed key win at the moment of synchronisation. That accounts e.g. for the situation when the same user modifies different attributes when being anonymous from different devices.
Imagine I had modified keys A and B on device AA yesterday, then logged today from device BB to enter another B and saved it to the server, then switched back to my device AA, where I am anonymous, to enter yet another A without changing the old B from yesterday, and then realised I want to log in and synchronise. Then my local B is obviously old and should clearly not overwrite the value of B that I changed more recently on device BB. In this seemingly complicated case, the above solutions works seamlessly and effectively. In contrast, putting the timestamp only on whole objects would be wrong.
Now in some cases, it could make sense to keep both objects, and, e.g. distinguish them by adding extra properties, like in case 1 suggested in Radek's question. For instance, Dropbox adds something like "conflicted copy by user X" to the end of the file. Like in Dropbox case, this is sensible in case of collaboration apps, where users like to have some version control.
However, in those cases, you as developer simply save two copies and let the users deal with that problem.
If on the other hand, you have to write a complicated logic based on user's data, having two different copies hanging around can be a nightmare. In that case, I would split data into two groups (e.g. create two objects out of one). The first group has data representing the state of the application as a whole, that is important to be unique. For that data I would use the conflict resolution as above or similar. Then the second group is user-specific, where I would store both data as two separate entries in the database, properly mark them (like Dropbox does), and then let users deal with the list of two (or more) entries of their project.
Finally, if that additional complication of database management makes the developer uneasy, and since Radek asked to give a resource reference, I want to "kill two flies with one shot" by mentioning the blog entry StackMob offline Sync, whose solution provides both database and user management functionality and so relieves the developer from that pain. Surely there is a lot more info to be found when searching for data concurrence, conflict resolution and the likes.
To conclude, I have to add the obligatory disclaimer, that all written here are merely my own thoughts and suggestions, that everyone should use at own risk and don't hold me responsible if you suddenly get too many happy users making your system crash :)
As I am myself working on an app, where I am implementing all those aspects, I am certainly very interested to hear other opinions and what else folks have to say on the subject.
From my experience - both as a user of sites that require a login, and as a developer working with logged in users - I don't think I've ever seen a site behave this way.
The common pattern is to let a user be anonymous and the first time they do something that would require saving state, they are prompted to login. Then the previous action is remembered and the user can continue. For example, if they try to add something to their shopping cart, they are prompted to login and then after login, the item is in their cart.
I suppose some places would allow you to fill a cart and then login at which point the cart is associated with a concrete user.
I would create a SessionUser object that has the state of the site interaction and one field called UserId that is used to retrieve other things like name, address, etc.
With anonymous users, I would create the SessionUser object with an empty reference for UserId. This means we can't resolve a name or an address, but we can still save state. The actions they are performing, the pages they're viewing, etc.
Once they login we don't have to merge two objects, we just populate the UserId field in SessionUser and now we can traverse an object graph to get name, email, address or whatever else.

Should I add a field like MarkedAsDeleted to my table?

I'm building an ASP.NET MVC application with SQL Server. I would like to know what will be a good practice for record deletion operations. I mean, when an item is deleted via web application, I would like to mark it as deleted, and then from an admin console, I will purge them if needed.
Is this a good practice? Should I use or avoid?
Thank you.
There is nothing wrong with this approach if you have no issues with storage space. Typically, we will use this pattern if the object being deleted is tied to other object (for instance, if you were tracking changes by user id, then you would not want to delete the user because you would not be able to pull info for that user later on). Simply mark a bit field showing the record has been deleted and filter those out when you query.
Again, it really depends on what makes sense for you and your application. Will you ever need this object again? Is it tied to other items in the database? Do you want to offer the user the option to 'undelete'? If the answer to any of those questions is yes, then you should probably keep the record. If the answer to all of those is no, then I would ask, why would you not just delete the record at the time it was requested?

Designing a generic unstructured data store

The project I have been given is to store and retrieve unstructured data from a third-party. This could be HR information – User, Pictures, CV, Voice mail etc or factory related stuff – Work items, parts lists, time sheets etc. Basically almost any type of data.
Some of these items may be linked so a User many have a picture for example. I don’t need to examine the content of the data as my storage solution will receive the data as XML and send it out as XML. It’s down to the recipient to convert the XML back into a picture or sound file etc. The recipient may request all Users so I need to be able to find User records and their related “child” items such as pictures etc, or the recipient may just want pictures etc.
My database is MS SQL and I have to stick with that. My question is, are there any patterns or existing solutions for handling unstructured data in this way.
I’ve done a bit of Googling and have found some sites that talk about this kind of problem but they are more interested in drilling into the data to allow searches on their content. I don’t need to know the content just what type it is (picture, User, Job Sheet etc).
To those who have given their comments:
The problem I face is the linking of objects together. A User object may be added to the data store then at a later date the users picture may be added. When the User is requested I will need to return the both the User object and it associated Picture. The user may update their picture so you can see I need to keep relationships between objects. That is what I was trying to get across in the second paragraph. The problem I have is that my solution must be very generic as I should be able to store anything and link these objects by the end users requirements. EG: User, Pictures and emails or Work items, Parts list etc. I see that Microsoft has developed ZEntity which looks like it may be useful but I don’t need to drill into the data contents so it’s probably over kill for what I need.
I have been using Microsoft Zentity since version 1, and whilst it is excellent a storing huge amounts of structured data and allowing (relatively) simple access to the data, if your data structure is likely to change then recreating the 'data model' (and the regression testing) would probably remove the benefits of using such a system.
Another point worth noting is that Zentity requires filestream storage so you would need to have the correct version of SQL Server installed (2008 I think) and filestream storage enabled.
Since you deal with XML, it's not an unstructured data. Microsoft SQL Server 2005 or later has XML column type that you can use.
Now, if you don't need to access XML nodes and you think you will never need to, go with the plain varbinary(max). For your information, storing XML content in an XML-type column let you not only to retrieve XML nodes directly through database queries, but also validate XML data against schemas, which may be useful to ensure that the content you store is valid.
Don't forget to use FILESTREAMs (SQL Server 2008 or later), if your XML data grows in size (2MB+). This is probably your case, since voice-mail or pictures can easily be larger than 2 MB, especially when they are Base64-encoded inside an XML file.
Since your data is quite freeform and changable, your best bet is to put it on a plain old file system not a relational database. By all means store some meta-information in SQL where it makes sense to search through structed data relationships but if your main data content is not structured with data relationships then you're doing yourself a disservice using an SQL database.
The filesystem is blindingly fast to lookup files and stream them, especially if this is an intranet application. All you need to do is share a folder and apply sensible file permissions and a large chunk of unnecessary development disappears. If you need to deliver this over the web, consider using WebDAV with IIS.
A reasonably clever file and directory naming convension with a small piece of software you write to help people get to the right path will hands down, always beat any SQL database for both access speed and sequential data streaming. Filesystem paths and file names will always beat any clever SQL index for data location speed. And plain old files are the ultimate unstructured, flexible data store.
Use SQL for what it's good for. Use files for what they are good for. Best tools for the job and all that...
You don't really need any pattern for this implementation. Store all your data in a BLOB entry. Read from it when required and then send it out again.
Yo would probably need to investigate other infrastructure aspects like periodically cleaning up the db to remove expired entries.
Maybe i'm not understanding the problem clearly.
So am I right if I say that all you need to store is a blob of xml with whatever binary information contained within? Why can't you have a users table and then a linked(foreign key) table with userobjects in, linked by userId?

How do you handle small sets of data?

With really small sets of data, the policy where I work is generally to stick them into text files, but in my experience this can be a development headache. Data generally comes from the database and when it doesn't, the process involved in setting it/storing it is generally hidden in the code. With the database you can generally see all the data available to you and the ways with which it relates to other data.
Sometimes for really small sets of data I just store them in an internal data structure in the code (like A Perl hash) but then when a change is needed, it's in the hands of a developer.
So how do you handle small sets of infrequently changed data? Do you have set criteria of when to use a database table or a text file or..?
I'm tempted to just use a database table for absolutely everything but I'm not sure if there are any implications to this.
Edit: For context:
I've been asked to put a new contact form on the website for a handful of companies, with more to be added occasionally in the future. Except, companies don't have contact email addresses.. the users inside these companies do (as they post jobs through their own accounts). Now though, we want a "speculative application" type functionality and the form needs an email address to send these applications to. But we also don't want to put an email address as a property in the form or else spammers can just use it as an open email gateway. So clearly, we need an ID -> contact_email type relationship with companies.
SO, I can either add a column to a table with millions of rows which will be used, literally, about 20 times OR create a new table that at most is going to hold about 20 rows. Typically how we handle this in the past is just to create a nasty text file and read it from there. But this creates maintenance nightmares and these text files are frequently looked over when data that they depend on changes. Perhaps this is a fault with the process, but I'm just interested in hearing views on this.
Put it in the database. If it changes infrequently, cache it in your middle tier.
The example that springs to mind immediately is what is appropriate to have stored as an enumeration and what is appropriate to have stored in a "lookup" database table.
I tend to "draw the line" with the rule that if it will result in a column in the database containing a "magic number" that maps to an enumeration value, then the enumeration should really exist as a lookup table. If it's unrelated to the data stored in the database (eg. Application configuration data rather than user generated data), then it's an enumeration all the way.
Surely it depends on the user of the software tool you've developed to consume the set of data, regardless of size?
It might just be that they know Excel, so your tool would have to parse a .csv file that they create.
If it's written for the developers, then who cares what you use. I'm not a fan of cluttering databases with minor or transient data however.
We have a standard config file format (key:value) and a class to handle it. We just use that on all projects. Mostly we're just setting persistent properties for our applications (mobile phone development) so that's an appropriate thing to do. YMMV
In cases where the program accesses a database, I'll store everything in there: easier for backup and moving data around.
For small programs without database access I store my data in the .net settings, which are stored in an xml file - of course this is a feature of c#, so it might not apply to you.
Anyway, I make sure to store all data in one place. Usually a database.
Have you considered sqlite ? It's file-based, which addresses your feeling that "just a file might do" (zero configuration), but it's a perfectly good database and scales remarkably well. It supports a number of APIs and there are numerous front ends for administering it.
If these are small config-like data, i use some simple and common format. ini, json and yaml are usually ok. Java and .NET fans also like XML. in short, use something that you can easily read to an in-memory object and forget about it.
I would add it to the database in the main table:
Backup and recovery (you do want to recover this text file, right?)
Adhoc querying (since you can do it will a SQL tool and join it to the other database data)
If the database column is empty the store requirements for it should be minimal (nothing if it's a NULL column at the end of the table in Oracle)
It will be easier if you want to have multiple application servers as you will not need to keep multiple copies of some extra config file around
Putting it into a little child table only complicates the design without giving any real benefits
You may well already be going to that same row in the database as part of your processing anyway, so performance is not likely to be a problem. If you are not, you could cache it in memory.

Resources