Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Am building a website which will have articles, policies and laws and text stuffs. I am storing all the data (in some cases the articles with over 8000 characters) in MSSql 2008 database. I read some articles where they are saying text data should not be stored in databases. Where should they be stored? in .txt files or something? I also want to search through the data. If they are stored in DB i can use stored procedures etc. If stored in docs, i would need to use tools like Lucene. Am i right? Is my approach of using DB wrong for this project? Please enlighten me.
You will be using a DB of some description for this project no matter how you look at it, whether it be:
1) An old fashioned flat file database (txt documents, not recommended for large scale projects imho)
2) A traditional text storing database
3) A database of documents
The argument whether to use a DB of text or a db of documents depends on which skills/knowledge you possess or are likely to get access to (or assistance with). It sounds to me like you are more comfortable with a DB of text and in my opinion there is nothing wrong with that - worst case scenario if there ends up being a genuine need for documents to be used in the long run rather than straight text storage you should be able to generate the documents automatically from a text database - I suspect doing the reverse would be a lot more tricky (converting a load of proprietary documents to text for storage and insertion). Generating a plain text file from a text databse is trivial, and most vendor document formats support the importing of plain text documents for subsequent formatting.
For a large project like this you really need to spend some time considering what your documents are likely to be used for and by whom, and what methods best match them. If you are providing a database for people that heavily use MS Word and want to download your data you probably need to consider using a document DB. If it's just the information you want to provide (and web-based tools) you want to consider how you want to manipulate your own data.
This is all opinion obviously, but my last advice would be make sure you use utf-8 text from the outset if you go down the text route (bitter experience).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Basically my application needs to dump data daily into a database. But for any data written down, there is no need to update.
Hence, is appending to csv or json file sufficient for the purpose. Or it will be more computationally efficient to write in standard SQL?
Edit
Use-Case Update
I am expecting to store one entry of for each particular activity count daily. There are about 6-8 activities.
It is exactly like a log in some sense. I would like to perform some analysis with the trend of activities for example. There is no relations between different activities though.
If say in some cases there might be a need for update, would that imply a proper database will be more suitable rather than text file?
It depends on the nature of the data, but there may be another style of database other than an SQL one which could be suitable, like MongoDB which essentially stores JSON objects.
SQL is great when you need entities to have relationships to each other, or if you can take advantage of the type of select queries it can provide you with.
Database systems do have some overhead and could have some gotchas you might not expect, like loading up a heap of crap into memory so it's ready to be searched.
But storing text files can have drawbacks, like it might become difficult to manage your data in the future.
It basically sounds like your use-case is similar to logging, in which case dumping it into a file is fine.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am old school and new to MVC. I like the MVC that's more focus on action. But MS has bundled it w/ EF together, it's hard to not to use EF nowadays.
This is what I thought:
1 - RDBMS stored proc/package suppose have better performance over LINQ. For instance, SQL Server Transact-SQL supports paging now, compared with LINQ, TSQL definitely has better performance. So LINQ is only good for people don't know TSQL or PL/SQL in terms of performance.
2 - I've tried using LINQ w/ stored proc. Thought it works but has many limitation, for instance, .dbml files are strongly-typed, it prohibited any attempts of re-formatting the data, such as adding an anchor to a field of display. Well, one might say you're not supposed to do so. Let me give an example, business wants to make a column clickable in a grid. There are a number of ways to implement, one of the quickest is to embed an anchorto the column returned from a stored proc, very little change on the UI. Hence QA just needs to test a few. But using EF as the foundation, anything based on this model/class must QA again.
3 - Model-first or Code-first wouldn't get a nice normalized large-scale database implementation, this is because if a developer doesn't know TSQL he wouldn't be good at RDBMS design.
4 - This is the most important issue: in an enterprise environment, we developers can NOT dictate schema and table definitions. Even with DB-first approach, sometimes we don't even know where it comes from. But that's what EF is good about, right? You might say. Imagine EF detects all schema and what returned from stored proc, then builds all data-layer/class for me. Great, but there is a need for a real-time median price which is not in the database at all, we add it in w/ some customization code. It will be gone if another scan and detect is needed because our client requests something causes a tiny change in the database. How do we avoid this hassle of losing customization code?
5 - Sometimes we need to run "update-database" command in the package console so EF can work. It's almost impossible to explain to Operation and DBA that they are harmless during release.
However, as EF is getting popular there must be a new-school way to make it work. Can some experts educate old-school please?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've done a lot of research into which architectural database approach is the best, and in the end, I'd prefer the separate database approach. However, most hosting providers are not happy with this (take Azure, with a 150 DB limitation).
My idea now, is to just start with a single database/single schema, use a tenant ID in each column to separate data, and then when it gets too big/slow, look for scaling options.
Is this a bad idea? Should I keep data separated from the start? I feel like security wise it doesn't matter much as long as I verify that the data i'm calling/retrieving belongs to the calling customer.
Also, isn't scaling later on going to be easier with a single big database, as oppposed to having 5000 small databases?
Thanks!
For cloud hosting I think a single multi tenant database is the way to go.
I had the same problem some times ago and opted for one database per tenant since our clients wanted to keep the option of hosting the database on their server. Since we had one code base and many databases on several servers, we had to roll a synchronizing solution to insure that all the schema stayed the same.
We also had some business logic in stored procedures and had to figure a way to distinguish the procedures that had global logic from those that had a logic specific to this database.
it worked but it was awkward and I wish we could have used a single database
Anyway, like said before each way has pluses and minuses, you just have to decide what is the most important to you and work around the minuses
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
a colleague of mine uses Excel to merge and analyse datasets (~10k lines).
Her spreadsheets are mazes of vlookup and nested if formulas.
How can I convince her to take a look at databases?
What would be a good way to start? I'm an sqlite fan, but wonder whether the entry threshold to Access is lower?
Are there any books that you'd recommend to get started? I checked this SO question What's a good book for introduction to databases for web developers - any additions to the list there?
Thanks,
Simone
re: How can I convince her to take a look at databases?
show her why your way is better.
redo what she did in Excel with your preferred tool and the same input data and see if you can find differences in the output.
Also, after both systems are set up, run them side-by-side for awhile noting performance and maintenance differences. If she agrees your way is better, she might decide to use it.
Not a direct answer to your question but as a developer who has done extensive work on data analysis in Excel a few observations.
If the primary goal is data analysis then using Excel might be good enough.
Specially if the different data sets (you mentioned merging) are provided as csv files - as and when required - going through the 'hassle' of first importing data into a sql database and then running queries to extract data for the analysis step might be too much.
Excel gives you the flexibility of playing around with your data, very easily trying different things, charting, pivot tables etc. If the reports that your friend needs are more or less static with only the data varying, then maybe a simple Access/SQL database with a small application on top would be a better solution. But then again, if this is the case, your friend probably has an Excel sheet with all the relevant formulas where only the data needs to be plugged in.
For most of my data-analysis in Excel the only real thing I have missed is the ability to gather data using foreign keys. Once you have that covered with vlookup, the rest of the analysis is usually quicker/easier in Excel.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
How, if you have a database per client of a web application instead of one database used by all clients, do you go about providing updates and enhancements to all databases efficiently?
How do you roll out changes to schema and code in such a scenario?
It's kinda difficult for us. We have a custom program that writes a lot of the sql code for the different databases for us. Essentially it writes the code once and then copies it over and over again along with placing the change database commands etc. It also makes sure that the primary key identities etc are in sync when they need to be. Beyond that I would look at Red Gate's products. They have saved us more than once here. With them you can easily compare the dbs and see what is differnt. A must when dealing with multiple copies.
Use a code generator / scripting language to implement the original schema and updates to it over time.
I've used Red Gate's SQL Packager for this in the past. The beauty of this tool is that it creates a C# project for you that actually does the work so if you need to you can extend the functionality of the default package to do other things like insert default values into new columns that have been added to the db etc. In the end you have a nice tool that you can hand to a technician and all they have to do to upgrade multiple DBs is point it to the database and click a button.
Red Gate also has a product called SQL multi-script that allows you to run scripts against multiple servers/dbs at the same time. I've never used this tool but I imagine if you're looking for something to use internally that doesn't need to be packaged up you'd want to look at that.