Should databases be separated based on size and load? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
I'm developing a web backend with two modules. One handles a relatively small amount of data that doesn't change often. The other handles real-time data that's constantly being dumped into the database and never gets changed or deleted. I'm not sure whether to have separate databases for each module or just one.
The data between the modules is interconnected quite a bit, so it's a lot more convenient to have it in a single database.
But anything fails, I need the first database to be available for reads as soon as possible, and the second one can wait.
Also I'm not sure how much performance impact the constantly growing large database would have on the first one.
I'd like to make dumps of the data available to public, and I don't want users downloading gigabytes that they don't need.
And if I decide to use a single one, how easy is it to separate them later? I use Postgres, btw.

Sounds like you have a website with its content being the first DB, and some kind of analytics being the second DB.
It makes sense to separate those physically (as in on different servers). Especially if one of those is required to be available as much as possible. Separating mission critical parts from something not that important is a good design. Also, smaller DB means shorter recovery times from a backup, if such need to arise.
For the data that is interconnected, if you need remote lookup from one DB into another, Foreign Data Wrappers may help.

Related

Strategies to building a database of 30m images [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Summary
I am facing the task of building a searchable database of about 30 million images (of different sizes) associated with their metadata. I have no real experience with databases so far.
Requirements
There will be only a few users, the database will be almost read-only, (if things get written then by a controlled automatic process), downtime for maintenance should be no big issue. We will probably perform more or less complex queries on the metadata.
My Thoughts
My current idea is to save the images in a folder structure and build a relational database on the side that contains the metadata as well as links to the images themselves. I have read about document based databases. I am sure they are reliable, but probably the images would only be accessible through a database query, is that true? In that case I am worried that future users of the data might be faced with the problem of learning how to query the database before actually getting things done.
Question
What database could/should I use?
Storing big fields that are not used in queries outside the "lookup table" is recommended for certain database systems, so it does not seem unusual to store the 30m images in the file system.
As to "which database", that depends on the frameworks you intend to work with, how complicated your queries usually are, and what resources you have available.
I had some complicated queries run for minutes on MySQL that were done in seconds on PostgreSQL and vice versa. Didn't do the tests with SQL Server, which is the third RDBMS that I have readily available.
One thing I can tell you: Whatever you can do in the DB, do it in the DB. You won't even nearly get the same performance if you pull all the data from the database and then do the matching in the framework code.
A second thing I can tell you: Indexes, indexes, indexes!
It doesn't sound like the data is very relational so a non-relational DBMS like MongoDB might be the way to go. With any DBMS you will have to use queries to get information from it. However, if your worried about future users, you could put a software layer between the user and DB that makes querying easier.
Storing images in the filesystem and metadata in the DB is a much better idea than storing large Blobs in the DB (IMHO). I would also note that the filesystem performance will be better if you have many folders and subfolders rather than 30M images in one big folder (citation needed)

Log file not stored [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to do research about log files not stored in the database. I do not know much about database systems so i need someone to give at least some ideas about it. What i was told is that some of the log files was not written in a bank's database.Log files are coming from various sources like atms,website vs. For example, the reason could be high rate of data flow causing some data to be left out.
The question is what are the reasons behind it and what could be the solutions to them?
I would really appreciate if you could share some articles about it.
Sorry if i could not explain it well. Thanks in advance
Edit:what i meant was not there is a system not writing some of log files to database intentionally. What i tried to mean is that some of the log files are not written into database and the reason is not known and my intention is to identify the possible reasons and solutions to them.the database belongs to a bank and as you can imagine, lots of data is flowing to database per second
Well, the questions is not very clear, so let me rephrase it:
What are the reasons why application logs are not stored in a database
It depends of the context, and there are different reasons:
First question, is why you might store logs in database? Usually you do it because they contains relevant data to you that you want to manipulate.
So why not store always these datas:
you are not interested by the log, except when something goes wrong, but then it's more debugging than storing log.
you don't want to mix business data (users, transaction, etc...) with not so important / relevant data
the amount of log is too important for your current system and putting them in a database might crash it completly
you might want to use another system to dig into the log, with a different typoe of storage (haddop, big data, nosql )
when you do database backup, you usually backup all the database. Logs are not 'as important' as other critical data, are bigger, and then would take too much place
there is no need to always put logs in database. Using plain text and some other tools (web server log for instance) is usually more than enough.
So that's for these reason that logs are in general not stored in the same database than the application.

I need an Access database that will be used by about 6-10 people, but not on a share drive [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I work on a project that has very well defined lines of responsibility. There are about six to ten of us and we currently do all of our work in Excel, building a single spreadsheet with maintenance requirements for ships. A couple of times during the project process we stop all work and compile all of the individual spreadsheets into one spreadsheet. Since each person had a well defined area, we don't have to worry about one person overwriting another person's work. It only takes an hour, so it isn't that huge of a deal. Less than optimal, sure, but it gets the job done.
But each person fills out their data differently. I think moving to a database would serve us well by making the data more regimented with validation rules. But the problem is, we do not have any type of share drive or database server where we can host the database, and that won't change. I was wondering if there was a simple solution similar to the way we were handling the Excel spreadsheet. I envisioned a process where I would wipe the old data and then import the new data. But I suspect that will bring up other problems.
I am pretty comfortable building small databases and using VBA and whatnot. This project would probably have about six tables, and probably three that would have the majority of the data for any given project (the others would be reference tables and slow-to-change data). Bottom line is, I am wondering if it is worth it, or should I stick with Excel?
Access 2007 onwards has an option for "Collecting email replies" which can organise flat data, but it can only be a single query that's populated so might be a bit limiting.
The only solution I can think of that's easier than you currently use is to create the DB with some VBA modules that export all new/updated data to an XML/csv file and attached this to an email. You'd then have to create a VBA module that would import the data from these files into the current table.
It's a fair amount of work to get set up but once working might be fairly quick and robust.
Edit, just to add, I have solved a similar problem but I solved it with VB.net and XML files rather than Access.
You can link Access databases to other databases (or import from them). So you can distribute a template database for users to add records to and then email back. When receiving back, you would either import or link them to a master database and do whatever you needed to do with the combined data.

Best way to archive sql data and show in web application whenever required [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have around 10 tables containing millions of rows. Now I want to archive 40% of data due to size and performance problem.
What would be best way to archive the old data and let the web application run? And in the near future if I need to show up the old data along with existing.
Thanks in advance.
There is no single solution for any case. It depends much on your data structure and application requirements. Most general cases seemed to be as follows:
If your application can't be redesigned and instant access is required to all your data, you need to use more powerful hardware/software solution.
If your application can't be redesigned but some of your data could be count as obsolete because it's requested relatively rearely you can split data and configure two applications to access different data.
If your application can't be redesigned but some of your data could be count as insensitive and could be minimized (consolidated, packed, etc.) you can perform some data transformation as well as keeping full data in another place for special requests.
If it's possible to redesign your application there are many ways to solve the problem.In general you will implement some kind of archive subsystem and in general it's complex problem especially if not only your data changes in time but data structure changes too.
If it's possible to redesign your application you can optimize you data structure using new supporting tables, indexes and other database objects and algorythms.
Create archive database if possible maintain different archive server because this data wont be much necessary but still need to be archived for future purposes, hence this reduces load on server and space.
Move all the table's data to that location. Later You can retrieve back in number of ways:
Changing the path of application
or updating live table with archive table

DataModel for Workflow/Business Process Application [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What should be the data model for a work flow application? Currently we are using an Entity Attribute Value based model in SQL Server 2000 with the user having the ability to create dynamic forms (on asp.net), but as the data grows performance is getting down and hard to generate report and worse if too many users concurrently query the data (EAV).
As you have probably realized, the problem with an EAV model is that tables grow very large and queries grow very complex very quickly. For example, EAV-based queries typically require lots of subqueries just to get at the same data that would be trivial to select if you were using more traditionally-structured tables.
Unfortunately, it is quite difficult to move to a traditionally-structured relational model while simultaneously leaving old forms open to modification.
Thus, my suggestion: consider closing changes on well-established forms and moving their data to standard, normalized tables. For example, if you have a set of shipping forms that are not likely to change (or whose change you could manage by changing the app because it happens so rarely), then you could create a fixed table and then copy the existing data out of your EAV table(s). This would A) improve your ability to do reporting, B) reduce the amount of data in your existing EAV table(s) and C) improve your ability to support concurrent users / improve performance because you could build more appropriate indices into your data.
In short, think of the dynamic EAV-based system as a way to collect user's needs (they tell you by building their forms) and NOT as the permanent storage. As the forms evolve into their final form, you transition to fixed tables in order to gain the benefits discussed above.
One last thing. If all of this isn't possible, have you considered segmenting your EAV table into multiple, category-specific tables? For example, have all of your shipping forms in one table, personnel forms in a second, etc. It won't solve the querying structure problem (needing subqueries) but it will help shrink your tables and improve performance.
I hope this helps - I do sympathize with your plight as I've been in a similar situation myself!
Typically, when your database schema becomes very large and multiple users are trying to access the same information in many different ways, Data Warehousing, is applied in order to reduce major load on the database server. Unlike your traditional schema where you are more than likely using Normalization to keep data integrity, data warehousing is optimized for speed and multiple copies of your data are stored.
Try using the relational model of data. It works.

Resources