Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to do research about log files not stored in the database. I do not know much about database systems so i need someone to give at least some ideas about it. What i was told is that some of the log files was not written in a bank's database.Log files are coming from various sources like atms,website vs. For example, the reason could be high rate of data flow causing some data to be left out.
The question is what are the reasons behind it and what could be the solutions to them?
I would really appreciate if you could share some articles about it.
Sorry if i could not explain it well. Thanks in advance
Edit:what i meant was not there is a system not writing some of log files to database intentionally. What i tried to mean is that some of the log files are not written into database and the reason is not known and my intention is to identify the possible reasons and solutions to them.the database belongs to a bank and as you can imagine, lots of data is flowing to database per second
Well, the questions is not very clear, so let me rephrase it:
What are the reasons why application logs are not stored in a database
It depends of the context, and there are different reasons:
First question, is why you might store logs in database? Usually you do it because they contains relevant data to you that you want to manipulate.
So why not store always these datas:
you are not interested by the log, except when something goes wrong, but then it's more debugging than storing log.
you don't want to mix business data (users, transaction, etc...) with not so important / relevant data
the amount of log is too important for your current system and putting them in a database might crash it completly
you might want to use another system to dig into the log, with a different typoe of storage (haddop, big data, nosql )
when you do database backup, you usually backup all the database. Logs are not 'as important' as other critical data, are bigger, and then would take too much place
there is no need to always put logs in database. Using plain text and some other tools (web server log for instance) is usually more than enough.
So that's for these reason that logs are in general not stored in the same database than the application.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
I'm developing a web backend with two modules. One handles a relatively small amount of data that doesn't change often. The other handles real-time data that's constantly being dumped into the database and never gets changed or deleted. I'm not sure whether to have separate databases for each module or just one.
The data between the modules is interconnected quite a bit, so it's a lot more convenient to have it in a single database.
But anything fails, I need the first database to be available for reads as soon as possible, and the second one can wait.
Also I'm not sure how much performance impact the constantly growing large database would have on the first one.
I'd like to make dumps of the data available to public, and I don't want users downloading gigabytes that they don't need.
And if I decide to use a single one, how easy is it to separate them later? I use Postgres, btw.
Sounds like you have a website with its content being the first DB, and some kind of analytics being the second DB.
It makes sense to separate those physically (as in on different servers). Especially if one of those is required to be available as much as possible. Separating mission critical parts from something not that important is a good design. Also, smaller DB means shorter recovery times from a backup, if such need to arise.
For the data that is interconnected, if you need remote lookup from one DB into another, Foreign Data Wrappers may help.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Summary
I am facing the task of building a searchable database of about 30 million images (of different sizes) associated with their metadata. I have no real experience with databases so far.
Requirements
There will be only a few users, the database will be almost read-only, (if things get written then by a controlled automatic process), downtime for maintenance should be no big issue. We will probably perform more or less complex queries on the metadata.
My Thoughts
My current idea is to save the images in a folder structure and build a relational database on the side that contains the metadata as well as links to the images themselves. I have read about document based databases. I am sure they are reliable, but probably the images would only be accessible through a database query, is that true? In that case I am worried that future users of the data might be faced with the problem of learning how to query the database before actually getting things done.
Question
What database could/should I use?
Storing big fields that are not used in queries outside the "lookup table" is recommended for certain database systems, so it does not seem unusual to store the 30m images in the file system.
As to "which database", that depends on the frameworks you intend to work with, how complicated your queries usually are, and what resources you have available.
I had some complicated queries run for minutes on MySQL that were done in seconds on PostgreSQL and vice versa. Didn't do the tests with SQL Server, which is the third RDBMS that I have readily available.
One thing I can tell you: Whatever you can do in the DB, do it in the DB. You won't even nearly get the same performance if you pull all the data from the database and then do the matching in the framework code.
A second thing I can tell you: Indexes, indexes, indexes!
It doesn't sound like the data is very relational so a non-relational DBMS like MongoDB might be the way to go. With any DBMS you will have to use queries to get information from it. However, if your worried about future users, you could put a software layer between the user and DB that makes querying easier.
Storing images in the filesystem and metadata in the DB is a much better idea than storing large Blobs in the DB (IMHO). I would also note that the filesystem performance will be better if you have many folders and subfolders rather than 30M images in one big folder (citation needed)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Recently I came across a scenario in a question:
There are n websites with n pages each and n users visiting the sites....each visit of the user has to be saved and the pages he/she has visited ( not mentioned whether in database or log files, so it's up to the developer )
I decided to go on with it and do something in datastructures but when I discussed this thing with a friend of mine, he said, we can save it in database and this logically sounded correct too.
So we have 3 ways of storing anything in general...log files data-structure database
Now I am really confused, when should one go with data-structures, databases or simply log files, not only for this particular scenario but in a generic way?
What's the real difference?
I understand that this question is primarily opinion based but couldn't get a concrete result while browsing!
Log files are often / usually output-only - these files will rarely, if ever, get read, possibly only read manually. Some types of files may have random access, allowing you to fairly efficiently find a given record by a single index (through binary search), but you can't (easily) have multiple indices on the data in a single file, which is a trivial task for a database. If you just want to log something for manual processing later, a log file can work fine (even if a database can work too).
Databases is the standard in the industry, in that they provide you with persistence, efficient reading and writing, a standard interface and redundancy (but of course they need to be set up correctly).
A pure data structure solution typically doesn't consider persistent storage, as in making sure your data is kept when the program stops running for some reason. If you do want to write to and read from persistent storage, this will often come with a fair bit of complexity to do efficiently and regularly. And multiple / complex indices is a bit of a hassle to cater for. That's not to say data structures can't be used with persistent storage - databases are built using data structures and some data structures are specifically made for disk reads and writes. But you don't want to be figuring this out on a low level - it's best to just let a database take care of it if you need persistence.
You could also combine data structures and databases, using the database as persistent storage and use the data structure to cache the results so you only need to do (slower) writes to the database and you can do (faster) reads from the data structure. This is not uncommon in large systems with external databases. Although anything more complex than a standard map data structure is probably overcomplicating your cache and make indicate a bigger problem with your design.
What you have there sounds like an interview question, for which they may be expecting a data structure solution and simply saying "use a database" may be frowned upon. However, if it's a system design question, you'd almost certainly need to include some sort of a database in your design instead of concerning yourself with data structures.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
This question does not appear to be about programming within the scope defined in the help center.
Improve this question
Over the past several months I've seen quite a few unexpected bugs popping up in a legacy application, most of which are related to inconsistencies between the application code (classic ASP) and the underlying SQL Server database.
For example, a user reported a 500 error on a page last week that has been working correctly for five years. I discovered that the page in question was looking for a column in a result set named "AllowEditDatasheets", while the real column name was "AllowDatasheetEdit".
Normally I'd attribute this to publishing untested code but, as I said, the page has been working correctly for a very long time.
I've run across this quite a few times recently - pages that never should have worked but have been.
I'm starting to suspect that another employee is making subtle changes to the database, such as renaming columns. Unfortunately, there are several applications that use a common login that was granted SA rights, and removing those rights would break a lot of code (Yes, I know this is poor design - don't get me started), so simply altering account permissions isn't a viable solution.
I'm looking for a way to track schema changes. Ideally, I'd be able to capture the IP address of the machine that makes these sorts of changes, as well as the change that was made and the date/time when it occurred.
I know I can create a scheduled process that will script the database and commit the scripts to our source control system, which will let me know when these chages occurr, but that doesn't really help me find the source.
Any suggestions?
The default trace already tracks schema changes.
In Management Studio you can right click the node of the database of interest and from the reports menu view the report "Schema Changes History" that pulls its data from there.
If the information recorded there is not sufficient you can add a DDL trigger to perform your own logging (e.g. recording HOST_NAME() though that can be spoofed)
If you are using SQL Server 2008 and above, you can use SQL Server Audit.
With earlier versions, you may be able to add triggers to the system tables that hold schema information and log changes to those.
That's just as bad as GRANT DBA TO PUBLIC!.. You best rewrite code and restrict SA privilege to one or few DBA's!.. Column renaming is not the only thing they could wreak havoc upon!.. Having a common login-ID is also not a good idea because you have no way of pinpointing exactly who did what.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have around 10 tables containing millions of rows. Now I want to archive 40% of data due to size and performance problem.
What would be best way to archive the old data and let the web application run? And in the near future if I need to show up the old data along with existing.
Thanks in advance.
There is no single solution for any case. It depends much on your data structure and application requirements. Most general cases seemed to be as follows:
If your application can't be redesigned and instant access is required to all your data, you need to use more powerful hardware/software solution.
If your application can't be redesigned but some of your data could be count as obsolete because it's requested relatively rearely you can split data and configure two applications to access different data.
If your application can't be redesigned but some of your data could be count as insensitive and could be minimized (consolidated, packed, etc.) you can perform some data transformation as well as keeping full data in another place for special requests.
If it's possible to redesign your application there are many ways to solve the problem.In general you will implement some kind of archive subsystem and in general it's complex problem especially if not only your data changes in time but data structure changes too.
If it's possible to redesign your application you can optimize you data structure using new supporting tables, indexes and other database objects and algorythms.
Create archive database if possible maintain different archive server because this data wont be much necessary but still need to be archived for future purposes, hence this reduces load on server and space.
Move all the table's data to that location. Later You can retrieve back in number of ways:
Changing the path of application
or updating live table with archive table