Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I just want to ask regarding transaction logs in SQL Server. We can take backup of those log files in .bak format at our any system location.
The problem is to extract SQL statement/query from transaction log backup file. We can do it using fn_dump_dblog function. But what we want is to extract the query or data on which transaction has to be done in logs.
I want to do it manually same as "apex" tool do for sql server. And don't want to use any third party tool.
Right now I am able to extract table name and operation type from logs. But still searching for SQL statement extraction.
Decoding the contents of the transaction log is exceptionally tricky - there is a reason Apex gets to charge money for the tool that does it - it's a lot of work to get it right.
The transaction log itself is a record of the changes that occurred, not a record of what the query was that was executed to make the change. In your question you mention extracting the query - that is not possible, only the data change can be extracted.
For simple insert / delete transactions it is possible to decode them, but the complexity of just doing that is too large to reproduce here in detail. The simpler scenario of just decoding the log using fn_dblog it analogous, but the complexity of that should give you an idea of how difficult it is. You can extract the operation type + the hex data in the RowLogContents - depending on the type of operation, the RowLogContents can be 'relatively' simple to decode, since it is the same format as a row at a binary / hex level on the page.
I am loathe to use a link as an example / answer, but the length of the explanation just for a simple scenario is non-trivial. My only redemption for the link answer is that it is my article - so that's also the disclaimer. The length and complexity really makes this question un-answerable with a positive answer!
https://sqlfascination.com/2010/02/03/how-do-you-decode-a-simple-entry-in-the-transaction-log-part-1/
https://sqlfascination.com/2010/02/05/how-do-you-decode-a-simple-entry-in-the-transaction-log-part-2/
There have been further articles published which built on this to try automate this logic into t-sql itself.
https://raresql.com/2012/02/01/how-to-recover-modified-records-from-sql-server-part-1/
The time / effort you will spend attempting to write your own decoding is sufficiently high that compared to the cost of a license, I would never recommend attempting to write your own software to do this unless you planned on selling it.
You may also wish to consider alternative tracing mechanisms being placed in-line with the running of the application code, and not something you try reverse engineer out of a backup.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Problem
Every day we recieve a new set of data files from our backoffice application. This application is not able to produce an incremental changeset so all it can do is dump to a large file.
Currently every morning we drop our old MySQL tables and load the data into uor database.
One of the problems we have here is that we are unable to act on specific changes in the data and also we are using CQRS and would have quite some benefits here if we had an incremental list.
File format is currently CSV
Data size per file is up to 10GB
Number of rows per file is up to 40 million
Approximately 30 data files
On average less than 1% of rows is changed each day
Most files either have no primary key or a combined primary key. For many, the full row is the only thing that makes them unique.
The order of data is not fixed. Rows may switch positions
Desired situation
When we receive the new data we calculate the difference and push a message into Kafka for each changed (if a rowidentifier exists), added or removed row.
Technology
We use AWS and are able to use all technologies AWS offers
We are not limited to a certain amount of hardware. We can just start up some new servers in AWS
Cost is only a very limited factor. We have quite a large budget and the ability to have an incremental set offers us quite a lot of value.
We have a running Kubernetes cluster
Question
So the main question is, What would be the best way to compare these 2 large files and create an incremental set? We need it to be fast, preferably within the hour or close to that.
Are there database types that have this natively or are there technologies that can do this for us?
"...The order of data is not fixed. Rows may switch positions..." That is the one that makes it hard. If the rows did not change a git diff or text file comparison tool would work.
Spitballing here but:
For each row create a SHA hash
Use the hash as a unique ID
Store each UNIQUE hash and associated data into a DB Table.
Post processing the file, dump the table into a text file (CSV/SQL/etc)
Commit file changes to source control
When you receive a new data set, check if the hash exists
If no: append the hash to the end of the table
If yes: ignore
Dump the table into a text file (CSV/SQL/etc)
'git diff' commits to see change sets.
Might be able to do this with AWS Glue...
Bonus:
To make it even easier create a location the back office app can upload the file and create a cron to process the report at a given time
This process is a typical ETL (Extract-Transform-Load) task. You are extracting data from one source/format, changing it, and loading/inserting it into a different source/format.
Let me know if any of this was helpful.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
We're developing a new product at work and it will require the use of a lightweight database. My coworkers and I, however, got into a debate over the conventions for database creation. They were of the mindset that we should just build quick outline of the database and go in and indiscriminately add and delete tables and stuff until it looks like what we want. I told them the proper way to do it was to make a script that follows a format similar to this:
Drop database;
Create Tables;
Insert Initial Data;
I said this was better than randomly changing tables. You should only make changes to the script and re-run the script every time you want to update the design of the database. They said it was pointless and that their way was faster (which holds a bit of weight since the database is kind of small, but I still feel it is a bad way of doing things). Their BIGGEST concern with this was that I was dropping the database, they were upset that I was going to delete the random data they put in there for testing purposes. That's when I clarified that you include inserts as part of the script that will act as initial data. They were still unconvinced. They told me in all of their time with databases they had NEVER heard of such a thing. The truth is we all need more experience with databases, but I am CERTAIN that this is the proper way to develop a script and create a database. Does anyone have any online resources that clearly explain this method that can back me up? If I am wrong about this, then please fell free to correct me.
Well, I don't know the details of your project, but I think its pretty safe to assume you're right on this one, for a number of very good reasons.
If you don't have a script that dictates how the database is structured, how will create new instances of it? What happens when you deploy to production or it gets accidentally deleted or the server crashes? Having a script means you don't have to remember all the little details of how it was all set up (which is pretty unlikely even for small databases).
It's way faster in the long run. I don't know about you, but in my projects I'm constantly bringing new databases online for things like unit testing, new branches, and deployments. If I had to recreate the database from by hand every time it would take forever. Yes it takes a little extra time to maintain a database script but it will almost always save you time over the life of the project.
It's not hard to do. I don't know what database you're using but many of them support exporting your schema as a DDL script. You can just start with that and modify it from them on. No need to type it all up. If your database won't do that, it's worth a quick search to see if a 3rd party tool that works with your database will do it for you.
Make sure your check your scripts into your source control system. It's just as important as any other part of your source code.
I think having a data seeding script like you mentioned is a good idea. But keep it as a separate script from the database creation script. This way your can a developer seed script, a unit testing seed script, a production seed script, etc.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to do research about log files not stored in the database. I do not know much about database systems so i need someone to give at least some ideas about it. What i was told is that some of the log files was not written in a bank's database.Log files are coming from various sources like atms,website vs. For example, the reason could be high rate of data flow causing some data to be left out.
The question is what are the reasons behind it and what could be the solutions to them?
I would really appreciate if you could share some articles about it.
Sorry if i could not explain it well. Thanks in advance
Edit:what i meant was not there is a system not writing some of log files to database intentionally. What i tried to mean is that some of the log files are not written into database and the reason is not known and my intention is to identify the possible reasons and solutions to them.the database belongs to a bank and as you can imagine, lots of data is flowing to database per second
Well, the questions is not very clear, so let me rephrase it:
What are the reasons why application logs are not stored in a database
It depends of the context, and there are different reasons:
First question, is why you might store logs in database? Usually you do it because they contains relevant data to you that you want to manipulate.
So why not store always these datas:
you are not interested by the log, except when something goes wrong, but then it's more debugging than storing log.
you don't want to mix business data (users, transaction, etc...) with not so important / relevant data
the amount of log is too important for your current system and putting them in a database might crash it completly
you might want to use another system to dig into the log, with a different typoe of storage (haddop, big data, nosql )
when you do database backup, you usually backup all the database. Logs are not 'as important' as other critical data, are bigger, and then would take too much place
there is no need to always put logs in database. Using plain text and some other tools (web server log for instance) is usually more than enough.
So that's for these reason that logs are in general not stored in the same database than the application.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
This question does not appear to be about programming within the scope defined in the help center.
Improve this question
Over the past several months I've seen quite a few unexpected bugs popping up in a legacy application, most of which are related to inconsistencies between the application code (classic ASP) and the underlying SQL Server database.
For example, a user reported a 500 error on a page last week that has been working correctly for five years. I discovered that the page in question was looking for a column in a result set named "AllowEditDatasheets", while the real column name was "AllowDatasheetEdit".
Normally I'd attribute this to publishing untested code but, as I said, the page has been working correctly for a very long time.
I've run across this quite a few times recently - pages that never should have worked but have been.
I'm starting to suspect that another employee is making subtle changes to the database, such as renaming columns. Unfortunately, there are several applications that use a common login that was granted SA rights, and removing those rights would break a lot of code (Yes, I know this is poor design - don't get me started), so simply altering account permissions isn't a viable solution.
I'm looking for a way to track schema changes. Ideally, I'd be able to capture the IP address of the machine that makes these sorts of changes, as well as the change that was made and the date/time when it occurred.
I know I can create a scheduled process that will script the database and commit the scripts to our source control system, which will let me know when these chages occurr, but that doesn't really help me find the source.
Any suggestions?
The default trace already tracks schema changes.
In Management Studio you can right click the node of the database of interest and from the reports menu view the report "Schema Changes History" that pulls its data from there.
If the information recorded there is not sufficient you can add a DDL trigger to perform your own logging (e.g. recording HOST_NAME() though that can be spoofed)
If you are using SQL Server 2008 and above, you can use SQL Server Audit.
With earlier versions, you may be able to add triggers to the system tables that hold schema information and log changes to those.
That's just as bad as GRANT DBA TO PUBLIC!.. You best rewrite code and restrict SA privilege to one or few DBA's!.. Column renaming is not the only thing they could wreak havoc upon!.. Having a common login-ID is also not a good idea because you have no way of pinpointing exactly who did what.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have the same problem as somebody described in another post. My application's log files are huge (~1GB), and grep is tedious to use to correlate information from the log files. Right now I use the ''less'' tool, but it is also slower than I would like.
I am thinking of speeding up the search. There are the following ways to do this: first, generate logs in XML and use some XML search tool. I am not sure how much speedup will be obtained using XML search (not much I guess, since non-indexed file search will still take ages).
Second, use an XML database. This would be better, but I don't have much background here.
Third, use a (non-XML) database. This would be somewhat tedious since the table schema has to be written (has it to be done for second option above too?). I also foresee the schema to change a lot at the start to include common use cases. Ideally, I would like something lighter than a full-fledged database for storing the logs.
Fourth, use lucene. It seems to fit the purpose, but is there a simple way to specify the indexes for the current use case? For example, I want to say "index whenever you see the word 'iteration'".
What is your opinion?
The problem is using XML will make your log file even bigger
I would suggest either splitting up your log files by date or lines
otherwise use file based database engines such as sqlite
A gigabyte isn't that big, really. What kind of "correlation" are you trying to do with these log files? I've often found it's simpler to write a custom program (or script) to handle a log file in a particular way than it is to try to come up with a database schema to handle everything you'll ever want to do with it. Of course, if your log files are hard to parse for whatever reason, it may well be worth trying to fix that aspect.
(I agree with kuoson, by the way - XML is almost certainly not the way to go.)
If you can check your logs on Windows, or using Wine, LogParser is a great tool to mine data out of logs, it practically allows you to run SQL queries on any log, with no need to change any code or log formats, and it can even be used generate quick HTML or excel reports.
Also a few years ago, when XML was in the hype I was using XML logs, and XSLT stylesheets to produce views, it was actually kinda nice, but it used way to much memory and it would choke on large files, so you probably DON'T want to use XML.
The trouble with working on log files is that each one has to be queried individually, you'll get a much sharper response if you could create an index of the log files and search/query that instead. Lucene would be my next port of call, then solr.
Maybe you could load your log into Emacs (provided you have sufficient memory) and use the various Emacs features such as incremental search and Alt-X occur.
Disclaimer: I haven't tried this on files > 100MB.