Ghost data rows added into Firebird database table? - database

I faced today strange case when receiving customer database for investigation.
System settings:
Firebird server v 2.5.9.26074
Firebird client v 2.6.5
Database file is accessed directly by the application, i.e., it is NOT registered via aliases.conf.
When I first looked into database, everything seemed to be pretty consistent. However, during the first startup there are two rows added in certain table without any detected SQL execution. I have confirmed with debugger that the application is not adding these rows. I also used Audit and Trace inferface (fbtracemgr) and saw in log file that there are not such rows added to the database.
There is one hint that something is wrong in the original database. The table that contains the problem is using INSERT trigger to set the table row's ID column value from generator. Now the generator value seem to be one too high in the original database. This leads me to think that the "ghost data" has already been entered in the file in some sort of cache as the generator is already increment by one.
The result is that after these the two ghost rows are added, the next real addition to the table leads into exception:
FirebirdSql.Data.FirebirdClient.FbException (0x80004005): violation of
PRIMARY or UNIQUE KEY constraint "INTEG_275" on table "DATALOG" --->
violation of PRIMARY or UNIQUE KEY constraint "INTEG_275" on table
"DATALOG"
as there already exist row with equal ID that the generator suggests.
Is there persistent "unsaved data cache" that could contain row data entered during the previous application runs? What could lead to this situation? Power break during database writing or backuping?
Any thoughts?

Firebird server v 2.5.9.26074
There is no such version released.
Firebird-2.5.8.27089
http://www.firebirdsql.org/en/firebird-2-5/
Basically u seem to use some destabilized FB developers internal build, which can have any number of strange averse effects.
So I would advice to use standard released verison or if using snapshot builds is required for some untold reasons - to ask developers in firebird-support mail list - http://www.firebirdsql.org/en/support/
Though don't hold your breath for much of support over exotic Firebird builds.
UPD. Thanks to Mark, here it is: https://www.firebirdsql.org/en/firebird-2-5-0/
2.5.0 - was the first release after a significant reworking of the engine. Not the most stable, obviously. For example there was an issue with indices right in the next 2.5.1 version.
if the behavior would be repeated on standard 2.5.8 Firebird, then i would suggest exporting all the database (at least all the meta-data, but maybe the data as well) into a long text file, SQL script, and then searching for the said table name in it. For example there might be on-database-connect triggers adding some data. Or stored procedures. Or views made on triggers. Or something yet else. For example - though malpractice - even UDF function may make it's own database connection and do things, though this should be shown in FBTrace.
However, during the first startup there are two rows added in certain table
startup of what ?
will those rows still be added if you use standard tools like iSQL/FlameRobin/IBExpert/etc just to connect and then disconnect from the database?
as there already exist row with equal ID that the generator suggests
Generator can not suggest things like that. It can only suggest that once such a number was reserved for possibly being added to one or another table. It does not mean the row was actually inserted, was inserted into that table, was not deleted later.
You may try to search with indices prohibited, in case index corruption could occur, something like
select id+0, count(*) from tableName group by 1
Also http://www.firebirdfaq.org/faq324/
when receiving customer database for investigation
BTW, how exactly did they created a copy of the database to give you?
Did they made back-up (FBK) ? If not, did they stopped Firebird server before making copies?

Related

Set Null works in Sqlite but not works in Sql Server when relating table itself

The problem is that "DeleteBehavior.SetNull" works only in Sqlite and doesn't work at all in Sql Server, is this some limitation of Sql Server with SET NULL?
I have the "User" model:
User.Id
User.Name
And I also have the "Partner" model:
Partner.Id
Partner.Title
Partner.ParentId
Partner.Parent (virtual)
Scenario:
I create Partner 1
I create Partner 2 and define that the ParentId is Partner 1 (1 is the father of 2)
I try to delete Partner 1 (I try to delete the parent)
At that moment, Sqlite defines NULL in the ParentId of Partner 2, that's correct, that's the behavior I want, but in SQL Server I can't do that at all, I tried innumerable ways and I fall into some errors, follow below:
Errors:
Delete Error:
Microsoft.Data.SqlClient.SqlException (0x80131904): The DELETE statement conflicted with the SAME TABLE REFERENCE constraint "FK_Partners_Partners_ParentId". The conflict occurred in database "master", table "dbo.Partners", column 'ParentId'.
Migrations Error:
Introducing FOREIGN KEY constraint 'FK_Partners_Partners_ParentId' on table 'Partners' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
Could not create constraint or index. See previous errors.
I even found some old texts saying that this is a Sql Server limitation, but it's already 2023 and this limitation still exists? Is it possible to get around this in some way that is easy and affects every table in the database?
I already tried all the DefaultBehavior and none works like Sqlite, I was programming 100% in Sqlite and I managed to develop a system and everything is working, however when generating the migration and trying to use Sql Server I came across this problem.
The same thing is asked at dba.stackexchange.com. The answers explain in detail why this isn't so easy to implement. Relational databases operate on sets of rows at a time, not individual rows. Deleting or updating rows one by one is the slowest way possible.
While SQLite is built to handle a few thousand rows for a single application running inside a watch, SQL Server has to handle thousands of concurrent operations to the same table that may contain several millions of rows spread across multiple partitions. The self-referencing ON DELETE SET NULL has to work reliably and predictably when deleting 1 row in an 1000 row table and when deleting 10K rows in a 50M row table.
As Mikael Eriksson explains in the first answer, ON DELETE SET NULL converts a DELETE operation on a table to an UPDATE operation on the same table.
This DBA question on cascading DELETEs shows what's involved in the easy case :
In this picture the server :
Finds the rows that need to be deleted in the first table,
Removes them from the parent table. That means marking rows and pages for deletion, writing records to the transaction log
Spool the deleted keys so they can be used on the related table
Repeat 1-2 on the child table
When all that finishes, commit the transaction by committing all changes in the data pages and the transaction log.
And that's a single operation. ON DELETE SET NULL on the other hand converts the DELETE operation to an DELETE and an UPDATE on the same table. The database would have to both DELETE and UPDATE index rows on the ParentID index to get this to happen. Different kinds of locks would have to be taken, and some of them could be taken
There's a similar statement that does multiple operations at once, MERGE. Aaron Bertrand's Use Caution with SQL Server's MERGE Statement shows a list of 30 bugs for that statement alone. MERGE isn't even atomic and the UPDATE/DELETE/INSERT operations are executed separately, which is the cause of some of the bugs.
I'd rather not have ON DELETE SET NULL than have a slow or unreliable one
While trying to reproduce this I found an SQLite limitation - foreign keys aren't enforced by default for compatibility with the way it worked over a decade ago. The docs warn this can change in the future:
Foreign key constraints are disabled by default (for backwards compatibility), so must be enabled separately for each database connection. (Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default but will instead enable or disable them as necessary.) The application can also use a PRAGMA foreign_keys statement to determine if foreign keys are currently enabled.
This can seem like an illogical restriction in 2023 until one remembers that SQLite was built to run on the weakest possible devices (microcontrollers, not even processors) where the very fact of checking constraints can cause significant problems. Those devices can easily be inside a car or other hardware device with a lifetime of decades.

How can I refer to a database deployed from the same DACPAC file, but with a different db name?

Background
I have a multi-tenant scenario and a unique Sql Server project that will be deployed into multiple databases instances on the same server. There will be one db for each tenant, plus one "model" db.
The "model" database serves three purposes:
Force some "system" data to be always present in each tenant database
Serves as an access point for users with a special permission to edit system data (which will be punctually synced to all tenants)
When creating a new tenant, the database will be copied and attached with a new name representing the tenant
There are triggers that checks if the modified / deleted data within tenant db corresponds to "system" data inside the "model" db. If it does, an error is raised saying that system data cannot be altered.
Issue
So here's a part of the trigger that checks if deletion can be allowed:
IF DB_NAME() <> 'ModelTenant' AND EXISTS
(
SELECT
[deleted].*
FROM
[deleted]
INNER JOIN [---MODEL DB NAME??? ---].[MySchema].[MyTable] [ModelTable]
ON [deleted].[Guid] = [ModelTable].[Guid]
)
BEGIN;
THROW 50000, 'The DELETE operation on table MyTable cannot be performed. At least one targeted record is reserved by the system and cannot be removed.', 1
END
I can't seem to find what should take the place of --- MODEL DB NAME??? --- in the above code that would allow the project to compile properly. When refering to a completely different project I know what to do: use a reference to the project that's represented with an SQLCMD variable. But in this scenario the project reference is essentially the same project; only on a different database. I can't seem to be able to add a self-reference in this manner.
What can I do? Does SSDT offers some kind of support for such a scenario?
Have you tried setting up a Database Variable? You can read under "Reference aware statements" here. You could then say:
SELECT * FROM [$(MyModelDb)][MySchema].[MyTable] [ModelTable]
If you don't have a specific project for $(MyModelDb) you can choose the option to "suppress errors by unresolved references...". It's been forever since I've used SSDT projects, but I think that should work.
TIP: If you need to reference 1-table 100-times, you may find it better to create a SYNONM that uses the database variable, then point to the SYNONM in your SPROCs/TRIGGERs. Why? Because that way you don't need to deploy your SPROCs/TRIGGERs to get the variable replaced with the actual value and that can make development smoother.
I'm not quite sure if SSDT is particularly well-suited to projects of any decent amount of complexity. I can think of one or two ways to most likely accomplish this (especially depending on exactly how you do the publishing / deployment), but I think you would actually lose more than you gain. What I mean by that is: you could add steps to get this to work (i.e. win the battle), but you would be creating a more complex system in order to get SSDT to publish a system that is more complex (and slower) than it needs to be (i.e. lose the war).
Before worrying about SSDT, let's look at why you need/want SSDT to do this in the first place. You have system data intermixed with tenant data, and you need to validate UPDATE and DELETE operations to ensure that the system data does not get modified, and the only way to identify data that is "system" data is by matching it to a home-of-record -- ModelDB -- based on GUID PKs.
That theory on identifying what data belongs to the "system" and not to a tenant is your main problem, not SSDT. You are definitely on the right track for a multi-tentant system by having the "model" database, but using it for data validation is a poor design choice: on top of the performance degradation already incurred from using GUIDs as PKs, you are further slowing down all of these UPDATE and DELETE operations by funneling them through a single point of contention since all client DBS need to check this common source.
You would be far better off to include a BIT field in each of these tables that mixes system and tenant data, denoting whether the row was "system" or not. Just look at the system catalog views within SQL Server:
sys.objects has an is_ms_shipped column
sys.assemblies went the other direction and has an is_user_defined column.
So, if you were to add an [IsSystemData] BIT NOT NULL column to these tables, your Trigger logic would become:
IF DB_NAME() <> N'ModelTenant' AND EXISTS
(
SELECT del.*
FROM [deleted] del
WHERE del.[IsSystemData] = 1
)
BEGIN
;THROW 50000, 'The DELETE operation on table MyTable cannot be performed. At least one targeted record is reserved by the system and cannot be removed.', 1;
END;
Benefits:
No more SSDT issue (at least for from this part ;-)
Faster UPDATE and DELETE operations
Less contention on the shared resource (i.e. ModelDB)
Less code complexity
As an alternative to referencing another database project, you can produce a dacpac, then reference the dacpac as a database reference in "same server, different database" mode.

Does Oracle (RDB in general?) take a snapshot of the table affected by DML?

Objective
To understand the mechanism/implementation when processing DMLs against a table. Does a database (I work on Oracle 11G R2) take snapshots (for each DML) of the table to apply the DMLs?
Background
I run a SQL to update the AID field of the target table containing old values with the new values from the source table.
UPDATE CASES1 t
SET t.AID=(
SELECT DISTINCT NID
FROM REF1
WHERE
oid=t.aid
)
WHERE EXISTS (
SELECT 1
FROM REF1
WHERE oid=t.aid
);
I thought the 'OLD01' could be updated twice (OLD01 -> NEW01 -> SCREWED).
However, it did not happen.
Question
For each DML, does a database take a snapshot of table X (call it X+1) for a DML (1st) and then keep taking snapshot (call it X+2) of the result (X+1) for the next DML (2nd) on the table, and so on for each DML that are successibly executed? Is this also used as a mechanism to implement Rollback/Commit?
Is it an expected behaviour specified as a standard somewhere? If so, kindly suggest relevant references. I Googled but not sure what the key words should be to get the right result.
Thanks in advance for your help.
Update
Started reading Oracle Core (ISBN 9781430239543) by Jonathan Lewis and saw the diagram. So current understanding is the UNDO records are created in the UNDO tablespace for each update and the original data is reconstructed from there, which I initially thought as snapshots.
In Oracle, if you ran that update twice in a row in the same session, with the data as you've shown, I believe you should get the results that you expected. I think you must have gone off track somewhere. (For example, if you executed the update once, then without committing you opened a second session and executed the same update again, then your result would make sense.)
Conceptually, I think the answer to your question is yes (speaking specifically about Oracle, that is). A SQL statement effectively operates on a snapshot of the tables as of the point in time that the statement starts executing. The proper term for this in Oracle is read-consistency. The mechanism for it, however, does not involve taking a snapshot of the entire table before changes are made. It is more the reverse - records of the changes are kept in undo segments, and used to revert blocks of the table to the appropriate point in time as needed.
The documentation you ought to look at to understand this in some depth is in the Oracle Concepts manual: http://docs.oracle.com/cd/E11882_01/server.112/e40540/part_txn.htm#CHDJIGBH

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

'tail -f' a database table

Is it possible to effectively tail a database table such that when a new row is added an application is immediately notified with the new row? Any database can be used.
Use an ON INSERT trigger.
you will need to check for specifics on how to call external applications with the values contained in the inserted record, or you will write your 'application' as a SQL procedure and have it run inside the database.
it sounds like you will want to brush up on databases in general before you paint yourself into a corner with your command line approaches.
Yes, if the database is a flat text file and appends are done at the end.
Yes, if the database supports this feature in some other way; check the relevant manual.
Otherwise, no. Databases tend to be binary files.
I am not sure but this might work for primitive / flat file databases but as far as i understand (and i could be wrong) the modern database files are encrypted. Hence reading a newly added row would not work with that command.
I would imagine most databases allow for write triggers, and you could have a script that triggers on write that tells you some of what happened. I don't know what information would be available, as it would depend on the individual database.
There are a few options here, some of which others have noted:
Periodically poll for new rows. With the way MVCC works though, it's possible to miss a row if there were two INSERTS in mid-transaction when you last queried.
Define a trigger function that will do some work for you on each insert. (In Postgres you can call a NOTIFY command that other processes can LISTEN to.) You could combine a trigger with writes to an unpublished_row_ids table to ensure that your tailing process doesn't miss anything. (The tailing process would then delete IDs from the unpublished_row_ids table as it processed them.)
Hook into the database's replication functionality, if it provides any. This should have a means of guaranteeing that rows aren't missed.
I've blogged in more detail about how to do all these options with Postgres at http://btubbs.com/streaming-updates-from-postgres.html.
tail on Linux appears to be using inotify to tell when a file changes - it probably uses similar filesystem notifications frameworks on other operating systems. Therefore it does detect file modifications.
That said, tail performs an fstat() call after each detected change and will not output anything unless the size of the file increases. Modern DB systems use random file access and reuse DB pages, so it's very possible that an inserted row will not cause the backing file size to change.
You're better off using inotify (or similar) directly, and even better off if you use DB triggers or whatever mechanism your DBMS offers to watch for DB updates, since not all file updates are necessarily row insertions.
I was just in the middle of posting the same exact response as glowcoder, plus another idea:
The low-tech way to do it is to have a timestamp field, and have a program run a query every n minutes looking for records where the timestamp is greater than that of the last run. The same concept can be done by storing the last key seen if you use a sequence, or even adding a boolean field "processed".
With oracle you can select an psuedo-column called 'rowid' that gives a unique identifier for the row in the table and rowid's are ordinal... new rows get assigned rowids that are greater than any existing rowid's.
So, first select max(rowid) from table_name
I assume that one cause for the raised question is that there are many, many rows in the table... so this first step will be taxing the db a little and take some time.
Then, select * from table_name where rowid > 'whatever_that_rowid_string_was'
you still have to periodically run the query, but it is now just a quick and inexpensive query

Resources