BaseX: database size is not being updated when deleting nodes - basex

I am running baseX 8.4.4 on my server and using the web-app for administration.
Now, when i delete nodes (via JAVA), the database-size shown in the webapp is not updating.
Is this just a bug of the GUI or is baseX somehow storing invisible backups? (as no backups are visible (see figure). Or are there other files needing this memory? All the Logs have been deleted.

Depending on the exact kind of update operation you are performing, the database may only shrink after an optimize call.

Related

How transparent should losing access to a Postgres-XL datanode be?

I have set-up a testing Postgres-XL cluster with the following architecture:
gtm - vm00
coord1+datanode1 - vm01
coord2+datanode2 - vm02
I created a new database, which contains a table that is distributed by replication. This means that I should have the exact copy of that table in each and every single datanode.
Doing operations on the table works great, I can see the changes replicated when connecting to all coordinator nodes.
However, when I simulate one of the datanodes going down, while I can still read the data in the table just fine, I cannot add or modify anything, and I receive the following error:
ERROR: Failed to get pooled connections
I am considering deploying Postgres-XL as a highly available database backend for a fair number of applications, and I cannot control how those applications interact with the database (it might be big a problem if those applications couldn't write to the database while one datanode is down).
To my understanding, Postgres-XL should achieve high availability for replicated tables in a very transparent way and should be able to support losing one or more datanodes (as long as at least one is still available - again, this is just for replicated tables), but this does not seem the case.
Is this the intended behaviour? What can be done in order to be able to withstand having one or more datanodes down?
So as it turns out not transparent at all. To my jaw dropping surprise at it turns out Postgres-XL has no build in high availably support or recovery. Meaning if you lose one node the database fails. And if you are using the round robbin or hash DISTRIBUTED BY options if you lose a disk in a node you have lost the entire database. I could not believe it, but that is the case.
They do have a "stand by" server option which is just a mirrored node for each node you have, but even this requires manually setting it to recover and doubles the number of nodes you need. For data protection you will have to use the REPLICATION DISTRIBUTED BY option which is MUCH slower and again has no fail over support so you will have to manually restart it and reconfigure it not to use the failing node.
https://sourceforge.net/p/postgres-xl/mailman/message/32776225/
https://sourceforge.net/p/postgres-xl/mailman/message/35456205/

How to use CHECKPOINT efficently in a application that uses FILESTREAM

I use FILESTREAM to store BLOBS in my client server application.
In past i had from time to time to clear all BLOBS by executing a command like:
UPDATE BLOBTABLE set BLOBFIELD = NULL
This clears the blobs, i did this to make the DB backup smaller.
But to make the blobs "disappear from disk" i then need to run
CHECKPOINT
Note: this was done as DBA activity, not as part of the software.
Now I realize that in my application I never call CHECKPOINT.
May be i should every time i delete a blob, should i?
Just to experss my self better i make an example of my real case:
My app allows to store files (like pdf documents).
those pdf are saved as blobs in a filestream field.
As the user deletes them (from the ui) I run a DELETE commmand.
I do not call CEHCKPOINT after it, so the garbage collection does not
run.
By considering this i realize that i do not have the full thing under control.
So my question is simply: do i need to run CHECKPOINT every time i delete one of those files? is there any drawback in doing this?
Thanks!
A database performs checkpoints in different moments, one of those is when backup is performed.
Since the checkpoint triggers the garbage collection it is not needed (exceptions could be huge or complex scenarios) to call CHECKPOINT in an application because the risk is to reduce the performance.
Better decide to use CHECKPOINT as a maintenance activity if needed, but keeping in mind that a database backup (or even stopping the sql service) has a checkpoint as consequence.

On the app engine development server why does the "Applying all pending transactions and saving the datastore" step take several seconds?

It seems like something that should take place in a flash, at least locally. Is there some time emulation of the production server that causes it?
It depends how much data is saved in your local datastore and how fast your disk is. I've noticed if you have a lot of deleted data in your datastore, the file still ends up being big.
If you want to clear the database, it's better to delete the whole file than delete all the individual entities.

What does a database log keep track of?

I'm quite new to SQL Server and was wondering what the difference between the SQL Server log is and a custom log (in my case, using log4net)? I guess there's more choice on what to log using log4net, but what things are automatically logged by the database? For example, if a user signs up to my site, would I have to manually log that transaction, or would that be recorded in the database's log automatically? I'm currently starting a project and would like to figure out exactly what I should bother logging.
Thanks
Apples and Oranges.
Log4net and other custom 'logging' is just a way to capture events an application is reporting. 'Log' in this context reffers to whatever store is used by this infrastucture to persist information about these events.
The database log on the other hand is something compeltely different. In order to maintain consistency and atomicity databases use a so called Write-Ahead-Log protocol. In WAL all changes are first durable written into a journal, or log, before being applied to the data. This allows recovery to replay the log (the journal) and get the data back into a consistent state, by rolling back any uncommited work.
Database logs have absolutely nothing to do with your application code. Any database update will be automatically logged by the engine, simply because this is how any data is updated in a database. You cannot modify that, nor do you have any access to what's written in the log (strictly speaking you can look into the log, but you won't find any usefull information for your application).
SQL log handles tansaction logging for rolling back or comiting data. They are usually only dealt with by someone who knows what they are doing restoring backups or shipping the logs to use for backups.
The log4net and other logging framweworks handle in code logging of exceptions, warning, or debug level info that you would like to output for your own info. They can be sent to a table in a database, command window, flat file or web service. Common logging scenarios are catching unhandled exceptions at the application level to help track down bugs, or in any try catch statements writing out the stack trace.
It keeps track of the transactions so it can roll them back or replay in case of a crash. Quite more involved than simple logging.
The two are almost completely unrelated.
A database log is used to rollback transactions, recover from crashes, etc. All good things to ensure database consistency. It has updates/inserts/deletes in it--not really anything about intent or what your app is trying to do unless it directly affects data in the database.
The application log on the other hand (with Log4Net) can be extremely useful when building and debugging your application. It is driven by you and should contain information that traces what your app is doing. This is something that can safely be turned off or reduced (by toggling the log level) when you no longer need it.
The SQL Server log file is actually used for maintaining it's own stability, but it's not terribly useful for normal developers. It's not what you think (and I what I thought), a list of SQL statements that have been run. It's a propriety format designed to help SQL recover from a crash or roll back transactions.
If you need to track what's going on in the system, the SQL transaction log won't be helpful, and it would be very difficult to get that information back out. Instead, I would suggest adding triggers on your tables that write information off to another table, or add some code in your data layer that saves off a log of what's going on. It could be as simple as wrapping the SQL command object with your own implementation, which saved SQL statements off to log4net in addition to whatever normal code it was executing.
It is the mechanism by which the RMDBS can assure atomicity and consistency, see ACID.

Memory leak using SQL FileStream

I have an application that uses a SQL FILESTREAM to store images. I insert a LOT of images (several millions images per days).
After a while, the machine stops responding and seem to be out of memory... Looking at the memory usage of the PC, we don't see any process taking a lot of memory (neither SQL or our application). We tried to kill our process and it didn't restore our machine... We then kill the SQL services and it didn't not restore to system. As a last resort, we even killed all processes (except the system ones) and the memory still remained high (we are looking in the task manager's performance tab). Only a reboot does the job at that point. We have tried on Win7, WinXP, Win2K3 server with always the same results.
Unfortunately, this isn't a one-shot deal, it happens every time.
Has anybody seen that kind of behaviour before? Are we doing something wrong using the SQL FILESTREAMS?
You say you insert a lot of images per day. What else do you do with the images? Do you update them, many reads?
Is your file system optimized for FILESTREAMs?
How do you read out the images?
If you do a lot of updates, remember that SQL Server will not modify the filestream object but create a new one and mark the old for deletion by the garbage collector. At some time the GC will trigger and start cleaning up the old mess. The problem with FILESTREAM is that it doesn't log a lot to the transaction log and thus the GC can be seriously delayed. If this is the problem it might be solved by forcing GC more often to maintain responsiveness. This can be done using the CHECKPOINT statement.
UPDATE: You shouldn't use FILESTREAM for small files (less than 1 MB). Millions of small files will cause problems for the filesystem and the Master File Table. Use varbinary in stead. See also Designing and implementing FILESTREAM storage
UPDATE 2: If you still insist on using the FILESTREAM for storage (you shouldn't for large amounts of small files), you must at least configure the file system accordingly.
Optimize the file system for large amount of small files (use these as tips and make sure you understand what they do before you apply)
Change the Master File Table
reservation to maximum in registry (FSUTIL.exe behavior set mftzone 4)
Disable 8.3 file names (fsutil.exe behavior set disable8dot3 1)
Disable last access update(fsutil.exe behavior set disablelastaccess 1)
Reboot and create a new partition
Format the storage volumes using a
block size that will fit most of the
files (2k or 4k depending on you
image files).

Resources