We want to add a load of records to an AS400 database file using COBOL several times a day. This file is also continually being updated and added to by 30 users through an online cobol screen, (albeit different records). My initial thought on this is one of horror, but is the File Sharing on the AS400 robust enough to handle this kind of multi threading, or does one task lock the file and release it when it has finished.
I'm an RPG programmer. I routinely have several hundred jobs adding, changing and deleting records from the same table all day long, for decades.
IBM i file sharing works very well - so well I never even think about it. There are a few tasks which require exclusive access to a file - backup & restore, for instance - but the sort of I/O that application programs perform works quite well with the typical 'shared update' access.
The COBOL application might lock records it has read until they are updated or released, or if processing under commitment control, until a commit or rollback.
Related
Question : Is keeping a lock on a record for a long period of time common practice with modern database systems ?
My understanding is locking records in a database (optimistic or pessimistic) is usually for very short period of time during a transaction.
The software I'm working with right now keeps locks on records for long periods of time :
A lock is kept on the record of the logged in user (in the ACTIVE_USERS' table) for the whole time the user is logged in the software.
Let say USER A is working on a file. The record corresponding to the file is locked until USER A saves the file or exit the file. So if a colleague, USER B tries to work on the same file, a popup shows up saying 'You can't work on this file because USER A is working on it right now'.
The company I'm working for to implement compatibility with Microsoft SQL Server wants the changes to be minimal : so I need to implement such a locking mechanism. I've hacked something that is working on a minimal test project but I'm not sure it is up to the industry and MSSQL's standards ...
This is a bit long for a comment.
Using the database locking mechanism for this application-level locking seems unusual. Database locks could be on the row, page, or table level, and they also affect indexes, so there could be unexpected side effects. Obviously, a proliferation of locks also makes deadlocks much more likely.
Normally, application locks would be handled on the record level. Using flags (of some sort) in the record, the application would ensure that only one row would have access to the file.
I would say, it might work. But I would never design a system that way and I'd be wary of unexpected consequences.
I have a big SQLite database to process, so I would like to use MPI for parallelization to accelerate the speed. What I want to do is sending a database from root to every slave, and sending the modified databases to root after slave add some table into it. I want to use MPI_Type_create_struct to create a datatype to store database, but the database is too complicated. IS there any other way to handle this situation? Thank you in advance!
I recently dealt with a similar problem - I have a large MPI application that uses SQLite as a configuration store. Handling multi-process writes is a challenge with an embedded SQL database. My experience with this involves a massively parallel application (running up to 65,535 ranks) with a shared filesystem.
Based on the FAQ from SQLite and some experience with database engines, there are a few ways to approach this problem. I am making the assumption that you are operating with a shared distributed file system, and multiple separate computers (a standard HPC cluster setup).
Since SQLite will block when multiple processes write to the database (but not read), reads will most likely not be an issue. Each process can run multiple SELECT commands at the same time without issue.
The challenge will be in the writing. Disk I/O is several orders of magnitude slower than computation, so generally this will be the bottleneck. Having said that, network communication may also be a significant slowdown, so how you approach the problem really depends on where the weakest link of your running environment will be.
If you have a fast network and slow disk speed, or if you want to implement this in the most straightforward way possible, your best bet is to have a single MPI rank in charge of writing to the database. Your compute processes would independently run SELECT commands until computation was complete, then send the new data to the MPI database process. The database control process would then write the new data to disk. I would not try to send the structure of the database across the network, rather I would send the data that should be written, along with (possibly) a flag that would identify what table/insert query the data should be written with. This technique is sort of similar to how a RDBMS works - while RDBMS servers do support concurrent writes, there is a "central" process in control of the ordering of write operations.
One thing to note is that if a process writes to the SQLite database, the file is locked for all processes that are trying to read or write to it. You will need to either handle the SQLITE_BUSY return code in your worker processes, register a callback to handle this, change the busy behavior, or use an alternate technique. In my application, I found that loading the database as an in-memory database, (https://www.sqlite.org/inmemorydb.html) for the readers provided a good workaround. Readers access the in-memory database, but sent results to the controlling process for writes. The downside is that you will have multiple copies of the database in memory.
Another option that might be less network intensive is to do the reads concurrently and have each worker process write out to their own file. You could write out to separate SQLite database files, or even export something like CSV (depending on the complexity of the data). When writes are complete, you would then have a single process merge the individual files into a single result database file - see How can I merge many SQLite databases?. This method has its own issues, but depending on where your bottlenecks are and how the system as a whole is laid out, this technique may work.
Finally, you might consider reading from the SQLite database and saving the data to a proper distributed file format, such as HDF5 (or using MPI IO). Once the computation is done, it would be pretty straightforward to write a script that would create a new SQLite database from this foreign file format.
I am trying to write a program which will be run in batch on AS400. This program is going to write a record into a file to reflect its processing status, say, when it is just submitted it adds a record saying it is currently running, and when it is done it updates the same record saying it has finished. If I want to submit this program into batch for multiple times, what is the best way to cope with this type of simultaneous file access to increase the efficiency? I don't want a job to lock the whole file and stops others from updating it in the same time. It can lock the record it needs and leave the rest to others. How to achieve this? RPGLE or QMQRY? Or any other methods?
RPG will not lock the entire file, only the record.
Personally, I'd recommend SQL for (pretty much) all file access, even through RPG. IBM hasn't been updating their Native I/O for a while, just concentrating on the SQL side of things.
Because, during normal use, record locks in RPG are released once the write or update has been performed, you should probably just have your SQL run WITH NC (no commit). You need a way to tie a processing job with the data it's processing anyways (assuming stuff is long-running enough that things are in files outside of QTEMP) - you want to be able to pick up where you left off, if your job dies (so, you can't rely on holding the lock as a control mechanism). So don't forget that you're going to need some sort of monitor job (that can at least report the status, if not resubmit things - look at the QUSRJOBI API).
If you're doing this because you're using all Native I/O, and processing huge sets of data (not huge, processor intensive calculations), consider re-writing everything to SQL. Seriously. You can get way better performance - we've taken a process that used to run for 25+ hours, to something that runs in 2.5ish.
I have an application that uses a SQL FILESTREAM to store images. I insert a LOT of images (several millions images per days).
After a while, the machine stops responding and seem to be out of memory... Looking at the memory usage of the PC, we don't see any process taking a lot of memory (neither SQL or our application). We tried to kill our process and it didn't restore our machine... We then kill the SQL services and it didn't not restore to system. As a last resort, we even killed all processes (except the system ones) and the memory still remained high (we are looking in the task manager's performance tab). Only a reboot does the job at that point. We have tried on Win7, WinXP, Win2K3 server with always the same results.
Unfortunately, this isn't a one-shot deal, it happens every time.
Has anybody seen that kind of behaviour before? Are we doing something wrong using the SQL FILESTREAMS?
You say you insert a lot of images per day. What else do you do with the images? Do you update them, many reads?
Is your file system optimized for FILESTREAMs?
How do you read out the images?
If you do a lot of updates, remember that SQL Server will not modify the filestream object but create a new one and mark the old for deletion by the garbage collector. At some time the GC will trigger and start cleaning up the old mess. The problem with FILESTREAM is that it doesn't log a lot to the transaction log and thus the GC can be seriously delayed. If this is the problem it might be solved by forcing GC more often to maintain responsiveness. This can be done using the CHECKPOINT statement.
UPDATE: You shouldn't use FILESTREAM for small files (less than 1 MB). Millions of small files will cause problems for the filesystem and the Master File Table. Use varbinary in stead. See also Designing and implementing FILESTREAM storage
UPDATE 2: If you still insist on using the FILESTREAM for storage (you shouldn't for large amounts of small files), you must at least configure the file system accordingly.
Optimize the file system for large amount of small files (use these as tips and make sure you understand what they do before you apply)
Change the Master File Table
reservation to maximum in registry (FSUTIL.exe behavior set mftzone 4)
Disable 8.3 file names (fsutil.exe behavior set disable8dot3 1)
Disable last access update(fsutil.exe behavior set disablelastaccess 1)
Reboot and create a new partition
Format the storage volumes using a
block size that will fit most of the
files (2k or 4k depending on you
image files).
I have a project right now where I'd like to be able to pull rows out of an Access database that a 3rd party product uses to store its information. There will likely be a small number of users hitting this database at the same time my "export" process does, so I'm a little concerned about data integrity and concurrent access.
Will I likely run into problems with my .NET import process (using LINQ/ADO.NET/?) when it is trying to pull data out of the MDB at the same time someone else is saving a row? How does Access's locking work?
There should no problem. Problems can occur only on concurrent write operations. The locking from MS Access based on file locks in the ldb file. The locks occur only on pages and not on the completely file. Because the locks are in the ldb file and not in the mdb file that there are no problems with parallel reading.
In previous workings with Access (back when I was using 2003 for things) the only thing I ran into was that occasionally a read would lock rows just above and below the current read. However, I believe this may have been an isolated issue with our application.
When you open the database, do not attempt to open in read-only mode (although you might think it makes sense). When you are the first user in, Access opens the mdb file in read-only mode and does not create an ldb, forcing all subsequent users to be in read-only mode as well.