Lets say NTFS's journalling is enabled but I dont want some of my file's change records to be added in the journal. Is this possible? and if not, Is there any way that even if the change related to a particular file is added into the USN journal, I can delete only that record related to that particular file? From what I have read so far that you can delete whole journal in one go using de-fragmentation API or using fsutil tool but not individual record.
Any help would be appreciated.
It's true. While the journal exists, you cannot hide file changes. And you cannot delete single usn records the regular way. As Xearinox pointed out, the only way to manipulate that data is through direct disk write operations.
If you are interested in that, this is what you want to read:
Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained
Keeping an Eye on Your NTFS Drives, Part II: Building a Change Journal Application
In short: The USN journal is a non-fragmented series of USN records. The Update Sequence Number is actually just an offset. [1] So the whole structure is pretty straight forward.
The Change Journal always writes new records to the end of the file, so the implementors chose to use the file offset of a record as its USN
Source: Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained
Related
I have loaded my Salesforce object data into Azure SQL. Now I want that one or multiple record in Salesforce get deleted then I can retrieve those records using REST API.
Is there any way to create REST API for those records for particular object?
"Yes, but".
By default SF soft deletes the records, they can still be seen in UI in Recycle Bin and undeleted from there. (There's also a hard delete call to skip the Recycle Bin).
Records stay in there for 15 days max. And bin's capacity depends on your org's data storage, see https://help.salesforce.com/articleView?id=home_delete.htm&type=5. So if you mass deleted a lot of data there's chance the bin will overflow.
To retrieve these you need to call /queryAll instead of /query service. And filter by isDeleted column which doesn't show up in Setup but is on pretty much every object. See https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_queryall.htm
/services/data/v49.0/queryAll/?q=SELECT+Name+from+Account+WHERE+isDeleted+=+TRUE
If this is not good enough for you, if you risk the Bin overflows or the operation was hard delete - you could make your own soft delete (move records to some special owner outside of role hierarchy so they become invisible to everybody except sysadmins?) or change strategy. Push info from SF instead of pulling. Send a platform even on delete, manual or with Change Data Capture. (I think CDC doesn't generate events on hard delete though, you'd have to read up)
So, for example let's say I wanted to setup a SQLite database that contains some data on invoices. Let's say each invoice has a date, invoice number, and company associated with it for simplicity. Is there a good way for the database to be able to access or store a PDF file(~300-700kb/file) of the specified invoice? If this wouldn't work any alternative ideas on what might work well?
Any help is greatly appreciated
You could store the data (each file) as a BLOB which is a byte array/stream so the file could basically be stored as it is within a BLOB.
However, it may be more efficient (see linked article) to just store the path to the file, or perhaps just the file name (depending upon standards) and then use that to retrieve and view the invoice.
Up to around 100k it can be more efficient to store files as BLOB. You may find this a useful document to read SQLite 35% Faster Than The Filesystem
SQLite does support a BLOB data type, which stores data exactly as it is entered. From the documentation:
The current implementation will only support a string or BLOB length up to 231-1 or 2147483647
This limit is much larger than your expected need of 300-700 KB per file, so what you want should be possible. The other thing to consider is the size of your database. Unless you expect to have well north of around 100 TB, then the database size limit also should not pose a problem.
On my web server, I have two folders showcase and thumbnail to store images and their thumbnails, respectively. A database fetches these images to display them on a page.
The table column in the showcase table is s_image which stores something like /showcase/urlcode.jpg.
I heard that after around 10-20k files in a folder, it starts to slow down. So should I be creating a second folder, showcase2 once it's filled up? Is there some kind of automatic creation that can do this for me?
I appreciate your input.
The filesystem you're using matters when you put tens of thousands of files in a single directory. extfs4 on Linux scales up better than NTFS on Windows.
Windows has a compatibility mode for 8.3 file names (the old-timey DOS file name standard). This causes every file name longer than abcdefgh.ext to have an alias created for it something like abcd~123.ext. This is slow, and gets very slow when you have lots of files in a single directory. You can turn off this ancient compatibility behavior. See here. https://support.microsoft.com/en-us/kb/121007. If you do turn it off, it's a quick fix for an immediate performance problem.
But, 20,000 files in one directory is a large number. Your best bet, on any sort of file system, is automatically creating subdirectories in your file system based on something that changes. One strategy is to create subdirectories based on year / month, for example
/showcase/2015/08/image1.jpg (for images uploaded this month)
/showcase/2015/09/image7.jpg (for images next month)
It's obviously no problem to store those longer file names in your s_image column in your table.
Or, if you have some system to the naming of the images, exploit it to create subdirectories. For example, if your images are named
cat0001.jpg
cat0002.jpb
...
cat0456.jpg
...
cat0987.jpg
You can create subdirectories based on, say, the first five letters of the names
/showcase/cat00/cat0001.jpg
/showcase/cat00/cat0002.jpb
...
/showcase/cat04/cat0456.jpg
...
/showcase/cat09/cat0987.jpg
If you do this, it's much better to keep the image names intact rather than make them shorter (for example, don't do this /showcase/cat09/87.jpg) because if you have to search for a particular image by name you want the full name there.
As far as I know, there's nothing automatic in a file system to do this for you. But it's not hard to do in your program.
I'm working on a piece of software that stores files in a file system, as well as references to those files in a database. Querying the uploaded files can thus be done in the database without having to access the file system. From what I've read in other posts, most people say it's better to use a file system for file storage rather then storing binary data directly in a database as BLOB.
So now I'm trying to understand the best way to set this up so that both the database a file system stay in sync and I don't end up with references to files that don't exist, or files taking up space in the file system that aren't referenced. Here are a couple options that I'm considering.
Option 1: Add File Reference First
//Adds a reference to a file in the database
database.AddFileRef("newfile.txt");
//Stores the file in the file system
fileStorage.SaveFile("newfile.txt",dataStream);
This option would be problematic because the reference to the file is added before the actual file, so another user may end up trying to download a file before it is actually stored in the system. Although, since the reference to the the file is created before hand the primary key value could be used when storing the file.
Option 2: Store File First
//Stores the file
fileStorage.SaveFile("newfile.txt",dataStream);
//Adds a reference to the file in the database
//fails if reference file does not existing in file system
database.AddFileRef("newfile.txt");
This option is better, but would make it possible for someone to upload a file to the system that is never referenced. Although this could be remedied with a "Purge" or "CleanUpFileSystem" function that deletes any unreferenced files. This option also wouldn't allow the file to be stored using the primary key value from the database.
Option 3: Pending Status
//Adds a pending file reference to database
//pending files would be ignored by others
database.AddFileRef("newfile.txt");
//Stores the file, fails if there is no
//matching pending file reference in the database
fileStorage.SaveFile("newfile.txt",dataStream); database
//marks the file reference as committed after file is uploaded
database.CommitFileRef("newfile.txt");
This option allows the primary key to be created before the file is uploaded, but also prevents other users from obtaining a reference to a file before it is uploaded. Although, it would be possible for a file to never be uploaded, and a file reference to be stuck pending. Yet, it would also be fairly trivial to purge pending references from the database.
I'm leaning toward option 2, because it's simple, and I don't have to worry about users trying to request files before they are uploaded. Storage is cheap, so it's not the end of the world if I end up with some unreferenced files taking up space. But this also seems like a common problem, and I'd like to hear how others have solved it or other considerations I should be making.
I want to propose another option. Make the filename always equal to the hash of its contents. Then you can safely write any content at all times provided that you do it before you add a reference to it elsewhere.
As contents never change there is never a synchronization problem.
This gives you deduplication for free. Deletes become harder though. I recommend a nightly garbage collection process.
What is the real use of the database? If it's just a list of files, I don't think you need it at all, and not having it saves you the hassle of synchronising.
If you are convinced you need it, then options 1 and 2 are completely identical from a technical point of view - the 2 resources can be out of sync and you need a regular process to consolidate them again. So here you should choose the options that suits the application best.
Option 3 has no advantage whatsoever, but uses more resources.
Note that using hashes, as suggested by usr, bears a theoretical risk of collision. And you'd also need a periodical consolidation process, as for options 1 and 2.
Another questions is how you deal with partial uploads and uploads in progress. Here option 2 could be of use, but you could also use a second "flag" file that is created before the upload starts, and deleted when the upload is done. This would help you determine which uploads have been aborted.
To remedy the drawback you mentioned of option 1 I use something like fileStorage.FileExists("newfile.txt"); and filter out the result for which it returns a negative.
In Python lingo:
import os
op = os.path
filter(lambda ref: op.exists(ref.path()), database.AllRefs())
i am using a text file to store my data records. the data is stored in the following format.
Antony|9876543210
Azar|9753186420
Branda|1234567890
David|1357924680
John|6767676767
Thousands of records are stored in that file. i want to delete a particular record, say "David|1357924680". I am using C, how to delete the particular record efficiently? currently i am using a temporary file to copy the records to that temp file by omitting the record i want to delete. and after copying to temp file, i copy the contents of the temp file to original file by truncating all the contents of the original file. i don't think that i am doing it efficiently. Help me.
Add a column to your data indicating it is either a valid ( 1 ) or deleted ( 0 ) row:
Antony|9876543210|1
Azar|9753186420|1
Branda|1234567890|1
David|1357924680|1
John|6767676767|1
When you want to delete a record, overwrite the single byte:
Antony|9876543210|1
Azar|9753186420|1
Branda|1234567890|0
David|1357924680|1
John|6767676767|1
Branda is now deleted.
Then add a data file compression function which can be used to rewrite the file excluding deleted rows. This could be done during times of low or no usage so it doesn't interfere with regular operations.
Edit
The validity column should probably be the first column so you can skip deleted rows more easily.
I think your approach is a little bit wrong. If you really want to do it efficiently use a database, for example sqlite. It is a simple to use database in a simple file. But it offers a lot of power of sql and is very efficient. So adding new entries and deleting wont be a problem (also searching will be easy). So check it out: http://www.sqlite.org/ .
Here is a 3minutes tutorial which will explain by example how to do everything you are trying to accomplish here: http://www.sqlite.org/quickstart.html .
Some simple ideas to improve efficiency a little bit:
You could not copy the temp file back into the original but delete the original after renaming the new one as the original (supposing they are in the same dir)
Use an in-memory data structure to copy the files instead of a support temp file (but by doing so you maybe shall limit its size and use it only as a buffer)
Mark some records as deleted but do not remove them from the file, then after a certain amount of delete operations you can provide to delete physically the records marked this way (but you shall rewrite your other operations on the file to ignore the marked records)
I would tell a similar solution that "Robert S. Barnes" gave.
I woud modify David|1357924680 to |--------------- (equal amount of bytes).
No need for extra bytes (not much benefit)
The data is really deleted. It is useful when needed by security concepts.
Sometime later (daily, weekly, ...) do the same / similar as you do now.
Three suggestions:
1. Do it the way you describe, but instead of copying the temporary file back to the original, just delete the original and rename the temporary file. This should run twice as fast.
2. Overwrite the record with 'XXXXXXX' or whatever. This is very fast, but it may not be suitable for your project.
3. Use a balanced binary tree. This is the 'professional' solution. If possible, avoid programming it from scratch!
Since direct editing of a file isn't possible, you have to resort to a method similar to what you are dong now.
As mentioned by few others, maintaining a proper data structure and only writing back at intervals would improve efficiency.