Does the maxTransactionSize set in the NetworkParameters include attachments? - file

Does the maxTransactionSize set in the NetworkParameters include attachments?
I have a situation where I need to send a large file as an attachment and I want to make sure it's less than the maxTransactionSize enforced by the network params.

maxTransactionSize does include attachments when calculated.
If you want to take a look for yourself, the calculation is done in - WireTransaction.checkTransactionSize: https://github.com/corda/corda/blob/master/core/src/main/kotlin/net/corda/core/transactions/WireTransaction.kt
Unfortunately, the maxTransactionSize is hard coded at 10mb for second for the time being:
#property maxMessageSize This is currently ignored. However, it will be wired up in a future release.
This is an interesting implementation because the file is NOT sent in the initial transaction. In fact, only a hash of the file is included in the transaction itself. Files are only sent to the requesting node when needed.
Still, the issue with a large maxTransactionSize is that all nodes must read the entire file into RAM will cause some nodes to fail to verify massive transactions if they have insufficient RAM.
Work is being done to chunk files and enable attachments of arbitrary size!

Related

How does Flash Translation Layer store mapping data, unusable block and super block?

Does ftl have private storage space that is not flash?
If not, how does ftl store those meta data while avoiding wear leveling.
Actually I don’t know if there is a super block in ftl, but if you want to locate the mapping data and unusable block whose physical address changes frequently, a certain physical address may be needed. The content on this physical address must change frequently, how To avoid wear leveling of this physical address?
There are many possible solutions to this problem and it's very intertwined with the data representation that the drive uses to store its data, so I'm sure it differs a lot based on the drive / manufacturer. I'll just outline a general approach that could work.
Let's say you design an FTL that maintains several fixed-size, append-only "logs", and for simplicity we always have one "active" log that all writes are appended to. If the user is issuing random writes, the order of LBAs in the active log will be random too. When the active log fills all the space allocated to it, it gets "frozen" and we switch the active log to some empty log elsewhere in the flash. As the data in the frozen log becomes stale, we will eventually need to garbage collect it by copying any still-referenced blocks to a different log before erasing the original so that it can be reused for new writes.
Now, for each write to a log, nothing in our interface so far requires that the blocks be exactly 4KiB (or whatever), so you could append a small header to the data that tells you what its LBA is, and perhaps some other metadata -- write sequence number so you can tell if it's the most recent copy of a block, and maybe a checksum for read integrity checking. When a write finishes, you update an in-RAM copy of the map with the new location for the LBAs that were updated (RAM inside the SSD, not RAM for the main CPU of the computer obviously).
If the FTL crashes or loses power, you can reconstruct the map by reading all headers from all the logs. The downside is that scanning the logs will scale O(number of logs * number of blocks per log), so you optimize that somehow:
you could write the headers to a separate part of the disk by themselves so that you can scan them without also reading the user data (same big-O runtime but a lot faster in practice)
you could periodically flush the in-RAM copy of the map to flash somewhere, along with the latest IO sequence number, so that you only have to read the parts of the logs that were written since the latest map flush
How do you find the portion of the log to start scanning from? do a binary search on the IO sequence numbers in the log headers. So the boot runtime is now O(number of logs * (log_2(number of blocks per log) + number of blocks that need to be scanned))
How do you know when to stop scanning? either you recognize that all data in the block you read is 1's because that part of the log hasn't been written to yet, or you recognize that the checksum and data don't match.
Minor optimization: during a clean shutdown, always write the map to flash, so that this binary search + scanning only needs to happen if there's a crash or unclean shutdown.
So far, this lowers how often you need to write the map by a lot, but it's still probably too often to overwrite it to a fixed location for a drive with a very long lifetime. To resolve that, we have to cycle where we write the map:
The simplest solution would be to designate a small set of X special logs to store all map data and write to them like a circular buffer, where X is chosen to make the map updates last the expected lifetime of the device. To find the most recent map in the log on boot, you'd do binary search within those logs to find the last one that was written. So boot = O(X * log_2(number of maps per log) + runtime to scan the other logs if unclean shutdown).
Probably a more optimal solution (but one that might be more complicated), would include the map writes directly into the logs where the updates are happening. Then you need some way to find where the maps are at boot time -- the most obvious way to do that would be to write the map into the beginning of each active log, or you could allow arbitrary map writes by adding backpointers into the block headers that point back to the latest map in their log.
Another aspect of this is that full map flushes could be expensive, which would add tail latency if it ever interferes with the performance of user IOs -- would it be better to allow incremental updates? That's when you start looking at using something like a log-structured merge (LSM) tree to store your map, so that each incremental write is pretty small and you can amortize the full map write cost.
Obviously there are a bunch of tiny details that this explanation leaves out, but hopefully that's enough to get you started. :-)

btrfs can't mount after broken disk removed

I want to use btrfs as filesystem on my server, and i am still research about it in all worst case condition.
Currently i want to test the raid system crash, the condition that i want to test is :
if my disk broken, how to replace it
if i can't replace it, how to save my data
if accidentally i am (or my team) formated one of the disk, how to fix it
if accidentally one of my disk stollen (i think this case not possible, just for the worst case condition), how to replace it
for all question i am writen above, i just can answer two of my question.
answer number one is, i can use replace method before unplug the broken disk.
answer number two, i can plug external harddrive, and then mounting it, and i can use restore method to save my data
for the other question, i failed to test it.
for question number 3 and 4(if i replace it with another disk), i tried to use mount -o degraded but i can't mount it it shows error wrong fs type, bad option, bad superblock on /dev/sdb. i am tried to rebalance it with balance method, but i can't mounting it.
please, i need answer to my question number 3 and 4.
The replace option needs to be done before the disk completely dies or else the replace operation won't work (and will likely screw up the array). If the disk is already unreadable, then yank it and mount with the degraded option. Add a new disk into the array and tell it to delete missing devices and it should sort it all out.
If your array has redundancy on both data and metadata a single failed disk shouldn't cost you any of your data. If, for some reason, the array is corrupted and won't accept a replacement disk, you can use btrfs recover to copy as much as is recoverable out of the array and into a different storage system. Then rebuild the array.
Formatting a disk is no different from having one go bad except you don't actually need a new physical disk. If your array is redundant, mount degraded, add the formatted disk back in, and delete missing. It should automatically rebalance the affected data. Running a scrub when you're done might also be wise.
A stolen disk is the same as having one go bad. Mount degraded, add in a new one, and delete missing.
Your bad superblock issue is most likely caused by attempting to mount the disk that was formatted/replaced. Formatting will remove the BTRFS filesystem identifiers, so the system won't be able to detect the other drives in the array. Use one of the devices that's still a part of the array for the mount command and it should be able to detect the rest. If it doesn't, then probably your array was not in a consistent state before you removed/formatted the disk and there is insufficient redundancy to repair it. btrfs recover may be your only option at that point. Depending on circumstances you may need to run btrfs device scan to re-detect what devices are and are not part of the array.

Conflicts in writing/reading a file

I'm developing a little software in C that reads and writes messages in a notice-board. Every message is a .txt named with a progressive number.
The software is multithreading, with many users that can do concurrent operations.
The operations that a user can do are:
Read the whole notice-board (concatenation of all the .txt file contents)
Add a message (add a file named "id_max++.txt")
Remove a message. When a message is removed there will be a hole in that number (e.g, "1.txt", "2.txt", "4.txt") that will never be filled up.
Now, I'd like to know if there is some I/O problem (*) that I should manage (and how) or the OS (Unix-like) does it all by itself.
(*) such as 2 users that want to read and delete the same file
As you have an Unix-like, OS will take care of deleting a file while it is open by another thread : the directory entry is immediately removed, and the file itself (inode) is deleted on last close.
The only problem I can see is between the directory scan and the open of a file : race conditions could make that the file has been deleted.
IMHO you simply must considere that an error file does not exist is normal, and simply go to next file.
What you describe is not really bad, since it is analog to MH folders for mails, and it can be accessed by many different processes, even if locking is involved. But depending on the load and on the size of the messages, you could considere using a database. Rule of thumb (my opinion) :
few concurrent accesses and big files : keep on using file system
many accesses and small files (several ko max.) : use a database
Of course, you must use a mutex protected routine to find next number when creating a new message (credits should be attributed to #merlin2011 for noticing the problem).
You said in a comment that your specs do not allow a database. On the analogy with mail handling, you could alse use a single file (like traditionnal mail format) :
one single file
each message is preceded with a fixed size header saying whether it is active or deleted
read access need not be synchronized
write accesses must be synchronized
It would be a poor man's database where all synchronization is done by hand, but you have only one file descriptor per thread and save all open and close operations. It makes sense where there are many reads and few writes or deletes
A possible improvement would be (still like mail readers do) to build an index with the offset and status of each message. The index could be on disk or in memory depending on your requirements.
The easier solution is to use a database like sqlite or MySQL, both of which provide transactions that you can use ot achieve consistency. If you still want to go down the route, read on.
The issue is not an IO problem, it's a concurrency problem if you do not implement proper monitors. Consider the following scenario (it is not the only problematic one, but it is one example of one).
User 1 reads the maximum id and stores it in a local variable.
Meanwhile, User 2 reads the same maximum id and stores it in a local variable also.
User 1 writes first, and then User 2 overwrites what User 1 just wrote, because it had the same idea of what the maximum id was.
This particular scenario can be solved by keeping the current maximum id as a variable that is initialized when the program is initialized, and protecting the get_and_increment operation with a lock. However, this is not the only problematic scenario that you will need to reason through if you go with this approach.

Removing bytes from File in (C) without creating new File

I have a file let's log. I need to remove some bytes let's n bytes from starting of file only. Issue is, this file referenced by some other file pointers in other programs and may these pointer write to this file log any time. I can't re-create new file otherwise file-pointer would malfunction(i am not sure about it too).
I tried to google it but all suggestion for only to re-write to new files.
Is there any solution for it?
I can suggest two options:
Ring bufferUse a memory mapped file as your logging medium, and use it as a ring buffer. You will need to manually manage where the last written byte is, and wrap around your ring appropriately as you step over the end of the ring. This way, your logging file stays a constant size, but you can't tail it like a regular file. Instead, you will need to write a special program that knows how to walk the ring buffer when you want to display the log.
Multiple number of small log filesUse some number of smaller log files that you log to, and remove the oldest file as the collection of files grow beyond the size of logs you want to maintain. If the most recent log file is always named the same, you can use the standard tail -F utility to follow the log contents perpetually. To avoid issues of multiple programs manipulating the same file, your logging code can send logs as messages to a single logging daemon.
So... you want to change the file, but you cannot. The reason you cannot is that other programs are using the file. In general terms, you appear to need to:
stop all the other programs messing with the file while you change it -- to chop now unwanted stuff off the front;
inform the other programs that you have changed it -- so they can re-establish their file-pointers.
I guess there must be a mechanism to allow the other programs to change the file without tripping over each other... so perhaps you can extend that ? [If all the other programs are children of the main program, then if the children all O_APPEND, you have a fighting chance of doing this, perhaps with the help of a file-lock or a semaphore (which may already exist ?). But if the programs are this intimately related, then #jxh has other, probably better, suggestions.]
But, if you cannot change the other programs in any way, you appear to be stuck, except...
...perhaps you could try 'sparse' files ? On (recent-ish) Linux (at least) you can fallocate() with FALLOC_FL_PUNCH_HOLE, to remove the stuff you don't want without affecting the other programs file-pointers. Of course, sooner or later the other programs may overflow the file-pointer, but that may be a more theoretical than practical issue.

Alter a file (without fseek or + modes) or concatenate two files with minimal copying

I am writing an audio file to an SD/MMC storage card in real time, in WAVE format, working on an ARM board. Said card is (and must remain) in FAT32 format. I can write a valid WAVE file just fine, provided I know how much I'm going to write beforehand.
I want to be able to put placeholder data in the Chunk Data Size field of the RIFF and data chunks, write my audio data, and then go back and update the Chunk Data Size field in those two chunks so that they have correct values, but...
I have a working filesystem and some stdio functions, with some caveats:
fwrite() supports r, w, and a, but not any + modes.
fseek() does not work in write mode.
I did not write the implementations of the above functions (I am using ARM's RL-FLashFS), and I am not certain what the justification for the restrictions/partial implementations is. Adding in the missing functionality personally is probably an option, but I would like to avoid it if possible (I have no other need of those features, do not forsee any, and can't really afford to spend too much time on it.) Switching to a different implementation is also not an option here.
I have very limited memory available, and I do not know how much audio data will be received, except that it will almost certainly be more than I can keep in memory at any one time.
I could write a file with the raw interleaved audio data in it while keeping track of how many bytes I write, close it, then open it again for reading, open a second file for writing, write the header into the second file, and copy the audio data over. That is, I could post-process it into a properly formatted valid WAVE file. I have done this and it works fine. But I want to avoid post-processing large amounts of data if at all possible.
Perhaps I could somehow concatenate two files in place? (I.e. write the data, then write the chunks to a separate file, then join them in the filesystem, avoiding much of the time spent copying potentially vast amounts of data.) My understanding of that is that, if possible, it would still involve some copying due to block orientation of the storage.
Suggestions?
EDIT:
I really should have mentioned this, but there is no OS running here. I have some stdio functions running on top of a hardware abstraction layer, and that's about it.
This should be possible, but it involves writing a set of FAT table manipulation routines.
The concept of FAT is simple: A file is stored in a chain of "clusters" - fixed size blocks. The clusters do not have to be contiguous on the disk. The Directory entry for a file includes the ID of the first cluster. The FAT contains one value for each cluster, which is either the ID of the next cluster in the chain, or an "End-Of-Chain" (EOC) marker.
So you can concatenate files together by altering the first file's EOC marker to point to the head cluster of the second file.
For your application you could write all the data, rewrite the first cluster (with the correct header) into a new file, then do FAT surgery to graft the new head onto the old tail:
Determine the FAT cluster size (S)
Determine the size of the WAV header up to the first data byte (F)
Write the incoming data to a temp file. Close when stream ends.
Create a new file with the desired name.
Open the temp file for reading, and copy the header to the new file while filling in the size field(s) correctly (as you have done before).
Write min(S-F, bytes_remaining) to the new file.
Close the new file.
If there are no bytes remaining, you are done,
else,
Read the FAT and Directory into memory.
Read the Directory to get
the first cluster of the temp file (T1) (with all the data),
the first cluster of the wav file (W1). (with the correct header)
Read the FAT entry for T1 to find the second temp cluster (T2).
Change the FAT entry for W1 from "EOC" to T2.
Change the FAT entry for T1 from T2 to "EOC".
Swap the FileSize entries for the two files in the Directory.
Write the FAT and Directory back to disk.
Delete the Temp file.
Of course, by the time you do this, you will probably understand the file system well enough to implement fseek(fp,0,SEEK_SET), which should give you enough functionality to do the header fixup through standard library calls.
We are working with exactly the same scenario as you in our project - recorder application. Since the length of file is unknown - we write a RIFF header with 0 length in the beginning (to reserve space) and on closing - go back to the 0 position (with fseek) and write correct header. Thus, I think you have to debug why fseek doesn't work in write mode, otherwise you will not be able to perform this task efficiently.
By the way, you'd better off from file system internal specific workarounds like concatenating blocks, etc - this is hardly possible, will not be portable and can bring you new problems. Let's use standard and proven methods instead.
Update
(After finding out that your FS is ARM's RL-FlashFS) why not using rewind http://www.keil.com/support/man/docs/rlarm/rlarm_rewind.htm instead of fseek?

Resources