Pharo FileSystem: setUp of SUnit test which uses a file - file

I want to write a SUnit test which uses a file with the Pharo 4.0 FileSystem.
I want to write a file and then later read it.
Something like this
fname := 'TabularTestExport1.xlsx'.
(FileLocator temp / fname ) delete.
TabularXSLXExport workbook: myWorkbook fileName: (FileLocator temp / fname ).
Questions
temp directory What is the method to use for using a temporary file in a platform independant way. FileLocator temp or FileLocator tempDirectory is not implemented.
deleting an existing test file How do I ensure that a file is deleted? I.e. How do I avoid a walkback in case the file does not exist.
Alternatively everything could be done in the memory: 1. creation of test file, 2. exporting test file, 3. Importing test file back

For tests, unless you have a real big archive, is better to do things in memory.
FileSystem provides you a way to do it, you just need to do:
fs := FileSystem memory.
It will give you a compatible API so you can make your tests.
If you want a file and not a directory, you can do:
file := FileSystem memory / 'myFile'.
EDIT: I forget a couple of things:
FileLocator temp is implemented and should work fine for you. Why you say is not implemented? Are you not finding it for some reason, maybe?
myFileReference ensureDelete will... well, ensure your file is deleted :)

Related

(Lua 5.2) Cannot create file because io.open returns nil file handle

I am trying to create a file in Lua.
The following threads all say the same thing:
https://forum.rainmeter.net/viewtopic.php?t=10024
Create a new file in Lua/LuaFileSystem
Sifteo: how to create a file.txt with LUA script?
Creating new files with Lua I/O functions
They say to use:
local file = io.open("test.txt", "w")
file:write("Hello World")
file:close()
I have implemented this like so:
local ADDRESSES = "capi/addresses.cfg"
local file, err = io.open(ADDRESSES, "w")
local data = "<DATA>"
file:write(data)
file:close()
This, however, results in the error Attempt to index local 'file' (a nil value). This is, of course, because the file does not exist because I am trying to create it. The examples seem to say that the file should be automatically created when I try to write to it, but how can I do that if the handle is nil!?
It is correct that io.open(filename, "w") will create the file if it doesn't already exist.
However, there are at least three prerequisites common to file operations:
You must have sufficient permissions to create the file in the desired path (e.g. write permission in the folder)
All folders along the path must already exist. Lua will not create folders for you.
There must be sufficient space (your file system may not be full)
You are presumably not meeting one of the prerequisites. To find out which, simply wrap your call to io.open with assert:
local file = assert(io.open(ADDRESSES, "w"))
Now if opening/creating the file fails, Lua will throw an error that tells you which prerequisite you failed to meet.
In your case, I would consider it most probable that the capi directory doesn't exist yet. On my Linux system, the corresponding error message is capi/addresses.cfg: No such file or directory.

Append data to file and make sure it doesn't get corrupted

I have an existing file and I'd like to append data to it and make sure it can never (or almost never) get corrupted, even if something fails during writing of the appended data.
One method for ensuring files won't get corrupted it to write the data to a temp file, and then rename/mv the temp file to the original file.
But doing so with append is more tricky.
I have the whole file content in memory (it's not a huge file), so I have two options in mind:
Copy the original file to a temp file, append the data to the temp file and then mv/rename the temp file to the original file
Write the whole content of the file (including the data I want to append) to a temp file and then mv/rename the temp file to the original file
The downside of both options is that they're slower than just append the data to the original file. Are there better ways to do this?
If not, which option is faster?
I need this to work on Windows, Linux and MacOS.
I'm not sure if the programming language I'm using is relevant, but I'm using Rust to write the data.

How to make atomic operation with both file system and database in Postgres?

I think the following should be a pretty common pattern :
A database is used to store file paths
The files themselves are stored in the file system
Issues may occur when say we want to modify a file path : we need to both modify
the database file path and to move the file in the filesystem. It is important that this is done "atomically". Indeed, while we are doing the modification, another process may attempt to read the file path in the datadase and then tries to access the file in the file system. We should make sure that the tuple
("file path", "actual file location")
remains consistant all the time.
Is there a canonical/simple way to achieve this with Postgres/Linux ?
One of the major features of the database is that the processes see it consistently. That also means that different clients see different state of the database.
This means that when you correct a file path in the database and commit the change any transactions that started before the commit can see the old path for some time after the commit.
So actually to make sure nobody would try to read the old file path you have to wait until all transactions from before the commit would end. That can take milliseconds or, in extreme situations, days. If you have a
I'd try to implement the following scheme (pseudocode):
sql("begin")
os.hardlink(old_path, new_path)
sql("update files set path=? where path=?, new_path, old_path)
sql("insert into files_to_clean values (?, txid_current())", old_path)
sql("commit")
if random()<CLEANUP_PROBABILITY:
sql("begin")
for delete_path in sql("
delete from files_to_clean
where txid<txid_snapshot_xmin(txid_current_snapshot())
returning path skip locked
"):
os.delete(delete_path)
sql("commit")

Deleting specific files from a directory

I am trying to delete all files from a directory apart from two (which will be erased, then re-written). One of these files, not to be deleted, contains the names of all files in the folder/directory (the other contains the number of files in the directory).
I believe there (possibly?) are 2 solutions:
Read the names of each file from the un-deleted file and delete them individually until there is only the final 2 files remaining,
or...
Because all other files end in .txt I could use some sort of filter which would only delete files with this ending.
Which of these 2 would be most efficient and how could it be done?
Any help would be appreciated.
You are going to end up deleting files one by one, regardless of which method you use. Any optimizations you make are going to be very miniscule. Without actually timing your algorithms, I'd say they'd both take about the same amount of time (and this would vary from one computer to the next, based on CPU speed, HDD type, etc). So, instead of debating that, I'll provide you code for both the ways you've mentioned:
Method1:
import os
def deleteAll(infilepath):
with open(infilepath) as infile:
for line in infile:
os.remove(line)
Method 2:
import os
def deleteAll():
blacklist = set(['names/of/files/to/be/deleted', 'number/of/files'])
for fname in (f for f in os.listdir() if f not in blacklist):
os.remove(fname)

Safely writing to and reading from the same file with multiple processes on Linux and Mac OS X

I have three processes designed to run constantly in both Linux and Mac OS X environments. One process (the Downloader) downloads and stores a local copy of a large XML file every 30 seconds. Two other processes (the Workers) use the stored XML file for input. Each Worker starts and runs at random times. Since the XML file is big, it takes a long time to download. The Workers also take a long time to read and parse it.
What is the safest way to setup the processes so the Downloader doesn't clobber the stored file while the Workers are trying to read it?
For Linux and Mac OS X machines that use inode based file systems, use temporary files to store the data while its being downloaded (and is an incomplete state). Once the download is complete, move the temporary file into its final location with an atomic action.
For a little more detail, there are two main things to watch out for when one process (e.g. Downloader) writes a file that's actively read by other processes (e.g. Workers):
Make sure the Workers don't try to read the file before the Downloader has finished writing it.
Make sure the Downloader doesn't alter the file while the Workers are reading it.
Using temporary files accommodates both of these points.
For a more specific example, when the Downloader is actively pulling the XML file, have it write to a temporary location (e.g. 'data-storage.tmp') on the same device/disk* where the final file will be stored. Once the file is completely downloaded and written, have the Downloader move it to its final location (e.g. 'data-storage.xml') via an atomic (aka linearizable) rename command like bash's mv.
* Note that the reason the temporary file needs to be on the same device as the final file location is to ensure the inode number stays the same and the rename can be done atomically.
This methodology ensures that while the file is being downloaded/written the Workers won't see it since it's in the .tmp location. Because of the way renaming works with inodes, it also make sure that any Worker that opened the file continues to see the old content even if a new version of the data-storage file is put in place.
Downloader will point 'data-storage.xml' to a new inode number when it does the rename, but the Worker will continue to access 'data-storage.xml' from the previous inode number thereby continuing to work with the file in that state. At the same time, any Worker that opens a new copy 'data-storage.xml' after Downloader has done the rename will see contents from the new inode number since it's now what is referenced directly in the file system. So, two Workers can be reading from the same filename (data-storage.xml) but each will see a different (and complete) version of the contents of the file based on which inode the filename was pointed to when the file was first opened.
To see this in action, I created a simple set of example scripts that demonstrate this functionality on github. They can also be used to test/verify that using a temporary file solution works in your environment.
An important note is that it's the file system on the particular device that matters. If you are using a Linux or Mac machine but working with a FAT file system (for example, a usb thumb drive), this method won't work.

Resources