Naming a file by a url - file

I am generating images of webpages from urls and writing them to my local filesystem, and I want to name the image by its url. However, if I do this, since there are /'s in the url, it will create new directories instead of calling the image by the entire url. Is there a way to avoid this?

I don't believe it's possible to place a / or \ in a file name. The best thing I can think to recommend is replacing all / or \ with alternate characters not normally found in URLs, such as $ ^ , !. If you can't find a single character to replace the slash, you can use a short string that's unlikely to be in the URL, such as [slash] or a combination of symbols.
The problem to watch out for when using the short string is the length of the file names. Although most OSes won't have a problem, you may not be able to zip them with their full name.

Related

Apache camel file with doneFileName

I am just starting to look at apache camel (Using blueprint routes) and I am already stuck.
I need to process a set of csv files with different formats. I get 5 files with foo_X_20160110.csv were X specifies the type of csv file and the files have a date stamp . These files can be quite large so a 'done' file is written once all files are written. The done file is named foo_trigger_20160110.csv.
I've seen the doneFileName option on file but that only supports a static name (I have a date in the filename) or it expects a done file for each input file.
The files have to be proceeded in a fixed order but it is not guaranteed in which order they are written to the input directory. Hence I need to wait for the done file.
Any idea how this can be done with Camel?
Any suggestions for good Camel books?
Here is an example from the documentation
http://camel.apache.org/file2.html
from("file:C:/temp/input.txt?doneFileName=done");
As you can see the doneFileName has a static value "done". But you can use standard java to write dynamic names i.e. for current dateformat or anything else and just use string operation to construct the URI. Hope that helps.
Update:
By the way, as mentioned in the documentation there is the option of dynamic placeholders for the doneFileName.
However its more common to have one done file per target file. This
means there is a 1:1 correlation. To do this you must use dynamic
placeholders in the doneFileName option. Currently Camel supports the
following two dynamic tokens: file:name and file:name.noext which must
be enclosed in ${ }. The consumer only supports the static part of the
done file name as either prefix or suffix (not both).
from("file:bar?doneFileName=${file:name}.done");
You can also use a prefix for the done file, such as:
from("file:bar?doneFileName=ready-${file:name}");

Tcl determine file name from browser upload

I have run into a problem in one of my Tcl scripts where I am uploading a file from a Windows computer to a Unix server. I would like to get just the original file name from the Windows file and save the new file with the same name. The problem is that [file tail windows_file_name] does not work, it returns the whole file name like "c:\temp\dog.jpg" instead of just "dog.jpg". File tail works correctly on a Unix file name "/usr/tmp/dog.jpg", so for some reason it is not detecting that the file is in Windows format. However Tcl on my Windows computer works correctly for either name format. I am using Tcl 8.4.18, so maybe it is too old? Is there another trick to get it to split correctly?
Thanks
The problem here is that on Windows, both \ and / are valid path separators so long Windows API is concerned (even though only \ is deemed to be "official" on Windows). On the other hand, in POSIX, the only valid path separator is /, and the only two bytes which can't appear in a pathname component are / and \0 (a byte with value 0).
Hence, on a POSIX system, "C:\foo\bar.baz" is a perfectly valid short filename, and running
file normalize {C:\foo\bar.baz}
would yield /path/to/current/dir/C:\foo\bar.baz. By the same logic, [file tail $short_filename] is the same as $short_filename.
The solution is to either do what Glenn Jackman proposed or to somehow pass the short name from the browser via some other means (some JS bound to an appropriate file entry?). Also you could attempt to detect the user's OS from the User-Agent header.
To make Glenn's idea more agnostic to user's platform, you could go like this:
Scan the file name for "/".
If none found, do set fname [string map {\\ /} $fname] then go to the next step.
Use [file tail $fn] to extract the tail name.
It's not very bullet-proof, but supposedly better than nothing.
You could always do [lindex [split $windows_file_name \\] end]

What corner cases must we consider when parsing $PATH on Linux?

I'm working on a C application that has to walk $PATH to find full pathnames for binaries, and the only allowed dependency is glibc (i.e. no calling external programs like which). In the normal case, this just entails splitting getenv("PATH") by colons and checking each directory one by one, but I want to be sure I cover all of the possible corner cases. What gotchas should I look out for? In particular, are relative paths, paths starting with ~ meant to be expanded to $HOME, or paths containing the : char allowed?
One thing that once surprised me is that the empty string in PATH means the current directory. Two adjacent colons or a colon at the end or beginning of PATH means the current directory is included. This is documented in man bash for instance.
It also is in the POSIX specification.
So
PATH=:/bin
PATH=/bin:
PATH=/bin::/usr/bin
All mean the current directory is in PATH
I'm not sure this is a problem with Linux in general, but make sure that your code works if PATH has some funky (like, UTF-8) encoding to deal with directories with fancy letters. I suspect this might depend on the filesystem encoding.
I remember working on a bug report of some russian guy who had fancy letters in his user name (and hence, his home directory name which appeared in PATH).
This is minor but I'll added it since it hasn't already been mentioned. $PATH can include both absolute and relative paths. If your crawling the paths list by chdir(2)ing into each directory, you need to keep track of the original working directory (getcwd(3)) and chdir(2) back to it at each iteration of the crawl.
The existing answers cover most of it, but it's worth covering parts of the question that wasn't answered yet:
$ and ~ are not special in the value of $PATH.
If $PATH is not set at all, execvp() will use a default value.

Spaces in Directory Names

Is putting a space in a directory name still a big deal? I've been doing some reading, but all the articles are from the early 2000s. Is it a problem now?
For those who don't get what I mean: public_html/space directory/index.html
If this is still an issue, why shouldn't I use spaces when naming files and directories?
Spaces in URLs are still special characters that need to be escaped or encoded (either a + or %20).
Well, I am still crossing fingers when executing external processes (from ant or Java's ProcessBuilder for example). If you just pass this dir to the external process within the command - it may break apart in two arguments which is clearly not what you want.
Some quoting and minding the spaces is still required in some usecases.

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources