Recursive File Search in .net

Recursive File Search in .net - file

I need to search a drive (C:, D: etc) for a partuicular file type (extension like .xml, .csv, .xls). How do I preform a recursive search to loop all directories and inner directories and return the full path of where the file(s) are? or where can I get information on this?
VB.NET or C#
Thanks
Edit ~ I am running into some errors like unable to access system volume access denied etc. Does anyone know where I can see some smaple code on implementing a file search? I just need to search a selected drive and return the full path of the file type for all the files found.

System.IO.Directory.GetFiles(#"c:\", "*.xml", SearchOption.AllDirectories);

How about this? It avoids the exception often thrown by the in-built recursive search (i.e. you get access-denied to a single folder, and your whole search dies), and is lazily evaluated (i.e. it returns results as soon as it finds them, rather than buffering 2000 results). The lazy behaviour lets you build responsive UIs etc, and also works well with LINQ (especially First(), Take(), etc).
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
static class Program { // formatted for vertical space
static void Main() {
foreach (string match in Search("c:\\", "*.xml")) {
Console.WriteLine(match);
}
}
static IEnumerable<string> Search(string root, string searchPattern) {
Queue<string> dirs = new Queue<string>();
dirs.Enqueue(root);
while (dirs.Count > 0) {
string dir = dirs.Dequeue();
// files
string[] paths = null;
try {
paths = Directory.GetFiles(dir, searchPattern);
} catch { } // swallow
if (paths != null && paths.Length > 0) {
foreach (string file in paths) {
yield return file;
}
}
// sub-directories
paths = null;
try {
paths = Directory.GetDirectories(dir);
} catch { } // swallow
if (paths != null && paths.Length > 0) {
foreach (string subDir in paths) {
dirs.Enqueue(subDir);
}
}
}
}
}

It looks like the recls library - stands for recursive ls - now has a pure .NET implementation. I just read about it in Dr Dobb's.
Would be used as:
using Recls;
using System;
static class Program { // formatted for vertical space
static void Main() {
foreach(IEntry e in FileSearcher.Search(#"c:\", "*.xml|*.csv|*.xls")) {
Console.WriteLine(e.Path);
}
}

Related

Eclipse PDE: Get full path of an external file open in Workbench

I am writing an Eclipse Plugin which requires me to get full path of any kind of file open in the Workspace.
I am able to get full path of any file which is part of any Eclipse project. Code to get open/active editor file from workspace.
public static String getActiveFilename(IWorkbenchWindow window) {
IWorkbenchPage activePage = window.getActivePage();
IEditorInput input = activePage.getActiveEditor().getEditorInput();
String name = activePage.getActiveEditor().getEditorInput().getName();
PluginUtils.log(activePage.getActiveEditor().getClass() +" Editor.");
IPath path = input instanceof FileEditorInput ? ((FileEditorInput) input).getPath() : null;
if (path != null) {
return path.toPortableString();
}
return name;
}
However, if any file is drag-dropped in Workspace or opened using File -> Open File. For instance, I opened a file from /Users/mac/log.txt from File -> Open File. My plugin is not able to find location of this file.

After couple of days search, I found the answer by looking at the source code of Eclipse IDE.
In IDE.class, Eclipse tries to find a suitable editor input depending on the workspace file or an external file. Eclipse handles files in workspace using FileEditorInput and external files using FileStoreEditorInput. Code snippet below:
/**
* Create the Editor Input appropriate for the given <code>IFileStore</code>.
* The result is a normal file editor input if the file exists in the
* workspace and, if not, we create a wrapper capable of managing an
* 'external' file using its <code>IFileStore</code>.
*
* #param fileStore
* The file store to provide the editor input for
* #return The editor input associated with the given file store
* #since 3.3
*/
private static IEditorInput getEditorInput(IFileStore fileStore) {
IFile workspaceFile = getWorkspaceFile(fileStore);
if (workspaceFile != null)
return new FileEditorInput(workspaceFile);
return new FileStoreEditorInput(fileStore);
}
I have modified the code posted in the question to handle both files in Workspace and external file.
public static String getActiveEditorFilepath(IWorkbenchWindow window) {
IWorkbenchPage activePage = window.getActivePage();
IEditorInput input = activePage.getActiveEditor().getEditorInput();
String name = activePage.getActiveEditor().getEditorInput().getName();
//Path of files in the workspace.
IPath path = input instanceof FileEditorInput ? ((FileEditorInput) input).getPath() : null;
if (path != null) {
return path.toPortableString();
}
//Path of the externally opened files in Editor context.
try {
URI urlPath = input instanceof FileStoreEditorInput ? ((FileStoreEditorInput) input).getURI() : null;
if (urlPath != null) {
return new File(urlPath.toURL().getPath()).getAbsolutePath();
}
} catch (MalformedURLException e) {
e.printStackTrace();
}
//Fallback option to get at least name
return name;
}

Pointers, functions and arrays in D Programming Language

I'm writing a method to output to several output streams at once, the way I got it set up right now is that I have a LogController, LogFile and LogConsole, the latter two are implementations of the Log interface.
What I'm trying to do right now adding a method to the LogController that attaches any implementation of the Log interface.
How I want to do this is as follows: in the LogController I have an associative array, in which I store pointers to Log objects. When the writeOut method of the LogController is called, I want it to then run over the elements of the array and call their writeOut methods too. The latter I can do, but the previous is proving to be difficult.
Mage/Utility/LogController.d
module Mage.Utility.LogController;
import std.stdio;
interface Log {
public void writeOut(string s);
}
class LogController {
private Log*[string] m_Logs;
public this() {
}
public void attach(string name, ref Log l) {
foreach (string key; m_Logs.keys) {
if (name is key) return;
}
m_Logs[name] = &l;
}
public void writeOut(string s) {
foreach (Log* log; m_Logs) {
log.writeOut(s);
}
}
}
Mage/Utility/LogFile.d
module Mage.Utility.LogFile;
import std.stdio;
import std.datetime;
import Mage.Utility.LogController;
class LogFile : Log {
private File fp;
private string path;
public this(string path) {
this.fp = File(path, "a+");
this.path = path;
}
public void writeOut(string s) {
this.fp.writefln("[%s] %s", this.timestamp(), s);
}
private string timestamp() {
return Clock.currTime().toISOExtString();
}
}
I've already tried multiple things with the attach functions, and none of them. The build fails with the following error:
Mage\Root.d(0,0): Error: function Mage.Utility.LogController.LogController.attach (string name, ref Log l) is not callable using argument types (string, LogFile)
This is the incriminating function:
public void initialise(string logfile = DEFAULT_LOG_FILENAME) {
m_Log = new LogController();
LogFile lf = new LogFile(logfile);
m_Log.attach("Log File", lf);
}
Can anyone tell me where I'm going wrong here? I'm stumped and I haven't been able to find the answer anywhere. I've tried a multitude of different solutions and none of them work.

Classes and interfaces in D are reference types, so Log* is redundant - remove the *. Similarly, there is no need to use ref in ref Log l - that's like taking a pointer by reference in C++.
This is the cause of the error message you posted - variables passed by reference must match in type exactly. Removing the ref should solve the error.

Spell checker using lucene

I am trying to write a spell corrector using the lucene spellchecker. I would want to give it a single text file with blog text content. The problem is that it works only when I give it one sentence/word per line in my dictionary file. Also the suggest API returns results without giving any weightage to number of occurences. Following is the source code
public class SpellCorrector {
SpellChecker spellChecker = null;
public SpellCorrector() {
try {
File file = new File("/home/ubuntu/spellCheckIndex");
Directory directory = FSDirectory.open(file);
spellChecker = new SpellChecker(directory);
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
spellChecker.indexDictionary(
new PlainTextDictionary(new File("/home/ubuntu/main.dictionary")), config, true);
//Should I format this file with one sentence/word per line?
} catch (IOException e) {
}
}
public String correct(String query) {
if (spellChecker != null) {
try {
String[] suggestions = spellChecker.suggestSimilar(query, 5);
// This returns the suggestion not based on occurence but based on when it occured
if (suggestions != null) {
if (suggestions.length != 0) {
return suggestions[0];
}
}
} catch (IOException e) {
return null;
}
}
return null;
}
}
Do I need to make some changes?

Regarding your first issue, sounds like the expected, documented dictionary format, here in the PlainTextDictionary API. If you want to pass arbitrary text in, you might want to index it and use a LuceneDictionary instead, or possibly a HighFrequencyDictionary, depending on your needs.
The Spellchecker suggests replacements based on the similarity between the words (based on Levenstein Distance), before any other concern. If you want it to only recommend more popular terms as suggestions, you should pass a SuggestMode to SpellChecker.suggestSimilar. This ensures that matches suggested are at least as strong, popularity-wise, as the word they are intended to replace.
If you must override how Lucene decides on best matches, you can do that with SpellChecker.setComparator, creating your own Comparator on SuggestWords. Since SuggestWord exposes freq to you, it should be easy to arrange found matches by popularity.

Windows 8 StorageFile.GetFileFromPathAsync Using UNC Path

Has anyone EVER managed to use a windows 8 app to copy files from a unc dir to a local dir ?
According to the official documentation here
It is possible to connect to a UNC path
I am using the std FILE ACCESS sample and have changed one line of code to read as below
I have added all the capabilities
Added .txt as a file type
The UNC path is read write to everyone and is located on the same machine..
But I keep getting Access Denied Errors.
Can anyone possibly provide me with a working example
This is driving me mad and really questioning the whole point of win 8 dev for LOB apps.
TIA
private async void Initialize()
{
try
{
//sampleFile = await Windows.Storage.KnownFolders.DocumentsLibrary.GetFileAsync(filename);
string myfile = #"\\ALL387\Temp\testfile.txt";
sampleFile = await Windows.Storage.StorageFile.GetFileFromPathAsync(myfile);
}
catch (FileNotFoundException)
{
// sample file doesn't exist so scenario one must be run
}
catch (Exception e)
{
var fred = e.Message;
}
}

I have sorted this out and the way I found best to do it was to create a folder object
enumnerate over the files in the folder object
copy the files one at a time to the local folder then access them
It seems that you can't open the files, but you can copy them. ( which was what I was trying to achieve in the first place )
Hope this helps
private async void Initialize()
{
try
{
var myfldr = await Windows.Storage.StorageFolder.GetFolderFromPathAsync(#"\\ALL387\Temp");
var myfiles = await myfldr.GetFilesAsync();
foreach (StorageFile myfile in myfiles)
{
StorageFile fileCopy = await myfile.CopyAsync(KnownFolders.DocumentsLibrary, myfile.Name, NameCollisionOption.ReplaceExisting);
}
var dsd = await Windows.Storage.KnownFolders.PicturesLibrary.GetFilesAsync();
foreach (var file in dsd)
{
StorageFile sampleFile = await Windows.Storage.StorageFile.GetFileFromPathAsync(file.Path);
}
}
catch (FileNotFoundException)
{
// sample file doesn't exist so scenario one must be run
}
catch (Exception e)
{
var fred = e.Message;
}
}

How do you store uploaded files in a filesystem?

I'm trying to figure out the best way to store user uploaded files in a file system. The files range from personal files to wiki files. Of course, the DB will point to those files by someway which I have yet to figure out.
Basic Requirements:
Fairy Decent Security so People Can't Guess Filenames
(Picture001.jpg, Picture002.jpg,
Music001.mp3 is a big no no)
Easily Backed Up & Mirrorable (I prefer a way so I don't have to copy the entire HDD every single time I want to backup. I like the idea of backing up just the newest items but I'm flexible with the options here.)
Scalable to millions of files on multiple servers if needed.

One technique is to store the data in files named after the hash (SHA1) of their contents. This is not easily guessable, any backup program should be able to handle it, and it easily sharded (by storing hashes starting with 0 on one machine, hashes starting with 1 on the next, etc).
The database would contain a mapping between the user's assigned name and the SHA1 hash of the contents.

Guids for filenames, automatically expanding folder hierarchy with no more than a couple of thousand files/folders in each folder. Backing up new files is done by backing up new folders.
You haven't indicated what environment and/or programming language you are using, but here's a C# / .net / Windows example:
using System;
using System.IO;
using System.Xml.Serialization;
/// <summary>
/// Class for generating storage structure and file names for document storage.
/// Copyright (c) 2008, Huagati Systems Co.,Ltd.
/// </summary>
public class DocumentStorage
{
private static StorageDirectory _StorageDirectory = null;
public static string GetNewUNCPath()
{
string storageDirectory = GetStorageDirectory();
if (!storageDirectory.EndsWith("\\"))
{
storageDirectory += "\\";
}
return storageDirectory + GuidEx.NewSeqGuid().ToString() + ".data";
}
public static void SaveDocumentInfo(string documentPath, Document documentInfo)
{
//the filestream object don't like NTFS streams so this is disabled for now...
return;
//stores a document object in a separate "docinfo" stream attached to the file it belongs to
//XmlSerializer ser = new XmlSerializer(typeof(Document));
//string infoStream = documentPath + ":docinfo";
//FileStream fs = new FileStream(infoStream, FileMode.Create);
//ser.Serialize(fs, documentInfo);
//fs.Flush();
//fs.Close();
}
private static string GetStorageDirectory()
{
string storageRoot = ConfigSettings.DocumentStorageRoot;
if (!storageRoot.EndsWith("\\"))
{
storageRoot += "\\";
}
//get storage directory if not set
if (_StorageDirectory == null)
{
_StorageDirectory = new StorageDirectory();
lock (_StorageDirectory)
{
string path = ConfigSettings.ReadSettingString("CurrentDocumentStoragePath");
if (path == null)
{
//no storage tree created yet, create first set of subfolders
path = CreateStorageDirectory(storageRoot, 1);
_StorageDirectory.FullPath = path.Substring(storageRoot.Length);
ConfigSettings.WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.FullPath);
}
else
{
_StorageDirectory.FullPath = path;
}
}
}
int fileCount = (new DirectoryInfo(storageRoot + _StorageDirectory.FullPath)).GetFiles().Length;
if (fileCount > ConfigSettings.FolderContentLimitFiles)
{
//if the directory has exceeded number of files per directory, create a new one...
lock (_StorageDirectory)
{
string path = GetNewStorageFolder(storageRoot + _StorageDirectory.FullPath, ConfigSettings.DocumentStorageDepth);
_StorageDirectory.FullPath = path.Substring(storageRoot.Length);
ConfigSettings.WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.FullPath);
}
}
return storageRoot + _StorageDirectory.FullPath;
}
private static string GetNewStorageFolder(string currentPath, int currentDepth)
{
string parentFolder = currentPath.Substring(0, currentPath.LastIndexOf("\\"));
int parentFolderFolderCount = (new DirectoryInfo(parentFolder)).GetDirectories().Length;
if (parentFolderFolderCount < ConfigSettings.FolderContentLimitFolders)
{
return CreateStorageDirectory(parentFolder, currentDepth);
}
else
{
return GetNewStorageFolder(parentFolder, currentDepth - 1);
}
}
private static string CreateStorageDirectory(string currentDir, int currentDepth)
{
string storageDirectory = null;
string directoryName = GuidEx.NewSeqGuid().ToString();
if (!currentDir.EndsWith("\\"))
{
currentDir += "\\";
}
Directory.CreateDirectory(currentDir + directoryName);
if (currentDepth < ConfigSettings.DocumentStorageDepth)
{
storageDirectory = CreateStorageDirectory(currentDir + directoryName, currentDepth + 1);
}
else
{
storageDirectory = currentDir + directoryName;
}
return storageDirectory;
}
private class StorageDirectory
{
public string DirectoryName { get; set; }
public StorageDirectory ParentDirectory { get; set; }
public string FullPath
{
get
{
if (ParentDirectory != null)
{
return ParentDirectory.FullPath + "\\" + DirectoryName;
}
else
{
return DirectoryName;
}
}
set
{
if (value.Contains("\\"))
{
DirectoryName = value.Substring(value.LastIndexOf("\\") + 1);
ParentDirectory = new StorageDirectory { FullPath = value.Substring(0, value.LastIndexOf("\\")) };
}
else
{
DirectoryName = value;
}
}
}
}
}

SHA1 hash of the filename + a salt (or, if you want, of the file contents. That makes detecting duplicate files easier, but also puts a LOT more stress on the server). This may need some tweaking to be unique (i.e. add Uploaded UserID or a Timestamp), and the salt is to make it not guessable.
Folder structure is then by parts of the hash.
For example, if the hash is "2fd4e1c67a2d28fced849ee1bb76e7391b93eb12" then the folders could be:
/2
/2/2f/
/2/2f/2fd/
/2/2f/2fd/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
This is to prevent large folders (some Operating Systems have trouble enumarating folders with a million of files, hence making a few subfolders for parts of the hash. How many levels? That depends on how many files you expect, but 2 or 3 is usually reasonable.

Just in terms of one aspect of your question (security): the best way to safely store uploaded files in a filesystem is to ensure the uploaded files are out of the webroot (i.e., you can't access them directly via a URL - you have to go through a script).
This gives you complete control over what people can download (security) and allows for things such as logging. Of course, you have to ensure the script itself is secure, but it means only the people you allow will be able to download certain files.

Expanding on Phill Sacre's answer, another aspect of security is to use a separate domain name for uploaded files (for instante, Wikipedia uses upload.wikimedia.org), and make sure that domain cannot read any of your site's cookies. This prevents people from uploading a HTML file with a script to steal your users' session cookies (simply setting the Content-Type header isn't enough, because some browsers are known to ignore it and guess based on the file's contents; it can also be embedded in other kinds of files, so it's not trivial to check for HTML and disallow it).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Recursive File Search in .net - file

System.IO.Directory.GetFiles(#"c:\", "*.xml", SearchOption.AllDirectories);

Related

Eclipse PDE: Get full path of an external file open in Workbench

Pointers, functions and arrays in D Programming Language

Spell checker using lucene

Windows 8 StorageFile.GetFileFromPathAsync Using UNC Path

How do you store uploaded files in a filesystem?

Categories

Resources