Importing Excel file with dynamic name into SQL table via SSIS? - sql-server

I've done a few searches here, and while some issues are similar, they don't seem to be exactly what I need.
What I'm trying to do is import an Excel file into a SQL table via SSIS, but the problem is that I will never know the exact filename. We get files at no steady interval, and the file usually has a date/month in the name. For instance, our current file is "Census Data - May 2013.xls". We will only ever load ONE file at a time, so I don't need to loop through a directory for multiple Excel files.
My concept is that I can take this file, copy it to a "Loading" directory, and load it from there. At the start of the package, I will first clear out the loading directory, then scan the original directory for an Excel file, copy it to the loading directory and then load it into SQL. I suppose I may have to store the file names somewhere so I don't copy the same file into the loading directory in subsequent months, but I'm not really sure of the best way to handle that.
I've pretty much got everything down except the part that scans the directory for the Excel file and copies it to the loading directory. I've taken the majority of my info from this page, which (again) is close to what I want to do but not quite exactly the solution I need.
Can anyone get me over the finish line? I can't seem to get the Excel Connection Manager right (this is my first time using variables), and I can't figure out how to get the file into the Loading directory.

Problem statement
How do I dynamically identify a file name?
You will require some mechanism to inspect the contents of a folder and see what exists. Specifically, you are looking for an Excel file in your "Loading" directory. You know the file extension and that is it.
Resolution A
Use a ForEach File Enumerator.
Configure the Enumerator with an Expression on FileSpec of *.xls or *.xlsx depending on which flavor of Excel you're dealing with.
Add another Expression on Directory to be your Loading directory.
I typically create SSIS Variables named FolderInput and FileMask and assign those in the Enumerator.
Now when you run your package, the Enumerator is going to look in Diretory and find all the files that match the FileSpec.
Something needs to be done with what is found. You need to use that file name that the Enumerator returns. That's done through the Variable Mappings tab. I created a third Variable called CurrentFileName and assign it the results of the enumerator.
If you put a Script Task inside the ForEach Enumerator, you should be able to see that the value in the "Locals" window for #[User::CurrentFileName] has updated from the Design time value of whatever to the "real" file name.
Resolution B
Use a Script Task.
You will still need to create a Variable to hold the current file name and it probably won't hurt to also have the FolderInput and FileMask Variables available. Set the former as ReadWrite and the latter as ReadOnly variables.
Chose the .NET language of your choice. I'm using C#. The method System.IO.Directory.EnumerateFiles
using System;
using System.Data;
using System.IO;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
namespace ST_fe2ea536a97842b1a760b271f190721e
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
string folderInput = Dts.Variables["User::FolderInput"].Value.ToString();
string fileMask = Dts.Variables["User::FileMask"].Value.ToString();
try
{
var files = Directory.EnumerateFiles(folderInput, fileMask, SearchOption.AllDirectories);
foreach (string currentFile in files)
{
Dts.Variables["User::CurrentFileName"].Value = currentFile;
break;
}
}
catch (Exception e)
{
Dts.Events.FireError(0, "Script overkill", e.ToString(), string.Empty, 0);
}
Dts.TaskResult = (int)ScriptResults.Success;
}
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
}
}
Decision tree
Given the two resolutions to the above problem, how do you chose? Normally, people say "It Depends" but there only possible time it would depend is if the process should stop/error out in the case that more than one file did exist in the Loading folder. That's a case that the ForEach enumerator would be more cumbersome than a script task. Otherwise, as I stated in my original response that adds cost to your project for Development, Testing and Maintenance for no appreciable gain.
Bits and bobs
Further addressing nuances in the question: Configuring Excel - you'll need to be more specific in what isn't working. Both Siva's SO answer and the linked blogspot article show how to use the value of the Variable I call CurrentFileName to ensure the Excel File is pointing to the "right" file.
You will need to set the DelayValidation to True for both the Connection Manager and the Data Flow as the design-time value for the Variable will not be valid when the package begins execution. See this answer for a longer explanation but again, Siva called that out in their SO answer.

Related

Is there a file object to get path or name of a file in Nim?

Let's say, I would like to use a single object to represent a file and I'd like to get the filename (or path) of it so that I can use the name to remove the file or for other standard library procedures. I'd like to have a single abstraction which can be used with all available file-related standard library procedures.
I've found FileInfo but in my research I didn't find a get-file-name-procedure. File and FileHandle are pretty useless from a software engineering point of view because they provide no convenient abstraction and don't have members.
Is there a file abstraction (object) in Nim, which provides fast access to FileInfo as well as the file name so that a file doesn't need more than one procedure parameter?
There is no such abstraction in Nim, or any other language, simply because you are asking for an impossible thing to do with most filesystems. Consider the FileInfo structure and its linkCount field which tells you the number of hard links the file object has. But there is no way to get-a-filename from one or all of those links short of building and updating yourself a database of the whole filesystem.
While most filesystems allow access to files through paths, there is rarely a filesystem that gives paths from files because they actually don't need one! An example would be a Unix filesystem where one process opens a file through a path, then removes the path without closing the file. While the process holding the file open is alive, that file won't actually disappear, so you would have the case of a file without path.
The issue of handling paths, especially considering cross platform applications, involves its own can of worms: if you store paths as strings, what is the path separator and how do you escape it? Does your filesystem support volumes that require special case handling? What string encoding do paths use to satisfy all users? Just the encoding issue requires tons of tables and conversions which would bog down every other API wishing to get just a file like handle to read or write bytes.
A FileInfo is just a snapshot of the state of the file at a given time, a file handle is the live file object you can operate on, and a path (or many paths if your filesystem supports hard links) is just a convenience name for end users.
These are all very different things, which is why they are separate. Your app may need a more complex abstraction than other programmers are willing to tolerate, so create own abstraction which holds together all the individual pieces you need. For instance, consider the following structure:
import os
type
AppFileInfo = object
fileInfo: FileInfo
file: File
oneOfMany: string
proc changeFileExt(appFileInfo: AppFileInfo, ext: string): string =
changeFileExt(appFileInfo.oneOfMany, ext)
proc readAll(appFileInfo: AppFileInfo): string =
readAll(appFileInfo.file)
Those procs simply mimic the respective standard library APIs but use your more complex structure as inputs and transform it as needed. If you are worried about this abstraction not being optimised due to the extra proc call you could use a template instead.
If you follow this route, however, at some point you will have to ask yourself what is the lifetime of an AppFileInfo object: do you create it with a path? Do you create it from a file handle? Is it safe to access the file field in parts of your code or has it not been initialised properly? Do you return errors or throw exceptions when something goes wrong? Maybe when you start to ask yourself these questions you'll realise they are very app specific and are very difficult to generalise for every use case. Therefore such a complex object doesn't make much sense in the language standard library.
I created the missing solution myself. I basically extended the File type using a global encapsulated table. Extending Types like this could be a useful idiom in Nim because of UFCS.
import tables
type FileObject = object
file : File
mode : FileMode
path : string
proc initFileObject(name: string; mode: FileMode; bufsize = -1) : FileObject =
result.file = open(name, mode, bufsize)
result.path = name
result.mode = mode
var g_fileObjects = initTable[File, FileObject]()
template get(this: File) : var FileObject = g_fileObjects[this]
proc openFile*(filepath: string; mode: FileMode = fmRead; bufsize = -1) : File =
var fileObject = initFileObject(filepath, mode, bufsize)
result = fileObject.file
g_fileObjects[result] = fileObject
proc filePath*(this: File) : string {.raises: KeyError.} =
return this.get.path
proc fileMode*(this: File) : FileMode {.raises: KeyError.} =
return this.get.mode
from os import tryRemoveFile
proc closeOrDeleteFile[delete = false](this: File) : bool =
result = g_fileObjects.hasKey(this)
if result:
when delete:
result = this.filepath.tryRemoveFile()
g_fileObjects.del(this)
this.close()
proc closeFile*(this: File) : bool = this.closeOrDeleteFile[:false]
proc deleteFile*(this: File) : bool = this.closeOrDeleteFile[:true]
Now you can write
var f = openFile("myFile.txt", fmWrite)
var g = openFile("hello.txt", fmWrite)
echo f.filePath
echo f.deleteFile()
g.writeLine(g.filePath)
echo g.closeFile()

How to read a text file from resources without javaClass

I need to read a text file with readLines() and I've already found this question, but the code in the answers always uses some variation of javaClass; it seems to work only inside a class, while I'm using just a simple Kotlin file with no declared classes. Writing it like this is correct syntax-wise but it looks really ugly and it always returns null, so it must be wrong:
val lines = object {}.javaClass.getResource("file.txt")?.toURI()?.toPath()?.readLines()
Of course I could just specify the raw path like this, but I wonder if there's a better way:
val lines = File("src/main/resources/file.txt").readLines()
Thanks to this answer for providing the correct way to read the file. Currently, reading files from resources without using javaClass or similar constructs doesn't seem to be possible.
// use this if you're inside a class
val lines = this::class.java.getResourceAsStream("file.txt")?.bufferedReader()?.readLines()
// use this otherwise
val lines = object {}.javaClass.getResourceAsStream("file.txt")?.bufferedReader()?.readLines()
According to other similar questions I've found, the second way might also work within a lambda but I haven't tested it. Notice the need for the ?. operator and the lines?.let {} syntax needed from this point onward, because getResourceAsStream() returns null if no resource is found with the given name.
Kotlin doesn't have its own means of getting a resource, so you have to use Java's method Class.getResource. You should not assume that the resource is a file (i.e. don't use toPath) as it could well be an entry in a jar, and not a file on the file system. To read a resource, it is easier to get the resource as an InputStream and then read lines from it:
val lines = this::class.java.getResourceAsStream("file.txt").bufferedReader().readLines()
I'm not sure if my response attempts to answer your exact question, but perhaps you could do something like this:
I'm guessing in the final use case, the file names would be dynamic - Not statically declared. In which case, if you have access to or know the path to the folder, you could do something like this:
// Create an extension function on the String class to retrieve a list of
// files available within a folder. Though I have not added a check here
// to validate this, a condition can be added to assert if the extension
// called is executed on a folder or not
fun String.getFilesInFolder(): Array<out File>? = with(File(this)) { return listFiles() }
// Call the extension function on the String folder path wherever required
fun retrieveFiles(): Array<out File>? = [PATH TO FOLDER].getFilesInFolder()
Once you have a reference to the List<out File> object, you could do something like this:
// Create an extension function to read
fun File.retrieveContent() = readLines()
// You can can further expand this use case to conditionally return
// readLines() or entire file data using a buffered reader or convert file
// content to a Data class through GSON/whatever.
// You can use Generic Constraints
// Refer this article for possibilities
// https://kotlinlang.org/docs/generics.html#generic-constraints
// Then simply call this extension function after retrieving files in the folder.
listOfFiles?.forEach { singleFile -> println(singleFile.retrieveContent()) }
In order to have the same url that work for both Jar or in local, the url (or path) needs to be a relative path from the repository root.
..meaning, the location of your file or folder from your src folder.
could be "/main/resources/your-folder/" or "/client/notes/somefile.md"
The url must be a relative path from the repository root.
it must be "src/main/resources/your-folder/" or "src/client/notes/somefile.md"
Now you get the drill, and luckily for Intellij Idea users, you can get the correct path with a right-click on the folder or file -> copy Path/Reference.. -> Path From Repository Root (this is it)
Last, paste it and do your thing.

Need to create a copy of a Database to a location not under Data

I want to create a copy of a database to a folder not under the data either locally or on a server I have code that looks like this:
var arcName:String = "C:\Archive\MyArchives\SomeName.nsf"
var arcDB:NotesDatabase = appDB.createCopy("", arcName);
When the action finishes (it does not generate any errors) I can't find the database anywhere. if I change the arcName to "Archives\Myarchives\SomeName,nsf" the process works correctly. But I don't want these Archives under Data.
Using the full path does not seem to make it move out from under the Data folder.
This may be a case of string escaping - in SSJS, like most C-lineage languages, \ is the escape character. Give it a shot with \\ in place of each. In my testing, it works as database.createCopy("", "C:\\Archive\\MyArchives\\SomeName.nsf").

boost log every hour

I'm using boost log and I want to make basic log principal file: new error log at the beginning of each hour (if error exists), and to name it like "file_%Y%m%d%H.log".
I have 2 problems with this boost library:
1. How to rotate file at the beginning of each hour?
This isn't possible with rotation_at_time_interval parameter because it creates new file regarding first written record in file, and the hour in file name doesn't match that rule. Is it possible to have multiple rotation_at_time_point for one file in sink or is there some other solution?
2. When file exceed some size I want it to start new file and in that case it should append some index to file name. With adding rotation_size parametar and %N to file name it will increment N all the time while application is running. I want that N to be reset at the beginning of each hour, just as my file name changes. Does anybody have any idea how to do that with this boost log library?
This is basic principal in creating log files in industry. I really don't understand how this can't be done with library which is dedicated for creating log files.
Library itself doesn't provide a way to rotate file at the begging of every hour, but i had same problem so i used a function wrapper, which return true on begging of every hour.
I find this way better for me, because i can controll efficency of code.
from boost.org:
bool is_it_time_to_rotate();
void init_logging(){
boost::shared_ptr< sinks::text_file_backend > backend =
boost::make_shared< sinks::text_file_backend >(
keywords::file_name = "file_%5N.log",
keywords::time_based_rotation = &is_it_time_to_rotate
);
}
For a second question i really dont undrestand it well.

Minimizing disk accesses when getting attributes of files in a directory

As the title suggests, I'm looking for a way to get attributes of a large number of files in a directory, but without adding the cost of an additional disk access for each file.
For example, if I get the Name attribute of FileInfo objects in a collection, then there is no additional disk access. However if I get the LastWriteTimeUtc, then an additional disk access is made.
My code:
DirectoryInfo di = new DirectoryInfo(myDir);
FileInfo[] allFiles = di.GetFiles("*.*", SearchOption.TopDirectoryOnly);
foreach (FileInfo fInfo in allFiles)
{
name = fInfo.Name //no additional disk access made
lastMod = fInfo.LastWriteTimeUtc //further disk access made!!!
}
Does anyone know of a way I can get this information in one round trip? I would have hoped that DirectoryInfo.GetFiles() does this but no luck.
Thanks in advance.
If you really care about this, you should probably write this in C using FindFirstFile/GetFileTime, etc.
So, this happens by design. The LastWriteTimeUtc is lazy loaded. So nothing to do other write my own component.

Resources