Anyone would have an idea how to read a text file (i.e. a log file being populated continuously) from SQL Server and import it continuously into a SQL Server table ?
I would like to use only T-SQL, within a stored procedure for instance.
I have not found any option in BULK INSERT or OPENROWSET other than to read the whole file at once. I would have to do it at once repeatedly and look for not yet imported rows.
That's a possibility but no very efficient if the file gets large.
Is it possible to read only the latest rows at each run?
Thanks !
Philippe
You could use the FileSystemWatcher in order to get notified when the log file changes
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = #"C:\PathOfTheLogfile";
watcher.Filter = "MyLogfile.txt"; // You can also use wildcards here.
watcher.NotifyFilter = NotifyFilters.LastWrite;
watcher.Changed += new FileSystemEventHandler(Watcher_Changed);
watcher.Created += new FileSystemEventHandler(Watcher_Changed);
watcher.EnableRaisingEvents = true; // Start watching.
...
private static void Watcher_Changed(object source, FileSystemEventArgs e)
{
if (e.ChangeType == WatcherChangeTypes.Created) {
//TODO: Read log from beginning
} else {
//TODO: Read log from last position
}
//TODO: write to MSSQL
//TODO: remember last log file position
}
Sometimes the FileSystemWatcher events fires when the file is still being written to. You might have to add a delay before reading the log file.
You might be able to use a linked server to the text file if your file is parseable by the Jet provider.
Related
I created a bro script, with the objective of extract all files for all posible protocols from a pcap file. But I dont want to write all logs. Bro create a log file for each protocol. Example: 'http.log', 'smtp.log', etc. Even a 'weird.log' is generated. My pcap files are large (20gb), so, each log file contains over 30mb of information. This log generation reduce the performance of the file extraction.
I can disable the 'conn.log' with the line Log::disable_stream(Conn::LOG) but, what about all protocol logging??
This is my script
#load base/files/extract
event bro_init()
{
Log::disable_stream(Conn::LOG);
}
event file_sniff(f: fa_file, meta: fa_metadata)
{
local ext = "";
if ( meta?$mime_type )
ext = split_string(meta$mime_type, /\//)[1];
local fname = fmt("%s-%s.%s", f$source, f$id, ext);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT, [$extract_filename=fname]);
}
You can use the none writer like this:
bro -r packets.pcap Log::default_writer=Log::WRITER_NONE
I'm not totally convinced that writing these logs is harming your performance in any real way though. Typically, writing the files to disk is what causes the biggest overhead.
Here's a way to turn off whatever logging's been turned on (prior to bro_init), without having to know which stream IDs are relevant:
event bro_init()
{
# We don't want any output other than from this script.
for (id in Log::active_streams)
Log::disable_stream(id);
}
This construct makes me twitch a little about modifying a table while iterating over it, but it seems to work and I can't actually find any way to peek at one key from a table without doing an iteration. I suppose one could write
event bro_init()
{
while (|Log::active_streams|) {
for (id in Log::active_streams) {
Log::disable_stream(id);
break;
}
}
}
but that's hideous and I'm not going to use it unless I discover that I have to.
I achieved this with this line of code in main.bro:
Log::remove_filter(Conn::LOG, "default");
I want to write entries to a log file stored in Azure file storage. I currently have this:
var log = "My log entry";
var client = _storageAccount.CreateCloudFileClient();
var share = client.GetShareReference(Config.LogShare);
share.CreateIfNotExists();
var root = share.GetRootDirectoryReference();
var logfile = root.GetFileReference("log.txt");
if (!logfile.Exists()) logfile.Create(0);
// What goes here to append to the file...?
I can see plenty of examples of how to do this with Blobs, or how to upload an entire file, but how do I just append to an existing file?
I have tried this:
var buffer = Encoding.GetEncoding("UTF-8").GetBytes(log.ToCharArray());
using (var fileStream = logfile.OpenWrite(0)) {
fileStream.Write(buffer, (int)logfile.Properties.Length, buffer.Length);
}
But then I get this error:
The remote server returned an error: (416) The range specified is invalid for the current size of the resource..
I managed to work this out myself. You just need to increase the size of the file by the number of new bytes you want to write to it, and then write the new data to that new empty space at the end of the file, like this:
var client = _storageAccount.CreateCloudFileClient();
var share = client.GetShareReference(Config.LogShare);
share.CreateIfNotExists();
var root = share.GetRootDirectoryReference();
var logfile = root.GetFileReference("log.txt");
if (!logfile.Exists()) logfile.Create(0);
var buffer = Encoding.UTF8.GetBytes($"{log}\r\n");
logfile.Resize(logfile.Properties.Length + buffer.Length);
using (var fileStream = logfile.OpenWrite(null)) {
fileStream.Seek(buffer.Length * -1, SeekOrigin.End);
fileStream.Write(buffer, 0, buffer.Length);
}
You can do this with blobs https://blogs.msdn.microsoft.com/windowsazurestorage/2015/04/13/introducing-azure-storage-append-blob/
Shame it doesn't work with files too
Azure file storage REST API doesn't support appending to an existing file. To achieve this, please mount the file share to your machine as a drive, and append to the file just like simple local files.
Actually, I don't think you really need appending functionality per your code above. You can specify the file size in CloudFile.OpenWrite() / CloudFile.Create(), or try CloudFile.UploadFromStream() instead of CloudFile.OpenWrite().
This error could also be due to multi-threaded access.
I bet if you tried to lock the file before you access it, you will not face this problem.
There are many ways to update the file.
Since you already managed to get the share, the root, the folder and the file.. Here is a portion of my code that worked for me.
if (!fileLock.IsWriteLockHeld) fileLock.EnterWriteLock();
try
{
using (var stream = new MemoryStream(content, false))
{
file.UploadFromStream(stream, null, options);
}
}
catch (Exception ex)
{
File.AppendAllText(FileName, ex.ToString());
}
finally
{
if (fileLock.IsWriteLockHeld)
fileLock.ExitWriteLock();
}
Where fileLock is declared as:
protected ReaderWriterLockSlim fileLock = new ReaderWriterLockSlim();
Having said that, I am not saying that this is the best way ever to do it.
The two things I would like you to keep in mind :
1-Lock the resource that is likely to be accessed by more than one thread (That is so common in AZURE)
2- Get familiar with asynchronous methods that Azure provides.. use them when they suit well.
Coming back to your original problem about appending to the existing file..
All the methods of the CloudFile will overwrite the existing file. Cloud Files are not for frequent writing, and they indeed impact performance if you keep writing on them frequently, add the lock impact on performance, they will be horrible.
Cloud files are meant to store big bulk of data once and for all, if you want to add another bulk you have the choice of creating another file.
Have all your data with the client till they reach some size and create an algorith to select the file name and upload them all at once.
Hi I have a specific question regarding scheduling SSIS import:
I have a data source which will send me scheduled excel sheet to my email inbox on daily basis. The expectation is to find a solution which will take this daily excel sheet email into SSIS and schedule importing into SQL on its own.
Is it possible at all? If anyone could provide some useful links or where shall I start to look into, it will be much appreciated.
Thank you
Sense there was no answer to the mail client yet I am just going to throw this out there.
This will work on Gmail and is configured for such.
First thing first you have to make sure that you enable POP (this will allow the process to read your inbox). It is suggested that you select "enable pop from now on" as it will only allow the viewing of items from that point forward.
Once you have done that you need to get the Nuget package for OpenPOP.net
now the fun part. Please keep in mind this is not proper coding practice and you are responsible for the addition of necessary security precautions and error handling. This is purely a proof of concept.
using OpenPop.Pop3;
using OpenPop.Mime;
using OpenPop.Mime.Header;
// create the client to be used
Pop3Client client = new Pop3Client();
// connect to the client via host server port and use ssl bool
client.Connect("pop.gmail.com", 995, true);
// log into the specific account to read
client.Authenticate("username", "password");
// generate count of emails in the inbox
int msgcount = client.GetMessageCount();
//loop thru available message numbers via message count
//incremented at the end of the while loop
while (msgcount > 0)
{
// gets the message header info to, from and subject ect.
MessageHeader header = client.GetMessageHeaders(msgcount);
//read the subject line
string subject = header.Subject;
//compare subject to identify the correct email
if (subject.ToLower() == "subject to match")
{
// gets message info based on message number from msgcount
var message = client.GetMessage(msgcount);
// creates list of the attachments available in the message
List<MessagePart> attachments = message.FindAllAttachments();
//loops thru attachments
foreach (var file in attachments)
{
//assigns filename as string for stream
string filename = file.FileName;
//create a stream to download the file
var stream = new FileStream(#"destination path" + filename,FileMode.Create,FileAccess.ReadWrite);
// downloads file
file.Save(stream);
// closes stream to protect system and hung files
stream.Close();
}
// optional and must be configured to be allowed in your
// email client.
client.DeleteMessage(msgcount);
}
// increment message number
msgcount--;
}
// this is extremely important if deleting or manipulating files in inbox
//the above deletemessage command only marked the message to be deleted.
// you must commit the change to have it take effect.
// this command commits the changes you have made.
// this also closes your client connection so that no connections are left open.
client.Dispose();
This can be added to a script task in your import and you can then download the excel file and import it as you normally would if you would have manually pulled the file and placed it on a hard disk or network drive.
I have developed a ETL which is consuming flat files. The size of flat files varies from 250 MB - 300 MB.
It is working absoultely fine when file present in the folder. But it fails when the file is in generation mode.
Ex: This ETL package runs from 8 AM to 10 AM to check whether the file is present in the folder or not. Now, at any instance(let say 9 AM) if the file is starting generated and till now it is 10 MB. ETL start processing the file and just hang and fail after 4-5 min ( hang at script task which is reading that the file is present in the folder or not).
What is the best way to trigger SSIS package only when the file generation is completely done?
Note: I have no control over the file generation.
Add a For Loop Container with a Boolean variable bFileAccessible:
The Init expression is #bFileAccessible=False
The Eval expression is #bFileAccessible==False
Inside the For Loop Container add a Script Task with a ReadWriteVariable User::bFileAccessible and the following C# script (showing only the Main() method):
public void Main()
{
try
{
using (Stream stream = new FileStream("Path\to\your\file", FileMode.Open))
{
Dts.Variables["bFileAccessible"].Value = true;
}
}
catch
{
Dts.Variables["bFileAccessible"].Value = false;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
You should also use a variable for the filename and maybe a little wait interval. For more information about the script see here.
Check the FIle modified time everytime and comapre the same with previous one....
it's not good logic but a good idea if no perfect alternative
I have a Firbird 1.0 data file weighting aprox 25 GB that I am working with it. It has a table which has stored documents and doc's pics as blob. So, I am asking is it possible to open such big data file using fib datasets, i firstly tried to open dataset in runtime = no success as grid was empty so another try was to set it active in design mode which it was also unable to open as it's active property is set to true but no fetched data in grid!
Have you any idea to make it work ? Do I have to set any blob cashe options?
or it is not possible at all?
Now I am developing using my laptop computer (Win 7 x64 4GB Ram ), and later it'll be deployed to my server machine!
I've fixed it!
So another my question is about loading blob data using stream to a TImage component
i am doing like this but it pops out an Access violation
here is my code which you may look at
DM->stImage->Active=true;
try {
TMemoryStream *ms=new TMemoryStream();
TStream *ps=DM->stImage->CreateBlobStream(DM->stImage->FieldByName("PHOTO") ,bmRead);
ms->Position=0;
ms->CopyFrom(ps,ps->Size);
ms->SaveToFile("c:\\1.jpg");
// imgPass->Picture->LoadFromStream(ms);
imgPass->Picture->Graphic->LoadFromStream(ps);
delete ms;
delete ps;
}
catch (Exception &e) {
ShowMessage(e.ToString());
}
it can save it but imgPass->Picture->Graphic->LoadFromStream(ps); does not work!
what could be a problem?
To avoid the AV you need to reset the stream position, that was moved forward during the call to "CopyFrom" function.
So, your code should look like (only the relevant lines):
ms->CopyFrom(ps,ps->Size);
ms->SaveToFile("c:\\1.jpg");
ps->Position = 0; //<<<<<<<<<< here we reset the stream position
imgPass->Picture->Graphic->LoadFromStream(ps);
//imgPass->Picture->Bitmap->LoadFromStream(ps); // <<< if a bitmap and not JPEG
Hope this helps you.
P.S.: this question should be tagged C++ (or C++Builder) because it is not only a database subject.