I have a file with the last line like this:
[End of File]
Can I read all the lines before it and skip this one without getting an error?
You would be better off creating a script to remove that line.
The only way for that line not to cause issues would be to set the bcp in batch size to 1 (-b1) Depending on how much data you are working with, using a batch size of 1 will take a long time to finish.
Related
I am running a simulation and want to add an option to continue evolving from the last iteration of a previous run. In order to do so, I need to read the last 2 lines of data from a file. Is there any way to do this without using fscanf to scan from the beginning of the file?
Have the previous run record in another file the ftell() values of the last few lines and other info to note the meta data of the file (e.g. date-time-modified).
A subsequent run can use this info to begin where the prior run left off.
If this side file is missing or does not agree with the current state of things, walk the files with fgetc(), fgets(), etc. to find where to begin again.
I have many files that are extracted into .txt with a batch file. But they don't have the headers. I've read that a possible solution from here that is to add to a .txt with the headers the exported rows.
With this:
echo. >> titles.txt
type data.txt >> titles.txt
This takes a lot of time and is not efficient, since it is adding the big file to the file with the text.
Another possible solution is to add to the SQL query the titles hardcoded, but this will change the type of the columns (is they are numeric they will be changed to varchar).
Is there a way to insert in the first row of the data txt the headers and not doing vice-versa?
I might be wrong, but as far as I am informed (and as far as I know from earlier experiments in doing as described): No, it is not possible! The mentioned Tasks are acting on the file sequentially. You can either open a file for reading, writing or appending. If you open the titles.txt file for writing, it is overwritten - and with this empty. If you open it for appending, it can only append to the end of the file - so you can only write the data after the Header... the only way it might work - but which is pretty nasty - is to append the title to the end of the file and during later processing (e.g. xls or whatever) Resort the rows and put the last one to the beginning. But as mentioned: nasty and not really the way to go.
If the number of files to process is a bigger problem than any individual file size, switching from bcp to sqlcmd might help.
I try to do a bulk copy from a file into a table in sqlserver 2008. Its only a part of the file that I want to copy (in the BCP command example bellow it is only line number 3 I will copy).
Sometimes I receive the "Unexpected EOF encountered" error message.
The command is:
BCP tblBulkCopyTest in d:\bc_file.txt -F3 -L3 -c -t, -S(local) -Utest -Ptest
When the file looks like the following, the copy works fine (the line number 3 is inserted into my table):
44261233,207,0,0,2168
44261234,207,0,0,2570
44261235,207,0,0,2572
When the file looks like the following, I receive "Unexpected EOF encountered" error message:
Test
44261233,207,0,0,2168
44261234,207,0,0,2570
44261235,207,0,0,2572
It seems like when the file starts with something not in the correct format, the BCP command fails, even though it is not the line I want to copy (in the bcp command I have specified line number 3).
When the file looks like the following, the copy works fine:
44261233,207,0,0,2168
44261234,207,0,0,2570
44261235,207,0,0,2572
Test
So it is only when the file has some incorrect data "before" the lines I want to copy, that I receive the error.
Any suggestions how the BCP command will ignore the lines, not in question?
The way I solve that kind of errors every day:
1) Create a table with single column tblRawBulk(RawRow varchar(1000)).
2) Insert all of the rows there.
3) Remove unnecessary unformatted rows (e.g. 'test' in your example) with WHERE-parameter. Or even remove all unnecessary rows (except of rows, that you need to load) to simplify point 5.
4) Upload this table with BCP to some workfolder.
5) Download this new file.
It's no exact what you want, but may help if you'll write corresponding stored procedure, that can automatically do all this things.
I am working with a text file, which contains a list of processes under my programs control, along with relevant data.
At some point, one of the processes will finish, and thus will need to be removed from the file (as its no longer under control).
Here is a sample of the file contents (which has enteries added "randomly"):
PID=25729 IDLE=0.200000 BUSY=0.300000 USER=-10.000000
PID=26416 IDLE=0.100000 BUSY=0.800000 USER=-20.000000
PID=26522 IDLE=0.400000 BUSY=0.700000 USER=-30.000000
So for example, if I wanted to remove the line that says PID=26416.... how could I do that, without writing the file over again?
I can use external unix commands, however I am not very familiar with them so please if that is your suggestion, give an example.
Thanks!
Either you keep the contents of the file in temporary memory and then rewrite the file. Or you could have a file for each of the PIDs with the relevant information in them. Then you simply delete the file when it's no longer running. Or you could use a database for this instead.
As others have already pointed out, your only real choice is to rewrite the file.
The obvious way to do that with "external UNIX commands" would be grep -v "PID=26416" (or whatever PID you want to remove, obviously).
Edit: It is probably worth mentioning that if the lines are all the same length (as you've shown here) and order doesn't matter, you could delete a line more efficiently by copying the last line into the space being vacated, then shorten the file so eliminate what had been the last line. This will only work if they really are all the same length though (e.g., if you got a PID of '1', you'd need to pad it to the same length as the others in the file).
The only way is by copying each character that comes after the deleted line down over the characters that are deleted.
It is far more efficient to simply rewrite the file.
how could I do that, without writing the file over again?
You cannot. Filesystems (perhaps besides more esoteric record based ones) does not support insertion or deletion.
So you'll have to write the lines to a temporary file up till the line you want to delete, skip over that line, and write the rest of the lines to the file. When done, rename/copy the temp file to the original filename
Why are you maintaining these in a text file? That's not the best model for such a task. But, if you're stuck with it ... if these lines are guaranteed to all be the same length (it appears that way from the sample), and if the order of the lines in the file doesn't matter, then you can write the last line over the line for the process that has died and then shorten the file by one line with the (f)truncate() call if you're on a POSIX system: see Jonathan Leffler's answer in How to truncate a file in C?
But note carefully netrom's answer, which gives three different better ways to maintain this info.
Also, if you stick with a text file (preferably written from scratch each time from data structures you maintain, as per netrom's first suggestion), and you want to be sure that the file is always well formed, then write the new data into a temp file on the same device (putting it in the same directory is easiest) and then do a rename() call, which is an atomic operation.
You can use sed:
sed -i.bak -e '/PID=26416/d' test
-i is for editing in place. It also creates a back-up file with the new extension .bak
-e is for specifying the pattern. The /d indicates all lines matching the pattern should be deleted.
test is the filename
The unix command for it is:
grep -v "PID=26416" myfile > myfile.tmp
mv myfile.tmp myfile
The grep -v part outputs the file without the rows with the search term.
The > myfile.tmp part creates a new temp file for this output.
The mv part renames the temp file to the original file.
Note that we are rewriting the file here, and moreover, we can lose data if someone write something to file between the two commands.
I've got a service which runs all the time and also keeps a log file. It basically adds new lines to the log file every few seconds. I'm written a small file which reads these lines and then parses them to various actions. The question I have is how can I delete the lines which I have already parsed from the log file without disrupting the writing of the log file by the service?
Usually when I need to delete a line in a file then I open the original one and a temporary one and then I just write all the lines to the temp file except the original which I want to delete. Obviously this method will not word here.
So how do I go about deleting them ?
In most commonly used file systems you can't delete a line from the beginning of a file without rewriting the entire file. I'd suggest instead of one large file, use lots of small files and rotate them for example once per day. The old files are deleted when you no longer need them.
Can't be done, unfortunately, without rewriting the file, either in-place or as a separate file.
One thing you may want to look at is to maintain a pointer in another file, specifying the position of the first unprocessed line.
Then your process simply opens the file and seeks to that location, processes some lines, then updates the pointer.
You'll still need to roll over the files at some point lest they continue to grow forever.
I'm not sure, but I'm thinking in this way:
New Line is a char, so you must delete chars for that line + New Line char
By the way, "moving" all characters back (to overwrite the old line), is like copying each character in a different position, and removing them from their old position
So no, I don't think you can just delete a line, you should rewrite all the file.
You can't, that just isn't how files work.
It sounds like you need some sort of message logging service / library that your program could connect to in order to log messages, which could then hide the underlying details of file opening / closing etc.
If each log line has a unique identifier (or even just line number), you could simply store in your log-parsing the identifier until which you got parsing. That way you don't have to change anything in the log file.
If the log file then starts to get too big, you could switch to a new one each day (for example).