ASSEMBLY 8086 remove specific line from file - file

I am new with assembly,
I am trying to remove spicific line from a text file
For example: remove the third line of the file
I tried alot of stuff but i didn't manage to do it
Can please someone help me?
thanks,
Yam

File is stream of bytes. What is "text file" depends on your definition of task, and encoding used.
If this is school project (emu8086 makes me think so), then you are probably dealing with simple raw ASCII text file, i.e. one byte = one character. Then you have probably DOS new lines (<EOL> = "end of line") (two bytes: 13, 10) in the text (if you are skilled, you can also support unix one byte 10 and old-mac one byte 13 line ends, and even mistakes [10, 13] pairs).
So to remove third line you need to open source file, open target file for writing, and copy every byte including second new-line (or finish when <EOF> = "end of file" is detected in source file ahead of third line). Then you just keep reading source file until you reach fourth <EOL> sequence of bytes, and then you copy remaining bytes from source to target file.
I.e. imagine source file of 5 empty lines, when viewed in hexa viewer you will see these bytes:
0D 0A 0D 0A 0D 0A 0D 0A 0D 0A
; ^^^^^ this is third line content, will be removed
After removing third line, the new file will contain these bytes:
0D 0A 0D 0A 0D 0A 0D 0A
= only 4 empty lines.

Related

If the "del" command is executed from a batch file it won't delete files whose names contain odd characters. But it will work from a CMD shell [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Problem:
The "del" command won't delete certain files when it is executed from a batch file but it will delete those files if it is executed from the CMD shell command line interpreter. The problem appears to be related to the fact that the filenenames contain an ellipsis character.
I am not new to batch files, I have been writing and using batch files since 1988 with MSDOS 3.2, through Windows 7. Now I am using Windows 10 Pro 21H1.
What follows is a description of the problem.
A directory list dir shows that the folder contains the following five files:
Volume in drive E has no label.
Volume Serial Number is 3D74-5A3F
Directory of E:\Backups\Magpie\2016-12-14-P\mike\Correspondence\Friends\Berg\In
10/21/2021 03:27 PM <DIR> .
10/21/2021 03:27 PM <DIR> ..
08/16/2021 03:15 AM 40,888 20160114-Fwd_ [New post] Biologist's comments to save White Mountain & Little Colorado wild horses (Today is the last day YOU can comment)-29324528.eml
08/16/2021 03:15 AM 21,095 20160229-More.-13257179.eml
08/16/2021 03:15 AM 47,526 20160229-Re_More…-13334902.eml
08/16/2021 03:15 AM 11,256,759 20160819-(Duplicate) Whew. Re_ Good thurs am God bless your first day back to work-10.eml
10/21/2021 02:48 PM 182 DeleteDuplicatesDC.bat
10/20/2021 06:19 PM <DIR> Photos
10/21/2021 02:00 PM <DIR> Temp
6 File(s) 11,366,450 bytes
4 Dir(s) 199,814,967,296 bytes free
I want to delete the third file whose name is 20160229-Re_More…-13334902.eml
Note the three dots following the word More in the filename. It is a single ellipsis character. It is NOT three separate dots.
The batch file was generated by a macro I wrote to assemble instructions to delete hundreds or thousands of individual files that meet certain criteria.
My batch file has the following commands:
Echo 271
CD "E:\Backups\Magpie\2016-12-14-P\mike\Correspondence\Friends\Berg\In\"
del "20160229-Re_More…-13334902.eml"
Echo 271
The Echo commands simply write a number to the screen to show the progress of the batch file.
The batch file contains hundreds of similar lines which delete hundreds of files in various directories.
When I ran the batch command it deleted most of the files it was supposed to delete but it would not delete the file:
20160229-Re_More…-13334902.eml
In order to check if the commands were valid and correct I ran them one at a time from a command line using the following steps.
In a CMD shell CLI (Command-Line-Interpreter" I Copy-Pasted the command
CD "E:\Backups\Magpie\2016-12-14-P\mike\Correspondence\Friends\Berg\In\"
from the batch file into a CMD command line shell and executed it. It took me to the correct directory which was
E:\Backups\Magpie\2016-12-14-P\mike\Correspondence\Friends\Berg\In\.
I then Copy-Pasted the following command from batch file into the CMD shell to delete the file:
del "20160229-Re_More…-13334902.eml"
and executed it. The del command deleted the file successfully (as it was supposed to do). The file also disappeared from a File Explorer window that I had open to monitor progress.
This test showed that the two commands did what they were supposed to do, which was to move to a certain directory (folder) and delete a certain file.
But the del command did NOT delete the file with the ellipsis character when I executed it from the batch file.
I don't understand why these commands cd and del work when I execute them individually from a CMD shell CLI but the del command did not delete the file I wanted to delete when I ran it in a batch file?
Does anyone have the answer to this problem?
Thank you
Michael
The file deletion does not work because of a character encoding is used by the macro on creating the batch file which is different to the character encoding used by the Windows command processor cmd.exe on processing the batch file.
There are in general two character encodings used on Windows which use just one byte per character and therefore can encode only 256 characters. There are used code pages for such a single byte per character encoding which define which code value (binary byte value) represents which character. The code pages used by Windows by default for a character encoding with just one byte per character depend on:
The country, region and language set for the used account. It makes a difference if Germany or Russia or Brazil or China is configured for an account.
The execution environment in which the binary byte stream representing a text is interpreted. The Windows GUI applications like text editors use by default the so called ANSI code page according to configured country while the Windows command processor cmd.exe uses an OEM code page according to configured country.
The ANSI code page is Windows-1252 for a North American and a Western European country. The ellipsis character is encoded with decimal code value 133 (hexadecimal 85) with this code page.
The OEM code page is 437 for a North American country and 850 for a Western European country. The two code pages do not contain the ellipsis character at all. The decimal code value 133 (hexadecimal 85) represents in those two code pages the character à.
So if a batch file is created in a text editor (or using a macro) which uses a single byte per character encoding with the code page Windows-1252, the command line del "20160229-Re_More…-13334902.eml" results in the byte stream:
64 65 6C 20 22 32 30 31 36 30 32 32 39 2D 52 65
5F 4D 6F 72 65 85 2D 31 33 33 33 34 39 30 32 2E
65 6D 6C 22
This byte stream is interpreted using code page 437 or 850 as:
del "20160229-Re_Moreà-13334902.eml"
The Unicode encoding UTF-16 Little Endian uses two bytes per character for characters of the Basic Multilingual Plane (and four bytes for characters of Supplementary Planes). UTF-16 LE with byte order mark (BOM) is used by WMIC for every output.
The command line del "20160229-Re_More…-13334902.eml" as byte stream wit UTF-16 LE encoding with BOM (FF FE as first two bytes) would be:
FF FE 64 00 65 00 6C 00 20 00 22 00 32 00 30 00
31 00 36 00 30 00 32 00 32 00 39 00 2D 00 52 00
65 00 5F 00 4D 00 6F 00 72 00 65 00 26 20 2D 00
31 00 33 00 33 00 33 00 34 00 39 00 30 00 32 00
2E 00 65 00 6D 00 6C 00 22 00
The ellipsis character is encoded in this case with two bytes with the hexadecimal values 26 20 and all other characters also with two bytes whereby the second byte has the value 00.
But the Windows command processor cmd.exe does not support UTF-16 LE on processing a batch file. So it is of no help to save the batch file with this Unicode encoding.
Another Unicode encoding is UTF-8 which uses a variable number of bytes per character depending on the character. The command line del "20160229-Re_More…-13334902.eml" is encoded with UTF-8 without BOM with the byte stream:
64 65 6C 20 22 32 30 31 36 30 32 32 39 2D 52 65
5F 4D 6F 72 65 E2 80 A6 2D 31 33 33 33 34 39 30
32 2E 65 6D 6C 22
The ellipsis character is encoded in this case with three bytes with the hexadecimal values E2 80 A6 while all other characters are encoded with just one byte.
So what could be done to get deleted the file with name 20160229-Re_More…-13334902.eml using a batch file?
There can be opened a command prompt window and executed the command chcp to get displayed the code page used by cmd.exe according to the country configured for the used account. The batch file should be written using the same code page. But that is of no help on OEM code page is 437 or 850 as … is not available at all in those code pages.
A working solution is encoding the batch file with Windows-1252 and use following two command lines:
%SystemRoot%\System32\chcp.com 1252
del "20160229-Re_More…-13334902.eml"
The first line changes the code page to Windows-1252. For that reason the Windows command processor interprets the next command line now with this code page and the file deletion works although displayed is nevertheless del "20160229-Re_Moreà-13334902.eml" in the console window.
Another solution is encoding the batch file with UTF-8 using the following command lines:
%SystemRoot%\System32\chcp.com 65001
del "20160229-Re_More…-13334902.eml"
The value 65001 is the code page number which Microsoft defined for UTF-8 encoding. It is not really a code page number as UTF-8 is a Unicode encoding and not a code page.
The deletion of the file works although it could be displayed on execution:
delThe system cannot write to the specified device.
The reason for the strange error message output after command del instead of the space and the file name in double quotes is described by my answer on Using another language (code page) in a batch file made for others and the comments below the answer written by Eryk Sun.
One more solution is the usage of the wildcard pattern character ? for the non-ASCII character in the file name.
del "20160229-Re_More?-13334902.eml"
The disadvantage of this solution is that other files could be deleted by chance too on being also matched by the wildcard pattern ? like 20160229-Re_More1-13334902.eml and not only the file with the ellipsis character in file name.

NTFS MFT datarun

I am trying to parse a Data Run in an MFT Record and I'm comparing my results to Active Disk Editor. The data run is as follows:
.... 42 0F 01 FD 83 90 D9 0C (second attribute starts here)
If I understand correctly: this is how it should be parsed:
number of bytes to parse the cluster count: 2
number of bytes to parse cluster location: 4
Parse cluster count: 0F 01 (in little endian) => 271
Parse first cluster location: 0xD99083FD => 3,650,126,845
Expecting a 00 instead of 0C to mark the end of the cluster
However, in active disk editor:
the cluster location is: 9,470,973 which is 0x 9083FD. ( the D9 is ignored). It turns out that this location is the correct one.
If I try to change the number of bytes representing the cluster location (the 4 in 42), here is what happens:
If I change it to 4 or 5, the cluster location remains the same (9470973)
If I change it to 3, the cluster location becomes negative
No value change on D9 0C seems to affect the outcome
Can anyone let me know what I'm doing wrong?
There is a little problem in your comment:
overwrites the last two sectors in each used sector
the sectors should be bytes.
It is a general problem for new guy of NTFS.
All records(index/FR/RCRC) must be read after USN handled.
After some additional research, I accidentally read about NTFS fixups. For those that might encounter the same issue in the future, the idea is as follows:
Update Sequence Number (USN) is a 2-byte entity that overwrites the last two bytes in each used sector. It is done for verification purposes.
Update Sequence Array (USA) contains the array of overwritten 2-bytes at the end of each sector.
Reading the structure without accounting for USN and USA is problematic. It can mess up file names, data runs, etc. I encountered this info on:
https://www.taksati.org/ntfs-fix-ups/
Long story short, when I accounted for this difference, the first cluster location became:
0x009083FD
Since the data run list info became: 42 0F 01 FD 83 90 00 00.

C convert char array (string) UTF8 format to CP1252 (ASCII) format

I have two C sources files:
First file is saved in UTF-8 format
Second file is saved in CP1252 format.
My example message is: char mybuffer[] = "lé\r\n";
In the UTF-8 source file, the string has been encoded using 5 bytes:
6C C3 A9 0D 0A
In the CP1252 source file, the string has been encoded using 4 bytes:
6C E9 0D 0A
I know that the two results are good, because you can set the output format to read correctly the two results.
But I need to convert the UTF-8 array variable into CP1252 format.
I use only C language.
If the files are source code files then you must tell the compiler what the source "charset" is for each. You are possibly doing this by default. If the two files have different encodings, you would probably have to run the compiler separately on each one. If the difference is showing up then you are doing that wrong.
Alternatively, convert source file encodings so they are all the same encoding, to make your project simpler. But, this doesn't get around the requirement to tell the compiler the correct encoding(s). This applies to opening any text file with any program or communicating it to someone else.

Why are Danish characters not displayed as in text editor on executing batch file?

I make a simple batch file, but Windows command processor cmd.exe does not display Danish characters correct when I execute the batch file. It shows weird characters like ├ª├©├Ñ instead ÆØÅ. If I type echo æøå directly in cmd window, it shows æøå.
Is there something wrong with my computer?
Use chcp to manage your code page.
Like Mofi said, specifying the following would help your case:
chcp 1252
Use this line of code before you print echo æøå.
Everything on a computer is stored with a sequence of zeros and ones including characters. Which sequence of zeros and ones is displayed as æøå depends on rules.
The first rule is that a file with the file extension bat or cmd contains text data interpreted by Windows command processor cmd.exe while a file with extension png contains image data according to PNG specification interpreted by image viewers/editors and so on.
The second rule is that a batch file contains text data being encoded with one byte (= 8 bits) per character and not two bytes as UTF-16 text encoding uses (for the mainly used characters, four bytes for rarely used symbols) or one to four bytes as UTF-8 text encoding uses (since November 2003).
The problem with one byte per character is that just 28 = 256 characters can be encoded, but there are much more characters used by humans.
The solution is using a code page. A code page defines which character is represented for example by a byte with the value
decimal: 248
hexadecimal: F8
binary: 1111 1000
The command CHCP (change code page) executed in a console window without any parameter outputs which code page is used on reading bytes being interpreted as characters by Windows command processor and how to output them.
The code page depends on Windows Region and Language settings set for the user account used for running a batch file in a console window.
The default code page on console is OEM 850 for Western European countries and OEM 865 for Nordic languages like Danish except Icelandic which uses OEM 861.
But the default code page for non Unicode encoded text files is Windows-1252 in GUI applications for Western European countries including Denmark.
How can the line echo æøå be encoded in a *.bat file?
Using code page Windows-1252 and one byte per character.
hexadecimal: 65 63 68 6F 20 E6 F8 E5
Using code page OEM 865 or OEM 850 and one byte per character.
hexadecimal: 65 63 68 6F 20 91 9B 86
Using UTF-8 encoding without byte order mark (BOM) with one or two bytes per character.
hexadecimal: 65 63 68 6F 20 C3 A6 C3 B8 C3 A5
Using UTF-16 little endian encoding with byte order mark (BOM) with two bytes per character.
hexadecimal: FF FE 65 00 63 00 68 00 6F 00 20 00 E6 00 F8 00 E5 00
And many others.
Output of ├ª├©├Ñ on running the batch file is an indication for batch file being UTF-8 encoded as those six OEM 865 interpreted characters have the code values C3 A6 C3 B8 C3 A5.
So the batch file first needs to be converted from Unicode with UTF-8 encoding to ANSI. There is written ANSI although Windows-1252 is not a standard defined by ANSI - American National Standards Institute because the term ANSI is used on Windows for one byte per character encoding. The result is a batch file with E6 F8 E5 for the three Danish characters.
The Windows-1252 encoded batch file displays on execution µ°Õ.
So the batch file needs to be converted a second time from ANSI to OEM, i.e. from Windows-1252 to OEM 865 or OEM 850. The three Danish characters are now encoded in the text file with 91 9B 86, but displayed with using code page Windows-1252 in a graphic user interface application (GUI text editor) as ‘›†.
However, now the batch file prints on execution æøå into the console window on my computer using code page 850 for console because of German configured in Windows Region and Language settings.
Another solution is encoding the batch file in Windows-1252 and use in batch file the following command line before the text is output with ECHO:
%SystemRoot%\System32\chcp.com 1252 >nul
But this solution does not work if in properties for console windows a font is selected which does not support Windows-1252. For example, if on tab Font of the Properties window of the console window Raster Fonts is selected and Windows (7, Vista, XP) selected Terminal as raster font to use for the console, changing the code page to 1252 has no effect because the font displays still µ°Õ on Windows-1252 encoded execution of echo æøå although active code page is 1252. In other words the selected font for the console window must support also the active code page to get the display of the output text correct.
The Microsoft developers are aware of the issues caused by not really supporting Unicode and are working on improvements of the Windows console, see the developer blog Windows Command-Line: Unicode and UTF-8 Output Text Buffer written by Rich Turner on December 10, 2018.

How to treat the first line of a file differently in COBOL?

In COBOL i want to read a line sequential file. The first line occurs one time. The second and the thirth line can be repeated multiple (unknown) times. I really don't know how to do it.
I think the file description is something like this:
01 DBGEGEVENS PIC X(200).
01 PROJECT. (occurs unknown times)
03 PROJECTCODE PIC X(10).
03 CSVPAD PIC X(200).
It depends on the file format
Do you want a VB file format ???? then
FILE-CONTROL.
SELECT In-File ASSIGN .....
DATA DIVISION.
FILE SECTION.
FD Comp-File.
01 DBGEGEVENS PIC X(200).
01 PROJECT.
03 PROJECTCODE PIC X(10).
03 CSVPAD PIC X(200).
with
Read In-File
Read In-File
Read In-File
You would use DBGEGEVENS for the first record and project for secon or subsquent records
For Fixed width file format
FILE-CONTROL.
SELECT Comp-File ASSIGN .....
DATA DIVISION.
FILE SECTION.
FD Comp-File.
01 input-record.
WORKING-STORAGE SECTION.
01 DBGEGEVENS PIC X(200).
01 PROJECT.
03 PROJECTCODE PIC X(10).
03 CSVPAD PIC X(200).
with
Read In-File into DBGEGEVENS
Read In-File into PROJECT.
Read In-File into PROJECT.
Either should work, depending on which file format you use
The code given indicates a VB file - record one is 200 bytes, while the other records are 210 bytes. There should be an indicator on the records that describes what they are and their purpose. Ultimately, you'd be best served by reading them into WORKING-STORAGE - and I'd ask whomever is passing you the file what indicators are available. If, however, you know for a fact that record one is the only 200 byte record in the file, that would be treated as a header read - read once into its definition - while the remaining 210 byte records (and I want to emphasize the definition provided describes 210 bytes) would be read into a WORKING-STORAGE area fitting their definition.

Resources