ARMv7 Word Patch (CBNZ) - arm

I have an iPhone app that I am disassembling.
It is my understanding that a CBNZ instruction is "Compare and Branch on Non-Zero." and a CBZ is "Compare and Branch on Zero"
I can not find anywhere online to confirm this but to me it seems that CBNZ is represented by B9 in an address like so "0x B9 DC" and CBZ is "0x B3 DC".
The full address is:
DC B9 53 48 03 99 78 44 00 68 BF F1 74 EE 51 49
I am modifying it to:
DC B3 53 48 03 99 78 44 00 68 BF F1 74 EE 51 49
Previously I had patched this same check in ARMv6 though it was represented by a BNE "0x D1 30" that I patched to a B "0x E0 32"
This:
32 D1 5B 48 5C 49 78 44 79 44 00 68 09 68 AC F1
To:
32 E0 5B 48 5C 49 78 44 79 44 00 68 09 68 AC F1
This behaved exactly how I expected to, taking the branch and continuing on as I wanted it to. Normally it only takes such branch if it passes a check.
I figured patching a CBNZ to a CBZ would have similar results though it seems not.
Hope someone can help me understand. Sorry if this is not a forum where I should post questions like this though it seems like a good place to ask. If you need more info I will be happy to provide.

To understand the assembly, you need to go to bit level. If you don't want to spend time to understand the ARM encoding, get a disassembler (e.g. otool -tV) and an assembler (e.g. as) and they will figure out the instruction encoding/decoding for you.
The encoding of the CBZ/CBNZ instructions are
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 <-- bit
1 0 1 1 op 0 i 1 [ imm5][ Rn] <-- meaning
where op = 1 means CBNZ, op = 0 means CBZ, 'i :imm5:0' is the relative address to jump, and Rn is the register to check (see ARMv7-ARM §A8.6.27).
Therefore, the word B9DC, in binary,
(1 0 1 1 op 0 i 1 [ imm5][ Rn])
1 0 1 1 1 0 0 1 [1 1 0 1 1][1 0 0]
means
op = 1
i = 0
imm5 = 11011
Rn = 100
means
CBNZ R4, (PC+54) ; 54 = 0b0110110
while B3DC, in binary,
(1 0 1 1 op 0 i 1 [ imm5][ Rn])
1 0 1 1 0 0 1 1 [1 1 0 1 1][1 0 0]
means
op = 0
i = 1
imm5 = 11011
Rn = 100
means
CBZ R4, (PC+118) ; 118 = 0b1110110
Note that your patch B9 → B3 changed the i bit as well, which changed the address it should jump to. You should only change the op bit, meaning you should patch the byte as B1.

Related

Unable to boot: Buffer I/O error on /dev/sda4, logical block 0

I am unable to boot into Ubuntu 22.04 and am shown the message "Buffer I/O error on /dev/sda4". sda4 is the HDD partition where my home directory is mounted.
After referring to a few similar questions, I have run some of the suggested commands and attached their output:
$ fsck -a /dev/sda4
fsck from util-linux 2.38.1
/dev/sda4:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
Running e2fsck gave me similar errors.
$ mount /dev/sda4 /mnt/temp
mount: /mnt/temp: can't read superblock on /dev/sda4.
dmesg(1) may have more information after failed mount system call.
$ smartctl -a /dev/sda4
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.0.7-301.fc37.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Mobile HDD
Device Model: ST1000LM035-1RK172
Serial Number: WQ9AWWCF
LU WWN Device Id: 5 000c50 0d4515054
Firmware Version: LFM3
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
TRIM Command: Available
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Feb 12 04:03:09 2023 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 169) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 036 033 034 Pre-fail Always In_the_past 165182416
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 094 094 020 Old_age Always - 6158
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 8
7 Seek_Error_Rate 0x000f 069 060 045 Pre-fail Always - 17210113963
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 5092 (225 1 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 097 097 020 Old_age Always - 3985
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 18151
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 4295491592
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 067 047 040 Old_age Always - 33 (Min/Max 30/39)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 98
192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 3234
193 Load_Cycle_Count 0x0032 083 083 000 Old_age Always - 34491
194 Temperature_Celsius 0x0022 033 053 000 Old_age Always - 33 (0 24 0 0 0)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 8
197 Current_Pending_Sector 0x0012 099 098 000 Old_age Always - 96
198 Offline_Uncorrectable 0x0010 099 098 000 Old_age Offline - 96
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x000f 096 096 030 Pre-fail Always - 3831 (52 158 0)
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 18120 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18120 occurred at disk power-on lifetime: 5091 hours (212 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:36:35.548 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 02:36:35.538 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 02:36:35.512 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 02:36:35.511 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 02:36:35.498 SET FEATURES [Set transfer mode]
Error 18119 occurred at disk power-on lifetime: 5091 hours (212 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:36:33.822 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:36:33.822 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:36:33.821 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 02:36:33.802 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 02:36:33.775 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 18118 occurred at disk power-on lifetime: 5091 hours (212 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:36:32.075 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 02:36:32.065 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 02:36:32.039 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 02:36:32.037 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 02:36:32.025 SET FEATURES [Set transfer mode]
Error 18117 occurred at disk power-on lifetime: 5091 hours (212 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:36:30.350 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 02:36:30.340 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 02:36:30.313 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 02:36:30.312 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 02:36:30.300 SET FEATURES [Set transfer mode]
Error 18116 occurred at disk power-on lifetime: 5091 hours (212 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 20 ff ff ff 4f 00 02:36:28.246 READ FPDMA QUEUED
ea 00 00 00 00 00 a0 00 02:36:28.243 FLUSH CACHE EXT
ef 10 02 00 00 00 a0 00 02:35:23.920 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 02:35:23.893 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 02:35:23.892 IDENTIFY DEVICE
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5091 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
$ fdisk -l /dev/sda4
fdisk: cannot open /dev/sda4: Input/output error
$ lsblk
loop0 7:0 0 1.8G 1 loop
loop1 7:1 0 7.6G 1 loop
├─live-rw 253:0 0 7.6G 0 dm /
└─live-base 253:1 0 7.6G 1 dm
loop2 7:2 0 32G 0 loop
└─live-rw 253:0 0 7.6G 0 dm /
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 128M 0 part
├─sda2 8:2 0 540.8G 0 part
├─sda3 8:3 0 7.6G 0 part
├─sda4 8:4 0 279.4G 0 part
└─sda5 8:5 0 103.6G 0 part
sdb 8:16 1 28.7G 0 disk
└─sdb1 8:17 1 28.7G 0 part /run/initramfs/live
zram0 252:0 0 7.4G 0 disk [SWAP]
nvme0n1 259:0 0 119.2G 0 disk
├─nvme0n1p1 259:1 0 260M 0 part
├─nvme0n1p2 259:2 0 16M 0 part
├─nvme0n1p3 259:3 0 97.8G 0 part
├─nvme0n1p4 259:4 0 1000M 0 part
└─nvme0n1p5 259:5 0 20.1G 0 part
$ sudo pvs; vgs; lvs
Error reading device /dev/sda4 at 0 length 4096.
WARNING: Running as a non-root user. Functionality may be unavailable.
/run/lock/lvm/P_global:aux: open failed: Permission denied
WARNING: Running as a non-root user. Functionality may be unavailable.
/run/lock/lvm/P_global:aux: open failed: Permission denied
I am able to boot into Windows, whose D: drive lies on the same HDD with the unencrypted sda4 partition. While Windows warned me about some kind of disk failure surrounding D: drive, I was still able to access and backup all my files present in it safely. I was expecting to be able to do the same thing with my ext4 Linux files, but can not find a way to read them.

How the Number of Expected bytes is calculated in Modbus RTU with Function Code 2

i am trying to find out how the Number of expected bytes is calculated with Function Code 2 in Modbus RTU.
I am querying registers from 0 to 71, but as a response i am getting expected bytes as 9
Below is the Query and response.
query : 33 02 00 00 00 47 3C 2A
resp : 33 02 09 00 08 00 FE FF FF FF FF 03 FA 68
You queried for 71 bits, the response has 9 bytes containing 8 bits per byte, any excess bits are ignored.

algorithm for 2D array that represents the topographic map of geographic surface

I'm having difficulty figuring out what to do with the following problem
The input will be 2D array A[n][n] numbers, representing the topographic map of the geographic surface. Also among the input will be a starting location (r,c). referring to entry A[r][c]
If you were planning hiking trails you would be bound by the differences in elevation between neighboring entries. A person could traverse between 2 adjacent locations, if their elevations differ by no more than 2). Adjacency follows just the 4 standard compass directions, (so I assume no diagonals). Therefore , a point on the map is considered reachable if it is traversable from A[r][c] through any sequence of adjacent entires.
Write an algorithm that computes all of the reachable locations. The output will be another 2D array R[n][n] with true/fals values. (I assume true means reachable, false means unreachable)
If i understand this question correctly, I can create following matrix.
(suppose A[10][10] looks like this from A[0][0]:)
50 51 54 58 60 60 60 63 68 71
48 52 51 59 60 60 63 63 69 70
44 48 52 55 58 61 64 64 66 69
44 46 53 52 57 60 60 61 65 68
42 45 50 54 59 61 63 63 66 70
38 42 46 56 56 63 64 61 64 62
36 40 44 50 58 60 66 65 62 61
36 39 42 49 56 62 67 66 65 60
30 36 40 47 50 64 64 63 62 60
50 50 50 50 50 50 50 50 50 50
Both south and east are traversable from A[0][0] so reachable entries would be:
50 51 54 58 60 60 60 63 68 71
48 52 51 59 60 60 63 63 69 70
44 48 52 55 58 61 64 64 66 69
44 46 53 52 57 60 60 61 65 68
42 45 50 54 59 61 63 63 66 70
38 42 46 56 56 63 64 61 64 62
36 40 44 50 58 60 66 65 62 61
36 39 42 49 56 62 67 66 65 60
30 36 40 47 50 64 64 63 62 60
50 50 50 50 50 50 50 50 50 50
so I can conclude that my resulting array should be
1 1 0 0 0 0 0 1 0 0
1 1 1 0 0 0 1 1 0 0
0 0 1 0 0 0 1 1 1 0
0 0 1 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0 1 0
0 0 0 1 1 0 0 0 1 1
0 0 0 0 1 1 0 0 1 1
0 0 0 0 1 1 0 0 0 1
0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0
I want to implement this in c code but i think its improper to ask for the code here. My plan is to implement this in pseudocode first then implementing in c code, which I will try doing it by myself =). I'm not sure as to where to start with my pseudocode. Can anyone please clarify this?
Thank you very much!
p.s just edited my matrix
Have a look at Dijkstra or A-Star which are used in such a case. Further more you may have a look at Graph Theory basics in order to create an appropriate representation of your matrics.
Additionally you may need the Manhatten Distance which can be used as a heuristic for the A-Star in your case.
There are many other algorihms if you dive deeper into the topic of graph theory and search algorithms.
EDIT due to comment:
You can also use Depth First Search (DFS) or Breadth First Search (BFS). These algorithms are simpler to implement especially at the beginning.
At first you need to create an appropriate datastructur which represents the hightmap. These structur could look like this:
struct Vertex
int x // coordinate x
int y // coordinate y
Vertex neighbors[8]; // Array of all adjacent vertices
int height // height
}
after that you can use the following pseudocode as an proposal taken from Breadth first search and depth first search
which already is aware of cycles within the graph, which would lead to infinite loops.
dfs(vertex v) {
visit(v);
for each neighbor w of v
if w is unvisited **and reachable** // reachable according to your hight differences
{
dfs(w); // recursive call to the dfs
add edge vw to tree T //tree contains a result path in your
//case the second matrix
}
}
Some steps are missing within the pseudocode. For example the condition for
abandon the DFS when visiting the goal.
Some additional notes:
the DFS will find just a solution for your problem
Dijkstra and A-Star will find the shortest (optimal) solution for the problem (shortest path from start to goal, taking into account the hight of your Vertices

Extra bytes between clusters in FAT12

I am currently investigating a disk image with a FAT12 file system for data recovery purposes/researching for file carving. For this investigation, I have the actual files that need to be carved/recovered from the disk image so that I can validate my results obtained from the carving process/recovery.
During the comparison and analysis from the recovered files, I noticed that after exactly 2 clusters (each of size 16384 bytes/32 sectors) of file data there are 4 extra/embedded bytes. These repetitive and distinct 4 bytes that are being noticed after 2 clusters are not found in the corresponding actual files. I think that these bytes are used somehow by the file system, is this right? What is their purposes and how can be identified during the recovery process?
Hex dump:
Actual File that needs to be recovered from disk (hex between 2 clusters):
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00016336 BC 55 B4 F8 A5 E1 69 82 2A DD 4A 5D DC 46 B9 80 ¼U´ø¥ái‚*ÝJ]ÜF¹€
00016352 E1 33 D3 F9 76 AE 8A 79 2E 22 0F 58 EE 67 FD AD á3Óùv®Šy." Xîgý­
00016368 49 E9 7B 76 45 99 3E 25 69 36 F2 00 8B 71 70 C0 Ié{vE™>%i6ò ‹qpÀ
00016384 FC BB 6D 65 E9 DC F2 30 7E BD 6A B4 BF 17 52 0B ü»meéÜò0~½j´¿ R
00016400 64 9A 2D 13 58 B8 0E FB 13 65 9B 1E 87 93 F9 00 dš- X¸ û e› ‡“ù
00016416 7F 11 55 4F 21 AD A7 3A 51 D7 B9 CF 3C DE 35 25 UO!­§:Q×¹Ï<Þ5%
Disk Image:
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00132880 BC 55 B4 F8 A5 E1 69 82 2A DD 4A 5D DC 46 B9 80 ¼U´ø¥ái‚*ÝJ]ÜF¹€
00132896 E1 33 D3 F9 76 AE 8A 79 2E 22 0F 58 EE 67 FD AD á3Óùv®Šy." Xîgý­
00132912 49 E9 7B 76 45 99 3E 25 69 36 F2 00 8B 71 70 C0 Ié{vE™>%i6ò ‹qpÀ
00132928 **08 B5 A9 88** FC BB 6D 65 E9 DC F2 30 7E BD 6A B4 µ©ˆü»meéÜò0~½j´
00132944 BF 17 52 0B 64 9A 2D 13 58 B8 0E FB 13 65 9B 1E ¿ R dš- X¸ û e›
00132960 87 93 F9 00 7F 11 55 4F 21 AD A7 3A 51 D7 B9 CF ‡“ù UO!­§:Q×¹Ï
00132976 3C DE 35 25 <Þ5%
From the above hex dump it can be visualized that 08 B5 A9 88 is exactly between the two clusters, however in the actual file those 4 bytes were eliminated.
The extra 4 bytes that were being encountered between the two clusters were CRCs that were embedded by the Encase disk image file format for security purposes. You can refer to the following link for more detail.

Understanding the `ctags -e` file format (ctags for emacs)

I am using "ExuberantCtags" also known as "ctags -e", also known as just "etags"
and I am trying to understand the TAGS file format which is generated by the etags command, in particular I want to understand line #2 of the TAGS file.
Wikipedia says that line #2 is described like this:
{src_file},{size_of_tag_definition_data_in_bytes}
In practical terms though TAGS file line:2 for "foo.c" looks like this
foo.c,1683
My quandary is how exactly does it find this number: 1683
I know it is the size of the "tag_definition" so what I want to know is what is
the "tag_definition"?
I have tried looking through the ctags source code, but perhaps someone better at C than me will have more success figuring this out.
Thanks!
EDIT #2:
^L^J
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
float bar () {^?bar^A7,59^J
int main() {^?main^A11,91^J
Alright, so if I understand correctly, "79" refers to the number of bytes in the TAGS file from after 79 down to and including "91^J".
Makes perfect sense.
Now the numbers 20, 59, 91 in this example wikipedia says refer to the {byte_offset}
What is the {byte_offset} offset from?
Thanks for all the help Ken!
It's the number of bytes of tag data following the newline after the number.
Edit: It also doesn't include the ^L character between file tag data. Remember etags comes from a time long ago where reading a 500KB file was an expensive operation. ;)
Here's a complete tags file. I'm showing it two ways, the first with control characters as ^X and no invisible characters. The end-of-line characters implicit in your example are ^J here:
^L^J
hello.cc,45^J
int main(^?5,41^J
int foo(^?9,92^J
int bar(^?13,121^J
^L^J
hello.h,15^J
#define X ^?2,1^J
Here's the same file displayed in hex:
0000000 0c 0a 68 65 6c 6c 6f 2e 63 63 2c 34 35 0a 69 6e
ff nl h e l l o . c c , 4 5 nl i n
0000020 74 20 6d 61 69 6e 28 7f 35 2c 34 31 0a 69 6e 74
t sp m a i n ( del 5 , 4 1 nl i n t
0000040 20 66 6f 6f 28 7f 39 2c 39 32 0a 69 6e 74 20 62
sp f o o ( del 9 , 9 2 nl i n t sp b
0000060 61 72 28 7f 31 33 2c 31 32 31 0a 0c 0a 68 65 6c
a r ( del 1 3 , 1 2 1 nl ff nl h e l
0000100 6c 6f 2e 68 2c 31 35 0a 23 64 65 66 69 6e 65 20
l o . h , 1 5 nl # d e f i n e sp
0000120 58 20 7f 32 2c 31 0a
X sp del 2 , 1 nl
There are two sets of tag data in this example: 45 bytes of data for hello.cc and 15 bytes for hello.h.
The hello.cc data starts on the line following "hello.cc,45^J" and runs for 45 bytes--this also happens to be complete lines. The reason why bytes are given is so code reading the file can just allocate room for a 45 byte string and read 45 bytes. The "^L^J" line is after the 45 bytes of tag data. You use this as a marker that there are more files remaining and also to verify that the file is properly formatted.
The hello.h data starts on the line following "hello.h,15^J" and runs for 15 bytes.
The {byte_offset} for a tag entry is the number of bytes from the start of the file the function is defined in. The number before the byte offset is the line number. In your example:
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
the foo function begins 20 bytes from the start of hello.c. You can verify that with a text editor that shows your cursor position in the file. You can also use the Unix tail command to display a file a number of bytes in:
tail -c +20 hello.c

Resources