Understanding the `ctags -e` file format (ctags for emacs) - c

I am using "ExuberantCtags" also known as "ctags -e", also known as just "etags"
and I am trying to understand the TAGS file format which is generated by the etags command, in particular I want to understand line #2 of the TAGS file.
Wikipedia says that line #2 is described like this:
{src_file},{size_of_tag_definition_data_in_bytes}
In practical terms though TAGS file line:2 for "foo.c" looks like this
foo.c,1683
My quandary is how exactly does it find this number: 1683
I know it is the size of the "tag_definition" so what I want to know is what is
the "tag_definition"?
I have tried looking through the ctags source code, but perhaps someone better at C than me will have more success figuring this out.
Thanks!
EDIT #2:
^L^J
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
float bar () {^?bar^A7,59^J
int main() {^?main^A11,91^J
Alright, so if I understand correctly, "79" refers to the number of bytes in the TAGS file from after 79 down to and including "91^J".
Makes perfect sense.
Now the numbers 20, 59, 91 in this example wikipedia says refer to the {byte_offset}
What is the {byte_offset} offset from?
Thanks for all the help Ken!

It's the number of bytes of tag data following the newline after the number.
Edit: It also doesn't include the ^L character between file tag data. Remember etags comes from a time long ago where reading a 500KB file was an expensive operation. ;)
Here's a complete tags file. I'm showing it two ways, the first with control characters as ^X and no invisible characters. The end-of-line characters implicit in your example are ^J here:
^L^J
hello.cc,45^J
int main(^?5,41^J
int foo(^?9,92^J
int bar(^?13,121^J
^L^J
hello.h,15^J
#define X ^?2,1^J
Here's the same file displayed in hex:
0000000 0c 0a 68 65 6c 6c 6f 2e 63 63 2c 34 35 0a 69 6e
ff nl h e l l o . c c , 4 5 nl i n
0000020 74 20 6d 61 69 6e 28 7f 35 2c 34 31 0a 69 6e 74
t sp m a i n ( del 5 , 4 1 nl i n t
0000040 20 66 6f 6f 28 7f 39 2c 39 32 0a 69 6e 74 20 62
sp f o o ( del 9 , 9 2 nl i n t sp b
0000060 61 72 28 7f 31 33 2c 31 32 31 0a 0c 0a 68 65 6c
a r ( del 1 3 , 1 2 1 nl ff nl h e l
0000100 6c 6f 2e 68 2c 31 35 0a 23 64 65 66 69 6e 65 20
l o . h , 1 5 nl # d e f i n e sp
0000120 58 20 7f 32 2c 31 0a
X sp del 2 , 1 nl
There are two sets of tag data in this example: 45 bytes of data for hello.cc and 15 bytes for hello.h.
The hello.cc data starts on the line following "hello.cc,45^J" and runs for 45 bytes--this also happens to be complete lines. The reason why bytes are given is so code reading the file can just allocate room for a 45 byte string and read 45 bytes. The "^L^J" line is after the 45 bytes of tag data. You use this as a marker that there are more files remaining and also to verify that the file is properly formatted.
The hello.h data starts on the line following "hello.h,15^J" and runs for 15 bytes.

The {byte_offset} for a tag entry is the number of bytes from the start of the file the function is defined in. The number before the byte offset is the line number. In your example:
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
the foo function begins 20 bytes from the start of hello.c. You can verify that with a text editor that shows your cursor position in the file. You can also use the Unix tail command to display a file a number of bytes in:
tail -c +20 hello.c

Related

How to determine the size of a file from the strings

I have a test file called text.txt. Its contents:
as
bq
df
But the file size of text.txt is 12 bytes. Why is it 12 bytes? The first line has 3 bytes as\n. The second line has 3 bytes bq\n. The third line is 1 byte \n. The fourth line is 3 bytes dfEOF.
3 + 3 + 1 + 3 = 10 bytes
But when I check the size of the file, it says 12 bytes. If I just have a single character in my txt. It says 1 byte. So I am confused as to how I get 12 bytes
A GIF of my one notepad++ pressing the right arrow key. Showing you there is no spaces whitespace:
https://gyazo.com/82717bd0e339188adae3d72dc243ba37
My hex: 61 73 0d 0a 62 71 0d 0a 0d 0a 64 66
Given the contents are
My hex: 61 73 0d 0a 62 71 0d 0a 0d 0a 64 66
Your 12 bytes are
61 73 <- this is 'as'
0d 0a <- CR-LF newline characters
62 71 <- this is 'bq'
0d 0a <- CR-LF
0d 0a <- CR-LF for empty line
64 66 <- `df`
That's 12. Note that your last line does not have a CR-LF pair.

When using AES 128 CBC encryption is it possible to get NULL byte in the middle of encrypted message?

As in the subject. I'm wondering if it is possible to get encrypted bytes like below when using AES128 CBC:
7b 22 63 6d 64 22 3a 22 73 65 74 41 70 22 2c 22
73 63 6f 22 2c 22 70 61 00 73 22 3a 22 70 61 73
73 77 6f 72 64 22 7d 00 00 00 00 00 00 00 00 00
Note the NULL byte in the second row.
EDIT: A bit of background behind this question.
I have a C function that takes my buffer and plain text (utf8) after calling it I need to know who much of the buffer was filled up.
Yes, any byte value is possible including 0.
The implied question here is "can I use string handling functions on encrypted data". You cannot because 0 is a valid value. You need to keep track of the number of bytes in the encrypted data.

Strange .C file

Don't ask where and why I got it, but I have a lot of lines like these in .c file:
0005080: 3465 3434 2035 6635 6620 2064 6c65 2e5f 4e44 5f5f dle._
0005090: 5f44 544f 525f 454e 445f 5f0a 3030 3031 _DTOR_END__.0001
00050a0: 3334 303a 2030 3035 6620 3566 3663 2036 340: 005f 5f6c 6
00050b0: 3936 3220 3633 3566 2036 3337 3320 3735 962 635f 6373 75
00050c0: 3566 2036 3936 6520 3639 3734 2020 2e5f 5f 696e 6974 ._
00050d0: 5f6c 6962 635f 6373 755f 696e 6974 0a30 _libc_csu_init.0
What can I do with it? Is this a program?
That's not a C file. That's not a C file at all!
What appears to have happened here is that someone flipped some parameters trying to compile a file; something like gcc -o my_file.c my_file.c, or something to that effect.
If you're on Linux, you can run the file utility to figure out what it is.
Note:
This might well also be a piece of malware: The enterprising would-be attacker sent you the file, hoping you would double-click it in the file manager, causing it to execute and do something nasty.
Edit:
Also, is that the literal content of the file, or the file as seen through xxd? If it's the former, it's more likely a mistake of some kind; but if it's the latter: Beware.
This looks like the output of the hexdump command.
If you have a file temp.c with the following code:
#include<stdio.h>
int main()
{
printf("Hello World!\n");
return 0;
}
Then, hexdump -C temp.c will produce the output as:
00000000 23 69 6e 63 6c 75 64 65 3c 73 74 64 69 6f 2e 68 |#include<stdio.h|
00000010 3e 0a 69 6e 74 20 6d 61 69 6e 28 29 0a 7b 0a 09 |>.int main().{..|
00000020 70 72 69 6e 74 66 28 22 48 65 6c 6c 6f 20 57 6f |printf("Hello Wo|
00000030 72 6c 64 21 5c 6e 22 29 3b 0a 09 72 65 74 75 72 |rld!\n");..retur|
00000040 6e 20 30 3b 0a 7d 0a |n 0;.}.|
00000047
The last few lines of the compiled output file (a.out generally) for the above program reads:
\00__data_start\00__gmon_start__\00__dso_handle\00_IO_stdin_used\00__libc_csu_init\00_end\00_start\00__bss_start\00main\00_Jv_RegisterClasses\00__TMC_END__\00_ITM_registerTMCloneTable\00_init\00
In your case, it looks like the hexdump (or a similar) command was used on an a.out (i.e. object code file) file and those are the last few lines of the output.
Good luck!

Data conversion from known data

I am trying to add a function to a software (not written by me, no sources available) in which a byte string seems to be convertet somehow:
I am reading an ID from a RFID token: as Bytes:
52 55 48 48 69 54 52 65 54 65 this ID is written in the database as 57770346.
Not just the entries are different, the length is, too. Maybe the first entrys are cut off.
How could the function look like? How could i find it?
Here is some additional data:
52 55 48 48 69 54 52 65 54 65 -> 57770346
52 55 48 48 69 53 68 67 51 49 -> 57742129
48 49 48 52 68 70 57 69 65 68 -> 76731309
Found it myself:
BIN to ASCII
ASCII as HEX to INT
(first 4 characters are missing in the DB)
I'm embarrassed...

How can i export file from wireshark to display not in hex format

Im tring to export file from wireshark , so i could search in it.
now every option i try doesn't give simple raw format as the tcp raw view , when i follow
tcp stream .
all it gives me is hex view of the packets and the string in this kind of format breaks and can't be searchable . i want it to export to searchable format.
can it be done ?
this is what im getting now :
0000 48 54 54 50 2f 31 2e 31 20 35 30 30 20 49 6e 74 HTTP/1.1 500 Int
0010 65 72 6e 61 6c 20 53 65 72 76 65 72 20 45 72 72 ernal Server Err
0020 6f 72 0d 0a 44 61 74 65 3a 20 54 68 75 2c 20 31 or..Date: Thu, 1
0030 30 20 4e 6f 76 20 32 30 31 31 20 31 36 3a 33 32 0 Nov 2011 16:32
0040 3a 35 37 20 47 4d 54 0d 0a 50 72 61 67 6d 61 3a :57 GMT..Pragma:
0050 20 6e 6f 2d 63 61 63 68 65 0d 0a 43 6f 6e 74 65 no-cache..Conte
What about using TShark, sed and tr?
tshark -r Clmt_04.pcap -x -R "frame.number<40" | sed 's/^.{56}//' | tr -d '\n' > Clmt-04.txt
tshark -x
add output of hex and ASCII dump (Packet Bytes)
sed 's/^.{56}//'
remove the first 56 characters of each line
tr -d '\n'
remove new line character
Hope this helps
After you identify the tcp stream, you can use the following command with tshark:
tshark -nr <file>.pcapng -q -z follow,tcp,ascii,XXXX
Where XXXX is the tcp stream.

Resources