asprintf - how to get string input in C [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am reading the book "21 century C" (first editon) and find a interesting program using asprintf
to get the string without using malloc /size of for string length or space allocation. Please read the attached image from the same book to understand the context.Following program also from book. The program compile run and NOT taking string input from keyboard instead getting following message. Question is : Why the program doesn't take string input from keboard instead the showing long (unusual) error message?
#define _GNU_SOURCE // stdio.h to include asprintf
#include <stdlib.h>
#include <stdio.h>
void get_strings(char const *in) {
char *cmd;
asprintf(&cmd, "strings %s", in);
if (system(cmd))
fprintf(stderr, "Something went Wrong %s.\n", cmd);
free(cmd);
}
int main(int argc, char **argv) {
get_strings(argv[0]);
//return 0;
}
When the run the program the output is :
/lib64/ld-linux-x86-64.so.2
libc.so.6
__stack_chk_fail
asprintf
stderr
system
fprintf
__libc_start_main
free
__gmon_start__
GLIBC_2.4
GLIBC_2.2.5
UH-X
AWAVA
AUATL
[]A\A]A^A_
strings %s
Something went Wrong %s.
;*3$"
GCC: (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413
crtstuff.c
__JCR_LIST__
deregister_tm_clones
__do_global_dtors_aux
completed.7585
__do_global_dtors_aux_fini_array_entry
frame_dummy
__frame_dummy_init_array_entry
get_strings.c
__FRAME_END__
__JCR_END__
__init_array_end
_DYNAMIC
__init_array_start
__GNU_EH_FRAME_HDR
_GLOBAL_OFFSET_TABLE_
__libc_csu_fini
free##GLIBC_2.2.5
_ITM_deregisterTMCloneTable
_edata
__stack_chk_fail##GLIBC_2.4
system##GLIBC_2.2.5
get_strings
__libc_start_main##GLIBC_2.2.5
__data_start
fprintf##GLIBC_2.2.5
__gmon_start__
__dso_handle
_IO_stdin_used
__libc_csu_init
__bss_start
asprintf##GLIBC_2.2.5
main
_Jv_RegisterClasses
__TMC_END__
_ITM_registerTMCloneTable
stderr##GLIBC_2.2.5
.symtab
.strtab
.shstrtab
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.jcr
.dynamic
.got.plt
.data
.bss
.comment
------------------
(program exited with code: 0)
Press return to continue
**I running it on Linux Mint 18. GCC version -5.3.1
Build setting - gcc -Wall -c "%f"
Compile - gcc -Wall -o "%e" "%f"**

The purpose of the program is not to get input from the user: it uses the system() function to run the strings program with its own name as the only argument.
If you are running on a Unix environment, the strings program scans files for printable strings. The output you observe is more of less expected: your program executable as produced by gcc contains many printable strings:
you can spot the string literal present in the source code:
Something went Wrong %s.
numerous symbol names to be resolved dynamically at load time
debugging information, such as the name of the source file: crtstuff.c
section names starting with .
there are also some random items ([]A\A]A^A_, ;*3$"...) that are just sequences of printable characters present in the executable file code or binary data, mistakenly interpreted by string as C strings because they are followed by a null byte.

There is no place in your program where it reads from standard input/keyboard. And the system("strings ...") is passing a filename to the strings command, so strings reads from that file and not from keyboard.
If you intend to read from the files with the filenames passed to your program you need to keep in mind that argv[0]is the program name. You need to look at argv[1], argv[2] and so on.
for(int i = 1; i < argc; ++i)
get_strings(argv[i]);

Related

Is it possible to make a hardcoding with the help of the command objcopy

I'm working on Linux and I've just heard that there was a command objcopy, I've found the relative command on my x86_64 PC: x86_64-linux-gnu-objcopy.
With its help, I can convert a file into an obj file: x86_64-linux-gnu-objcopy -I binary -O elf64-x86-64 custom.config custom.config.o
The file custom.config is a human-readable file. It contains two lines:
name titi
password 123
Now I can execute objdump -x -s custom.config.o to check its information.
custom.config.o: file format elf64-little
custom.config.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000017 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_custom_config_start
0000000000000017 g .data 0000000000000000 _binary_custom_config_end
0000000000000017 g *ABS* 0000000000000000 _binary_custom_config_size
Contents of section .data:
0000 6e616d65 20746974 690a7061 7373776f name titi.passwo
0010 72642031 32330a rd 123.
As all we know, we can open, read or write a file, such as custom.config in any C/C++ project. Now, I'm thinking if it's possible to use this obj file custom.config.o immediately in a C/C++ project. For example, is it possible to read the content of the file custom.config.o immediately without calling the I/O functions, such as open, read or write. If possible, I think this might become some kind of hardcoding style and avoid calling the I/O functions?
Even if I tried this on Win10 with MinGW (MinGW-W64 project, GCC 8.1.0), this should work for you with only minor adaptions.
As you see from the info objdump gave you, the file's contents is placed in the .data section that is the common section for non-constant variables.
And some symbols were defined for it. You can declare these symbols in your C source.
The absolute value _binary_custom_config_size is special, because it is marked *ABS*. Currently I know no other way to obtain its value than to declare a variable of any type and take its address.
This is my show_config.c:
#include <stdio.h>
#include <string.h>
extern const char _binary_custom_config_start[];
extern const char _binary_custom_config_size;
int main(void) {
size_t size = (size_t)&_binary_custom_config_size;
char config[size + 1];
strncpy(config, _binary_custom_config_start, size);
config[size] = '\0';
printf("config = \"%s\"\n", config);
return 0;
}
Because the "binary" file (actually a text) has no final '\0' character, you need to append one to get a correctly terminated C string.
You could as well declare _binary_custom_config_end and use it to calculate the size, or as a limit.
Building everything goes like this (I used the -g option to debug):
$ objcopy -I binary -O elf64-x86-64 -B i386 custom.config custom.config.o
$ gcc -Wall -Wextra -pedantic -g show_config.c custom.config.o -o show_config
And the output shows the success:
$ show_config.exe
config = "name titi
password 123"
If you need the file's contents in another section, you will add the option to rename the section to objcopy's call. Add any flag you need, the example shows .rodata that is used for read-only data:
--rename-section .data=.rodata,alloc,load,readonly,data,contents

[]A\A]A^A_ and ;*3$" in compiled C binary

I'm on an Ubuntu 18.04 laptop coding C with VSCode and compiling it with GNU's gcc.
I'm doing some basic engineering on my own C code and I noticed a few interesting details, on of which is the pair []A\A]A^A_ and ;*3$" that seems to appear in every one of my compiled C binaries. Between them is usually (or always) strings that I hard code in for printf() functions.
An example is this short piece of code here:
#include <stdio.h>
#include <stdbool.h>
int f(int i);
int main()
{
int x = 5;
int o = f(x);
printf("The factorial of %d is: %d\n", x, o);
return 0;
}
int f(int i)
{
if(i == 0)
{
return i;
}
else
{
return i*f(i-1);
}
}
... is then compiled using gcc test.c -o test.
When I run strings test, the following is outputted:
/lib64/ld-linux-x86-64.so.2
0HSn(
libc.so.6
printf
__cxa_finalize
__libc_start_main
GLIBC_2.2.5
_ITM_deregisterTMCloneTable
__gmon_start__
_ITM_registerTMCloneTable
AWAVI
AUATL
[]A\A]A^A_
The factorial of %d is: %d
;*3$"
GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
crtstuff.c
deregister_tm_clones
__do_global_dtors_aux
completed.7697
__do_global_dtors_aux_fini_array_entry
frame_dummy
__frame_dummy_init_array_entry
test.c
__FRAME_END__
__init_array_end
_DYNAMIC
__init_array_start
__GNU_EH_FRAME_HDR
_GLOBAL_OFFSET_TABLE_
__libc_csu_fini
_ITM_deregisterTMCloneTable
_edata
printf##GLIBC_2.2.5
__libc_start_main##GLIBC_2.2.5
__data_start
__gmon_start__
__dso_handle
_IO_stdin_used
__libc_csu_init
__bss_start
main
__TMC_END__
_ITM_registerTMCloneTable
__cxa_finalize##GLIBC_2.2.5
.symtab
.strtab
.shstrtab
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.dynamic
.data
.bss
.comment
Same as other scripts I've written, the 2 pieces []A\A]A^A_ and ;*3$" always pop up, 1 before the strings used with printf and one right after.
I'm curious: What exactly do those strings mean? I'm guessing they mainly mark the begining and endding of the use of hard-coded output strings.
Our digital computers work on bits, most commonly clustered in bytes containing 8 bits each. The meaning of such a combination depends on the context and the interpretation.
A non-exhausting list of possible interpretation is:
ASCII characters with the eighth bit ignored or accepted only if 0;
signed or unsigned 8-bit integer;
operation code (or part of it) of one specific machine language, each processor (family) has its own different set.
For example, the hex value 0x43 can be seen as:
ASCII character 'C';
Unsigned 8-bit integer 67 (signed is the same if 2's complement is used);
Operation code "LD B,E" for a Z80 CPU (see, I'm really old and learned that processor in depth);
Operation code "EORS ari" for an ARM CPU.
Now strings simply (not to say "primitively") scans through the given file and tries so interpret the bytes as sequences of printable ASCII characters. By default a sequence has to have at least 4 characters and the bytes are interpreted as 7-bit ASCII. BTW, the file does not have to be an executable. You can scan any file but if you give it an object file by default it scans only sections that are loaded in memory.
So what you see are sequences of bytes which by chance are at least 4 printable characters in a row. And because some patterns are always in an executable it just looks as if they have a special meaning. Actually they have but they don't have to relate to your program's strings.
You can use strings to quickly peek into a file to find, well, strings which might help you with whatever you're trying to accomplish.
What you're seeing is an ASCII representation of a particular bit pattern that happens to be common in executable programs generated by that particular compiler. The pattern might correspond to a particular sequence of machine language instructions which the compiler is fond of emitting. Or it might correspond to a particular data structure which the compiler or linker uses to mark the various other pieces of data stored in the executable.
Given enough work, it would probably be possible to work out the actual details, for your C code and your particular version of your particular compiler, precisely what the bit patterns behind []A\A]A^A_ and ;*3$" correspond to. But I don't do much machine-language programming any more, so I'm not going to try, and the answers probably wouldn't be too interesting in the end, anyway.
But it reminds me of little quirk which I have noticed and can explain. Suppose you wrote the very simple program
int i = 12345;
If you compiled that program and ran strings on it, and if you told it to look for strings as short as two characters, you'd probably see (among lots of other short, meaningless strings), the string
90
and that bit pattern would, in fact, correspond to your variable! What's up with that?
Well, 12345 in hexadecimal is 0x3039, and most machines these days are little-endian, so those two bytes in memory are stored in the other order as
39 30
and in ASCII, 0x39 is '9', while 0x30 is '0'.
And if this is interesting to you, you can try compiling the program fragment
int i = 12345;
long int a = 1936287860;
long int b = 1629516649;
long int c = 1953719668;
long long int x = 48857072035144;
long long int y = 36715199885175;
and running strings -2 on it, and see what else you get.

How do I add contents of text file as a section in an ELF file?

I have a NASM assembly file that I am assembling and linking (on Intel-64 Linux).
There is a text file, and I want the contents of the text file to appear in the resulting binary (as a string, basically). The binary is an ELF executable.
My plan is to create a new readonly data section in the ELF file (equivalent to the conventional .rodata section).
Ideally, there would be a tool to add a file verbatim as a new section in an elf file, or a linker option to include a file verbatim.
Is this possible?
This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program.
OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name.
Assume we have an input file called myfile.txt with the contents:
the
quick
brown
fox
jumps
over
the
lazy
dog
Something like this would be a starting point:
objcopy --input binary \
--output elf64-x86-64 \
--binary-architecture i386:x86-64 \
myfile.txt myfile.o
If you wanted to generate 32-bit objects you could use:
objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 \
myfile.txt myfile.o
The output would be an object file called myfile.o . If we were to review the headers of the object file using OBJDUMP and a command like objdump -x myfile.o we would see something like this:
myfile.o: file format elf64-x86-64
myfile.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_myfile_txt_start
000000000000002c g .data 0000000000000000 _binary_myfile_txt_end
000000000000002c g *ABS* 0000000000000000 _binary_myfile_txt_size
By default it creates a .data section with contents of the file and it creates a number of symbols that can be used to reference the data.
_binary_myfile_txt_start
_binary_myfile_txt_end
_binary_myfile_txt_size
This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file myfile.txt. OBJCOPY will base the symbols on the input file name. myfile.txt is mangled into myfile_txt and used to create the symbols.
One problem is that a .data section is created which is read/write/data as seen here:
Idx Name Size VMA LMA File off Algn
0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
You specifically are requesting a .rodata section that would also have the READONLY flag specified. You can use the --rename-section option to change .data to .rodata and specify the needed flags. You could add this to the command line:
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA
Of course if you want to call the section something other than .rodata with the same flags as a read only section you can change .rodata in the line above to the name you want to use for the section.
The final version of the command that should generate the type of object you want is:
objcopy --input binary \
--output elf64-x86-64 \
--binary-architecture i386:x86-64 \
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \
myfile.txt myfile.o
Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki:
A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss.
The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced)
Keeping this in mind we could create this C file called main.c:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
/* These are external references to the symbols created by OBJCOPY */
extern char _binary_myfile_txt_start[];
extern char _binary_myfile_txt_end[];
extern char _binary_myfile_txt_size[];
int main()
{
char *data_start = _binary_myfile_txt_start;
char *data_end = _binary_myfile_txt_end;
size_t data_size = (size_t)_binary_myfile_txt_size;
/* Print out the pointers and size */
printf ("data_start %p\n", data_start);
printf ("data_end %p\n", data_end);
printf ("data_size %zu\n", data_size);
/* Print out each byte until we reach the end */
while (data_start < data_end)
printf ("%c", *data_start++);
return 0;
}
You can compile and link with:
gcc -O3 main.c myfile.o
The output should look something like:
data_start 0x4006a2
data_end 0x4006ce
data_size 44
the
quick
brown
fox
jumps
over
the
lazy
dog
A NASM example of usage is similar in nature to the C code. The following assembly program called nmain.asm writes the same string to standard output using Linux x86-64 System Calls:
bits 64
global _start
extern _binary_myfile_txt_start
extern _binary_myfile_txt_end
extern _binary_myfile_txt_size
section .text
_start:
mov eax, 1 ; SYS_Write system call
mov edi, eax ; Standard output FD = 1
mov rsi, _binary_myfile_txt_start ; Address to start of string
mov rdx, _binary_myfile_txt_size ; Length of string
syscall
xor edi, edi ; Return value = 0
mov eax, 60 ; SYS_Exit system call
syscall
This can be assembled and linked with:
nasm -f elf64 -o nmain.o nmain.asm
gcc -m64 -nostdlib nmain.o myfile.o
The output should appear as:
the
quick
brown
fox
jumps
over
the
lazy
dog

How to load library defined symbols to a specified location?

The test is on Ubuntu 12.04, 32-bit, with gcc 4.6.3.
Basically I am doing some binary manipulation work on ELF binaries, and what I have to do now is to assemble a assembly program and guarantee the libc symbols are loaded to a predefined address by me.
Let me elaborate it in an simple example.
Suppose in the original code, libc symbols stdout#GLIBC_2.0 is used.
#include <stdio.h>
int main() {
FILE* fout = stdout;
fprintf( fout, "hello\n" );
}
When I compile it and check the symbol address using these commands:
gcc main.c
readelf -s a.out | grep stdout
I got this:
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout#GLIBC_2.0 (2)
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout##GLIBC_2.0
and the .bss section is like this:
readelf -S a.out | grep bss
[25] .bss NOBITS 0804a020 001014 00000c 00 WA 0 0 32
Now what I am trying to do is to load the stdout symbol in a predefined address, so I did this:
echo "stdout = 0x804a024;" > symbolfile
gcc -Wl,--just-symbols=symbolfile main.c
Then when I check the .bss section and symbol stdout, I got this:
[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4
4: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
49: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
It seems that I didn't successfully load the symbol stdout##GLIBC_2.0, but just a wired stdout. (I tried to write stdout##GLIBC_2.0 in symbolfile, but it can't compile... )
It seems that as I didn't make it, the beginning address of .bss section has also changed, which makes the address of stdout symbol in a non-section area. During runtime, it throws a segmentation fault when loading from 0x804a024.
Could anyone help me on how to successfully load the library symbol at a predefined address? Thanks!

How do I see the memory locations of static variables within .bss?

Supposing I have a static variable declared in gps_anetenova_m10478.c as follows:
static app_timer_id_t m_gps_response_timeout_timer_id;
I have some sort of buffer overrun bug in my code and at some point a write to the variable right before m_gps_response_timeout_timer_id in memory is overwriting it.
I can find out where m_gps_response_timeout_timer_id is in memory using the 'Expressions' view in Eclipse's GDB client. Just enter &m_gps_response_timeout_timer_id. But how do I tell which variable is immediately before it in memory?
Is there a way to get this info into the .map file that ld produces? At the moment I only see source files:
.bss 0x000000002000011c 0x0 _build/debug_leds.o
.bss 0x000000002000011c 0x11f8 _build/gps_antenova_m10478.o
.bss 0x0000000020001314 0x161c _build/gsm_ublox_sara.o
I'll be honest, I don't know enough about Eclipse to give an easy way within Eclipse to get this. The tool you're probably looking for is either objdump or nm. An example with objdump is to simply run objdump -x <myELF>. This will then return all symbols in the file, which section they're in, and their addresses. You'll then have to manually search for the variable in which you're interested based on the addresses.
objdump -x <ELFfile> will give output along the lines of the following:
000120d8 g F .text 0000033c bit_string_copy
00015ea4 g O .bss 00000004 overflow_bit
00015e24 g .bss 00000000 __bss_start
00011ce4 g F .text 0000003c main
00014b6c g F .text 0000008c integer_and
The first column is the address, the fourth the section and the fifth the length of that field.
nm <ELFfile> gives the following:
00015ea8 B __bss_end
00015e24 B __bss_start
0000c000 T _start
00015e20 D zero_constant
00015e24 b zero_constant_itself
The first column is the address and the second the section. D/d is data, B/b is BSS and T/t is text. The rest can be found in the manpage. nm also accepts the -n flag to sort the lines by their numeric address.

Resources