What is the time complexity of crypt function in Linux? - c

The crypt function described as below in unix for authentication:
char *crypt(const char *key, const char *salt);
Assume that I have the key (of length n), and the salt (of length m), what is the time complexity (order of algorithm) of calling this function?

From the man page of crypt:
salt is a two-character string chosen from the set [a-zA-Z0-9./]. This string is used to perturb the algorithm in one of 4096 different ways.
and
By taking the lowest 7 bits of each of the first eight characters of the key, a 56-bit key is obtained.
The thusly obtained key is then used to encrypt a constant string (using a tweaked DES algorithm), which takes constant time. Therefore, the function has constant run-time for any valid arguments. Note that this truncating leads to very weak passwords.
As commented by melpomene, some implementations provide an extension to the crypt function that allow selecting a more secure mode. For the following, I will assume you are using the crypt function from the GNU C library. The manual says:
For the MD5-based algorithm, the salt should consist of the string $1$, followed by up to 8 characters, terminated by either another $ or the end of the string. The result of crypt will be the salt, followed by a $ if the salt didn't end with one, followed by 22 characters from the alphabet ./0-9A-Za-z, up to 34 characters total. Every character in the key is significant.
Since the length of the salt is fixed by a constant and the cryptographic hash function has time complexity linear in the length of the input, the overall time complexity of the crypt function will be linear in key.
My version of glibc also supports the more secure SHA-256 (selected via $5$) and SHA-512 (selected via $6$) cryptographic hash functions in addition to MD5. These have linear time complexity in the length of their input, too.
Since I cannot make sense out of the task I'm actually supposed to do right now, I have timed the various crypt methods to support the above analysis. Here are the results.
Plotted are the execution times spent in the crypt function against the length of the key string. Each data series is overlayed with a linear regression except for DES where the average value is plotted instead. I am surprised that SHA-512 is actually faster than SHA-256.
The code used for the benchmarks is here (benchmark.c).
#define _GNU_SOURCE /* crypt */
#include <errno.h> /* errno, strerror */
#include <stdio.h> /* FILE, fopen, fclose, fprintf */
#include <stdlib.h> /* EXIT_{SUCCESS,FAILURE}, malloc, free, [s]rand */
#include <string.h> /* size_t, strlen */
#include <assert.h> /* assert */
#include <time.h> /* CLOCKS_PER_SEC, clock_t, clock */
#include <unistd.h> /* crypt */
/* Barrier to stop the compiler from re-ordering instructions. */
#define COMPILER_BARRIER asm volatile("" ::: "memory")
/* First character in the printable ASCII range. */
static const char ascii_first = ' ';
/* Last character in the printable ASCII range. */
static const char ascii_last = '~';
/*
Benchmark the time it takes to crypt(3) a key of length *keylen* with salt
*salt*. The result is written to the stream *ostr* so its computation cannot
be optimized away.
*/
static clock_t
measure_crypt(const size_t keylen, const char *const salt, FILE *const ostr)
{
char * key;
const char * passwd;
clock_t t1;
clock_t t2;
size_t i;
key = malloc(keylen + 1);
if (key == NULL)
return ((clock_t) -1);
/*
Generate a random key. The randomness is extremely poor; never do this in
cryptographic applications!
*/
for (i = 0; i < keylen; ++i)
key[i] = ascii_first + rand() % (ascii_last - ascii_first);
key[keylen] = '\0';
assert(strlen(key) == keylen);
COMPILER_BARRIER;
t1 = clock();
COMPILER_BARRIER;
passwd = crypt(key, salt);
COMPILER_BARRIER;
t2 = clock();
COMPILER_BARRIER;
fprintf(ostr, "%s\n", passwd);
free(key);
return t2 - t1;
}
/*
The program can be called with zero or one arguments. The argument, if
given, will be used as salt.
*/
int
main(const int argc, const char *const *const argv)
{
const size_t keymax = 2000;
const size_t keystep = 100;
const char * salt = ".."; /* default salt */
FILE * devnull = NULL; /* redirect noise to black hole */
int status = EXIT_SUCCESS;
size_t keylen;
if (argc > 1)
salt = argv[1];
devnull = fopen("/dev/null", "r");
if (devnull == NULL)
goto label_catch;
srand((unsigned) clock());
for (keylen = 0; keylen <= keymax; keylen += keystep)
{
clock_t ticks;
double millis;
ticks= measure_crypt(keylen, salt, devnull);
if (ticks < 0)
goto label_catch;
millis = 1.0E3 * ticks / CLOCKS_PER_SEC;
fprintf(stdout, "%16zu %e\n", keylen, millis);
}
goto label_finally;
label_catch:
status = EXIT_FAILURE;
fprintf(stderr, "error: %s\n", strerror(errno));
label_finally:
if (devnull != NULL)
fclose(devnull);
return status;
}
The Gnuplot script used for the regression and plotting is here (plot.gplt).
set terminal 'svg'
set output 'timings.svg'
set xrange [0 : *]
set yrange [0 : *]
set key top left
set title 'crypt(3) benchmarks'
set xlabel 'key length / bytes'
set ylabel 'computation time / milliseconds'
des(x) = a_des
md5(x) = a_md5 + b_md5 * x
sha256(x) = a_sha256 + b_sha256 * x
sha512(x) = a_sha512 + b_sha512 * x
fit des(x) 'timings.des' via a_des
fit md5(x) 'timings.md5' via a_md5, b_md5
fit sha256(x) 'timings.sha256' via a_sha256, b_sha256
fit sha512(x) 'timings.sha512' via a_sha512, b_sha512
plot des(x) w l notitle lc '#75507b' lt 1 lw 2.5, \
'timings.des' w p t 'DES' lc '#5c3566' pt 7 ps 0.8, \
md5(x) w l notitle lc '#cc0000' lt 1 lw 2.5, \
'timings.md5' w p t 'MD5' lc '#a40000' pt 7 ps 0.8, \
sha256(x) w l notitle lc '#73d216' lt 1 lw 2.5, \
'timings.sha256' w p t 'SHA-256' lc '#4e9a06' pt 7 ps 0.8, \
sha512(x) w l notitle lc '#3465a4' lt 1 lw 2.5, \
'timings.sha512' w p t 'SHA-512' lc '#204a87' pt 7 ps 0.8
Finally the Makefile used to hook everything together (GNUmakefile).
CC := gcc
CPPFLAGS :=
CFLAGS := -Wall -O2
LDFLAGS :=
LIBS := -lcrypt
all: benchmark timings.svg timings.png
benchmark: benchmark.o
${CC} -o $# ${CFLAGS} $^ ${LDFLAGS} ${LIBS}
benchmark.o: benchmark.c
${CC} -c ${CPPFLAGS} ${CFLAGS} $<
timings.svg: plot.gplt timings.des timings.md5 timings.sha256 timings.sha512
gnuplot $<
timings.png: timings.svg
convert $< $#
timings.des: benchmark
./$< '$(shell pwgen -ncs 2)' > $#
timings.md5: benchmark
./$< '$$1$$$(shell pwgen -ncs 8)' > $#
timings.sha256: benchmark
./$< '$$5$$$(shell pwgen -ncs 16)' > $#
timings.sha512: benchmark
./$< '$$6$$$(shell pwgen -ncs 16)' > $#
clean:
rm -f benchmark benchmark.o fit.log $(wildcard *.o timings.*)
.PHONY: all clean

Related

FPGA Analog-to-Digital Conversion DE1-SOC

I have a DE1-SOC(Rev. C) running Linux. I am having problems accessing the onboard ADC. The input to all 8 channels is a 3V Pk-Pk sinusoidal signal. The onboard ADC is an AD7928 12-bit 8-channel ADC. The datasheet says that the ADC can handle bipolar signals, and gives the following circuit diagram:
AD7928 Bipolar Circuit Diagram
All eight channels need to be sampled continually. and the DE1-SOC datasheet specifies to set the channel one register to 1, which activates the automatic update option on the ADC. Here's my first attempt at the code. It compiles and runs but the values aren't correct, as the same signal that's being fed into the ADC is also being measured by my oscilloscope.
#include <inttypes.h>
#include <time.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/mman.h>
/* FPGA HPS BRIDGE BASE */
#define LW_BRIDGE_BASE (0xFF200000)
#define HW_REGS_BASE (0xFF200000)
#define HW_REGS_SPAN (0x00200000)
#define HW_REGS_MASK ( HW_REGS_SPAN - 1 )
/* HPS-2-FPGA AXI Bridge */
#define ALT_AXI_FPGASLVS_OFST (0xC0000000) // axi_master
#define HW_FPGA_AXI_SPAN (0x40000000) // Bridge span 1GB
#define HW_FPGA_AXI_MASK ( HW_FPGA_AXI_SPAN - 1 )
/* ADC REGISTER SPAN */
#define ADC_BASE (0x00004000)
/* ADC CHANNEL & UPDATE REGISTERS */
#define ADC_CH0_UPDATE (LW_BRIDGE_BASE+ADC_BASE)
#define ADC_CH1_AUTO_UPDATE (LW_BRIDGE_BASE+ADC_BASE+4) // Write 1 for continual ADC request
#define ADC_CH2 (LW_BRIDGE_BASE+ADC_BASE+8)
#define ADC_CH3 (LW_BRIDGE_BASE+ADC_BASE+12)
#define ADC_CH4 (LW_BRIDGE_BASE+ADC_BASE+16)
#define ADC_CH5 (LW_BRIDGE_BASE+ADC_BASE+20)
#define ADC_CH6 (LW_BRIDGE_BASE+ADC_BASE+24)
#define ADC_CH7 (LW_BRIDGE_BASE+ADC_BASE+28)
/* ADC REGISTER END */
#define ADC_END (0x0000001F)
int main() {
// Defining variables
void *virtual_base;
int fd;
volatile int *h2p_lw_adc_addr;
int i;
//Defining pointer for register
if((fd = open( "/dev/mem",(O_RDWR | O_SYNC ))) == -1) {
printf("ERROR: could not open \"/dev/mem\"...\n");
return(1);
}
virtual_base = mmap(NULL,HW_REGS_SPAN,(PROT_READ | PROT_WRITE),MAP_SHARED,fd,HW_REGS_BASE);
if(virtual_base == MAP_FAILED) {
printf("ERROR: mmap() failed...\n");
close(fd);
return(1);
}
h2p_lw_adc_addr = virtual_base + ((int)(LW_BRIDGE_BASE + ADC_BASE)&(int)(HW_REGS_MASK));
float Vref = 5.0;
float stepSize = Vref/4096.0;
/* Heading & Calculating Step Size/Resolution */
printf("*____________________________________*\n");
printf("* Setting up the AD7928 ADC *\n");
printf("*____________________________________*\n");
printf("Resolution for 5V Vref: %f[mV]\n", stepSize*1000);
// Setting up the ADC for bipolar signal
// ...
// Auto-update all channels continuously
*(int *)(h2p_lw_adc_addr + 4) = 1;
// Sample a single channel
// ...
/* Data Collection Attempt #1 */
int num = 5; // Number of samples?
unsigned int samples[num];
int channel = 16; // channel 4
for (i = 0; i < num; i++){
samples[i] = *(int *)(h2p_lw_adc_addr + channel);
}
if(munmap(virtual_base, HW_REGS_SPAN) != 0) {
printf("ERROR: munmap() failed...\n");
close(fd);
return(1);
}
close(fd);
return 0;
}
It gets cross-compiled using this Makefile:
C_SRC := adc.c
CFLAGS := -g -O0 -Wall
LDFLAGS := -lm
CROSS_COMPILE := arm-linux-gnueabihf-
CC := $(CROSS_COMPILE)gcc
NM := $(CROSS_COMPILE)nm
ifeq ($(or $(COMSPEC),$(ComSpec)),)
RM := rm -rf
else
RM := cs-rm -rf
endif
ELF ?= adc
OBJ := $(patsubst %.c,%.o,$(C_SRC))
.c.o:
$(CC) $(CFLAGS) -c $< -o $#
.PHONY: all
all: $(ELF)
.PHONY:
clean:
$(RM) $(ELF) $(OBJ) $(OBJS) *.map *.objdump
$(ELF): $(OBJ) $(OBJS)
$(CC) $(CFLAGS) $(OBJ) $(OBJS) -o $# $(LDFLAGS)
$(NM) $# > $#.map
I'm a noobie when it comes to ADCs and DSP, but ideally, I would like to be able to continually measure all eight channels, recording the pk-pk (amplitude) of the incoming sine waves in each one, which will eventually be used for post-processing.
As of right now, the output for the five samples is always 0, except when I sample channel 1, then all five samples are 1, like so:
Samples [0]: 1
Samples [1]: 1
Samples [2]: 1
Samples [3]: 1
Samples [4]: 1
Even when I increase the number of samples, it's always 1 for Channel 1 and 0 for all the other channels.
I think my problem is probably a combination of my code and also maybe not having the buffering circuitry? (But I'm not handling the bipolar input only because I can set the DC offset on my signal generator so it's an all positive 3v pk-pk.)
Vref on the ADC is being fed an even 5V DC. I'm pretty lost right now, so any help or pointers would be greatly appreciated.
I bet that your problem is in the following lines:
> volatile int *h2p_lw_adc_addr;
>
> *(int *)(h2p_lw_adc_addr + 4) = 1;
>
> samples[i] = *(int *)(h2p_lw_adc_addr + channel);
Because h2p_lw_adc_addr is pointer to int, you will get wrong addresses from the later two lines.
When you add number N to the int pointer, the result pointer is N * sizeof(int) bigger than the int pointer.
Change the type of h2p_lw_adc_addr to char pointer to get quick fix:
volatile char *h2p_lw_adc_addr;
Or alternatively, you can change the offsets:
*(int *)(h2p_lw_adc_addr + 1) = 1;
int channel = 4; // channel 4
But in that case I propose to use int32_t or uint32_t instead on int:

How to translate neon intrinsics to llvm-IR using llvm-clang on x86

Using clang we can generate IR with compile C program:
clang -S -emit-llvm hello.c -o hello.ll
I would like to translate neon intrinsic to llvm-IR, code like this:
/* neon_example.c - Neon intrinsics example program */
#include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <arm_neon.h>
/* fill array with increasing integers beginning with 0 */
void fill_array(int16_t *array, int size)
{ int i;
for (i = 0; i < size; i++)
{
array[i] = i;
}
}
/* return the sum of all elements in an array. This works by calculating 4 totals (one for each lane) and adding those at the end to get the final total */
int sum_array(int16_t *array, int size)
{
/* initialize the accumulator vector to zero */
int16x4_t acc = vdup_n_s16(0);
int32x2_t acc1;
int64x1_t acc2;
/* this implementation assumes the size of the array is a multiple of 4 */
assert((size % 4) == 0);
/* counting backwards gives better code */
for (; size != 0; size -= 4)
{
int16x4_t vec;
/* load 4 values in parallel from the array */
vec = vld1_s16(array);
/* increment the array pointer to the next element */
array += 4;
/* add the vector to the accumulator vector */
acc = vadd_s16(acc, vec);
}
/* calculate the total */
acc1 = vpaddl_s16(acc);
acc2 = vpaddl_s32(acc1);
/* return the total as an integer */
return (int)vget_lane_s64(acc2, 0);
}
/* main function */
int main()
{
int16_t my_array[100];
fill_array(my_array, 100);
printf("Sum was %d\n", sum_array(my_array, 100));
return 0;
}
But It doesn't support neon intrinsic, and print error messages like this:
/home/user/llvm-proj/build/bin/../lib/clang/4.0.0/include/arm_neon.h:65:24: error:
'neon_vector_type' attribute is not supported for this target
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
^
I think the reason is my host is on x86, but target is on ARM.
And I have no idea how to Cross-compilation using Clang to translate to llvm-IR(clang version is 4.0 on ubuntu 14.04).
Is there any target option commands or other tools helpful?
And any difference between SSE and neon llvm-IR?
Using ELLCC, a pre-packaged clang based tool chain (http://ellcc.org), I was able to compile and run your program by adding -mfpu=neon:
rich#dev:~$ ~/ellcc/bin/ecc -target arm32v7-linux -mfpu=neon neon.c
rich#dev:~$ ./a.
a.exe a.out
rich#dev:~$ file a.out
a.out: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=613c22f6bbc277a8d577dab7bb27cd64443eb390, not stripped
rich#dev:~$ ./a.out
Sum was 4950
rich#dev:~$
It was compiled on an x86 and I ran it using QEMU.
Using normal clang, you'll also need the appropriate -target option for ARM. ELLCC uses slightly different -target options.

Issues with make and #include

Somewhere between my headers and my Makefile I'm not doing the dependencies correctly, and it's not compiling. This really only has anything to do with the first few lines from each code, but I posted all the code for reference
I'm trying to split up a who clone into 3 parts. Here is the original for reference. The exercise is to make it with utmp, so you also need utmplib
So I've split it up into 3 files, the first one being show.h
#include <stdio.h>
#include <sys/types.h>
#include <utmp.h>
#include <fcntl.h>
#include <time.h>
#include <stdlib.h>
#define SHOWHOST
void show_info(struct utmp *);
void showtime(time_t);
then I have show.c
/*
* * show info()
* * displays the contents of the utmp struct
* * in human readable form
* * * displays nothing if record has no user name
* */
void show_info( struct utmp *utbufp )
{
if ( utbufp->ut_type != USER_PROCESS )
return;
printf("%-8.8s", utbufp->ut_name); /* the logname */
printf(" "); /* a space */
printf("%-8.8s", utbufp->ut_line); /* the tty */
printf(" "); /* a space */
showtime( utbufp->ut_time ); /* display time */
#ifdef SHOWHOST
if ( utbufp->ut_host[0] != '\0' )
printf(" (%s)", utbufp->ut_host); /* the host */
#endif
printf("\n"); /* newline */
}
void showtime( time_t timeval )
/*
* * displays time in a format fit for human consumption
* * uses ctime to build a string then picks parts out of it
* * Note: %12.12s prints a string 12 chars wide and LIMITS
* * it to 12chars.
* */
{
char *ctime(); /* convert long to ascii */
char *cp; /* to hold address of time */
cp = ctime( &timeval ); /* convert time to string */
/* string looks like */
/* Mon Feb 4 00:46:40 EST 1991 */
/* 0123456789012345. */
printf("%12.12s", cp+4 ); /* pick 12 chars from pos 4 */
}
and finally, `who3.c'
/* who3.c - who with buffered reads
* - surpresses empty records
* - formats time nicely
* - buffers input (using utmplib)
*/
#include "show.h"
int main()
{
struct utmp *utbufp, /* holds pointer to next rec */
*utmp_next(); /* returns pointer to next */
if ( utmp_open( UTMP_FILE ) == -1 ){
perror(UTMP_FILE);
exit(1);
}
while ( ( utbufp = utmp_next() ) != ((struct utmp *) NULL) )
show_info( utbufp );
utmp_close( );
return 0;
}
So I created my Makefile:
who3:who3.o utmplib.o
gcc -o who who3.o utmplib.o
who3.o:who3.c show.c
gcc -c who3.c show.o
show.o:show.c
gcc -c show.c show.h
utmplib.o:utmplib.c
gcc -c utmplib.c
clean:
rm -f *.o
Unfortunately there's an error when I do make:
gcc -o who who3.o utmplib.o
who3.o: In function `main':
who3.c:(.text+0x38): undefined reference to `show_info'
collect2: error: ld returned 1 exit status
make: *** [who3] Error 1
As I said earlier, I haven't done my dependencies correctly, and I'm not sure what I did wrong. How do I do my dependencies correctly?
It looks like you are missing show.o from the dependencies and from the list of object files of the command for building who3 in your makefile.
Also, the command for who3.o looks wrong. You are compiling only -c, but you are passing an object file as input (show.o). You should remove show.o from the rule and show.c doesn't belong on the list of dependencies of who3.o either.
Also, the command for show.o looks wrong. You shouldn't be passing header files (show.h) to the compiler; they only need to be referenced as #include in the source files.
Also, you are inconsistent about what your default is actually called. You say it is who3 in the rule (who3: ...) but the command will actually build a task called who (gcc -o who ...).

Modifying Linker Script to make the .text section writable, errors

I am trying to make the .text section writable for a C program. I looked through the options provided in this SO question and zeroed on modifying the linker script to achieve this.
For this I created a writable memory region using
MEMORY { rwx (wx) : ORIGIN = 0x400000, LENGTH = 256K}
and at the section .text added:
.text :
{
*(.text.unlikely .text.*_unlikely)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf32.em. */
*(.gnu.warning)
} >rwx
On compiling the code with gcc flag -T and giving my linker file as an argument I am getting an error:
error: no memory region specified for loadable section '.interp'
I am only trying to change the memory permissions for the .text region. Working on Ubuntu x86_64 architecture.
Is there a better way to do this?
Any help is highly appreciated.
Thanks
The Linker Script
Linker Script on pastie.org
In Linux, you can use mprotect() to enable/disable text section write protection from the runtime code; see the Notes section in man 2 mprotect.
Here is a real-world example. First, however, a caveat:
I consider this just a proof of concept implementation, and not something I'd ever use in a real world application. It may look enticing for use in a high-performance library of some sort, but in my experience, changing the API (or the paradigm/approach) of the library usually yields much better results -- and fewer hard-to-debug bugs.
Consider the following six files:
foo1.c:
int foo1(const int a, const int b) { return a*a - 2*a*b + b*b; }
foo2.c:
int foo2(const int a, const int b) { return a*a + b*b; }
foo.h.header:
#ifndef FOO_H
#define FOO_H
extern int foo1(const int a, const int b);
extern int foo2(const int a, const int b);
foo.h.footer:
#endif /* FOO_H */
main.c:
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include "foo.h"
int text_copy(const void *const target,
const void *const source,
const size_t length)
{
const long page = sysconf(_SC_PAGESIZE);
void *start = (char *)target - ((long)target % page);
size_t bytes = length + (size_t)((long)target % page);
/* Verify sane page size. */
if (page < 1L)
return errno = ENOTSUP;
/* Although length should not need to be a multiple of page size,
* adjust it up if need be. */
if (bytes % (size_t)page)
bytes = bytes + (size_t)page - (bytes % (size_t)page);
/* Disable write protect on target pages. */
if (mprotect(start, bytes, PROT_READ | PROT_WRITE | PROT_EXEC))
return errno;
/* Copy code.
* Note: if the target code is being executed, we're in trouble;
* this offers no atomicity guarantees, so other threads may
* end up executing some combination of old/new code.
*/
memcpy((void *)target, (const void *)source, length);
/* Re-enable write protect on target pages. */
if (mprotect(start, bytes, PROT_READ | PROT_EXEC))
return errno;
/* Success. */
return 0;
}
int main(void)
{
printf("foo1(): %d bytes at %p\n", foo1_SIZE, foo1_ADDR);
printf("foo2(): %d bytes at %p\n", foo2_SIZE, foo2_ADDR);
printf("foo1(3, 5): %d\n", foo1(3, 5));
printf("foo2(3, 5): %d\n", foo2(3, 5));
if (foo2_SIZE < foo1_SIZE) {
printf("Replacing foo1() with foo2(): ");
if (text_copy(foo1_ADDR, foo2_ADDR, foo2_SIZE)) {
printf("%s.\n", strerror(errno));
return 1;
}
printf("Done.\n");
} else {
printf("Replacing foo2() with foo1(): ");
if (text_copy(foo2_ADDR, foo1_ADDR, foo1_SIZE)) {
printf("%s.\n", strerror(errno));
return 1;
}
printf("Done.\n");
}
printf("foo1(3, 5): %d\n", foo1(3, 5));
printf("foo2(3, 5): %d\n", foo2(3, 5));
return 0;
}
function-info.bash:
#!/bin/bash
addr_prefix=""
addr_suffix="_ADDR"
size_prefix=""
size_suffix="_SIZE"
export LANG=C
export LC_ALL=C
nm -S "$#" | while read addr size kind name dummy ; do
[ -n "$addr" ] || continue
[ -n "$size" ] || continue
[ -z "$dummy" ] || continue
[ "$kind" = "T" ] || continue
[ "$name" != "${name#[A-Za-z]}" ] || continue
printf '#define %s ((void *)0x%sL)\n' "$addr_prefix$name$addr_suffix" "$addr"
printf '#define %s %d\n' "$size_prefix$name$size_suffix" "0x$size"
done || exit $?
Remember to make it executable using chmod u+x ./function-info.bash
First, compile the sources using valid sizes but invalid addresses:
gcc -W -Wall -O3 -c foo1.c
gcc -W -Wall -O3 -c foo2.c
( cat foo.h.header ; ./function-info.bash foo1.o foo2.o ; cat foo.h.footer) > foo.h
gcc -W -Wall -O3 -c main.c
The sizes are correct but the addresses are not, because the code is yet to be linked. Relative to the final binary, the object file contents are usually relocated at link time. So, link the sources to get example executable, example:
gcc -W -Wall -O3 main.o foo1.o foo2.o -o example
Extract the correct (sizes and) addresses:
( cat foo.h.header ; ./function-info.bash example ; cat foo.h.footer) > foo.h
Recompile and link,
gcc -W -Wall -O3 -c main.c
gcc -W -Wall -O3 foo1.o foo2.o main.o -o example
and verify that the constants now do match:
mv -f foo.h foo.h.used
( cat foo.h.header ; ./function-info.bash example ; cat foo.h.footer) > foo.h
cmp -s foo.h foo.h.used && echo "Done." || echo "Recompile and relink."
Due to high optimization (-O3) the code that utilizes the constants may change size, requiring a yet another recompile-relink. If the last line outputs "Recompile and relink", just repeat the last two steps, i.e. five lines.
(Note that since foo1.c and foo2.c do not use the constants in foo.h, they obviously do not need to be recompiled.)
On x86_64 (GCC-4.6.3-1ubuntu5), running ./example outputs
foo1(): 21 bytes at 0x400820
foo2(): 10 bytes at 0x400840
foo1(3, 5): 4
foo2(3, 5): 34
Replacing foo1() with foo2(): Done.
foo1(3, 5): 34
foo2(3, 5): 34
which shows that the foo1() function indeed was replaced. Note that the longer function is always replaced with the shorter one, because we must not overwrite any code outside the two functions.
You can modify the two functions to verify this; just remember to repeat the entire procedure (so that you use the correct _SIZE and _ADDR constants in main()).
Just for giggles, here is the generated foo.h for the above:
#ifndef FOO_H
#define FOO_H
extern int foo1(const int a, const int b);
extern int foo2(const int a, const int b);
#define foo1_ADDR ((void *)0x0000000000400820L)
#define foo1_SIZE 21
#define foo2_ADDR ((void *)0x0000000000400840L)
#define foo2_SIZE 10
#define main_ADDR ((void *)0x0000000000400610L)
#define main_SIZE 291
#define text_copy_ADDR ((void *)0x0000000000400850L)
#define text_copy_SIZE 226
#endif /* FOO_H */
You might wish to use a smarter scriptlet, say an awk one that uses nm -S to obtain all function names, addresses, and sizes, and in the header file replaces only the values of existing definitions, to generate your header file. I'd use a Makefile and some helper scripts.
Further notes:
The function code is copied as-is, no relocation etc. is done. (This means that if the machine code of the replacement function contains absolute jumps, the execution continues in the original code. These example functions were chosen, because they're unlikely to have absolute jumps in them. Run objdump -d foo1.o foo2.o to verify from the assembly.)
That is irrelevant if you use the example just to investigate how to modify executable code within the running process. However, if you build runtime-function-replacing schemes on top of this example, you may need to use position independent code for the replaced code (see the GCC manual for relevant options for your architecture) or do your own relocation.
If another thread or signal handler executes the code being modified, you're in serious trouble. You get undefined results. Unfortunately, some libraries start extra threads, which may not block all possible signals, so be extra careful when modifying code that might be run by a signal handler.
Do not assume the compiler compiles the code in a specific way or uses a specific organization. My example uses separate compilation units, to avoid the cases where the compiler might share code between similar functions.
Also, it examines the final executable binary directly, to obtain the sizes and addresses to be modified to modify an entire function implementation. All verifications should be done on the object files or final executable, and disassembly, instead of just looking at the C code.
Putting any code that relies on the address and size constants into a separate compilation unit makes it easier and faster to recompile and relink the binary. (You only need to recompile the code that uses the constants directly, and you can even use less optimization for that code, to eliminate extra recompile-relink cycles, without impacting the overall code quality.)
In my main.c, both the address and length supplied to mprotect() are page-aligned (based on the user parameters). The documents say only the address has to be. Since protections are page-granular, making sure the length is a multiple of the page size does not hurt.
You can read and parse /proc/self/maps (which is a kernel-generated pseudofile; see man 5 proc, /proc/[pid]/maps section, for further info) to obtain the existing mappings and their protections for the current process.
In any case, if you have any questions, I'd be happy to try and clarify the above.
Addendum:
It turns out that using the GNU extension dl_iterate_phdr() you can enable/disable write protection on all text sections trivially:
#define _GNU_SOURCE
#include <unistd.h>
#include <dlfcn.h>
#include <sys/mman.h>
#include <link.h>
static int do_write_protect_text(struct dl_phdr_info *info, size_t size, void *data)
{
const int protect = (data) ? PROT_READ | PROT_EXEC : PROT_READ | PROT_WRITE | PROT_EXEC;
size_t page;
size_t i;
page = sysconf(_SC_PAGESIZE);
if (size < sizeof (struct dl_phdr_info))
return ENOTSUP;
/* Ignore libraries. */
if (info->dlpi_name && info->dlpi_name[0] != '\0')
return 0;
/* Loop over each header. */
for (i = 0; i < (size_t)info->dlpi_phnum; i++)
if ((info->dlpi_phdr[i].p_flags & PF_X)) {
size_t ptr = (size_t)info->dlpi_phdr[i].p_vaddr;
size_t len = (size_t)info->dlpi_phdr[i].p_memsz;
/* Start at the beginning of the relevant page, */
if (ptr % page) {
len += ptr % page;
ptr -= ptr % page;
}
/* and use full pages. */
if (len % page)
len += page - (len % page);
/* Change protections. Ignore unmapped sections. */
if (mprotect((void *)ptr, len, protect))
if (errno != ENOMEM)
return errno;
}
return 0;
}
int write_protect_text(int protect)
{
int result;
result = dl_iterate_phdr(do_write_protect_text, (void *)(long)protect);
if (result)
errno = result;
return result;
}
Here is an example program you can use to test the above write_protect_text() function:
#define _POSIX_C_SOURCE 200809L
int dump_smaps(void)
{
FILE *in;
char *line = NULL;
size_t size = 0;
in = fopen("/proc/self/smaps", "r");
if (!in)
return errno;
while (getline(&line, &size, in) > (ssize_t)0)
if ((line[0] >= '0' && line[0] <= '9') ||
(line[0] >= 'a' && line[0] <= 'f'))
fputs(line, stdout);
free(line);
if (!feof(in) || ferror(in)) {
fclose(in);
return errno = EIO;
}
if (fclose(in))
return errno = EIO;
return 0;
}
int main(void)
{
printf("Initial mappings:\n");
dump_smaps();
if (write_protect_text(0)) {
fprintf(stderr, "Cannot disable write protection on text sections: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("\nMappings with write protect disabled:\n");
dump_smaps();
if (write_protect_text(1)) {
fprintf(stderr, "Cannot enable write protection on text sections: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("\nMappings with write protect enabled:\n");
dump_smaps();
return EXIT_SUCCESS;
}
The example program dumps /proc/self/smaps before and after changing the text section write protection, showing that it indeed does enable/disable write protectio on all text sections (program code). It does not try to alter write protect on dynamically-loaded libraries. This was tested to work on x86-64 using Ubuntu 3.8.0-35-generic kernel.
If you just want to have one executable with a writable .text, you can just link with -N
At least for me, binutils 2.22 , ld -N objectfile.o
will produce a binary that i can happily write around in.
Reading gcc pages, you can pass the linker option from gcc by : gcc -XN source

Where is the implementation of strlen() in GCC?

Can anyone point me to the definition of strlen() in GCC? I've been grepping release 4.4.2 for about a half hour now (while Googling like crazy) and I can't seem to find where strlen() is actually implemented.
You should be looking in glibc, not GCC -- it seems to be defined in strlen.c -- here's a link to strlen.c for glibc version 2.7... And here is a link to the glibc SVN repository online for strlen.c.
The reason you should be looking at glibc and not gcc is:
The GNU C library is used as the C library in the GNU system and most systems with the Linux kernel.
Here's the bsd implementation
size_t
strlen(const char *str)
{
const char *s;
for (s = str; *s; ++s)
;
return (s - str);
}
I realize this question is 4yrs old, but gcc will often include its own copy of strlen if you do not #include <string.h> and none of the answers (including the accepted answer) account for that. If you forget, you will get a warning:
file_name:line_number: warning: incompatible implicit declaration of built-in function 'strlen'
and gcc will inline its copy which on x86 is the repnz scasb asm variant unless you pass -Werror or -fno-builtin. The files related to this are in gcc/config/<platform>/<platform>.{c,md}
It is also controlled by gcc/builtins.c. In case you wondered if and how a strlen() was optimized to a constant, see the function defined as tree c_strlen(tree src, int only_value) in this file. It also controls how strlen (amongst others) is expanded and folded (based on the previously mentioned config/platform)
defined in glibc/string/strlen.c
#include <string.h>
#include <stdlib.h>
#undef strlen
#ifndef STRLEN
# define STRLEN strlen
#endif
/* Return the length of the null-terminated string STR. Scan for
the null terminator quickly by testing four bytes at a time. */
size_t
STRLEN (const char *str)
{
const char *char_ptr;
const unsigned long int *longword_ptr;
unsigned long int longword, himagic, lomagic;
/* Handle the first few characters by reading one character at a time.
Do this until CHAR_PTR is aligned on a longword boundary. */
for (char_ptr = str; ((unsigned long int) char_ptr
& (sizeof (longword) - 1)) != 0;
++char_ptr)
if (*char_ptr == '\0')
return char_ptr - str;
/* All these elucidatory comments refer to 4-byte longwords,
but the theory applies equally well to 8-byte longwords. */
longword_ptr = (unsigned long int *) char_ptr;
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
himagic = 0x80808080L;
lomagic = 0x01010101L;
if (sizeof (longword) > 4)
{
/* 64-bit version of the magic. */
/* Do the shift in two steps to avoid a warning if long has 32 bits. */
himagic = ((himagic << 16) << 16) | himagic;
lomagic = ((lomagic << 16) << 16) | lomagic;
}
if (sizeof (longword) > 8)
abort ();
/* Instead of the traditional loop which tests each character,
we will test a longword at a time. The tricky part is testing
if *any of the four* bytes in the longword in question are zero. */
for (;;)
{
longword = *longword_ptr++;
if (((longword - lomagic) & ~longword & himagic) != 0)
{
/* Which of the bytes was the zero? If none of them were, it was
a misfire; continue the search. */
const char *cp = (const char *) (longword_ptr - 1);
if (cp[0] == 0)
return cp - str;
if (cp[1] == 0)
return cp - str + 1;
if (cp[2] == 0)
return cp - str + 2;
if (cp[3] == 0)
return cp - str + 3;
if (sizeof (longword) > 4)
{
if (cp[4] == 0)
return cp - str + 4;
if (cp[5] == 0)
return cp - str + 5;
if (cp[6] == 0)
return cp - str + 6;
if (cp[7] == 0)
return cp - str + 7;
}
}
}
}
libc_hidden_builtin_def (strlen)
glibc 2.26 has several hand optimized assembly implementations of strlen
As of glibc-2.26, a quick:
git ls-files | grep strlen.S
in the glibc tree shows a dozen of assembly hand-optimized implementations for all major archs and variations.
In particular, x86_64 alone has 3 variations:
sysdeps/x86_64/multiarch/strlen-avx2.S
sysdeps/x86_64/multiarch/strlen-sse2.S
sysdeps/x86_64/strlen.S
A quick and dirty way to determine which one is used, is to step debug a test program:
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int main(void) {
size_t size = 0x80000000, i, result;
char *s = malloc(size);
for (i = 0; i < size; ++i)
s[i] = 'a';
s[size - 1] = '\0';
result = strlen(s);
assert(result == size - 1);
return EXIT_SUCCESS;
}
compiled with:
gcc -ggdb3 -std=c99 -O0 a.c
Off the bat:
disass main
contains:
callq 0x555555554590 <strlen#plt>
so the libc version is being called.
After a few si instruction level steps into that, GDB reaches:
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:52
52 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
which tells me that strlen-avx2.S was used.
Then, I further confirm with:
disass __strlen_avx2
and compare the disassembly with the glibc source.
It is not surprising that the AVX2 version was used, since I have an i7-7820HQ CPU with launch date Q1 2017 and AVX2 support, and AVX2 is the most advanced of the assembly implementations, with launch date Q2 2013, while SSE2 is much more ancient from 2004.
This is where a great part of the hardcoreness of glibc comes from: it has a lot of arch optimized hand written assembly code.
Tested in Ubuntu 17.10, gcc 7.2.0, glibc 2.26.
-O3
TODO: with -O3, gcc does not use glibc's strlen, it just generates inline assembly, which is mentioned at: https://stackoverflow.com/a/19885891/895245
Is it because it can optimize even better? But its output does not contain AVX2 instructions, so I feel that this is not the case.
https://www.gnu.org/software/gcc/projects/optimize.html mentions:
Deficiencies of GCC's optimizer
glibc has inline assembler versions of various string functions; GCC has some, but not necessarily the same ones on the same architectures. Additional optab entries, like the ones for ffs and strlen, could be provided for several more functions including memset, strchr, strcpy and strrchr.
My simple tests show that the -O3 version is actually faster, so GCC made the right choice.
Asked at: https://www.quora.com/unanswered/How-does-GCC-know-that-its-builtin-implementation-of-strlen-is-faster-than-glibcs-when-using-optimization-level-O3
Although the original poster may not have known this or been looking for this, gcc internally inlines a number of so-called "builtin" c functions that it defines on its own, including some of the mem*() functions and (depending on the gcc version) strlen. In such cases, the library version is essentially never used, and pointing the person at the version in glibc is not strictly speaking correct. (It does this for performance reasons -- in addition to the improvement that inlining itself produces, gcc "knows" certain things about the functions when it provides them, such as, for example, that strlen is a pure function and that it can thus optimize away multiple calls, or in the case of the mem*() functions that no aliasing is taking place.)
For more information on this, see http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Is this what you are looking for? strlen() source. See the git repository for more information. The glibc resources page has links to the git repositories if you want to grab them rather than looking at the web view.
Google Code Search is a good starting point for questions like that. They usually point to various different sources and implementations of a function.
In your particular case: GoogleCodeSearch(strlen)
Google Code Search was completely shut down on March 2013
I realize that this is old question, you can find the linux kernel sources at github here, and the 32 bit implementation for strlen() could be found in strlen_32.c on github. The mentioned file has this implementation.
#include <linux/types.h>
#include <linux/string.h>
#include <linux/module.h>
size_t strlen(const char *s)
{
/* Get an aligned pointer. */
const uintptr_t s_int = (uintptr_t) s;
const uint32_t *p = (const uint32_t *)(s_int & -4);
/* Read the first word, but force bytes before the string to be nonzero.
* This expression works because we know shift counts are taken mod 32.
*/
uint32_t v = *p | ((1 << (s_int << 3)) - 1);
uint32_t bits;
while ((bits = __insn_seqb(v, 0)) == 0)
v = *++p;
return ((const char *)p) + (__insn_ctz(bits) >> 3) - s;
}
EXPORT_SYMBOL(strlen);
You can use this code, the simpler the better !
size_t Strlen ( const char * _str )
{
size_t i = 0;
while(_str[i++]);
return i;
}

Resources