convert c language to IA32 assembly language

convert c language to IA32 assembly language - c

I wrote a c code: convert integer to string and have a comma every 3 digits, and can anyone give me a hint how to convert it to assembly language???
I just want simply convert it into assembly language! Can't use other library call!
#include <stdio.h>
char *my_itoa(int n, char *buf)
{
int i, j, k=0, l=0;
char tmp[32] = {0};
i = n;
do {
j = i%10;
i = i/10;
sprintf(tmp+k, "%d", j);
k++;
l++;
if (i!=0 && l%3 == 0) {
sprintf(tmp+k, ",");
k++;
l = 0;
}
}while(i);
for (k--,i=0; i<=k; i++) {
buf[i] = tmp[k-i];
}
return buf;}

If your complier is gcc, this answer suggests a nice way to produce a combined C/assembly language file that is easier to read than plain assembly. They use c++, but just replace c++ with gcc:
# create assembler code:
gcc -S -fverbose-asm -g original.c -o assembly.s
# create asm interlaced with source lines:
as -alhnd assembly.s > listing.lst

C compilers conforming to the SUSv2 (Single Unix Specification version 2 published in 1997) can do what you want much more simply by using a locale flag specifier. Here's example C code that works with gcc:
#include <stdio.h>
#include <locale.h>
int main()
{
int num = 12345678;
setlocale(LC_NUMERIC, "en_US");
printf("%'d\n", num);
return 0;
}
This prints 12,345,678. Note, too, that this works correctly with negative numbers while your code does not. Because the code is simply a single library call, it's probably not worth implementing in assembly language, but if you still cared to, you could use this:
In the .rodata section include this string constant:
.fmt:
.string "%'d\n"
The code is extremely simple because all the hard work is done within printf:
; number to be converted is in %esi
movl $.fmt, %edi
movl $0, %eax
call printf

Related

Undefined behaviour after multiple calls to a print function

I'm writing a basic compiler and the code generated does not work as intended.
I'm using a naive graph coloring algorithm to allocate variables in registers based on their liveness.
The problem is that the generated assembly code seems perfectly fine, but, at some point, it produces undefined behaviour.
If, instead of using registers to store variables, I just use the stack, everything works fine.
I also discovered that I can't use the %edx register around an imull instruction and I wondered if something similar is happening right now with %ebx and %ecx.
I compile the code using gcc -m32 "test.s" runtime.c -o test, where runtime.c is a helper C file containing the print and input functions.
I've also tried to remove parts of the program (every print except the last one) and then the last print will work.
If I call a single print function before the last call it won't work.
The runtime.c file:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
int input() {
int num;
char term;
scanf("%d%c", &num, &term);
return num;
}
void print_int_nl(int i) {
printf("%d\n", i);
}
Source file:
a = 10
b = input()
c = - 10
d = -input()
print a
print b
print c
print d
Generated assembly code:
https://pastebin.com/ChSRbWgt
After compiling the .s file and running it using the console (./test) it asks for 2 input (as intended).
I give it 1 and 2.
Then the output is:
10
1
-10
1415880
instead of
10
1
-10
-2

You need to observe the calling convention (see e.g. Calling conventions for different C++ compilers and operating systems by Agner Fog).
Namely, there are caller-save and callee-save registers in the convention for your C compiler.
Your generated code needs to preserve the callee-save registers in order to be able to return to its C caller.
Similarly, printf() will preserve the callee-save registers, but it can trash the caller-save registers, meaning that if your generated code calls printf(), it will need to preserve the caller-save registers across the calls to printf() or any other C function.

You need to clear the input buffer after your first input the buffer still contains the newline character i'd recommend this :
int input() {
int num;
char term;
scanf("%d %c", &num, &term);
return num;
}

I'm implementing printf function in C/Linux

Program:
#ifndef PRINTF_H
#define PRINTF_H
#include "my_put_char.h"
int my_printf(char *str, ...);
#endif
This is my Header file for my function.
#include <stdio.h>
#include "my_put_char.h"
void my_put_char(char c)
{
fwrite(&c, sizeof(char), 1, stdout);
}
This is my putchar implementation(my_put_char.c).
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include "printf.h"
int my_printf(char *str, ...)
{
if(str == NULL)
return 0;
int i;
char a;
va_list print;
va_start(print,str);
for(i = 0; str[i] ; i++)
{
if(str[i] == '%')
{
i++;
switch(str[i])
{
case 'c':
a = va_arg(print, char);
my_put_char(a);
break;
}
}
}
va_end(print);
return 0;
}
At last, this is a part of my printf implementation.
I'm testing with %c to display a character.
When I do my_print("%c", 'd'); from main.c
it compiles and displays d.
But when I do my_print("%c", "hi"); , it still compiles and displays a number.
Question:
After(or before) writing a = va_arg(print, char); Is there a way to check whether my input is a different data type?
I'm trying to display an error if my input is a different data type.
I'm on this subject for 2 days and couldn't find any answer.
Thank you so much for your time!

when I do my_print("%c", "hi"); , it still compiles and displays a number
You've got some undefined behavior, so be scared. Your my_printf would call va_arg with an argument of the bad type (expected char promoted to int, got char*).
To explain what is happening you should dive into implementation details (look into the assembler code, e.g. with gcc -Wall -fverbose-asm -O -S; study your processor, its instruction set architecture, its application binary interface and calling conventions). You don't want to do that, it could take years and is not reproducible.
Read absolutely Lattner's blog on UB, right now!
Then download C11 specification n1570....
You could also, with gcc, use some function attributes. Don't forget to compile with all warnings and debug info (gcc -Wall -Wextra -g)
after writing a = va_arg(print, char); Is there a way to check whether my input is a different data type?
No, not really and not always. But the format function attribute could help. And you could also spend months customizing GCC with your own plugin or some GCC MELT extension (that is not worth your time). Be aware of the Halting Problem and Rice's Theorem (each makes static source code program analysis so challenging). Look also into source analyzing tools like Frama-C.
I'm implementing printf function
BTW studying the source code of existing free software implementations of the C standard library (such as GNU glibc and musl-libc) could be inspirational; they are based upon syscalls(2).

C compiler optimize loop by running it

Can a C compiler ever optimize a loop by running it?
For example:
int num[] = {1, 2, 3, 4, 5}, i;
for(i = 0; i < sizeof(num)/sizeof(num[0]); i++) {
if(num[i] > 6) {
printf("Error in data\n");
exit(1);
}
}
Instead of running this each time the program is executed, can the compiler simply run this and optimize it away?

Let's have a look… (This really is the only way to tell.)
Fist, I've converted your snippet into something we can actually try to compile and run and saved it in a file named main.c.
#include <stdio.h>
static int
f()
{
const int num[] = {1, 2, 3, 4, 5};
int i;
for (i = 0; i < sizeof(num) / sizeof(num[0]); i++)
{
if (num[i] > 6)
{
printf("Error in data\n");
return 1;
}
}
return 0;
}
int
main()
{
return f();
}
Running gcc -S -O3 main.c produces the following assembly file (in main.s).
.file "main.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE22:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (GNU) 5.1.0"
.section .note.GNU-stack,"",#progbits
Even if you don't know assembly, you'll notice that the string "Error in data\n" is not present in the file so, apparently, some kind of optimization must have taken place.
If we look closer at the machine instructions generated for the main function,
xorl %eax, %eax
ret
We can see that all it does is XOR'ing the EAX register with itself (which always results in zero) and writing that value into EAX. Then it returns again. The EAX register is used to hold the return value. As we can see, the f function was completely optimized away.

Yes. The C compiler unrolls loops automatically with options -O3 and -Otime.

You didn't specify the compiler, but using gcc with -O3 and taking the size calculation outside the for maybe it could do a little adjustment.

Compilers can do even better than that. Not only can compilers examine the effect of running code "forward", but the Standard even allows them to work code logic in reverse in situations involving potential Undefined Behavior. For example, given:
#include <stdio.h>
int main(void)
{
int ch = getchar();
int q;
if (ch == 'Z')
q=5;
printf("You typed %c and the magic value is %d", ch, q);
return 0;
}
a compiler would be entitled to assume that the program will never receive any input which would cause the printf to be reached without q having received a value; since the only input character which would cause q to receive a value would be 'Z', a compiler could thus legitimately replace the code with:
int main(void)
{
getchar();
printf("You typed Z and the magic value is 5");
}
If the user types Z, the behavior of the original program will be well-defined, and the behavior of the latter will match it. If the user types anything else, the original program will invoke Undefined Behavior and, as a consequence, the Standard will impose no requirements on what the compiler may do. A compiler will be entitled to do anything it likes, including producing the same result as would be produced by typing Z.

Compile code in C to MIPS assembly

I wrote a C program that I need to see it in MIPS assembly code.
How do I install or operate a software that takes *.c file to be *.txt or *.something_else to see its MIPS assembly code ?
My OS is Linux.
Thanks a lot !!
BTW my code is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 128
int main ()
{
char mychar , string [SIZE];
int i;
int count =0 ;
printf ("Please enter your string: \n\n");
fgets (string, SIZE, stdin);
printf ("Please enter char to find: ");
mychar = getchar();
for (i=0 ; string[i] != '\0' ; i++ )
if ( string[i] == mychar )
count++;
printf ("The char %c appears %d times\n" ,mychar ,count);
return 0;
}

What you want is a MIPS cross-compiler, unless you're running Linux on a MIPS box, in which case you want a regular MIPS compiler. Here is a howto for setting up a cross compiler on Linux.
Once you've compiled it, you'll want to see the MIPS disassembly. objdump can help you there.

You need to install a MIPS cross-compile library, and then you need to pass -S and one of the -march=mips* options to gcc.

You can use Dan Kegels' excellent and easy to use cross-tool to compile your own MIPS cross compiler.

gcc, strict-aliasing, and horror stories [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.
This question is broader: Do you have any horror stories about gcc and strict-aliasing?
Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:
"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."
Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.
Some other less-relevant links:
performance-impact-of-fno-strict-aliasing
strict-aliasing
when-is-char-safe-for-strict-pointer-aliasing
how-to-detect-strict-aliasing-at-compile-time
So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.
Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.
Source:
#include <string.h>
struct iw_event { /* dummy! */
int len;
};
char *iwe_stream_add_event(
char *stream, /* Stream of events */
char *ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if ((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
The specific complaint is:
Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).
Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):
_iwe_stream_add_event:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $20, %esp
movl 8(%ebp), %eax # stream --> %eax
movl 20(%ebp), %edx # event_len --> %edx
leal (%eax,%edx), %ebx # sum --> %ebx
cmpl 12(%ebp), %ebx # compare sum with ends
jae L2
movl 16(%ebp), %ecx # iwe --> %ecx
movl %edx, (%ecx) # event_len --> iwe->len (!!)
movl %edx, 8(%esp) # event_len --> stack
movl %ecx, 4(%esp) # iwe --> stack
movl %eax, (%esp) # stream --> stack
call _memcpy
movl %ebx, %eax # sum --> retval
L2:
addl $20, %esp
popl %ebx
leave
ret
And for the second link in Michael's answer,
*(unsigned short *)&a = 4;
gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:
#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;
I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.

No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):
http://lkml.org/lkml/2003/2/26/158:
Date Wed, 26 Feb 2003 09:22:15 -0800
Subject Re: Invalid compilation without -fno-strict-aliasing
From Jean Tourrilhes <>
On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:
Jean Tourrilhes <> said:
It looks like a compiler bug to me...
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
Code (from linux/include/net/iw_handler.h) :
static inline char *
iwe_stream_add_event(char * stream, /* Stream of events */
char * ends, /* End of stream */
struct iw_event *iwe, /* Payload */
int event_len) /* Real size of payload */
{
/* Check if it's possible */
if((stream + event_len) < ends) {
iwe->len = event_len;
memcpy(stream, (char *) iwe, event_len);
stream += event_len;
}
return stream;
}
IMHO, the compiler should have enough context to know that the
reordering is dangerous. Any suggestion to make this simple code more
bullet proof is welcomed.
The compiler is free to assume char *stream and struct iw_event *iwe point
to separate areas of memory, due to strict aliasing.
Which is true and which is not the problem I'm complaining about.
(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)
And Linus Torvald's comment on the above:
Jean Tourrilhes wrote:
>
It looks like a compiler bug to me...
Why do you think the kernel uses "-fno-strict-aliasing"?
The gcc people are more interested in trying to find out what can be
allowed by the c99 specs than about making things actually work. The
aliasing code in particular is not even worth enabling, it's just not
possible to sanely tell gcc when some things can alias.
Some users have complained that when the following code is
compiled without the -fno-strict-aliasing, the order of the write and
memcpy is inverted (which mean a bogus len is mem-copied into the
stream).
The "problem" is that we inline the memcpy(), at which point gcc won't
care about the fact that it can alias, so they'll just re-order
everything and claim it's out own fault. Even though there is no sane
way for us to even tell gcc about it.
I tried to get a sane way a few years ago, and the gcc developers really
didn't care about the real world in this area. I'd be surprised if that
had changed, judging by the replies I have already seen.
I'm not going to bother to fight it.
Linus
http://www.mail-archive.com/linux-btrfs#vger.kernel.org/msg01647.html:
Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.
...
I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that
unsigned long a;
a = 5;
*(unsigned short *)&a = 4;
could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.

SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.
SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
jlong jresult = 0 ;
int arg1 ;
int arg2 ;
my_struct_t *result = 0 ;
(void)jenv;
(void)jcls;
arg1 = (int)jarg1;
arg2 = (int)jarg2;
result = (my_struct_t *)make_my_struct(arg1,arg2);
*(my_struct_t **)&jresult = result; /* <<<< horror*/
return jresult;
}

gcc, aliasing, and 2-D variable-length arrays: The following sample code copies a 2x2 matrix:
#include <stdio.h>
static void copy(int n, int a[][n], int b[][n]) {
int i, j;
for (i = 0; i < 2; i++) // 'n' not used in this example
for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
b[i][j] = a[i][j];
}
int main(int argc, char *argv[]) {
int a[2][2] = {{1, 2},{3, 4}};
int b[2][2];
copy(2, a, b);
printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
return 0;
}
With gcc 4.1.2 on CentOS, I get:
$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)
I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4 on Cygwin, so it may have been fixed. Some work-arounds:
Use __attribute__((noinline)) for copy().
Use the gcc switch -fno-strict-aliasing.
Change the third parameter of copy() from b[][n] to b[][2].
Don't use -O2 or -O3.
Further notes:
This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
Yes, I know you wouldn't write copy() like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
No gcc warning switches, include -Wstrict-aliasing=, did anything here.
1-D variable-length arrays seem to be OK.
Update: The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.
I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.
And I have a slightly simpler example of a similar bug, with only one matrix. The code:
static void zero(int n, int a[][n]) {
int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] = 0;
}
int main(void) {
int a[2][2] = {{1, 2},{3, 4}};
zero(2, a);
printf("%d\n", a[1][1]);
return 0;
}
produces the results:
gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4
It seems it is the combination -fstrict-aliasing with -finline which causes the bug.

here is mine:
http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html
it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.
the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.

The Common Initial Sequence rule of C used to be interpreted as making it
possible to write a function which could work on the leading portion of a
wide variety of structure types, provided they start with elements of matching
types. Under C99, the rule was changed so that it only applied if the structure
types involved were members of the same union whose complete declaration was visible at the point of use.
The authors of gcc insist that the language in question is only applicable if
the accesses are performed through the union type, notwithstanding the facts
that:
There would be no reason to specify that the complete declaration must be visible if accesses had to be performed through the union type.
Although the CIS rule was described in terms of unions, its primary
usefulness lay in what it implied about the way in which structs were
laid out and accessed. If S1 and S2 were structures that shared a CIS,
there would be no way that a function that accepted a pointer to an S1
and an S2 from an outside source could comply with C89's CIS rules
without allowing the same behavior to be useful with pointers to
structures that weren't actually inside a union object; specifying CIS
support for structures would thus have been redundant given that it was
already specified for unions.

The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?
int main()
{
int v = 10;
union vv {
int v;
short q;
} *s = (union vv *)&v;
s->v = 1;
return v;
}