31 bit limit on bit operations in R

31 bit limit on bit operations in R - c

I am trying to get around the 31-bit limit for bit operations in R. I can do this in pure R, but my issue is about implementing this in C for use in R.
Example
For example I have the data
> x = c(2147028898, 2147515013)
where each element is at most 32 bits, unsigned, and on which I'd like to do bit operations such as (but not limited to) (x >> 20) & 0xFFF. The end goal would be using many of these kinds of operations in a single function.
The two numbers are of different bit lengths.
> log2(x)
[1] 30.99969446331090239255 31.00002107107989246515
Normal bitwise operations in R yield the following result, ie NAs are introduced for the larger of the two.
> bitwShiftR(x,20)
[1] 2047 NA
Warning message:
In bitwShiftR(x, 20) : NAs introduced by coercion
> bitwAnd(x,20)
[1] 0 NA
Warning message:
In bitwAnd(x, 20) : NAs introduced by coercion
Workaround with R package 'bitops'
The bitopspackage does what I want, but my end goal is something more advanced, and I want to be able to use C, see below.
> library(bitops)
> bitShiftR(x,20)
[1] 2047 2048
I have looked at the C code for this package, but I don't really understand it. Does it have to be that complicated, or is that just for optimization for vectorized inputs and outputs?
Workaround in C (the issue)
My code is as follows, only a simple expression so far. I have tried different types in C, but to no avail.
#include <R.h>
void myBitOp(int *x, int *result) {
*result = (*x >> 20) & 0xFFF;
}
which I then compile with R CMD SHLIB myBitOp.c on a 64 bit machine.
$uname -a
Linux xxxxxxxxx 3.0.74-0.6.8-xen #1 SMP Wed May 15 07:26:33 UTC 2013 (5e244d7) x86_64 x86_64 x86_64 GNU/Linux
In R I load this with
> dyn.load("myBitOp.so")
> myBitOp <- function(x) .C("myBitOp", as.integer(x), as.integer(0))[[2]]
When I run the function I get back
> myBitOp(x[1])
[1] 2047
> myBitOp(x[2])
Error in myBitOp(x[2]) : NAs in foreign function call (arg 1)
In addition: Warning message:
In myBitOp(x[2]) : NAs introduced by coercion
So the question is, why do I get these NAs with this C code, and how do I fix it? The return value will always be much less than 31 bits btw.
Thank you!
Update
After studying the bitops code a bit more, and going through this presentation among other links I came up with this code (bonus vectorization here)
#include <R.h>
#include <Rdefines.h>
SEXP myBitOp(SEXP x) {
PROTECT (x = AS_NUMERIC(x) ) ;
double *xx = NUMERIC_POINTER(x);
SEXP result = PROTECT(NEW_NUMERIC(length(x)));
double *xresult = NUMERIC_POINTER(result);
for( int i=0; i < length(x); i++) {
xresult[i] = (double) ((((unsigned int) xx[i]) >> 20) & 0xFFF);
}
UNPROTECT(2);
return(result);
}
Compile with R CMD SHLIB myBitOp.c
And in R:
> dyn.load("myBitOp.so")
> myBitOp <- function(x) .Call("myBitOp", x)
> myBitOp(x)
[1] 2047 2048
I don't fully understand why or how yet, but it works, well seems to work for this example at least.

The second element of as.integer(x) will be NA because it's larger than .Machine$integer.max. NAOK = FALSE in your call to .C, so that NA in your input results in an error. Your call to .C will "succeed" if you set NAOK = TRUE (because, in this case, NA is technically NA_integer_, which is a special int value in C).
You'll have to be creative to get around this. You could try splitting values > 2^31-1 into two values, pass both of them to C, convert them to unsigned integers, sum them, convert the result to a signed integer, then pass back to R.

Related

Frama-C does not recognize valid memory access from bitwise-ANDed index

I am right-shifting an unsigned integer then &ing it with 0b111, so the resulting value must be in the range [0, 7].
When I use that value as an index into an array of length 8, Frama-C is not able to verify the associated rte: mem_access assertion.
#include <assert.h>
#include <stdbool.h>
#include <stdint.h>
/*#
requires \valid_read(bitRuleBuf + (0 .. 7));
assigns \nothing;
ensures \forall uint8_t c; (0 <= c < 8) ==> (
((\result >> c) & 0b1) <==> bitRuleBuf[(state >> c) & 0b111]
);
*/
uint8_t GetNewOctet(
const uint16_t state,
const bool bitRuleBuf[const static 8])
{
uint8_t result = 0;
/*
loop invariant 0 <= b <= 8;
loop invariant \forall uint8_t c; (0 <= c < b) ==> (
((result >> c) & 0b1) <==> bitRuleBuf[(state >> c) & 0b111]
);
loop assigns b, result;
loop variant 8 - b;
*/
for (uint8_t b = 0; b < 8; b += 1) {
result |= ((uint8_t)bitRuleBuf[(state >> b) & 0b111]) << b;
// Still seeing an issue if break apart the steps:
/*
const uint16_t shifted = state >> b;
const uint16_t anded = shifted & 0b111;
// "assert rte: mem_access" is not successful here.
const uint8_t value = bitRuleBuf[anded];
const uint8_t shifted2 = value << b;
result |= shifted2;
*/
}
return result;
}
/*#
assigns \nothing;
*/
int main(void) {
// Empty cells with both neighbors empty become alive.
// All other cells become empty.
const bool bitRuleBuf[] = {
1, // 0b000
0, // 0b001
0, // 0b010
0, // 0b011
0, // 0b100
0, // 0b101
0, // 0b110
0 // 0b111
};
const uint8_t newOctet = GetNewOctet(0b0010000100, bitRuleBuf);
//assert(newOctet == 0b00011000); // Can be uncommented to verify.
//# assert newOctet == 0b00011000;
return 0;
}
The failed assertion happens for line 27: result |= ((uint8_t)bitRuleBuf[(state >> b) & 0b111]) << b;.
Changing the & 0b111 to % 8 does not resolve the issue.
I have tried many variations of the code and ACSL involved, but have not been successful.
I'm guessing integer promotion might be involved in the issue.
How can the code/ACSL be modified so that verification is successful?
$ frama-c --version
24.0 (Chromium)
I am running frama-c and frama-c-gui with arguments -wp and -wp-rte.
For background of the included block of code, the least significant 10 bits of the state argument are the state of 10 cells of 1-dimensional cellular automata. The function returns the next state of the middle 8 cells of those 10.
Edit: Alt-Ergo 2.4.1 is used:
$ why3 config detect
Found prover Alt-Ergo version 2.4.1, OK.
1 prover(s) added
Save config to /home/user/.why3.conf

First, a small note: if you're using WP, it is also important to state which solvers you have configured: each of them has strengths and weaknesses, so that it might be easier to complete a proof with the appropriate solver. In particular, my answer is based on the use of Z3 (4.8.14), known as z3-ce by
why3 config detect (note that you have to run this command each time you change the set of solvers you use).
EDIT As mentioned in comments below, the Mod-Mask tactic is not available in Frama-C 24.0, but only in the development version (https://git.frama-c.com/pub/frama-c). As far as I can tell, for a Frama-C 24.0 solution, you need to resort to a Coq script as mentioned at the end of this answer.
Z3 alone is not sufficient to complete the proof, but you can use a WP tactic (see section 2.2 of the WP manual about the interactive proof editor in Frama-C's GUI). Namely, if you select land(7, to_sint32(x)) in the proof editor, the tactics panel will show you a Mod-Mask tactic, which converts bitmasks to modulo operations and vice-versa (see image below). If you apply it, Z3 will complete the two resulting proof obligations, completing the proof of the assertion.
After that, you can save the script in order to be able to replay it later: use e.g. -wp-prover script,z3-ce,alt-ergo to let WP take advantage of existing scripts in addition to automated solvers. The scripts are searched for (and saved to) the script subdirectory of the WP session directory, which defaults to ./.frama-c/wp and can be set with -wp-session.
Another possibility is to use a Coq script to complete the proof. This supposes you have Coq, CoqIDE and the Why3 Coq libraries installed (they are available as opam packages coq, coqide and why3-coq respectively). Once this is done and you have configured Why3 to use Coq (why3 config detect should tell you that it has found Coq), you can use it through the GUI of Frama-C to complete proofs that the built-in interactive prover can't take care of.
For that, you may need to configure Frama-C to display a Coq column in the WP Goals panel: click on the Provers button in this panel, and make sure that Coq is ON, as shown below:
Once this is done, you can double-click in the cell of this column that corresponds to the proof obligation you want to discharge with Coq. This will open a CoqIDE session where you have to complete the proof script at then end of the opened file. Once this is done, save the file and quit CoqIDE. The Coq script will then be saved as part of the WP session, and can be played again if coq is among the arguments given to -wp-prover. For what it's worth, the following script seems to do the trick (Coq 8.13.2, Why3 1.4.1, and of course Frama-C 24.0):
intros alloc mem_int b buf state0 state1 x a1 hpos hval hbmax hbmax1 htyp hlink huint1 huint2 huint3 hvalid htyp1 htyp2 hdef.
assert (0<=land 7 (to_sint32 x) <= 7)%Z.
apply uint_land_range; auto with zarith.
unfold valid_rd.
unfold valid_rd in hvalid.
unfold a1.
unfold shift; simpl.
unfold shift in hvalid; simpl in hvalid.
assert (to_sint32 (land 7 (to_sint32 x)) = land 7 (to_sint32 x))%Z.
- apply id_sint32; unfold is_sint32; auto with zarith.
- intros _; repeat split; auto with zarith.

Loading ARM CPSR into C and formatting?

Whilst being given a document teaching ARM assembly the document now tells me to load the CPRS into C and format the data into a friendly format, such as -
Flags: N Z IRQ FIQ
State: ARM
Mode: Supervisor
Now I've loaded the CPRS into a variable within my program, but I'm struggling to understand what format the CPRS is in, I've seen things using hex to reset flags and etc along which bytes are control, field, status and extension masks.
I put my CPRS into an int just to see what the data shows and I'm given 1610612752, I'm assuming I shouldn't be loading it into an int and something else in order for it to be much more clear.
Any hints pushing me to the right direction would be most appreciated.

From This wiki page, (http://www.heyrick.co.uk/armwiki/The_Status_register) we get the bit layout of the CPSR (and SPSR):
31 30 29 28 27 - 24 - 19 … 16 - 9 8 7 6 5 4 … 0
N Z C V Q - J - GE[3:0] - E A I F T M[4:0]
Declare some flags (or just compute these):
int armflag_N = (Cpsr>>31)&1;
int armflag_Z = (Cpsr>>30)&1;
int armflag_C = (Cpsr>>29)&1;
int armflag_V = (Cpsr>>28)&1;
int armflag_Q = (Cpsr>>27)&1;
int armflag_J = (Cpsr>>24)&1;
int armflag_GE = (Cpsr>>16)&7;
int armflag_E = (Cpsr>>9)&1;
int armflag_A = (Cpsr>>8)&1;
int armflag_I = (Cpsr>>7)&1;
int armflag_F = (Cpsr>>6)&1;
int armflag_T = (Cpsr>>5)&1;
int armflag_M = (Cpsr>>0)&15;
(The ">>" means to rightshift specified number of bits, and "&" is the bitwise and operator, so "(val>>num)&mask" means rightshift val num bits, and then extract the bits under the mask).
Now you have variables with flags, Here is how you could conditionally print a flag,
printf("Flags: ");
printf("%s ", armflag_N ? "N" : "-" );
...

How to control size of basic arithmetic types in Frama-C?

In Frama-C, is it possible to freely specify the sizes of the basic types?
My target, the TMS320F2808 DSP, has 16-bit bytes. The char, short and int types are all one byte, and the long type is two.
As yet, I cannot see how, if possible, I can specify to Frama-C these sizes.

You may already have discovered the option -machdep. The command frama-c -machdep shows a list:
$ frama-c -machdep help
[kernel] supported machines are x86_64 x86_32 ppc_32 x86_16.
Unfortunately, the value of CHAR_BIT is not one of the machdep parameters. Instead, the value 8 is hard-coded in many places in Frama-C for CHAR_BIT.
Adding support for larger values than 8 of CHAR_BIT is a trivial but repetitive programming task: one must simply identify all these places and modify them to use Bit_utils.sizeofchar() instead. In fact, someone has already done this, so it is definitely doable, but that change was never contributed back to the Frama-C development (welcome to the world of open-source software).
Once you have done the above, creating a new architecture with CHAR_BIT == 16, sizeof(int) == 1 and sizeof(long) == 2 will be a comparatively simple operation.
How to do the changes
I get a first list of potential change sites with the command below. This finds all occurrences of the number 8:
$ grep -rI \\W8\\W src/*/*.ml
src/ai/base.ml: 8 (* FIXME: CHAR_BIT *), (String.length s)
src/aorai/aorai_register.ml: (* Step 8 : clearing tables whose information has been
src/aorai/ltllexer.ml: | 8 ->
src/aorai/promelalexer.ml: | 8 ->
src/aorai/promelalexer_withexps.ml: | 8 ->
src/aorai/yalexer.ml: | 8 ->
src/gui/design.ml: height * 8 / 5 (* 16/10 ratio *)
src/gui/gtk_form.ml: val table = GPack.table ~rows:2 ~col_spacings:8 ~packing ()
src/gui/gtk_helper.ml: ~fallback:"#neither UTF-8 nor locale nor ISO-8859-15#"
src/gui/gtk_helper.ml: ~to_codeset:"UTF-8"
src/gui/source_manager.ml:(* Try to convert a source file either as UTF-8 or as locale. *)
src/kernel/stmts_graph.ml: | Block _ -> [`Shape `Box; `Fontsize 8]
src/lib/binary_cache.ml:let cache_size () = 1 lsl (8 + MemoryFootprint.get ())
src/lib/bitvector.ml: if b-a [I 8]
src/logic/description.ml: | IPPredicate(kind,kf,ki,_) -> [I 8;F kf;K ki] # kind_order kind
src/logic/property.ml: Hashtbl.hash (8, Kf.hash f, Kinstr.hash ki, hash_bhv_loop b)
src/logic/property_status.ml: | Never_tried -> [`Style `Bold; `Width 0.8 ]
src/memory_state/offsetmap.ml: let char_width = 8 in
src/misc/bit_utils.ml: Int_Base.inject (Int.of_int (warn_if_zero ty (bitsSizeOf ty) / 8))
src/pdg/ctrlDpds.ml: (2) if (c) (3) y = 3; (4) goto L; else (5) z = 8;
src/pdg/ctrlDpds.ml: (8) L : return x;
src/pdg/ctrlDpds.ml: (1) -> (2) -> (6) -> (8)
src/printer/cil_printer.ml: Integer.pred (Integer.of_int (8 * (Cil.bytesSizeOfInt k)))
src/printer/cil_printer.ml: CompoundInit (_, il) when List.length il >= 8 -> true
src/project/state_builder.ml: debug ~level:8 "updating" p;
src/value/builtins_nonfree.ml: Value_parameters.debug "find_ival(8) on %a returns %a"
src/value/builtins_nonfree.ml:let int_hrange = Int.two_power_of_int (8 * Cil.theMachine.Cil.theMachine.sizeof_int -1)
src/value/builtins_nonfree_print_c.ml: let step = if iso then 1 else (Integer.to_int modu) / 8 in
src/value/builtins_nonfree_print_c.ml: let start = ref ((Integer.to_int bk) / 8) in
src/value/builtins_nonfree_print_c.ml: let ek = ek / 8 in
src/value/eval_exprs.ml: let offs_bytes = fst (Cil.bitsOffset typ_exp offs) / 8 in
src/value/eval_terms.ml: [i * 8 * sizeof( *tlv)] *)
src/value/value_parameters.ml: (defaults to 8; experimental)"
src/wp/Cint.ml: in let hsb p = let n = p lsr 8 in if n = 0 then hsb.(p) else 8 + hsb.(n)
src/wp/GuiPanel.ml: let options = GPack.hbox ~spacing:8 ~packing () in
src/wp/GuiPanel.ml: let control = GPack.table ~columns:4 ~col_spacings:8 ~rows:2 ~packing () in
src/wp/Matrix.ml: let buffer = Buffer.create 8 in
src/wp/cil2cfg.ml: | VblkIn (Bloop s,_) -> (8, s.sid)
src/wp/ctypes.ml: | 8 -> if signed then SInt64 else UInt64
src/wp/ctypes.ml: | 8 -> Float64
src/wp/ctypes.ml: | size -> WpLog.not_yet_implemented "%d-bits floats" (8*size)
src/wp/ctypes.ml: let m = Array.create 8 None in
src/wp/ctypes.ml: (Cil.bitsSizeOf ctype / 8)
src/wp/ctypes.ml: (Cil.bitsSizeOf ctype / 8)
src/wp/driver.ml: | 8 ->
src/wp/rformat.ml: | 8 ->
src/wp/script.ml: | 8 ->
The first one is obviously a true positive, and the second one obviously a false positive.
In the first case, the context expects a value of type int. The simplest change is:
Index: src/ai/base.ml
===================================================================
--- src/ai/base.ml (revision 24517)
+++ src/ai/base.ml (working copy)
## -116,7 +116,7 ##
let u, l =
match s with
| CSString s ->
- 8 (* FIXME: CHAR_BIT *), (String.length s)
+ bitsSizeOf charType, (String.length s)
| CSWstring s ->
bitsSizeOf theMachine.wcharType, (List.length s)
in
In the above list, the pattern Cil.bitsSizeOf … / 8 is a sure sign that the 8 represents CHAR_BIT, but in other instances, it requires looking at the source code and understanding the intent.
The difficulty comes from the different forms the constant 8 may take. You may also encounter 8L, the same constant but of type int64. When that constant represents the width of a char, it can be replaced with Int64.of_int (bitsSizeOf charType). There is one in src/ai/base.ml:
Index: src/ai/base.ml
===================================================================
--- src/ai/base.ml (revision 24517)
+++ src/ai/base.ml (working copy)
## -156,12 +156,12 ##
(fun _ x ->
try Scanf.sscanf x "%Li-%Li"
(fun min max ->
- let mul8 = Int64.mul 8L in
+ let mul_CHAR_BIT = Int64.mul (Int64.of_int (bitsSizeOf charType)) in
MinValidAbsoluteAddress.set
- (Abstract_interp.Int.of_int64 (mul8 min));
+ (Abstract_interp.Int.of_int64 (mul_CHAR_BIT min));
MaxValidAbsoluteAddress.set
(Abstract_interp.Int.of_int64
- (Int64.pred (mul8 (Int64.succ max)))))
+ (Int64.pred (mul_CHAR_BIT (Int64.succ max)))))
with End_of_file | Scanf.Scan_failure _ | Failure _ as e ->
Kernel.abort "Invalid -absolute-valid-range integer-integer: each integer may be in decimal, hexadecimal (0x, 0X), octal (0o) or binary (0b) notation and has to hold in 64 bits. A correct example is -absolute-valid-range 1-0xFFFFFF0.#\nError was %S#."
(Printexc.to_string e))
However, effecting this last change causes Frama-C to crash when the commandline option -absolute-valid-range is used, because of the order things are currently initialized (the front-end is not ready to answer questions about the size of char at the time the commandline arguments are interpreted). So this particular change has to be postponed, and a note has to be made that the option will continue to assume 8-bit chars until Frama-C is re-architectured a bit.
Apart from int and int64, Frama-C also uses multi-precision (allocated) integers. The constant 8 of that type is usually found as Int.eight. This one can be replaced with a call to Bit_utils.sizeofchar, because this function returns a multi-precision integer. The code should also be inspected for shifts by 3.

Frama-C uses a notion of machdep, that describes the underlying hardware architecture. No suitable machdep is provided by default for your case, and sometimes you can craft your own and use it for your analyses. Unfortunately this is not the case here, as you cannot change the size of char.
The remainder of this answer won't help with the original question, as the size of char is not currently customizable in Frama-C. It it left for people who would like to configure Frama-C for exotic architectures, but on which the size of char is not 8 bits
For semi-vanilla architectures, for which the defaults machdep are not sufficient, you could have created a file machdep_custom.ml with the following contents:
module Mach = struct
(* Contents of e.g. file cil/src/machdep_x86_32.ml properly modified for your
architecture. The MSVC configuration is never used by Frama-C, no need
to edit it (but it must be included). *)
open Cil_types
let gcc = {
version_major = 1;
version_minor = 0;
version = "custom machdep";
(* All types but char and long long are 16 bits *)
sizeof_short = 2;
sizeof_int = 2;
sizeof_long = 2;
sizeof_longlong = 4;
(* [...] *)
}
end
let () = File.new_machdep "custom" (module Mach: Cil.Machdeps)
This registers your own machdep. All your analyses must be started adding -load-script machdep_custom.ml -machdep custom to your command-line.
For technical reasons, within Frama-C, at least one type must be 32 bits. In this example, you could not have had sizeof(long long)=2.

Can you perform fixed-length bit reversal in #defines / preprocessor directives?

I am writing C code (not c++) for a target with very limited ROM, but I want the code to be easy to customize for other similar targets with #defines. I have #defines used to specify the address and other values of the device, but as a code-saving technique, these values are necessary bitwise reversed. I can enter these by first manually reversing them, but this would be confusing for future use. Can I define some sort of macro that performs a bitwise reversal?

As seen here (Best Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C), there is no single operation to switch the order in c. Because of this, if you were to create a #define macro to perform the operation, it would actually perform quite a bit of work on each use (as well as significantly increasing the size of your binary if used often). I would recommend manually creating the other ordered constant and just using clear documentation to ensure the information about them is not lost.

I think something like this ought to work:
#define REV2(x) ((((x)&1)<<1) | (((x)>>1)&1))
#define REV4(x) ((REV2(x)<<2) | (REV2((x)>>2)))
#define REV8(x) ((REV4(x)<<4) | (REV4((x)>>4)))
#define REV16(x) ((REV8(x)<<8) | (REV8((x)>>8)))
#define REV32(x) ((REV16(x)<<16) | (REV16((x)>>16)))
It uses only simple operations which are all safe for constant expressions, and it's very likely that the compiler will evaluate these at compile time.
You can ensure that they're evaluated at compile time by using them in a context which requires a constant expression. For example, you could initialize a static variable or declare an enum:
enum {
VAL_A = SOME_NUMBER,
LAV_A = REV32(VAL_A),
};

For the sake of readable code I'd not recommend it, but you could do something like
#define NUMBER 2
#define BIT_0(number_) ((number_ & (1<<0)) >> 0)
#define BIT_1(number_) ((number_ & (1<<1)) >> 1)
#define REVERSE_BITS(number_) ((BIT_1(number_) << 0) + (BIT_0(number_) << 1))
int main() {
printf("%d --> %d", NUMBER, REVERSE_BITS(NUMBER));
}

There are techniques for this kind of operation (see the Boost Preprocessor library, for example), but most of the time the easiest solution is to use an external preprocessor written in some language in which bit manipulation is easier.
For example, here is a little python script which will replace all instances of #REV(xxxx)# where xxxx is a hexadecimal string with the bit-reversed constant of the same length:
#!/bin/python
import re
import sys
reg = re.compile("""#REV\(([0-9a-fA-F]+)\)#""")
def revbits(s):
return "0X%x" % int(bin(int(s, base=16))[-1:1:-1].ljust(4*len(s), '0'), base=2)
for l in sys.stdin:
sys.stdout.write(reg.sub(lambda m: revbits(m.group(1)), l))
And here is a version in awk:
awk 'BEGIN{R["0"]="0";R["1"]="8";R["2"]="4";R["3"]="C";
R["4"]="2";R["5"]="A";R["6"]="6";R["7"]="E";
R["8"]="1";R["9"]="9";R["A"]="5";R["B"]="D";
R["C"]="3";R["D"]="B";R["E"]="7";R["F"]="F";
R["a"]="5";R["b"]="D";R["c"]="3";R["d"]="B";
R["e"]="7";R["f"]="F";}
function bitrev(x, i, r) {
r = ""
for (i = length(x); i; --i)
r = r R[substr(x,i,1)]
return r
}
{while (match($0, /#REV\([[:xdigit:]]+\)#/))
$0 = substr($0, 1, RSTART-1) "0X" bitrev(substr($0, RSTART+5, RLENGTH-7)) substr($0, RSTART+RLENGTH)
}1' \
<<<"foo #REV(23)# yy #REV(9)# #REV(DEADBEEF)#"
foo 0X32 yy 0X9 0Xfeebdaed

Comparing speed of Haskell and C for the computation of primes

I initially wrote this (brute force and inefficient) method of calculating primes with the intent of making sure that there was no difference in speed between using "if-then-else" versus guards in Haskell (and there is no difference!). But then I decided to write a C program to compare and I got the following (Haskell slower by just over 25%) :
(Note I got the ideas of using rem instead of mod and also the O3 option in the compiler invocation from the following post : On improving Haskell's performance compared to C in fibonacci micro-benchmark)
Haskell : Forum.hs
divisibleRec :: Int -> Int -> Bool
divisibleRec i j
| j == 1 = False
| i `rem` j == 0 = True
| otherwise = divisibleRec i (j-1)
divisible::Int -> Bool
divisible i = divisibleRec i (i-1)
r = [ x | x <- [2..200000], divisible x == False]
main :: IO()
main = print(length(r))
C : main.cpp
#include <stdio.h>
bool divisibleRec(int i, int j){
if(j==1){ return false; }
else if(i%j==0){ return true; }
else{ return divisibleRec(i,j-1); }
}
bool divisible(int i){ return divisibleRec(i, i-1); }
int main(void){
int i, count =0;
for(i=2; i<200000; ++i){
if(divisible(i)==false){
count = count+1;
}
}
printf("number of primes = %d\n",count);
return 0;
}
The results I got were as follows :
Compilation times
time (ghc -O3 -o runProg Forum.hs)
real 0m0.355s
user 0m0.252s
sys 0m0.040s
time (gcc -O3 -o runProg main.cpp)
real 0m0.070s
user 0m0.036s
sys 0m0.008s
and the following running times :
Running times on Ubuntu 32 bit
Haskell
17984
real 0m54.498s
user 0m51.363s
sys 0m0.140s
C++
number of primes = 17984
real 0m41.739s
user 0m39.642s
sys 0m0.080s
I was quite impressed with the running times of Haskell. However my question is this : can I do anything to speed up the haskell program without :
Changing the underlying algorithm (it is clear that massive speedups can be gained by changing the algorithm; but I just want to understand what I can do on the language/compiler side to improve performance)
Invoking the llvm compiler (because I dont have this installed)
[EDIT : Memory usage]
After a comment by Alan I noticed that the C program uses a constant amount of memory where as the Haskell program slowly grows in memory size. At first I thought this had something to do with recursion, but gspr explains below why this is happening and provides a solution. Will Ness provides an alternative solution which (like gspr's solution) also ensures that the memory remains static.
[EDIT : Summary of bigger runs]
max number tested : 200,000:
(54.498s/41.739s) = Haskell 30.5% slower
max number tested : 400,000:
3m31.372s/2m45.076s = 211.37s/165s = Haskell 28.1% slower
max number tested : 800,000:
14m3.266s/11m6.024s = 843.27s/666.02s = Haskell 26.6% slower
[EDIT : Code for Alan]
This was the code that I had written earlier which does not have recursion and which I had tested on 200,000 :
#include <stdio.h>
bool divisibleRec(int i, int j){
while(j>0){
if(j==1){ return false; }
else if(i%j==0){ return true; }
else{ j -= 1;}
}
}
bool divisible(int i){ return divisibleRec(i, i-1); }
int main(void){
int i, count =0;
for(i=2; i<8000000; ++i){
if(divisible(i)==false){
count = count+1;
}
}
printf("number of primes = %d\n",count);
return 0;
}
The results for the C code with and without recursion are as follows (for 800,000) :
With recursion : 11m6.024s
Without recursion : 11m5.328s
Note that the executable seems to take up 60kb (as seen in System monitor) irrespective of the maximum number, and therefore I suspect that the compiler is detecting this recursion.

This isn't really answering your question, but rather what you asked in a comment regarding growing memory usage when the number 200000 grows.
When that number grows, so does the list r. Your code needs all of r at the very end, to compute its length. The C code, on the other hand, just increments a counter. You'll have to do something similar in Haskell too if you want constant memory usage. The code will still be very Haskelly, and in general it's a sensible proposition: you don't really need the list of numbers for which divisible is False, you just need to know how many there are.
You can try with
main :: IO ()
main = print $ foldl' (\s x -> if divisible x then s else s+1) 0 [2..200000]
(foldl' is a stricter foldl from Data.List that avoids thunks being built up).

Well bang patters give you a very small win (as does llvm, but you seem to have expected that):
{-# LANUGAGE BangPatterns #-}
divisibleRec !i !j | j == 1 = False
And on my x86-64 I get a very big win by switching to smaller representations, such as Word32:
divisibleRec :: Word32 -> Word32 -> Bool
...
divisible :: Word32 -> Bool
My timings:
$ time ./so -- Int
2262
real 0m2.332s
$ time ./so -- Word32
2262
real 0m1.424s
This is a closer match to your C program, which is only using int. It still doesn't match performance wise, I suspect we'd have to look at core to figure out why.
EDIT: and the memory use, as was already noted I see, is about the named list r. I just inlined r, made it output a 1 for each non-divisble value and took the sum:
main = print $ sum $ [ 1 | x <- [2..800000], not (divisible x) ]

Another way to write down your algorithm is
main = print $ length [()|x<-[2..200000], and [rem x d>0|d<-[x-1,x-2..2]]]
Unfortunately, it runs slower. Using all ((>0).rem x) [x-1,x-2..2] as a test, it runs slower still. But maybe you'd test it on your setup nevertheless.
Replacing your code with explicit loop with bang patterns made no difference whatsoever:
{-# OPTIONS_GHC -XBangPatterns #-}
r4::Int->Int
r4 n = go 0 2 where
go !c i | i>n = c
| True = go (if not(divisible i) then (c+1) else c) (i+1)
divisibleRec::Int->Int->Bool
divisibleRec i !j | j == 1 = False
| i `rem` j == 0 = True
| otherwise = divisibleRec i (j-1)

When I started programming in Haskell I was also impressed about its speed. You may be interested in reading point 5 "The speed of Haskell" of this article.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight