Blowfish encryption function(BF_encrypt) in C

Blowfish encryption function(BF_encrypt) in C - c

I am trying to understand the BlowFish Encryption algorithm. I am having some problems which are described below in full details:
Here are functions:
void BF_encrypt (data, encrypt)
BF_LONG *data; //data is array of two members i.e ti[0] and ti[1]
//and BF_LONG is defined as unsigned long in header file
int encrypt; //encrypt is defined as 1
{
register BF_LONG l, r, *p, *s; //BF_LONG is defined as unsigned long in header file
p = key_P; //key_P is declared as BF_LONG key_P[16 + 2];
s = &(key_S[0]); //key_S is declared as BF_LONG key_S[4 * 256];
l = data[0]; //data[0]=ti[0]
r = data[1]; //data[1]=ti[1]
l ^= p[0]; //Bitwise ex-or of l with p[0];
BF_ENC (r, l, s, p[1]);
}
And here is BF_ENC function:
#define BF_ENC(LL,R,S,P) \
LL^=P; \
LL^=((( S[ (R>>24L) ] + \
S[0x0100+((R>>16L)&0xff)])^ \
S[0x0200+((R>> 8L)&0xff)])+ \
S[0x0300+((R )&0xff)])&0xffffffff;
Here are my questions:
What does "\" mean in every ending line in BF_ENC?
R>>24L, I can get it that it is shifting to right. L is long here but I am not getting its presence here. What is the effect of L on data?
Last question is that what is BF_ENC function doing over all, just a overview not in depth.
If you need any more information, I am here.
Thanks!

BF_ENC is a C-style macro. Usually those are delimited by the end of line character. The \ escape character is just there to indicate that the next line should be considered part of this line. So everything up to 0xffffffff; is part of the macro.
The effect of L is that on many languages the primitive type of the output may depend on both operands. The L makes 24 (and 16 and 8) literals of the type long. This will make sure that the output is of the correct primitive type (of at least 32 bit). In particular it makes sure that the output is large enough to store the 32 bit information.
For shifts this doesn't make much sense (you wouldn't expect a shift of more than 64 bits, so the output could be just the primitive type put in the left operand) but language designers like symmetry - with operators such as + and * - and dislike special cases.
Looking at a previous question I found:
yes, this was my thought as well. Afair, this really worked in K&R (1 << 1L was of type long like 1 + 1L is in ISO C), which is why I asked where the code is from...
so make sure you verify that this macro is working correctly in your environment. To understand blowfish you may be better off looking at more modern code that doesn't contain macro's at all.
BF_ENC shows Blowfishes F function which is the round function (Feistel function) of Blowfish.
It's on the upper right of the Wikipedia page about Blowfish and is described as:
The diagram to the upper right shows Blowfish's F-function. The function splits the 32-bit input into four eight-bit quarters, and uses the quarters as input to the S-boxes. The outputs are added modulo 2^32 and XORed to produce the final 32-bit output.

Related

FIFO implementation in C

I am analysing an Internet guide, where I fond code like that. Can somebody explain me the usage of ~ and & operators?
Thanks in advance
uint8_t tx_fifo_put(tx_dataType data)
{
/*Check if FIFO is full*/
if((tx_put_itr - tx_get_itr) & ~(TXFIFOSIZE-1))
{
/*FIFO full - return TXFAIL*/
return (TXFAIL);
}
/*Put data into fifo*/
TX_FIFO[tx_put_itr & (TXFIFOSIZE - 1)] = data;
/*Incerment itr*/
tx_put_itr++;
return(TXSUCCESS);
}

What the code does, is an obfuscated way to replace a more human readable code.
As a commenter wrote before me, the TX_FIFO[tx_put_itr & (TXFIFOSIZE - 1)] = data; loops the output. Also as it was mentioned in comments, the code is meant to have size being power of two.
I do not know why it is done so, for me TX_FIFO[tx_put_itr % TXFIFOSIZE] = data does the same, but more readable. Also, a person expects predicate checks to be before data access. At least it is my nature.
The (w - r) &~ size part is a way to check for (1)w < r and, (2) as an edge case, w being equal to FIFOSIZE and r being zero. Semantically it should have meant, that "if the write pointer points to boundary, and read pointer points to start of a buffer, we suggest that, for our data structure, next write could be an overflow."
Let us see some code, numbers and their binary representation.
let s = 8 - 1, in binary is 00000111 and negated is 11111000.
let w = 0, let r = 1.
now in binary w = 00000000, r = 00000001.
w - r = 11111111, logical and that with ~(8 - 1) and get some value, other then zero.
Continuing the logic for the w < r case, we get that any negative integer will produce some bits in the above. So this definitely gives true for the OP if code.
Now the w = r case can not commit bits to the boolean test.
And last case,
let s = 8,
let w = 8
let r = 0
w - r = 00001000
~(8 - 1) = 11111000
(w - r) &~ 7 = 00001000
All other cases where w > r give zero.
Update
To my great grief, the #UkropUkraine had deleted all comments and his answer. There were some discussion there about the fact, that one can use (w - r) >= mask in place of (w - r) & mask.
Here I present a code, and an explanation that it is not an optimization, or just syntax, or whatever came to mind to the person who wrote the OP code. It is intended code. And it fails to do its purpose: to run as a FIFO or circular queue, or whatever that part of code was meant to do.
First, take an example of usage. The part where Ukrop user had difficulties. The w pointer can be less than r pointer. And the result of w - r will be negative.
The common usage is to add a byte to the buffer and wrap write pointer as soon as it reaches the end. Imagine situation where w pointer already wrapped.
#include <stdio.h>
int main()
{
unsigned char w = 0, r = 1;
int r;
r = (a - b) & 0xffffffff;
printf("%d\n", r);
return 0;
}
-1
I do not know what is a common boolean result type with micro controllers. For a common x86 C machine, it is int. So I expect the if((w - r) &~ size) to be converted to an int. And the result is negative. You can not just write the above with >=, '>', or == as it was stated by the comments and the other answer here.
More than that, the code fails its semantics. It is meant to be a FIFO, or something, I do not know. But in the above situation, the read pointer still has some sensible data to read. And it can be done, because the write pointer, even if it is wrapped, does not overwrite the read portion of a buffer, yet. But the code returns BUFFULL.
I thought about read/write being different directions, but it does not change anything. The code OP gave, fails to do what one would expect.
Maybe I do miss some insight here, as Ukrop user, and OP, point me to the fact that they know code semantics. The OP just did not get a ~ and & usage. Well, this is an answer, the ~& is used to test for a negative value and for the edge cases.

The two operators:
& is a bitwise and operator
~ is a bitwise complement operator
Now for the posted code it's important to notice that TXFIFOSIZE must have a value which is a power of 2, i.e. values like 2, 4, 8, 16, 32, ...
When that is true, the code:
TX_FIFO[tx_put_itr & (TXFIFOSIZE - 1)] = data;
is equivalent to:
TX_FIFO[tx_put_itr % TXFIFOSIZE] = data;
Notice that tx_put_itr is being incremented in such a way that it will take value higher than TXFIFOSIZE. So in order to get a valid array index the code must find the remainder of tx_put_itr with respect to TXFIFOSIZE.
So how does work? Why are the above lines equivalent?
Let's take a value as example.
Assume TXFIFOSIZE is 8 (2 to the power of 3)
So TXFIFOSIZE-1 is 7
7 is bitwise 00....00111
And when you do:
SOME_NUMBER & 00....00111
You keep the 3 least significant bits of SOME_NUMBER
And that is exactly the remainder of when diving by 8
So let's look at
if((tx_put_itr - tx_get_itr) & ~(TXFIFOSIZE-1))
It is equivalent to
if((tx_put_itr - tx_get_itr) >= TXFIFOSIZE)
So it checks for "FIFO full"
Again using an example it works like this:
Assume TXFIFOSIZE is 8 (2 to the power of 3)
So TXFIFOSIZE-1 is 7
7 is bitwise 00....00111
~7 is bitwise 11....11000
And when you do:
SOME_NUMBER & 11....11000
You clear the 3 least significant bits of SOME_NUMBER and keep the rest unchanged
So if the result is non-zero it means that the difference between
tx_put_itr and tx_get_itr is 8 (or more).

Calculate a third grade function with C on a STM32 microcontroller

I'm trying to implement the following function to be calculated by my STM32
y=0.0006*x^3 - 0.054*x^2 + 2.9094*x - 2.3578
x is in range 0 to 215
To avoid the use of pow or any other functions, I have written the following code
static double tmp = 0;
static unsigned short c_m;
static unsigned short c_m_3;
static unsigned short c_m_2;
c_m_3 = c_m*c_m*c_m;
c_m_2 = c_m*c_m;
tmp = (0.0006*c_m_3) - (0.054*c_m_2) + (2.9094*c_m) - 2.3578;
dati->sch_g = tmp;
For some reason the calculation is totally wrong as, for istane, if c_m = 1 I should have tmp = 0.4982 instead I got 13
Am I missing something?

As denoted by Lundin in the comments your micro controller type (ARM Cortex M0) doesn't provide a floating point unit. This in consequence means you cannot rely on natural floating point math, but need to rely on a floating point software library like e.g. this one (note: I didn't evaluate, has just been the very first I stumbled upon on a quick search!).
Alternatively – and likely preferrably – you might want to do the calculations in plain integers; if you additionally convert your calculation from pattern a*x*x*x + b*x*x + c*x + d to ((a*x + b)*x + c)*x + d you even spare some mulitiplications:
int32_t c_m = ...;
c_m = ((6 * c_m - 540) * c_m + 29094) * c_m - 23578;
Note: unsigned short would be too small to hold the result on STM32, so you need to switch to 32 bit at least! Additionally you need a signed value to be able to hold the negative result arising from c_m == 0.
Your results would now be too large by a factor of 10 000, of course. As the use case is unclear question remains open how you would want to deal with, possibly rounding it (c_m = (c_m + 5000) / 10000) or evaluating the fractional part by other means.

short is 16 bits on all STM32. Thus the value 215 * 215 * 215 will not fit inside one. c_m_3 = c_m*c_m*c_m; truncates the result as per modulus USHRT_MAX+1 (65536):
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Use uint32_t instead.

short is only 16 bits, the max value it can hold is 65535. Therefore it will overflow if the number you want to calculate the third power for is over 40. This means that you must use a larger variable type like uint32_t.
You can also use ifs to detect overflow for better programming practices.
As another note, it's better to use "uint8_t" and "uint16_t" instead of "unsigned char" and "unsigned short" in embedded programming, because they're more descriptive of the data sizes.

Generating (very) large non-repeating integer sequence without pre-shuffling

Background
I have a simple media client/server I've written, and I want to generate a non-obvious time value I send with each command from the client to the server. The timestamps will have a fair bit of data in them (nano-second resolution, even if it's not truly accurate, due to limitations of timer sampling in modern operating systems), etc.
What I'm trying to do (on Linux, in C), is to generate a one-to-one sequence of n-bit values (let's assume data is store in 128bit array-of-int elements for now) with no overlapping/colliding values. I would then take a pseudo-random 128bit value/number as a "salt", apply it to the timestamp, and then start sending off commands to the server, incrementing the pre-salted/pre-hashed value.
The reason the timestamp size is so large is because the timestamp may have to accommodate a very large duration of time.
Question
How could I accomplish such a sequence (non-colliding) with an initial salt value? The best approach that sounds along the lines of my goal is from this post, which notes:
If option 1 isn't "random" enough for you, use the CRC-32 hash of said
global (32-bit) counter. There is a 1-to-1 mapping (bijection) between
N-bit integers and their CRC-N so uniqueness will still be guaranteed.
However, I do not know:
If that can (efficiently) be extended to 128-bit data.
If some sort of addition-to/multiplication-by salt-value to provide the initial seed for the sequence would disrupt it or introduce collisions.
Follow-up
I realize that I could use a 128bit random hash from libssl or something similar, but I want the remote server, using the same salt value, to be able to convert the hashed timestamps back into their true values.
Thank you.

You could use a linear congruential generator. With the right parameters, it is guaranteed to produce non-repeating sequences [unique] sequences with a full period (i.e. no collisions).
This is what random(3) uses in TYPE_0 mode. I adapted it for a full unsigned int range and the seed can be any unsigned int (See my sample code below).
I believe it can be extended to 64 or 128 bits. I'd have a look at: https://en.wikipedia.org/wiki/Linear_congruential_generator to see about the constraints on parameters to prevent collisions and good randomness.
Following the wiki page guidelines, you could produce one that can take any 128 bit value as the seed and will not repeat until all possible 128 bit numbers have been generated.
You may need to write a program to generate suitable parameter pairs and then test them for the "best" randomness. This would be a one time operation.
Once you've got them, just plug these parameters into your equation in your actual application.
Here's some code of mine that I had been playing with when I was looking for something similar:
// _prngstd -- get random number
static inline u32
_prngstd(prng_p prng)
{
long rhs;
u32 lhs;
// NOTE: random is faster and has a _long_ period, but it _only_ produces
// positive integers but jrand48 produces positive _and_ negative
#if 0
rhs = jrand48(btc->btc_seed);
lhs = rhs;
#endif
// this has collisions
#if 0
rhs = rand();
PRNG_FLIP;
#endif
// this has collisions because it defaults to TYPE_3
#if 0
rhs = random();
PRNG_FLIP;
#endif
// this is random in TYPE_0 (linear congruential) mode
#if 0
prng->prng_state = ((prng->prng_state * 1103515245) + 12345) & 0x7fffffff;
rhs = prng->prng_state;
PRNG_FLIP;
#endif
// this is random in TYPE_0 (linear congruential) mode with the mask
// removed to get full range numbers
// this does _not_ produce overlaps
#if 1
prng->prng_state = ((prng->prng_state * 1103515245) + 12345);
rhs = prng->prng_state;
lhs = rhs;
#endif
return lhs;
}

The short answer is encryption. With a set of 128 bit values feed them into AES and get a different set of 128 bit values out. Because encryption is reversible the outputs are guaranteed unique for unique inputs with a fixed key.
Encryption is a reversible one-to-one mapping of the input values to the output values, each set is a full permutation of the other.
Since you are presumably not repeating your inputs, then ECB mode is probably sufficient, unless you want a greater degree of security. ECB mode is vulnerable if used repeatedly with identical inputs, which does not appear to be the case here.
For inputs shorter than 128 bits, then use a fixed padding method to make them the right length. As long as the uniqueness of inputs is not affected, then padding can be reasonably flexible. Zero padding, at either end (or at the beginning of internal fields) may well be sufficient.
I do not know your detailed requirements, so feel free to modify my advice.

Somewhere between linear congruential generators and encryption functions there are hashes that can convert linear counts into passable pseudorandom numbers.
If you happen to have 128-bit integer types handy (eg., __int128 in GCC when building for a 64-bit target), or are willing to implement such long multiplies by hand, then you could extend on the construction used in SplitMix64. I did a fairly superficial search and came up with the following parameters:
uint128_t mix(uint128_t x) {
uint128_t m0 = (uint128_t)0xecfb1b9bc1f0564f << 64
| 0xc68dd22b9302d18d;
uint128_t m1 = (uint128_t)0x4a4cf0348b717188 << 64
| 0xe2aead7d60f8a0df;
x ^= x >> 59;
x *= m0;
x ^= x >> 60;
x *= m1;
x ^= x >> 84;
return x;
}
and its inverse:
uint128_t unmix(uint128_t x) {
uint128_t im0 = (uint128_t)0x367ce11aef44b547 << 64
| 0x424b0c012b51d945;
uint128_t im1 = (uint128_t)0xef0323293e8f059d << 64
| 0x351690f213b31b1f;
x ^= x >> 84;
x *= im1;
x ^= x >> 60 ^ x >> (2 * 60);
x *= im0;
x ^= x >> 59 ^ x >> (2 * 59);
return x;
}
I'm not sure if you wanted a just a random sequence, or a way to obfuscate an arbitrary timestamp (since you said you wanted to decode the values they must be more interesting than a linear counter), but one derives from the other simply enough:
uint128_t encode(uint128_t time, uint128_t salt) {
return mix((time + 1) * salt);
}
uint128_t generate(uint128_t salt) {
static uint128_t t = 0;
return encode(t++, salt);
}
static uint128_t inv(uint128_t d) {
uint128_t i = d;
while (i * d != 1) {
i *= 2 - i * d;
}
return i;
}
uint128_t decode(uint128_t etime, uint128_t salt) {
return unmix(etime) * inv(salt) - 1;
}
Note that salt chooses one of 2127 sequences of non-repeating 128-bit values (we lose one bit because salt must be odd), but there are (2128)! possible sequences that could have been generated. Elsewhere I'm looking at extending the parameterisation so that more of these sequences can be visited, but I started goofing around with the above method for increasing the randomness of the sequence to hide any problems where the parameters could pick not-so-random (but provably distinct) sequences.
Obviously uint128_t isn't a standard type, and so my answer is not C, but you can use either a bignumber library or a compiler extension to make the arithmetic work. For clarity I relied on the compiler extension. All the operations rely on C-like unsigned overflow behaviour (take the low-order bits of the arbitrary-precision result).

Notation for fixed point representation

I'm looking for a commonly understandable notation to define a fixed point number representation.
The notation should be able to define both a power-of-two factor (using fractional bits) and a generic factor (sometimes I'm forced to use this, though less efficient). And also an optional offset should be defined.
I already know some possible notations, but all of them seem to be constrained to specific applications.
For example the Simulink notation would perfectly fit my needs, but it's known only in the Simulink world. Furthermore the overloaded usage of the fixdt() function is not so readable.
TI defines a really compact Q Formats, but the sign is implicit, and it doesn't manage a generic factor (i.e. not a power-of-two).
ASAM uses a generic 6-coefficient rational function with 2nd-degree numerator and denominator polynomials (COMPU_METHOD). Very generic, but not so friendly.
See also the Wikipedia discussion.
The question is only about the notation (not efficiency of the representation nor fixed-point manipulation). So it's a matter of code readability, maintenability and testability.

Ah, yes. Having good naming annotations is absolutely critical to not introducing bugs with fixed point arithmetic. I use an explicit version of the Q notation which handles
any division between M and N by appending _Q<M>_<N> to the name of the variable. This also makes it possible to include the signedness as well. There are no run-time performance penalties for this. Example:
uint8_t length_Q2_6; // unsigned, 2 bit integer, 6 bit fraction
int32_t sensor_calibration_Q10_21; // signed (1 bit), 10 bit integer, 21 bit fraction.
/*
* Calculations with the bc program (with '-l' argument):
*
* sqrt(3)
* 1.73205080756887729352
*
* obase=16
* sqrt(3)
* 1.BB67AE8584CAA73B0
*/
const uint32_t SQRT_3_Q7_25 = 1 << 25 | 0xBB67AE85U >> 7; /* Unsigned shift super important here! */
In case someone have not fully understood why such annotation is extremely important,
Can you spot the if there is an bug in the following two examples?
Example 1:
speed_fraction = fix32_udiv(25, speed_percent << 25, 100 << 25);
squared_speed = fix32_umul(25, speed_fraction, speed_fraction);
tmp1 = fix32_umul(25, squared_speed, SQRT_3);
tmp2 = fix32_umul(12, tmp1 >> (25-12), motor_volt << 12);
Example 2:
speed_fraction_Q7_25 = fix32_udiv(25, speed_percent << 25, 100 << 25);
squared_speed_Q7_25 = fix32_umul(25, speed_fraction_Q7_25, speed_fraction_Q7_25);
tmp1_Q7_25 = fix32_umul(25, squared_speed_Q7_25, SQRT_3_Q1_31);
tmp2_Q20_12 = fix32_umul(12, tmp1_Q7_25 >> (25-12), motor_volt << 12);
Imagine if one file contained #define SQRT_3 (1 << 25 | 0xBB67AE85U >> 7) and another file contained #define SQRT_3 (1 << 31 | 0xBB67AE85U >> 1) and code was moved between those files. For example 1 this has a high chance of going unnoticed and introduce the bug that is present in example 2 which here is done deliberately and has a zero chance of being done accidentally.

Actually Q format is the most used representation in commercial applications: you use is when you need to deal with fractional numbers FAST and your processor does not have a FPU (floating point unit) is it cannot use float and double data types natively - it has to emulate instructions for them which are very expensive.
usually you use Q format to represent only the fractional part, though this not a must, you get more precision for your representation. Here's what you need to consider:
number of bits you use (Q15 uses 16 bitdata types, usually short int)
the first bit is the sign bit (out of 16 bits you are left with 15 for data value)
the rest of the bits are used to store the fractional part of your number.
since you are representing fractional numbers your value is somewhere in [0,1)
you can choose to use some bits for the integer part as well, but you would loose precision - e.g if you wanted to represent 3.3 in Q format, you would need 1 bit for sign, 2 bits for the integer part, and are left with 13 bits for the fractional part (assuming you are using 16 bits representation)-> this format is called 2Q13
Example: Say you want to represent 0.3 in Q15 format; you apply the Rule of Three:
1 = 2^15 = 32768 = 0x8000
0.3 = X
-------------
X = 0.3*32768 = 9830 = 0x666
You lost precision by doing this but at least the computation is fast now.

In C, you can't use a user defined type like a builtin one. If you want to do that, you need to use C++. In that language you can define a class for your fixed point type, overload all the arithmetic operators (+, -, *, /, %, +=, -=, *=, /=, %=, --, ++, cast to other types), so that usage of the instances of this class really behave like the builtin types.
In C, you need to do what you want explicitly. There are two basic approaches.
Approach 1: Do the fixed point adjustments in the user code.
This is overhead-free, but you need to remember to do the correct adjustments. I believe, it is easiest to just add the number of past point bits to the end of the variable name, because the type system won't do you much good, even if you typedef'd all the point positions you use. Here is an example:
int64_t a_7 = (int64_t)(7.3*(1<<7)); //a variable with 7 past point bits
int64_t b_5 = (int64_t)(3.78*(1<<5)); //a variable with 5 past point bits
int64_t sum_7 = a_7 + (b_5 << 2); //to add those two variables, we need to adjust the point position in b
int64_t product_12 = a_7 * b_5; //the product produces a number with 12 past point bits
Of course, this is a lot of hassle, but at least you can easily check at every point whether the point adjustment is correct.
Approach 2: Define a struct for your fixed point numbers and encapsulate the arithmetic on it in a bunch of functions. Like this:
typedef struct FixedPoint {
int64_t data;
uint8_t pointPosition;
} FixedPoint;
FixedPoint fixed_add(FixedPoint a, FixedPoint b) {
if(a.pointPosition >= b.PointPosition) {
return (FixedPoint){
.data = a.data + (b.data << a.pointPosition - b.pointPosition),
.pointPosition = a.pointPosition
};
} else {
return (FixedPoint){
.data = (a.data << b.pointPosition - a.pointPosition) + b.data,
.pointPosition = b.pointPosition
};
}
}
This approach is a bit cleaner in the usage, however, it introduces significant overhead. That overhead consists of:
The function calls.
The copying of the structs for parameter and result passing, or the pointer dereferences if you use pointers.
The need to calculate the point adjustments at runtime.
This is pretty much similar to the overhead of a C++ class without templates. Using templates would move some decisions back to compile time, at the cost of loosing flexibility.
This object based approach is probably the most flexible one, and it allows you to add support for non-binary point positions in a transparent way.

How does this unique file header 'special marker' tell eCryptfs that it's an eCryptfs file?

I am writing a program (using Free Pascal, not C) that parses the header of eCryptfs files.
One of the values in the header starts at byte 7 and ends at 15 (an 8 byte value). It is of use to me because it's a value that uniquely identifies the files as eCryptfs files. So I am trying to code my app to recognise it when it finds such values in files.
However, the marker itself is generated by XOR'ing a randomly generated 4 byte value (X) with another 4 byte static hex value of 0x3c81b7f5 (Y). The generated value is 4 bytes, Z. X + Z together form the 8 byte special marker. Y itself is not stored in the files header. So, seeing as the value 0x3c81b7f5 (Y) is never stored in the header, I can't code my application to look for it and seeing as the other 4 bytes are the XOR'd result of one static value with a another random one, I can't work out how it's recognised.
Having asked how the eCryptfs program recognises this value as "an eCryptfs file" at the eCryptfs Launchpad site (https://answers.launchpad.net/ecryptfs/+question/152821, one of the community referred me to the relevant C source code which I have linked to below. However, I don't understand C well enough to work out how it is recognising the special markers. Can anyone help me so I can code the same kind of recognition process into my own app? I don't want source code but I just want someone to explain how the C code is working out "Ah yes, that's an eCryptfs file right there!" so I know what I need to code my app to do.
http://fxr.watson.org/fxr/source/fs/ecryptfs/crypto.c?v=linux-2.6;im=excerpts#L1029

What you're really interested in is this part here:
m_1 = get_unaligned_be32(data);
m_2 = get_unaligned_be32(data + 4);
if ((m_1 ^ MAGIC_ECRYPTFS_MARKER) == m_2)
return 1;
The get_unaligned_be32 function just converts four bytes from
data to an unsigned four byte integer with possible byte order
adjustments. The data + 4 in the second call to get_unaligned_be32
moves the address passed to get_unaligned_be32 up by four bytes:
[0][3][2][4][4][5][6][7]
^ ^
data data + 4
m_1 m_2 /* After adjusting byte order */
So, the first two lines just pull two unsigned integers out of the
first eight bytes of data (possibly with byte order fixes).
Then we have this expression:
(m_1 ^ MAGIC_ECRYPTFS_MARKER) == m_2
The ^ is just the XOR operator and MAGIC_ECRYPTFS_MARKER is
0x3c81b7f5 so this test is just XORing m_1 and 0x3c81b7f5 and
seeing if it is equal to m_2; if this comparison is true then you
have the right type of file.

If by "X + Z together form the 8 byte special marker" you mean that they're concatenated, then you have the following:
Y = 0x3c81b7f5
X = some random value
Z = X ^ Y (also a random value, due to X being random)
Now, you have X and Z from the file header. If you xor them together,the following relation holds:
X ^ Z = X ^ (X ^ Y) = Y
So you should end up with the value 0x3c81b7f5.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight