#define in #define; what happens in the preprocessor? - c-preprocessor

If I have:
#define X 5
#define Y X
what happens in the preprocessor for that kind of thing?
Is it going through the whole file and changes every X to 5,
then comes back up to the next define and then changes every Y to 5 (because in the previous iteration Y got 5) ?

The C standard has a special terminology for how macros are expanded.
Macro names are, in effect, stored in a big table of "all macros defined at this time". Each table entry "macro name" on the left (and any arguments in the middle) and "expansion token-stream" on the right.
When a macro is to be expanded (because it occurs in some non-preprocessor line, or in a position in a preprocessor line in which it must be expanded -- e.g., you can #define STDIO <stdio.h> and then #include STDIO), the table entry is "painted blue", and then the replacement token-stream is read (with argument expansion as also dictated by the standard).
If the replacement token-stream contains the original macro name, it no longer matches, because the "blue paint" covers up the name.
When the replacement token-stream has been fully handled, the "blue paint" is removed, re-exposing the name.
Thus:
#define X 5
adds X: (no arguments), 5 to the table.
Then:
#define Y X
adds: Y: (no arguments), X to the table.
Somewhere later in the file, you might have an occurrence of the token Y. Assuming neither of the above have been #undefed (removed from the table), the compiler must first "paint the table entry for Y blue" and replace the token Y with the token X.
Next, the compiler must "paint the table entry for X blue" and replace the token X with the token 5.
The token 5 is not a preprocessor macro name (it can't be by definition) so the token 5 passes beyond the reach of the preprocessing phase. Now the "blue paint" is removed from the X table entry, as that's done; and then the "blue paint" is removed from the Y entry, which is also done.
If you were to write instead:
#define Y Y, Y, Y, the letter is called Y!
then the sequence, on encountering a later token Y, would be:
paint entry for Y blue
drop in replacement token sequence: Y , Y , Y , the letter is called Y !
check each replacement token in the table -- since Y is painted blue those don't match, and , cannot match, so those are all passed on to the rest of the compiler; the, letter, is, and called must be checked but are probably not in the table and hence passed on; Y is still painted blue so does not match and is passed on, and ! cannot match and is passed on.
remove blue paint, restoring the expansion

Related

proof or disproof adding two minimal values to a B tree and then delete them

I came across a question and I'm not sure about the right answer:
We insert two new minimal values w and z, with w > z, in a B tree --
first we insert w and then x. Right afterwards we delete them by the same order. Does the original B tree struct stay the same, or do we get a different order in the tree?
It is not guaranteed that the B-tree remains the same. It would be guaranteed if the deletions happened in the opposite order as the insertions, but if the order is:
Insert w
Insert z
Delete w
Delete z
...then it depends on implementation choices, notably how the deletion of a value that occurs in a non-leaf node is dealt with.
Here is a counter example 2-3 tree, i.e. a B-tree of order 3:
[5 , -]
/ |
[4,-] [6,7]
So we have a root with (separator) value 5 and an empty slot. There are two leaves: the first leaf is filled half, with value 4, while the right leaf is completely occupied with values 6 and 7.
Now let w=2 and z=1.
After we insert 2, we get this tree -- nothing special happens:
[5 , -]
/ |
[2,4] [6,7]
Then, to insert 1, we must split the left most leaf, and move 2 as separator value to the parent node:
[2 , 5]
/ | \
[1,-] [4,-] [6,7]
Now we get to the critical part: the deletion of 2 gives us a choice. Wikipedia describes that choice as follows:
Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator.
If we choose the second option, then that means we choose 4 as new separator value to replace the value 2. This gives us the following intermediate situation:
[4 , 5]
/ | \
[1,-] [-,-] [6,7]
The empty leaf in the middle is underflowing, so we try to rotate. We must perform a rotation with the right-sided neighbor, as the other one does not have enough values, and so we move the 6 up, and the 5 down:
[4 , 6]
/ | \
[1,-] [5,-] [7,-]
...and the tree is valid again. But,... it is not the original tree.
So, this one counter example is enough proof that the predicate is false.
If however there would be the extra information that the algorithm always takes the first alternative for the deletion of an internal value, then the predicate seems to be true.

Compute if loop SPSS

Ultimately, I want to change scores of 0 to 1, scores of 1 to 2, and scores of 2 to 3. I thought one way to do that was using +1, but I realize I could also use a more complicated if then series.
Here is what I did so far:
I used the existing variable (x) to create a new variable (y=x+1) using SPSS syntax. I only want to do this for variables with values >=0 (this was my approach to excluding cells with missing data; the range for x is 0-2).
I can create x+1, but it overwrites the existing variables.
DO REPEAT x =var_1 TO var_86.
if (x>=0) x=(x+1).
end repeat.
exe.
I tried this modification, but it doesn't work:
DO REPEAT x = var_1 TO var_86 / y = var_1a TO var_86a.
IF (x >= 0) y=x +1.
END REPEAT.
EXE.
The error message is:
DO REPEAT The form VARX TO VARY to refer to a range of variables has
been used incorrectly. When using VARX TO VARY to create new
variables, X must be an integer less than or equal to the integer Y.
(Can't use A3 TO A1.)
I tried many other configurations including vectors and loops but haven't yet figured out how to do this computation across the range of variables without overwriting the existing ones. Thanks in advance for any recommendations.
The message you are getting is because SPSS doesn't understand the form var_1a TO var_86a.
For the x to y form to work the number has to be at the end of the name, so for example varA_1 to varA_86 should work.
While you're at it, here's a simple way to go about your task:
recode var_1 TO var_86 (0=1)(1=2)(2=3) into varA_1 TO varA_86.

How can zsh array elements be transformed in a single expansion?

Say you have a zsh array like:
a=("x y" "v w")
I want to take the first word of every element, say:
b=()
for e in $a; {
b=($b $e[(w)0])
}
So now I have what I need in b:
$ print ${(qq)b}
'x' 'v'
Is there a way to do this in a single expansion expression? (i.e. not needing a for loop for processing each array element and accumulating the result in a new array).
It could be possible to take the word by removing from the first occurrence of a white space to the end in each element of the array like this:
$ print ${(qq)a%% *}
'x' 'v'
It coulde be noted that the%% expression (and some others) could be used for array elements:
In the following expressions, when name is an array and the substitution is not quoted, or if the ‘(#)’ flag or the name[#] syntax is used, matching and replacement is performed on each array element separately.
...
${name%pattern}
${name%%pattern}
If the pattern matches the end of the value of name, then substitute the value of name with the matched portion deleted; otherwise, just substitute the value of name. In the first form, the smallest matching pattern is preferred; in the second form, the largest matching pattern is preferred.
-- zshexpn(1): Expansion, Parameter Expansion

SPSS recoding variables data from multiple variables into boolean variables

I have 26 variables and each of them contain numbers ranging from 1 to 61. I want for each case of 1, each case of 2 etc. the number 1 in a new variable. If there is no 1, the variable should contain 2.
So 26 variables with data like:
1 15 28 39 46 1 12 etc.
And I want 61 variables with:
1 2 1 2 2 1 etc.
I have been reading about creating vectors, loops, do if's etc but I can't find the right way to code it. What I have done is just creating 61 variables and writing
do if V1=1 or V2=1 or (etc until V26).
recode newV1=1.
end if.
exe.
**repeat this for all 61 variables.
recode newV1 to newV61(missing=2).
So this is a lot of code and quite a detour from what I imagine it could be.
Anyone who can help me out with this one? Your help is much appreciated!
noumenal is correct, you could do it with two loops. Another way though is to access the VECTOR using the original value though, writing that as 1, and setting all other values to zero.
To illustrate, first I make some fake data (with 4 original variables instead of 26) named X1 to X4.
*Fake Data.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO 20.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
VECTOR X(4,F2.0).
LOOP #i = 1 TO 4.
COMPUTE X(#i) = TRUNC(RV.UNIFORM(1,62)).
END LOOP.
EXECUTE.
Now what this code does is create four vector sets to go along with each variable, then uses DO REPEAT to actually refer to the VECTOR stub. Then finishes up with RECODE - if it is missing it should be coded a 2.
VECTOR V1_ V2_ V3_ V4_ (61,F1.0).
DO REPEAT orig = X1 TO X4 /V = V1_ V2_ V3_ V4_.
COMPUTE V(orig) = 1.
END REPEAT.
RECODE V1_1 TO V4_61 (SYSMIS = 2).
It is a little painful, as for the original VECTOR command you need to write out all of the stubs, but then you can copy-paste that into the DO REPEAT subcommand (or make a macro to do it for you).
For a more simple illustration, if we have our original variable, say A, that can take on integer values from 1 to 61, and we want to expand to our 61 dummy variables, we would then make a vector and then access the location in that vector.
VECTOR DummyVec(61,F1.0).
COMPUTE DummyVec(A) = 1.
For a record if A = 10, then here DummyVec10 will equal 1, and all the others DummyVec variables will still by system missing by default. No need to use DO IF for 61 values.
The rest of the code is just extra to do it in one swoop for multiple original variables.
This should do it:
do repeat NewV=NewV1 to NewV61/vl=1 to 61.
compute NewV=any(vl,v1 to v26).
end repeat.
EXPLANATION:
This syntax will go through values 1 to 61, for each one checking whether any of the variables v1 to v26 has that value. If any of them do, the right NewV will receive the value of 1. If none of them do, the right NewV will receive the value of 0.
Just make sure v1 to v26 are consecutively ordered in the file. if not, then change to:
compute NewV=any(vl,v1, v2, v3, v4 ..... v26).
You need a nested loop: two loops - one outer and one inner.

What is a regular expression for parsing key = value using C's regex.h lib

I need an expression which would much key=value line
Actually you might have between "key" and "=" "value" as many whitespaces as you want
so key = value is valid as well. But multivalue should not be allowed.
So something like this:
**key = value1 value2**
is not allowed.
I've already tried with
**const char* regexCheckValidityForKeyValue = "([[:print:]]{1,})([:blank:]*)(\\=){1}([[:blank:]]*)([[:graph:]]*)";**
But this does not really work.
Thank you for any help.
At least to me, it appears that using a regex for this is entirely unnecessary. Getting the code correct will be comparatively difficult, and reading it even more so.
I'd just use [sf]scanf:
if (2 == (sscanf(input, "%s = %s %s", key, value, ignore))
// it's good: just `key = value`
else
// malformed
Basically, this attempts to read and convert a key, a value, and a second value. It then checks the return value to see how many of those were matched. If exactly two matched, you have "key = value". If fewer than 2 matched, some part of key = value is missing. If it matches more than that, you have key = value1 value2 (and possibly more after value2), so that's malformed as well.
As a bonus, this also gives you the actual strings that made up your key and value without any extra cruft.
(Decided to put the comment as an answer instead...)
You're close - just put the first [:blank:] inside a character class as you've done with the rest of them. Then remove the escaping of the '=' i.e. : ([[:print:]]{1,})([[:blank:]]*)(=){1}([[:blank:]]*)([[:graph:]]*), or if your flavor supports shorthand - (\w+)\s*=\s*(\S+) which is a bit easier to read (or at least shorter ;)). Also captures only the key and the value, not the spaces and stuff that your regex does.
Regards
The "print" class includes the space character 0x20-0x7e, which is making the key allowed to have spaces, or basically could match all spaces as a key.
I assume this is extended POSIX.
There are problems in your description.
The only way to not allow a second 'value2' is to put edge conditions at the end.
But, not much is known about that.
modified Since there is no way to negate a posix class, you have to be specific to stop part delimeters from being included in key/value.
# ^[[:blank:]]*([[:alpha:]][!-<>-~]*)[[:blank:]]*=[[:blank:]]*([!-<>-~]+)[[:blank:]]*$
^ # BOS
[[:blank:]]* # space or tab
( # (1 start), KEY
[[:alpha:]] # start with alpha char
[!-<>-~]* # any chars not equal sign nor whitespace
) # (1 end)
[[:blank:]]* # spaces or tabs
= # equal sign
[[:blank:]]* # spaces or tabs
( [!-<>-~]+ ) # (2), VALUE any chars not equal sign nor whitespace
[[:blank:]]* # spaces or tabs
$ # EOS

Resources