How to fix tapered eval taking more nodes - artificial-intelligence

I've just implemented tapered eval but I'm not sure if I'm actually done in it because it just ruins the move ordering.
I'm using this fen for test: r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1
So this are the info with tapered eval:
info depth 1 score cp 18.000000 nodes 65 time 116 pv e2a6
info depth 2 score cp 18.000000 nodes 165 time 402 pv e2a6 b4c3
info depth 3 score cp 18.000000 nodes 457 time 568 pv e2a6 b4c3 d2c3
info depth 4 score cp 18.000000 nodes 3833 time 1108 pv e2a6 b4c3 d2c3 h3g2
info depth 5 score cp 17.000000 nodes 12212 time 1875 pv e2a6 e6d5 c3d5
info depth 6 score cp 17.000000 nodes 77350 time 4348 pv e2a6 e6d5 c3d5
bestmove e2a6 ponder e6d5
And without tapered eval:
info depth 1 score cp 19.000000 nodes 75 time 66 pv e2a6
info depth 2 score cp 19.000000 nodes 175 time 182 pv e2a6 b4c3
info depth 3 score cp 19.000000 nodes 398 time 371 pv e2a6 b4c3 d2c3
info depth 4 score cp 19.000000 nodes 3650 time 947 pv e2a6 b4c3 d2c3 h3g2
info depth 5 score cp 18.000000 nodes 10995 time 1849 pv e2a6 e6d5 c3d5
info depth 6 score cp 18.000000 nodes 75881 time 4334 pv e2a6 e6d5 c3d5
bestmove e2a6 ponder e6d5
You can see that without tapered eval actualy has less nodes than the other, I'm just wondering if this is necessary or did I just implemented it wrong.
My phase function:
int totalPhase = pawnPhase * 16 + knightPhase * 4 + bishopPhase * 4 + rookPhase * 4 + queenPhase * 2;
int phase = totalPhase;
for each piece in node {
if piece is pawn, phase -= pawnPhase
else if piece is knight, phase -= knightPhase
...
}
return (phase * 256 + (totalPhase / 2)) / totalPhase;
And then I added the interpolation in the eval function:
for (each piece in node) {
...score material weights and positional scores, etc.
}
evaluation = ((mgEvaluation * (256 - phase)) + (egEvaluation * phase)) / 256;
I got the formula in this site: Tapered Eval
If this is actually necessary, can someone give me tips to optimize this?

Tapered eval is very useful and needs to be used since the tactic in early/mid game is way different than in the end game. You don't mention how you sort the moves, but since the tapered eval gives you different numbers in the piece square tables (PST) for a mid game position it is just natural that the move ordering will be slightly different than before. The results you are getting are pretty close to eachother and seems plausible.
Test the start position with tapered eval and see that it gives the same results as just normal eval with the PST for opening. Also do the same with an endgame position and just PST for endgame which should also give the same result.

Related

ArangoDB: how to optimize FILTER on a subquery on big collections

I have a fairly simple data model (inspired by IMDB data) to link films to their directors or actors.
This includes 3 collections:
document collection Titles
document collection Persons
edge collection core_edge_values_links with an attribute field to determine the role of the person (director, actor...)
Titles are linked to Persons through a core_edge_values_links edge:
Now, I'm trying to find all Titles which have a link to any director.
Query is:
FOR r IN titles
LET rlinkVal = FIRST(
FOR rv, re IN 1 OUTBOUND r._id
core_edge_values_links
FILTER re.attribute == 'directors'
RETURN re
)
FILTER rlinkVal != null
LIMIT 0, 20
RETURN r
This query will take about 2.5s (and up to almost 2 minutes with fullCount enable) on a DB with 1M titles, about 200k persons and about 4M edges. I have about 600k matching titles.
To try to speed this up, I added an index on the core_edge_values_links colllection on fields _from and attribute. Unfortunately, the traversal won't take advantage of this index.
I also tried to use a simple join instead of the traversal:
FOR r IN titles
LET rlinkVal = FIRST(
FOR e IN core_edge_values_links
FILTER e._from == r._id AND e.attribute == 'directors'
RETURN true
)
FILTER rlinkVal != null
LIMIT 0, 20
RETURN r
This time, the index is used, but query time is pretty much the same as with the traversal (about 2.5s, see profiling data below).
This makes me wondering if the problem just comes from the number of titles in the collection (about 1M) as they're all scanned in the initial FOR but I don't see how I could index it. This is a fairly simple use case, I don't feel this should take so long.
This is a fairly simple use case, I don't feel this should take so long.
Here is profiling data provided by arango.
With the traversal:
Query String (241 chars, cacheable: false):
FOR r IN titles
LET rlinkVal = FIRST(
FOR rv, re IN 1 OUTBOUND r._id
core_edge_values_links
FILTER re.attribute == 'directors'
RETURN re
)
FILTER rlinkVal != null
LIMIT 0, 20
RETURN r
Execution plan:
Id NodeType Calls Items Runtime [s] Comment
1 SingletonNode 1 1 0.00001 * ROOT
2 EnumerateCollectionNode 305 304000 0.08433 - FOR r IN titles /* full collection scan */
4 CalculationNode 305 304000 0.07084 - LET #6 = r.`_id` /* attribute expression */ /* collections used: r : titles */
16 SubqueryStartNode 609 608000 0.12038 - LET #3 = ( /* subquery begin */
5 TraversalNode 306 305000 1.93808 - FOR /* vertex optimized away */, re /* edge */ IN 1..1 /* min..maxPathDepth */ OUTBOUND #6 /* startnode */ core_edge_values_links
6 CalculationNode 306 305000 0.02857 - LET #10 = (re.`attribute` == "directors") /* simple expression */
7 FilterNode 305 304000 0.05040 - FILTER #10
15 LimitNode 305 304000 0.00927 - LIMIT 0, 1
17 SubqueryEndNode 304 303000 0.05520 - RETURN re ) /* subquery end */
10 CalculationNode 304 303000 0.02094 - LET rlinkVal = FIRST(#3) /* simple expression */
11 CalculationNode 304 303000 0.01909 - LET #12 = (rlinkVal != null) /* simple expression */
12 FilterNode 1 20 0.01508 - FILTER #12
13 LimitNode 1 20 0.00001 - LIMIT 0, 20
14 ReturnNode 1 20 0.00001 - RETURN r
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
5 edge edge core_edge_values_links false false 34.87 % [ `_from` ] base OUTBOUND
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
5 1..1 core_edge_values_links uniqueVertices: none, uniqueEdges: path FILTER (re.`attribute` == "directors")
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 optimize-subqueries
3 optimize-traversals
4 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Filtered Peak Mem [b] Exec Time [s]
0 0 304000 0 302977 1605632 2.41360
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00006
optimizing ast 0.00001
loading collections 0.00001
instantiating plan 0.00006
optimizing plan 0.00120
executing 2.41225
finalizing 0.00003
With the join:
Query String (232 chars, cacheable: false):
FOR r IN titles
LET rlinkVal = FIRST(
FOR e IN core_edge_values_links
FILTER e._from == r._id AND e.attribute == 'directors'
RETURN true
)
FILTER rlinkVal != null
LIMIT 0, 20
RETURN r
Execution plan:
Id NodeType Calls Items Runtime [s] Comment
1 SingletonNode 1 1 0.00001 * ROOT
7 CalculationNode 1 1 0.00000 - LET #7 = true /* json expression */ /* const assignment */
2 EnumerateCollectionNode 305 304000 0.09654 - FOR r IN titles /* full collection scan */
17 SubqueryStartNode 609 608000 0.13655 - LET #2 = ( /* subquery begin */
16 IndexNode 305 304000 2.29080 - FOR e IN core_edge_values_links /* persistent index scan, scan only */
15 LimitNode 305 304000 0.00992 - LIMIT 0, 1
18 SubqueryEndNode 304 303000 0.05765 - RETURN #7 ) /* subquery end */
10 CalculationNode 304 303000 0.02272 - LET rlinkVal = FIRST(#2) /* simple expression */
11 CalculationNode 304 303000 0.02060 - LET #9 = (rlinkVal != null) /* simple expression */
12 FilterNode 1 20 0.01656 - FILTER #9
13 LimitNode 1 20 0.00000 - LIMIT 0, 20
14 ReturnNode 1 20 0.00000 - RETURN r
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
16 idx_1742224911374483456 persistent core_edge_values_links false false n/a [ `_from`, `attribute` ] ((e.`_from` == r.`_id`) && (e.`attribute` == "directors"))
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 optimize-subqueries
3 use-indexes
4 remove-filter-covered-by-index
5 remove-unnecessary-calculations-2
6 splice-subqueries
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Filtered Peak Mem [b] Exec Time [s]
0 0 304000 492 302977 1441792 2.65203
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00006
optimizing ast 0.00001
loading collections 0.00001
instantiating plan 0.00003
optimizing plan 0.00056
executing 2.65136
finalizing 0.00003

Drop columns from a data frame but I keep getting this error below

enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
No matter how I try to code this in R, I still cannot drop my columns so that I can build my logistic regression model. I tried to run it two different ways
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[-cols,]
Error in -cols : invalid argument to unary operator
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[!cols,]
Error in !cols : invalid argument type
This may solve your problem:
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[ , !colnames(DAT_690_Attrition_Proj1EmpAttrTrain) %in% cols]
Please note that if you want to drop columns, you should put your code inside [ on the right side of the comma, not on the left side.
So [, your_code] not [your_code, ].
Here is an example of dropping columns using the code above.
cols <- c("cyl", "hp", "wt")
mtcars[, !colnames(mtcars) %in% cols]
# mpg disp drat qsec vs am gear carb
# Mazda RX4 21.0 160.0 3.90 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 160.0 3.90 17.02 0 1 4 4
# Datsun 710 22.8 108.0 3.85 18.61 1 1 4 1
# Hornet 4 Drive 21.4 258.0 3.08 19.44 1 0 3 1
# Hornet Sportabout 18.7 360.0 3.15 17.02 0 0 3 2
# Valiant 18.1 225.0 2.76 20.22 1 0 3 1
#...
Edit to Reproduce the Error
The error message you got indicates that there is a column that has only one, identical value in all rows.
To show this, let's try a logistic regression using a subset of mtcars data, which has only one, identical values in its cyl column, and then we use that column as a predictor.
mtcars_cyl4 <- mtcars |> subset(cyl == 4)
mtcars_cyl4
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars_cyl4, family = "binomial")
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Now, compare it with the same logistic regression by using full mtcars data, which have various values in cyl column.
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars, family = "binomial")
# Call: glm(formula = am ~ as.factor(cyl) + mpg + disp, family = "binomial",
# data = mtcars)
#
# Coefficients:
# (Intercept) as.factor(cyl)6 as.factor(cyl)8 mpg disp
# -5.08552 2.40868 6.41638 0.37957 -0.02864
#
# Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
# Null Deviance: 43.23
# Residual Deviance: 25.28 AIC: 35.28
It is likely that, even though you have drop three columns that have one,identical values in all the respective rows, there is another column in Trainingmodel1 that has one identical values. The identical values in the column were probably resulted during filtering the data frame and splitting data into training and test groups. Better to have a check by using summary(Trainingmodel1).
Further edit
I have checked the summary(Trainingmodel1) result, and it becomes clear that EmployeeNumber has one identical value (called "level" for a factor) in all rows. To run your regression properly, either you drop it from your model, or if EmployeeNumber has another level and you want to include it in your model, you should make sure that it contains at least two levels in the training data. It is possible to achieve that during splitting by repeating the random sampling until the randomly selected EmployeeNumber samples contain at least two levels. This can be done by looping using for, while, or repeat. It is possible, but I don't know how proper the repeated sampling is for your study.
As for your question about subsetting more than one variable, you can use subset and conditionals. For example, you want to get a subset of mtcars that has cyl == 4 and mpg > 20 :
mtcars |> subset(cyl == 4 & mpg > 20 )
If you want a subset that has cyl == 4 or mpg > 20:
mtcars |> subset(cyl == 4 | mpg > 20 )
You can also subset by using more columns as subset criteria:
mtcars |> subset((cyl > 4 & cyl <8) | (mpg > 20 & gear > 4 ))

Alternative for PSM package

Anyone could suggest an alternative for PSM package in R for parametric survival model since this package has been removed?
psm() is a function within the rms package; can you clarify which psm package do you mean?
the PSM package is here: https://rdrr.io/cran/PSM/
You can reproduce the results of the paper with the following codes:
Zhang Z. Parametric regression modelfor survival data: Weibull
regression model as an example. Ann Transl Med 2016;4(24):484. doi:
10.21037/atm.2016.08.45
> install.packages("rms")
> library(rms)
> library(survival)
> data(lung)
> psm.lung<-psm(Surv(time, status)~ph.ecog+sex*age+
+ ph.karno+pat.karno+meal.cal+
+ wt.loss,lung, dist='weibull')
> anova(psm.lung)
Wald Statistics Response: Surv(time, status)
Factor Chi-Square d.f. P
ph.ecog 13.86 1 0.0002
sex (Factor+Higher Order Factors) 10.24 2 0.0060
All Interactions 3.22 1 0.0728
age (Factor+Higher Order Factors) 3.75 2 0.1532
All Interactions 3.22 1 0.0728
ph.karno 5.86 1 0.0155
pat.karno 3.54 1 0.0601
meal.cal 0.00 1 0.9439
wt.loss 3.85 1 0.0498
sex * age (Factor+Higher Order Factors) 3.22 1 0.0728
TOTAL 33.18 8 0.0001

An Algorithm Comparing Peaks: are they in phase or not?

I am developing an algorithm for comparing two lists of numbers. The lists represent peaks discovering in a signal using a robust peak detection method. I wish to come up with some way of determining whether the peaks are either in phase, out of phase, or neither (could not be determined). For example:
These arrays would be considered in phase:
[ 94 185 278 373 469], [ 89 180 277 369 466]
But these arrays would be out of phase:
[51 146 242 349], [99 200 304 401]
There is no requirement that the arrays must be the same length. I have looked into measuring periodicity, however in this case I can assume the signal is already periodic.
Another idea I had was to divide all the array elements by their index (or their index+1) to see if they cluster around one or two points, but this is not robust and fails if a single peak is missing.
What approaches might be useful in solving this problem?
One approach would be to find the median distance from each peak in the first list to a peak in the second list.
If you divide this distance by the median distance between peaks in the first list, you will get a fraction where 0 means in phase, and 0.5 means out of phase.
For example:
[ 94 185 278 373 469], [ 89 180 277 369 466]
94->89 = 5
185->180 = 5
278->277 = 1
373->369 = 4
469->466 = 5
Score = median(5,5,1,4,5) / median distance between peaks
= 5 / 96 = 5.2% => in phase
[51 146 242 349], [99 200 304 401]
51->99 = 48
146->99 = 47
242->200 = 42
349->304 = 45
score = median(48,47,42,45) / median distance between peaks
= 46 / 95.5
= 48% => out of phase
I would enter the peak locations, using them as index locations, into a much larger array (best if the length of the array is close to an integer multiple of the periodicity distance of your peaks), and then do either a complex Goertzel filter (if you know the frequency), or do a DFT or FFT (if you don't know the frequency) of the array. Then use atan2() on the complex result (at the peak magnitude frequency for the FFT) to measure phase relative to the array starts. Then compare unwrapped phases using some difference threshold.

Breadth-first Search Exercise - AI

I'm new student of AI and I'm trying to do some exercises before I start programming to understand the logic. However, I'm having a hard time doing the exercises, I want to know if someone can help me with this one (any explanation, where I can find material which can helps are welcome):
Consider Deep Blue can evaluate 200 million positions a
second. Assume at each move, a pawn can go to 2 possible
positions, a rook 14, a knight 8, a bishop 14, a queen 28,
and a king 8. Each side has 8 pawns, 2 rooks, 2 knights, 2
bishops, a queen and a king. Under standard regulations,
each side makes 40 moves within the first 2 hours (or 3
minutes a move on the average)
a) Using the breadth-first search algorithm, how many
levels can Deep Blue evaluate (visit) before each move
(in 3 minutes)?
b) To examine 20 levels in 3 minutes, how many positions Deep Blue needs to evaluate (visit) in a second?
I really appreciate any help. Thanks a lot guys.
Basically, you multiply the number of pieces with their individual potential mobility to get the theoretical branching factor for one side. That is, the number of possible moves at each search level.
Then you raise that figure to the power of the search depth to get the number of total positions to evaluate.
So if for the first search ply (half-move), the branching factor is N, then for a two-ply search the total number of positions is N*N, for three it's N*N*N, and so on.
I'll leave the rest up to you :)
`I don't know if I'm right, but this was my answer for question b):
p = 2 x 8 = 16
r = 14 x 2 = 28
k = 8 x 2 = 16
b = 14 x 2 = 28
q = 28 x 1 = 28
k = 8 x 1 = 8
Total = 124 x 2 = 248 x 20 = 4960 position p/ level
1 min = 60 x 3 = 180 seconds
4960/180 = 25.7~ => 28 per seconds`

Resources