Extendible hashing - cmu 15-445/645 - database

I'm trying to solve Question 3 c), originating from here: https://15445.courses.cs.cmu.edu/fall2021/files/hw2-clean.pdf, solutions are available here: https://15445.courses.cs.cmu.edu/fall2021/files/hw2-sols.pdf (so this is no homework ...)
Link to image of question (can not embed images due to low reputation)
Extendible hashing question
Starting from the table in the image linked above, delete keys 10,12,7,24 & 8.
The global depth seems to start at 3 as there are 3 bits used.
There are 2 questions:
Which deletion causes the first reduction in local depth? That's the deletion of 7, I understand that one.
Which deletion causes the first reduction in global depth? The answer is supposed to be "24", but for the life of me, I don't see it. My answer would be "None of the above". Can somebody shed some light on this please?
edit: I think I understand it, after deletion of 24, the global depth can decrease to 2. All indices ending with the same 2 bits point to the same bucket, so they can be merged. It was not clear to me, because that rule is not described in the assignment, I guess it's a general rule the assignment creator assumed.

Related

Find the Rotation Count in Rotated Sorted array

I got the following question:
Consider an array of distinct numbers sorted in increasing order. The array has been rotated (clockwise) k number of times. Given such an array, find the value of k.
I understand it's a well-known question which has been asked (and that there are some answers in the web for that), but my question is not about the solution to this problem.
I had this problem in a test, and I said that a solution to this problem can be: to find the first number that is smaller than the predecessors in the array. The way I chose to do it is to do increasing jumps with the power of 2 from the start of the array. I will demonstrate:
assume k=29
then I will check 2,4,8,16,32. the value in place 32 is smaller but not the first. then I will jump from place 17 (jumping described above): 17, 19, 23, 31. place 31 is smaller and but not the first.
next stage check: 24,26,30. this time the value in place 30 is smaller and first so return 30-1.
(I can write this as algorithm, but I think that the demonstrate clear the thinking)
My thinking is that this will work recursively and will certainly give me the answer.
My lecturer said to me that I was wrong.
So my question is do I really wrong?
secondly, I struggled to find the running time of the algorithm. I know the first run is a most O(logk) and the other runs (on the sub-arrays) are less, but what is the sum of then in the worst case. I tell you the truth I thought at first it will still be o(logk), but now I don't really sure about it.
thank you.

How to account for move order in chess board evaluation

I am programming a Chess AI using an alpha-beta pruning algorithm that works at fixed depth. I was quite surprised to see that by setting the AI to a higher depth, it played even worse. But I think I figured it why so.
It currently works that way : All positions are listed, and for each of them, every other positions from that move is listed and so on... Until the fixed depth is reached : the board is evaluated by checking what pieces are present and by setting a value for every piece types. Then, the value bubbles up to the root using the minimax algorithm with alpha-beta.
But I need to account for the move order. For instance, there is two options, a checkmate in 2 moves, and another in 7 moves, then the first one has to be chosen. The same thing goes to taking a queen in whether 3 or 6 moves.
But since I only evaluate the board at the deepest nodes and that I only check the board as the evaluation result, it doesn't know what was the previous moves were.
I'm sure there is a better way to evaluate the game that can account for the way the pieces moved through the search.
EDIT: I figured out why it was playing weird. When I searched for moves (depth 5), it ended with a AI move (a MAX node level). By doing so, it counted moves such as taking a knight with a rook, even if it made the latter vulnerable (the algorithm cannot see it because it doesn't search deeper than that).
So I changed that and I set depth to 6, so it ends with a MIN node level.
Its moves now make more sense as it actually takes revenge when attacked (what it sometimes didn't do and instead played a dumb move).
However, it is now more defensive than ever and does not play : it moves its knight, then moves it back to the place it was before, and therefore, it ends up losing.
My evaluation is very standard, only the presence of pieces matters to the node value so it is free to pick the strategy it wants without forcing it to do stuff it doesn't need to.
Consedering that, is that a normal behaviour for my algorithm ? Is it a sign that my alpha-beta algorithm is badly implemented or is it perfectly normal with such an evaluation function ?
If you want to select the shortest path to a win, you probably also want to select the longest path to a loss. If you were to try to account for this in the evaluation function, you would have to the path length along with the score and have separate evaluation functions for min and max. It's a lot of complex and confusing overhead.
The standard way to solve this problem is with an iterative deepening approach to the evaluation. First you search deep enough for 1 move for all players, then you run the entire search again searching 2 moves for each player, etc until you run out of time. If you find a win in 2 moves, you stop searching and you'll never run into the 7 moves situation. This also solves your problem of searching odd depths and getting strange evaluations. It has many other benefits, like always having a move ready to go when you run out of time, and some significant algorithmic improvements because you won't need the overhead of tracking visited states.
As for the defensive play, that is a little bit of the horizon effect and a little bit of the evaluation function. If you have a perfect evaluation function, the algorithm only needs to see one move deep. If it's not perfect (and it's not), then you'll need to get much deeper into search. Last I checked, algorithms that can run on your laptop and see about 8 plys deep (a ply is 1 move for each player) can compete with strong humans.
In order to let the program choose the shortest checkmate, the standard approach is to give a higher value to mates that occur closer to the root. Of course, you must detect checkmates, and give them some score.
Also, from what you describe, you need a quiescence search.
All of this (and much more) is explained in the chess programming wiki. You should check it out:
https://chessprogramming.wikispaces.com/Checkmate#MateScore
https://chessprogramming.wikispaces.com/Quiescence+Search

Cache Oblivious Search

Please forgive this stupid question, but I didn't find any hint by googling it.
If I have an array (contiguous memory), and I search sequentially for a given pattern (for example build the list of all even numbers), am I using a cache-oblivious algorithm? Yes it's quite stupid as an algorithm, but I'm trying to understand here :)
Yes, you are using a cache-oblivious algorithm since your running time is O(N/B) - i.e. # of disk transfers, which is dependent on the block size, but your algorithm doesn't depend on a particular value of the block size. Additionally, this means that you are both cache-oblivious as well as cache-efficient.

How does this sort function work?

As part of my job, I'm occasionally called upon to evaluate candidates for programming positions. A code snippet recently passed across my desk and my first thoughts were that I wasn't sure code like this would even compile any more. But compile it does, and it works as well.
Can anyone explain why and how this works? The mandate was to provide a function to sort five integer values.
void order5(arr) int *arr; {
int i,*a,*b,*c,*d,*e;
a=arr,b=arr+1,c=arr+2,d=arr+3,e=arr+4;
L1: if(*a >*b){*a^=*b;*b^=*a;*a^=*b;}
L2: if(*b >*c){*b^=*c;*c^=*b;*b^=*c;goto L1;}
L3: if(*c >*d){*c^=*d;*d^=*c;*c^=*d;goto L2;}
if(*d >*e){*d^=*e;*e^=*d;*d^=*e;goto L3;}
}
Now I can see the disadvantages of this approach (lack of readability and maintainability for anyone born after 1970) but can anyone think of any advantages? I'm hesitant to dismiss it out of hand but, before we decide whether or not to bring this person back in for round 2, I'd like to know if it has any redeeming features beyond job security for the author.
It's a fully unrolled bubble sort with the XOR-swap trick expressed inline. I compiled it with several different options hoping it produced some awesome compact code, but it's really not that impressive. I threw in some __restrict__ keywords so that the compiler would know that none of the *a could alias each other, which does help quite a bit. Overall though, I think the attempted cleverness has gone so far outside the norm that the compiler is really not optimizing the code very well at all.
I think the only advantage here is novelty. It certainly caught your eye! I would have been more impressed with abuses of more modern technology, like sorting with MMX/SSE or the GPU, or using 5 threads which all fight it out to try to insert their elements into the right place. Or perhaps an external merge sort, just in case the 5 element array can't fit in core.
The xor trick just swaps two integers. The goto's are the imitation of the loop. Advantages? None at all except for showing off how obfuscated a code you can write. The parameter after function () is a deprecated feature. And having an array on hand and havong 5 distinct pointers pointing at each elem of the array is just horrible. To sum it up: Yuck! :)
It's a screwy implementation of Gnome sort for five items.
Here is how a
garden gnome sorts a line of flower
pots. Basically, he looks at the
flower pot next to him and the
previous one; if they are in the right
order he steps one pot forward,
otherwise he swaps them and steps one
pot backwards. Boundary conditions: if
there is no previous pot, he steps
forwards; if there is no pot next to
him, he is done.
The "stepping one pot forward" is done by falling through to the next if. The goto immediately after each XOR-swap does the "stepping one pot backwards."
You can't dismiss someone out of hand for an answer like this. It might have been provided tongue-in-cheek.
The question is highly artificial, prompting contrived answers.
You need to find out how the candidate would solve more real-world problems.
lack of readability and maintainability for anyone born after 1970
Are people born before 1970 better at maintaining unreadable code then? If so, that's good because I was and it can only be a selling point.
before we decide whether or not to bring this person back in for round 2, I'd like to know if it has any redeeming features beyond job security for the author.
The code has no one redeeming features. It bizarrely uses the xor swap technique whose only potential redeeming feature would be saving oner integer's worth of stack space. However, even that is negated by the five pointers defined and the unused int. It also has a gratuitous use of the comma operator.
Normally, I'd also say "goto, yuck", but in this case, it has been used in quite an elegant way, once you understand the sort algorithm used. In fact, you could argue that it makes the gnome sort algorithm clearer than using an index variable (except it cannot be generalised to n elements). So there you have the redeeming feature, it makes goto look good :)
As for "do you bring the candidate back for the second interview". If the code fragment was accompanied by a detailed comment explaining how the algorithm worked and the writer's motivation for using it, I'd say definitely yes. If not, I'd probably ring him up and ask those questions.
NB, the code fragment uses K&R style parameter declarations. This means the author probably hasn't programmed in C for 10 to 15 years or he copied it off the Internet.

Alternate FizzBuzz Questions [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Anybody have any good FizzBuzz type questions that are not the FizzBuzz problem?
I am interviewing someone and FB is relatively well known and not that hard to memorize, so my first stop in a search for ideas is my new addiction SO.
I've seen a small list of relatively simple programming problems used to weed out candidates, just like FizzBuzz. Here are some of the problems I've seen, in order of increasing difficulty:
Reverse a string
Reverse a sentence ("bob likes dogs" -> "dogs likes bob")
Find the minimum value in a list
Find the maximum value in a list
Calculate a remainder (given a numerator and denominator)
Return distinct values from a list including duplicates (i.e. "1 3 5 3 7 3 1 1 5" -> "1 3 5 7")
Return distinct values and their counts (i.e. the list above becomes "1(3) 3(3) 5(2) 7(1)")
Given a string of expressions (only variables, +, and -) and a set of variable/value pairs (i.e. a=1, b=7, c=3, d=14) return the result of the expression ("a + b+c -d" would be -3).
These were for Java, and you could use the standard libraries so some of them can be extremely easy (like 6). But they work like FizzBuzz. If you have a clue about programming you should be able to do most pretty quickly. Even if you don't know the language well you should at least be able to give the idea behind how to do something.
Using this test one of my previous bosses saw everything from people who aced it all pretty quick, to people who could do most pretty quick, to one guy who couldn't answer a single one after a half hour.
I should also note: he let people use his computer while they were given these tasks. They were specifically instructed that they could use Google and the like.
Perhaps this does not answer your question directly, but I am not certain you need to come up with another problem. Besides being "easy to memorize", the FizzBuzz question is just plain "easy", and that is the point. If the person you are interviewing is in the class of people to which FizzBuzz is "well-known", then they are in the class of people that a FizzBuzz-type question would not filter out. That does not mean that you hire them on the spot, but it does mean that they should be able to breeze through it and get on to the meat of the interview.
To put it another way, anybody who takes the time to read Coding Horror is worth interviewing further. Just have them write out the solution really quickly, discuss it briefly (e.g., How do you test this?), and then move on to the next question. And as the article says, "it is genuinely astonishing how many candidates are incapable of the simplest programming tasks."
Any of the early ones from Project Euler would probably be good.
For example:
Problem 25
The Fibonacci sequence is defined by the recurrence relation:
Fn = Fn−1 + Fn−2, where F1 = 1 and F2 = 1.
Hence the first 12 terms will be:
F1 = 1
F2 = 1
F3 = 2
F4 = 3
F5 = 5
F6 = 8
F7 = 13
F8 = 21
F9 = 34
F10 = 55
F11 = 89
F12 = 144
The 12th term, F12, is the first term to contain three digits.
What is the index of the first term in the Fibonacci sequence to
contain 1000 digits?
I've found checking a string if it is a palindrome is a pretty simple one that can be a decent weeder.
I wanted a FizzBuzz question that doesn't involve the modulo operator. Especially since I'm typically interviewing web developers for whom the modulo operator just doesn't come up that often. And if it's not something you run into regularly, it's one of those things you look up the few times you need it.
(Granted, it's a concept that, ideally, you should have encountered in a math course somewhere along the way, but that's a different topic.)
So, what I came up with is what I call, unimaginatively, Threes in Reverse. The instruction is:
Write a program that prints out, in reverse order, every multiple of 3 between 1 and 200.
Doing it in normal order it easy: multiply the loop index by 3 until you reach a number that exceeds 200, then quit. You don't have to worry about how many iterations to terminate after, you just keep going until you reach the first value that's too high.
But going backwards, you have to know where to start. Some might realize intuitively that 198 (3 * 66) is the highest multiple of 3, and as such, hard-code 66 into the loop. Others might use a mathematical operation (integer division or a floor() on a floating point division of 200 and 3) to figure out that number, and in doing so, provide something more generically applicable.
Essentially, it's the same sort of problem as FizzBuzz (looping through values and printing them out, with a twist). This one is a problem to solve that doesn't use anything quite as (relatively) esoteric as the modulo operation.
For something really super-simple that can be done in 10 seconds, but would remove those people who literally can't program anything, try this one:
Ask: show me (on paper, but better on
a whiteboard) how you would swap the
values of two variables.
This wasn't my idea, but was posted in a comment by someone named Jacob on a blog post all about the original FizzBuzz question.
Jacob goes on to say:
If they don’t start with creating a
third variable, you can pretty much
write that person off. I’ve found that
I can cut a third to half my
(admittedly at that point unscreened)
applicants with that question alone.
There is a further interesting discussion after that comment on the original blog post about ways to perform this variable swapping without requiring a third variable (adding/subtracting, xor etc.), and of course, if you're using a language that supports this in a single statement/operation, it may not be such a good test.
Although not my idea, I wanted to post this here as it's such an elegantly simple, easy question that can (and should) be answered within about 10 seconds by someone who has written even the simplest of programs. It also does not require the use of somewhat apparently obscure operators like the modulo operator, which lots of people, who are otherwise fairly decent programmers, are simply not familiar with (which I know from my own experience).
Fibonacci, reverse a string, count number of bits set in a byte are other common ones.
Project Euler also has a large collection of increasing difficulty.
Ask them to write an app to return the factors of a given number. It's easy to do and hard to do well in a short period of time. You can see their style and the way they think through problems in a small amount of time.
Return the index of the first
occurrence of string X within string Y
Implementing strstr() requires a basic understanding of the language while providing the opportunity for clever optimization.
If it is a C/C++ interview make sure the person knows about pointers.
General - simple algorithm ([single/double]linked list). Ask about complexity of adding in each case (at the begining, at the end, optimizations ...) ?
(General) How do you find min and max from an array (N size) with just 3*N/2 comparisons?
C/C++: How would you optimize multiple "strcat"s to a buffer ?
Check out 6.14 from the C++ FAQ Lite:
http://www.parashift.com/c++-faq-lite/big-picture.html
Find a list of primes is a fairly common question but it still requires some thought and there are varying degrees of answers people might give.
You would also be surprised how many people struggle to implement a Map/Dictionary type data-structure.
I have asked my candidates to create a program to calculate factorial of a given number in any pseudo language of their choice. It is a fairly easy problem to solve and it lends itself well to the natural followup quistions (that could often be asked) about recursion.
How about:
I want to use a single integer to store multiple values. Describe how that would work.
If they don't have a clue about bit masks and operations, they probably can't solve other problems.

Resources