Snowflake Row count "G" meaning - snowflake-cloud-data-platform

Using snowflake for the first time, and I noticed that row counts in the billions (288,010,550,524) is represented as 288.0G (for the TPCDS_SF100TCL SNOWFLAKE_SAMPLE_DATA set).
What exactly does "G" mean?
I've seen smaller row counts represented in K (thousands), M (Millions), but not B (Billions).
Sounds like a simple and silly question, but I've never seen row counts represented with that quantifier and wanted to know what it was.
It's also not documented in the snowflake documentation.
Thanks!

The prefixes/factors are:
factor name
k - kilo 10^3 thousand
M - mega 10^6 million
G - giga 10^9 billion
Related: Metric prefix/Giga
SNOWFLAKE_SAMPLE_DATA -> Tables -> STORE_SALES -> Rows
<div data-qtip="<b>Row Count: 288.0G (288,010,550,524)</b>">288.0G</div>
So 288.0G means 288.0 * 10^9 rows.

Related

Constraint Programming CP Optimizer - How to model a cumulative capacity constraint

I am new to the CP Optimizer (from ILOG), and am trying to find the right keywords to get me on a track to modelling cumulative capacity constraints.
For example, I have two tasks:
A, which lasts for 6 days and contributes 5 units per day to a cumulative capacity constraint
B, which lasts for 3 days and contributes 10 units per day to a cumulative capacity constraint
I would like to put in a cumulative capacity constraint over an interval. For example, a constraint from days 0,...,4 with a maximum cumulative value of 50.
Scheduling A and B to both start at day 0 would yield a cumulative contribution of:
A: 5 + 5 + 5 + 5 + 5 = 25 (with the remaining 5 outside of this interval not hitting this constraint)
B: 10 + 10 + 10 = 30
This would exceed the cumulative capacity of 50 over that timespan.
However, if I move the intervals such that A starts on day 1 (rather than 0) and B starts on day 0, then
A: 5 + 5 + 5 + 5 = 20 (with the remaining 5+5 outside of this interval not hitting this constraint)
B: 10 + 10 + 10 = 30
This schedule would satisfy this constraint.
Can someone please help me find the right functions / words to solve this? I realize this could be trivial for experts in CP. I don't necessarily need full code, just pointing me in the right direction would be extremely helpful!
I have tried to use StepAtEnd for A and B, however, this lets task A completely slide under the capacity of 50 units. StepAtStart will push A or B outside of the constraint window. I understand that an alternative approach would be to put a constraint on 10 units per day, but I am intentionally trying to model this as a cumulative constraint (in the bigger picture, letting daily constraints flex above 10, but ultimately staying under 50 cumulatively over a specific window).
For computing the integral of a cumul function over the period T, you can do as follows.
Let f be the cumul function (in your case pulse(A,6) + pulse(B,10)).
Let the period T = [t0, t0+k).
Let z_i be a fixed interval of size 1, starting at t0+i, for i in 0..k-1.
(the z_i's cover T)
Let g be a new cumul function
defined by: g=sum(pulse(z_i, 0, MAX))
and constrained by: g+f==MAX.
Then, F, the integral of f over the period T is:
F = T*MAX - Sum(heightAtStart(z_i, g))
And for your problem, you just need to constrain F to be less than or equal to 50.
In scheduling, the concept of a cumulative constraint is usually reserved for resources that have a maximum usage at each time index, as you talk about in the final sentence.
What you are talking about seems to be more of a sliding sum constraint, which limits the sum of variables in any contiguous subsequence of a certain length.
If you want to reason about the daily contributions, you can model each task as a chain of consecutive intervals, each a single day long, and each contributing at the begin/end of the day. E.g. using the Python API:
from docplex.cp.model import CpoModel
m = CpoModel()
cumul_contribs = []
# Task A
task_a = m.interval_var_list(6, start=(0,100),end=(0,100), size=1, name='a')
for (ivl1,ivl2) in zip(task_a, task_a[1:]):
m.add(m.end_at_start(ivl1, ivl2))
for ivl in task_a:
cumul_contribs.append(m.step_at_start(ivl, 5))
# Task B
task_b = m.interval_var_list(3, start=(0,100),end=(0,100), size=1, name='b')
for (ivl1,ivl2) in zip(task_b, task_b[1:]):
m.add(m.end_at_start(ivl1, ivl2))
for ivl in task_b:
cumul_contribs.append(m.step_at_start(ivl, 10))
# Constrain the cumulative resource
m.add(m.always_in(sum(cumul_contribs), (0,5), 0, 50))
msol = m.solve()
msol.write()

Resizing a hash table: Using prime number vs Power of 2 (for array size)

I've looked into a lot of questions on this site about whether one should use prime number or power of 2 for mod. Thankfully, I understood the idea
However, my situation is different. I couldn't find any question that dealt with the right array size when resizing hash tables.
I am implementing a hash table that may need to grow and shrink as the number of stored keys varies. I have a hashcode function that uniformly hashes keys to positive 32-bit integers. The table itself will use a smaller array of approximate size M. So which of the following is best choice for the hash function that takes the hashcode to produce a value between 0 and P-1 where P is close to M?
a) Modulo P where P is a prime closest to M
b) Modulo P where P is the power of 2 closest to M
c) Either
d) Neither
I've been trying to figure this out for hours, but with no luck.

Is it possible to select a number from every given intervals without repetition in selections. Solution in LINEAR TIME

I have been trying this question on hackerearth practice which requires below work done.
PROBLEM
Given an integer n which signifies a sequence of n numbers from {0,1,2,3,4,5.......n-2,n-1}
We are provided m ranges in form of (L,R) such that (0<=L<=n-1)(0<=R<=n-1)
if(L <= R) (L,R) signifies numbers {L,L+1,L+2,L+3.......R-1,R} from above sequence
else (L,R) signifies numbers {R,R+1,R+2,.......n-2,n-1} & {0,1,2,3....L-1,L} ie numbers wrap around
example
n = 5 ie {0,1,2,3,4}
(0,3) signifies {0,1,2,3}
(3,0) signifies {3,4,0}
(3,2) signifies {3,4,0,1,2}
Now we have to select ONE (only one) number from each range without repeating any selection. We have to tell is it possible to select one number from each(and every) range without repetition.
Example test case
n = 5// numbers {0,1,2,3,4}
// ranges m in number //
0 0 ie {0}
1 2 ie {1,2}
2 3 ie {2,3}
4 4 ie {4}
4 0 ie {4,0}
Answer is "NO" it's not possible.
Because we cannot select any number from range 4 0 because if we select 4 from it we could not be able to select from range 4 4 and if select 0 from it we would not be able to select from 0 0
My approaches -
1) it can be done in O(N*M) using recurrsion checking all possibilitie of selection from each range and side by side using hash map to record our selections.
2) I was trying it in order n or m ie linear order .Problem lack editorial explanation .Only a code is mentioned in the editorial without comments and explanation . I m not able to get the codelinear solution code by someone which passes all test cases and got accepted.
I am not able to understand the logic/algo used in the code and why is it working?
Please suggest ANY linear method and logic behind it because problem has these constraints
1 <= N<= 10^9
1 <= M <= 10^5
0 <= L, R < N
which demands a linear or nlogn solution as i guess??
The code in the editorial can also be seen here http://ideone.com/5Xb6xw
Warning --After looking The code I found the code is using n and m interchangebly So i would like to mention the input format for the problem.
INPUT FORMAT
The first line contains test cases, tc, followed by two integers N,M- the first one depicting the number of countries on the globe, the second one depicting the number of ranges his girlfriend has given him. After which, the next M lines will have two integers describing the range, X and Y. If (X <= Y), then range covers countries [X,X+1... Y] else range covers [X,X+1,.... N-1,0,1..., Y].
Output Format
Print "YES" if it is possible to do so, print "NO", if it is not.
There are two components to the editorial solution.
Linear-time reduction to a problem on ordinary intervals
Assume to avoid trivial cases that the number of input intervals is less than n.
The first is to reduce the problem to one where the intervals don't wrap around as follows. Given an interval [L, R], if L ≤ R, then emit two intervals [L, R] and [L + n, R + n]; if L > R, emit [L, R + n]. The easy direction of the reduction is showing that, if the original problem has a solution, then the reduced problem has a solution. For [L, R] with L ≤ R assigned a number k, assign k to [L, R] and k + n to [L + n, R + n]. For [L, R] with L > R, assign whichever of k, k + n belongs to [L, R + n]. Except for the dual assignment of k and k + n for intervals [L, R] and [L + n, R + n] respectively, each interval gets its own residue class mod n, so the assignments do not conflict.
Conversely, the hard direction of the reduction (if the original problem has no solution, then the reduced problem has no solution) is proved using Hall's marriage theorem. By Hall's criterion, an unsolvable original problem has, for some k, a set of k input intervals whose union has size less than k. We argue first that there exists such a set of input intervals whose union is a (circular) interval (which by assumption isn't all of 0..n-1). Decompose the union into the set of maximal (circular) intervals that comprise it. Each input interval is contained in exactly one of these intervals. By an averaging argument, some maximal (circular) interval contains more input intervals than its size. We finish by "lifting" this counterexample to the reduced problem. Given the maximal (circular) interval [L*, R*], we lift it to the ordinary interval [L*, R*] if L* ≤ R*, or [L*, R* + n] if L* > R*. Do likewise with the circular intervals contained in this interval. It is tedious but straightforward to show that this lifted counterexample satisfies Hall's criterion, which implies that the reduced problem has no solution.
O(m log m)-time solution for ordinary intervals
This is a sweep-line algorithm. Sort the intervals by lower endpoint and scan them in that order. We imagine that the sweep line moves from lower endpoint to lower endpoint. Maintain the set of intervals that intersect the sweep line and have not been assigned a number, sorted by upper endpoint. When the sweep line is about to move, assign the numbers between the old and new positions to the intervals in the set, preferentially to the ones whose upper endpoint is the lowest. The correctness of this strategy should be clear: the intervals that could be assigned a number but are passed over have at least as many options (in the sense of being a superset) as the intervals that are assigned, so we never make a choice that we have cause to regret.

Hello, I have a computational q. regarding combination/permutations

A brief intro. I am creating a medical software. I forget some of the computation/permutation theorems in college. Let's say I have five nerves. Median, ulnar, radial, tibial, peroneal. I can choose one, two, three, four, or all five of them in any combintation. What is the equation to find the maxmimum number of combinations I can make?
For example;
median
median + ulnar
median + ulnar + radial
etc etc
ulnar + median = median + ulnar. so those would be repetitive. Thank you for your help. I know this isn't directly programming related, but I thought you guys would be familiar.
The comment that says it is (2^n)-1 is correct. 2^n is the number of possible subsets you can form from a set of n objects (in this case you have 5 objects), and then in your case, you don't want to count the empty set, so you subtract out 1.
I'm sure you can do the math, but for the sake of completeness, for 5 nerves, there would be 2^5 - 1 = 32 - 1 = 31 possible combinations you could end up with.

Information Gain and Entropy

I recently read this question regarding information gain and entropy. I think I have a semi-decent grasp on the main idea, but I'm curious as what to do with situations such as follows:
If we have a bag of 7 coins, 1 of which is heavier than the others, and 1 of which is lighter than the others, and we know the heavier coin + the lighter coin is the same as 2 normal coins, what is the information gain associated with picking two random coins and weighing them against each other?
Our goal here is to identify the two odd coins. I've been thinking this problem over for a while, and can't frame it correctly in a decision tree, or any other way for that matter. Any help?
EDIT: I understand the formula for entropy and the formula for information gain. What I don't understand is how to frame this problem in a decision tree format.
EDIT 2: Here is where I'm at so far:
Assuming we pick two coins and they both end up weighing the same, we can assume our new chances of picking H+L come out to 1/5 * 1/4 = 1/20 , easy enough.
Assuming we pick two coins and the left side is heavier. There are three different cases where this can occur:
HM: Which gives us 1/2 chance of picking H and a 1/4 chance of picking L: 1/8
HL: 1/2 chance of picking high, 1/1 chance of picking low: 1/1
ML: 1/2 chance of picking low, 1/4 chance of picking high: 1/8
However, the odds of us picking HM are 1/7 * 5/6 which is 5/42
The odds of us picking HL are 1/7 * 1/6 which is 1/42
And the odds of us picking ML are 1/7 * 5/6 which is 5/42
If we weight the overall probabilities with these odds, we are given:
(1/8) * (5/42) + (1/1) * (1/42) + (1/8) * (5/42) = 3/56.
The same holds true for option B.
option A = 3/56
option B = 3/56
option C = 1/20
However, option C should be weighted heavier because there is a 5/7 * 4/6 chance to pick two mediums. So I'm assuming from here I weight THOSE odds.
I am pretty sure I've messed up somewhere along the way, but I think I'm on the right path!
EDIT 3: More stuff.
Assuming the scale is unbalanced, the odds are (10/11) that only one of the coins is the H or L coin, and (1/11) that both coins are H/L
Therefore we can conclude:
(10 / 11) * (1/2 * 1/5) and
(1 / 11) * (1/2)
EDIT 4: Going to go ahead and say that it is a total 4/42 increase.
You can construct a decision tree from information-gain considerations, but that's not the question you posted, which is only the compute the information gain (presumably the expected information gain;-) from one "information extraction move" -- picking two random coins and weighing them against each other. To construct the decision tree, you need to know what moves are affordable from the initial state (presumably the general rule is: you can pick two sets of N coins, N < 4, and weigh them against each other -- and that's the only kind of move, parametric over N), the expected information gain from each, and that gives you the first leg of the decision tree (the move with highest expected information gain); then you do the same process for each of the possible results of that move, and so on down.
So do you need help to compute that expected information gain for each of the three allowable values of N, only for N==1, or can you try doing it yourself? If the third possibility obtains, then that would maximize the amount of learning you get from the exercise -- which after all IS the key purpose of homework. So why don't you try, edit your answer to show you how you proceeded and what you got, and we'll be happy to confirm you got it right, or try and help correct any misunderstanding your procedure might reveal!
Edit: trying to give some hints rather than serving the OP the ready-cooked solution on a platter;-). Call the coins H (for heavy), L (for light), and M (for medium -- five of those). When you pick 2 coins at random you can get (out of 7 * 6 == 42 possibilities including order) HL, LH (one each), HM, MH, LM, ML (5 each), MM (5 * 4 == 20 cases) -- 2 plus 20 plus 20 is 42, check. In the weighting you get 3 possible results, call them A (left heavier), B (right heavier), C (equal weight). HL, HM, and ML, 11 cases, will be A; LH, MH, and LM, 11 cases, will be B; MM, 20 cases, will be C. So A and B aren't really distinguishable (which one is left, which one is right, is basically arbitrary!), so we have 22 cases where the weight will be different, 20 where they will be equal -- it's a good sign that the cases giving each results are in pretty close numbers!
So now consider how many (equiprobable) possibilities existed a priori, how many a posteriori, for each of the experiment's results. You're tasked to pick the H and L choice. If you did it at random before the experiment, what would be you chances? 1 in 7 for the random pick of the H; given that succeeds 1 in 6 for the pick of the L -- overall 1 in 42.
After the experiment, how are you doing? If C, you can rule out those two coins and you're left with a mystery H, a mystery L, and three Ms -- so if you picked at random you'd have 1 in 5 to pick H, if successful 1 in 4 to pick L, overall 1 in 20 -- your success chances have slightly more than doubled. It's trickier to see "what next" for the A (and equivalently B) cases because they're several, as listed above (and, less obviously, not equiprobable...), but obviously you won't pick the known-lighter coin for H (and viceversa) and if you pick one of the 5 unweighed coins for H (or L) only one of the weighed coins is a candidate for the other role (L or H respectively). Ignoring for simplicity the "non equiprobable" issue (which is really kind of tricky) can you compute what your chances of guessing (with a random pick not inconsistent with the experiment's result) would be...?

Resources