My question is about creating a table with q and using foreign keys. I know how to do it the following way
q)T1:([id:1 2 3 4 5]d1:"acbde")
q)T2:([id:1 2 3 4 5]f1:`T1$2 2 2 4 4)
But now lets say I want to create the table with the ! operator flipping a dictionary this way
q)T3:1!flip ((`id`f1 )!((1 2 3 4 5);(2 2 2 4 4)))
How can I set a foreign key to table T1s primary key with this way of creating a table.
Update
Well, I thought my upper example would be sufficient for myself to solve my actual problem, but unfortunately its not.
What if I have this lists of lists layout A and B
q)A:enlist 1 2 3 4 5
q)B:(enlist "abcde"), (enlist `v`w`x`y`z)
q)flip (`id`v1`v2)!(B,A)
How can I make the list A as foreign key to table T1?
Update 2
And how would I implement it if I have A coming from somewhere, not initializing it myself. Do I have to make a copy from the list?
You can use the same syntax on the column values list:
q)T3:1!flip ((`id`f1 )!((1 2 3 4 5);(`T1$2 2 2 4 4)))
q)T3~T2
1b
Update:
Again for this case we can use the same syntax on the list -
q)A:enlist`T1$1 2 3 4 5
q)meta flip (`id`v1`v2)!(B,A)
c | t f a
--| ------
id| c
v1| s
v2| j T1
Update2:
Same syntax applied to the variable name:
q)A:1 2 3 4 5
q)meta flip (`id`v1`v2)!(B,enlist`T1$A)
c | t f a
--| ------
id| c
v1| s
v2| j T1
Related
I have a vector of nondecreasing data. Here is a sample:
1
1
1
2
2
2
2
2
2
2
2
3
3
4
4
6
Clearly there are duplicates and missing numbers. I can remove the duplicates using unique, so my unique values are:
uniqueVals = unique(sortedData);
So far, so good. Now, I want to change the data so that the values in sortedData are replaced with their index number in uniqueVals. For instance, uniqueVals first 5 elements would be 1,2,3,4,6, with indices 1,2,3,4,5. I want to change sortedData so that 1 maps to 1, 2 maps to 2, 3 to 3, 4 to 4, 6 to 5 and so on.
I know I can create a "map" object, but that seems to just be used to map uniqueVals to its index. How do I apply that mapping so that the entries in sortedData are changed?
I have no need for this to be a particularly fast operation. sortedData contains only a few hundred thousand rows and it only needs to be done once.
You can use the third output from unique
[uniqueVals,~,yourOutput] = unique(sortedData);
yourOutput =
1
1
1
2
2
2
2
2
2
2
2
3
3
4
4
5
You can also use g = findgroups(sortedData);, which will give you the group index, where there is one group per unique value. The 2nd output of this tells you the value itself
[g, gValue] = findgroups( sortedData );
I have been working with country-level survey data in Stata that I needed to reshape. I ended up exporting the .dta to a .csv and making a pivot table in in Excel but I am curious to know how to do this in Stata, as I couldn't figure it out.
Suppose we have the following data:
country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
I would like the data to be reformatted as such:
country sum_1 sum_2
A 4 4
B 3 2
First I tried a simple reshape wide command but got the error that "values of variable response not unique within country" before realizing reshape without additional steps wouldn't work anyway.
Then I tried generating new variables conditional on the value of response and trying to use reshape following that... the whole thing turned into kind of a mess so I just used Excel.
Just curious if there is a more intuitive way of doing that transformation.
If you just want a table, then just ask for one:
clear
input str1 country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
end
tabulate country response
| response
country | 1 2 | Total
-----------+----------------------+----------
A | 4 4 | 8
B | 3 2 | 5
-----------+----------------------+----------
Total | 7 6 | 13
If you want the data to be changed to this, reshape is part of the answer, but you should contract first. collapse is in several ways more versatile, but your "sum" is really a count or frequency, so contract is more direct.
contract country response, freq(sum_)
reshape wide sum_, i(country) j(response)
list
+-------------------------+
| country sum_1 sum_2 |
|-------------------------|
1. | A 4 4 |
2. | B 3 2 |
+-------------------------+
In Stata 16 up, help frames introduces frames as a way to work with multiple datasets in the same session.
I am studying data structures right now and in specific Hash Tables. I came across the follow question:
Imagine that we have placed the following keys
in an initial empty hash table with a length of 7
with linear probing, using the following table of hash-values:
key: A B C D E F G
hash: 3 1 4 1 5 2 5
Which of the following arrays could be the linear-probing array?
1.
0 1 2 3 4 5 6
G B D F A C E
2.
0 1 2 3 4 5 6
B G D F A C E
3.
0 1 2 3 4 5 6
E G F A B C D
When I create the linear-probing array I get this:
0 1 2 3 4 5 6
G B D A C E F
Could somebody please tell me why I am wrong and whats the right answer?
Notice how the question doesn't specify the order in which the keys are inserted, so your answer is only correct assuming that the keys are actually inserted in the order A-B-C-D-E-F-G, but since the question doesn't explicitly state the order, you need to dig deeper.
What you do know, however, is that one of those keys will be inserted first and it will go to its designated slot as shown in the Key-to-Hash diagram, since the hash table is initially empty. This immediately discards option choice 2 because none of the keys are in their designated array entry, leaving you with choice 1 and 3.
For table 1, B is in slot 1, which corresponds to its hash value and for table 3, keys F and A are in their initial hash-value spots.
It's simple to prove that no sequence of key inserts on table 3 after inserting F and A will yield table 3 as a result. And its likewise easy to prove that the sequence of key inserts B-D-F-A-C-E-G will result in table 1.
Although this is a question based on hash tables, I honestly don't consider it a good way to assess your knowledge on linear probing, this is more of a puzzle, as #gnasher729 mentioned.
I can't seem to find something quite like this problem...
I have an array table where each row contains a random assortment of numbers 1-N
On another sheet, I have a table with column and row headers numbered 1-N
I want to count how many rows in the array contain both the column and row headers for a given cell in the table. Since countifs only reference the current cell in the specified array, they don't seem to be working in this scenario.
Example array:
A B C D
1 3 5 7
1 2 3 4
2 3 4 5
2 4 6 8
...
Table results (symmetrical about the diagonal):
A B C D E F
. 1 2 3 4 5 ...
1 - 1 2 1 1
2 1 - 2 2 1
3 2 2 - 2 2
4 1 2 2 - 1
5 1 1 2 1 -
Would using nested countifs work?
I don't agree with your results corresponding to 4/2, which surely should be 3, not 2, but this formula, based on the array table being in Sheet1 A1:D4 and the results table being in Sheet2 A1:F6, placed in cell B2 of the latter, should work:
=IF($A2=B$1,"-",SUMPRODUCT(N(MMULT(N(COUNTIF(OFFSET(Sheet1!$A$1:$D$1,ROW(Sheet1!$A$1:$D$4)-MIN(ROW(Sheet1!$A$1:$D$4)),),CHOOSE({1,2},B$1,$A2))>0),{1;1})=2)))
Copy across and down as required.
Note: If your actual table is in fact much larger than that given, it will probably be worth adding a simple clause into the above to the effect that the results for approximately half of the cells are obtained from their symmetrical counterparts, rather than via calculation of this construction, thus saving resource.
Regards
I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.
As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:
cust date store item_type
a 1 Main St 1
a 1 Main St 2
a 1 Main St 2
a 1 Main St 2
b 1 Main St 1
b 1 Main St 2
b 1 Main St 2
c 1 Main St 1
d 2 Elm St 1
d 2 Elm St 3
e 2 Main St 1
e 2 Main St 1
a 3 Main St 1
a 3 Main St 2
I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:
Index cust date store n_items item1 item2 item3 item4
1 a 1 Main St 4 1 3 0 0
2 b 1 Main St 3 1 2 0 0
3 c 1 Main St 1 1 0 0 0
4 d 2 Elm St 2 1 0 1 0
5 e 2 Main St 2 2 0 0 0
6 a 3 Main St 2 1 1 0 0
I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).
Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.
Sadly, my SQLite kung-fu is weak.
I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.
When I tried a more basic approach to begin by generating a unique index:
CREATE UNIQUE INDEX ui ON t(cust, date);
I get:
Error: indexed columns are not unique
I would greatly appreciate any help with where to start. Many thanks in advance!
To create one result record for each unique combination of column values, use GROUP BY.
The number of records in the group is available with COUNT.
To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:
SELECT cust,
date,
store,
COUNT(*) AS n_items,
SUM(item_type = 1) AS item1,
SUM(item_type = 2) AS item2,
SUM(item_type = 3) AS item3,
SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
date,
store