Reshape Data from Long to Wide without identifier in Stata - database

How can I reshape the dataset below from long to wide in Stata?
a1 a2
NAME Jane
SEX female
PHONE 234
SCORE 9
NAME John
SEX male
PHONE 444
SCORE 10
NAME Baba
SEX male
PHONE 777
SCORE 5
I've tried using gen i = tag(a1) to generate an id. However, this does not uniquely identify each set of repeating data.

You're correct that you need an identifier, indeed two identifiers. But gen i = tag(a1) is illegal -- presumably you mean egen -- and more to the point is not going to help. egen, tag() depends on identifiers existing already and serves only to create a (0, 1) variable, not what you need here.
This works for me. Please note the use of Stata code to create a data example: there is much more on this at the Stata tag wiki.
clear
input str6 (a1 a2)
NAME Jane
SEX female
PHONE 234
SCORE 9
NAME John
SEX male
PHONE 444
SCORE 10
NAME Baba
SEX male
PHONE 777
SCORE 5
end
drop a1
egen id = seq(), block(4)
egen j = seq(), to(4)
reshape wide a2, i(id) j(j)
rename (a2*) (name sex phone score)
destring, replace
list
+------------------------------------+
| id name sex phone score |
|------------------------------------|
1. | 1 Jane female 234 9 |
2. | 2 John male 444 10 |
3. | 3 Baba male 777 5 |
+------------------------------------+

Related

Find a string within a string in SQL Server 2016

I am trying to write a query to find a string withing a string. In Oracle I used regexp_like but I don't see such function in SQL Server 2016. I have a table that has an address column such as:
ADDRESS
--------
345 E 149 ST NY NY
345 EAST 149 STREET NY NY
444 CHEST AVE NY NY
444 CHEST AVENUE NY NY
I want to write a query that search for 345 [E OR East] follow by 149. in the case of 444 chest, I want a query to search for 444 chest[ ave or avenue].
I have something like the following which doesn't work
select * from table1 where address like '345[EEAST] 149%'
Basically I want to tell the query to get any address that start with 345 E or 345 EAST follow by 149.
Can someone help me write a query for this? I know I can use OR clause with two different address but if I have multiple addresses with different pattern then OR clause will not be efficient method. I'm looking into using some type of regular expression.
Unfortunately the amount of support for LIKE expression is very limited currently in SQL Server. Your requirement can be satisfied by below queries
DECLARE #table table(address VARCHAR(1000))
INSERT INTO #table
values
('345 E 149 ST NY NY')
,('345 EAST 149 STREET NY NY')
,('444 CHEST AVE NY NY')
,('444 CHEST AVENUE NY NY')
SELECT * FROM #table WHERE Address LIKE '345 E%149 ST% NY NY'
SELECT * FROM #table WHERE Address LIKE '444 CHEST AVE% NY NY'
Result Set
+---------------------------+
| address |
+---------------------------+
| 345 E 149 ST NY NY |
| 345 EAST 149 STREET NY NY |
+---------------------------+
+------------------------+
| address |
+------------------------+
| 444 CHEST AVE NY NY |
| 444 CHEST AVENUE NY NY |
+------------------------+
Use the LIKE operator, with separate conditions for each match:
SELECT *
FROM table1
WHERE ADDRESS LIKE '345 E 149 %' OR ADDRESS LIKE '345 EAST 149 %';
You can try this in WHERE Clause.
SELECT *
FROM #T
WHERE (
1 = CASE
WHEN ADDRESS LIKE '%345_E_149%' THEN 1
WHEN ADDRESS LIKE '%345_EAST_149%' THEN 1
WHEN ADDRESS LIKE '%444_CHEST_AVE%' THEN 1
WHEN ADDRESS LIKE '%444_CHEST_AVENUE%' THEN 1
END
)

Query or Scan DynamoDB for all rows which have an attribute that matches a list of options

I have a DynamoDB table that has the following format
{
id: "1234" (Primary Key, String)
billNumber: "01" (Sort Key, String)
month: 1 (Number)
product: "Apple" (String)
itemLocation": "Aisle 1" (String)
}
Each product is written to the table separately, so the products can't be written to the same row and they can't update the existing entry to append to the value in the product field.
I want to know how to query or scan this DDB table to find all itemLocations that id "1234" has purchased in the month 1 where the value within the Product field matches list of given products.
I also have a global secondary index on id-month which I can use to find all rows purchased in a month by a user.
Meaning if the table looked like
id | billNumber | month | product | itemLocation
1234 | 01 | 1 | Apple | Aisle 1
1234 | 02 | 1 | Banana | Aisle 2
1234 | 03 | 1 | Cherry | Aisle 3
1234 | 04 | 1 | Coke | Aisle 4
and I wanted to get the itemLocations that id "1234" bought in month "1" where the products were one of {"Apple", "Banana", "Cherry"} I would be returned every row but the Coke row as seen below.
id | billNumber | month | product | itemLocation
1234 | 01 | 1 | Apple | Aisle 1
1234 | 02 | 1 | Banana | Aisle 2
1234 | 03 | 1 | Cherry | Aisle 3
Is this possible in a single query or scan without needing to query for each Product separately? I believe I could solve the problem in my example by doing 3 queries. The same id and month in all 3 queries and a different Product in each query.
The closest I've gotten to seeing something that could work is DDB Condition Expression for this but that doesn't seem to be used for query, instead for CRUD operations.
You can't have a DDB table that looks like that...
With DDB, when using a composite primary key (hash + sort) the combination must be unique.
So you can't have
id | month | product | itemLocation
1234 | 1 | Apple | Aisle 1
1234 | 1 | Banana | Aisle 2
1234 | 1 | Cherry | Aisle 3
1234 | 1 | Coke | Aisle 4
in which 4 records have the same hash key (ID) and sort key(month)
EDIT
Ok so now you've got a valid DDB table...
But you can't query() it for
that id "1234" bought in month "1" where the products were one of {"Apple", "Banana", "Cherry"}
Scan() would work, but very inefficiently...
In order to do this efficiently, you'll want an local secondary index with a sort key of month#product
Now, depending on how many products, you'll need to query 3 times
Query(table, hk='1234', sk='1#Apple')
Query(table, hk='1234', sk='1#Banana')
Query(table, hk='1234', sk='1#Cherry')
Or only once and filter the products server side or client side (note that filtering server side doesn't save any Read Capacity Units)
Query(table, hk='1234', sk BEGINS WITH '1#')
Adding a newer answer here for anyone else who comes across this problem later:
You can transform your list of products into a String Set (type SS) and query your DDB using your id and contains(StringSet, product) as a filter expression.

Making an employee hours table

I was wondering how to make a table in python 3 that has 7 columns for each day of the week and a variable defined amount of rows. I need to input employee hours for each separate person and be able to add them up later. This problem is really throwing me for a loop.
For example:
employeeAmount = int(input("Enter the amount of employees here"))
Then the table goes down "employeeAmount" of times.
I need something like this
Employee 1 | 7 | 9 | 8 | 8| 0 | 9 | 7 |
Employee 2 etc
Employee 3 etc

Transforming to 3NF

I need to reduce a model DB to 3NF. However there is a column in the data thats very ambiguous.
So the database has the following columns. (Apologies for formatting, I did try)
Employer ID | ContractNo | Hours | emp Name | workNo | workLocation
--
123 | A1 | 10 | J Smith | W36 | New York
124 | A1 | 7 | P Jones | W36 | New York
125 | A2 | 9 | R Lewis | W37 | Los Angeles
123 | A2 | 9 | J Smith | W37 | Los Angeles
Each employee has a unique ID, an employee can work at more than 1 location and each location has a unique workNo. I'm just a bit stuck on where to include the ContractNo. There is no indication in the question of what it actually is for.
So my first step was splitting it up into a table with EmployerID, employee Name and hours. And a second table with WorkNo, WorkLocation. But what do I make of that bloody ContractNo?
I expect the contract is likely a separate entity, capturing the nature of the relationship between contractor and contractee.
Image from QuickDBD, where I work.

Dimension Member as Calculated Measure in MDX

I need to get a dimension member returned as a calculated measure.
Given:
Dimensions
Customer {ACME, EMCA, Universal Imports, Universal Exports}
Salesperson {Bob, Fred, Mary, Joe}
Credit Type {Director, Manager}
Measures
Credited Value
Value
Relationships
The Customer is a dimension of the facts that contain Value
The Customer, Salesperson and Credit Type are dimensions of the facts that contain Credited Value
I am trying to do the following:
Create calculated measures that will return the Salesperson with the largest $s credited in a role for a customer. e.g.
| Customer | Director | Manager | Value |
|-------------------|----------|---------|-------|
| ACME | Bob | Fred | 500 |
| EMCA | Bob | Fred | 540 |
| Universal Imports | Mary | Joe | 1000 |
| Universal Exports | Mary | Fred | 33 |
ACME has Bob credited with 490 as Director
ACME has Fred credited with 500 as Manager
ACME has Mary credited with 10 as Director
I would like to use this as a calculated measure that I can use in any case where Customers are the ROW.
If I understand your problem correctly, something along this line should do the trick (of course you'd have to use the proper level, hierarchy and cube names):
with
member [Measures].[DirectorTemp] as topcount([Salesperson].[Salesperson].members,1,([Measures].[Credited Value],[Credit Type].[Director],[Customer].currentmember)).item(0).properties("Caption")
member [Measures].[Director] as iif([Measures].[DirectorTemp] = [Salesperson].UnknownMember.properties("caption"), null, [Measures].[DirectorTemp])
member [Measures].[ManagerTemp] as topcount([Salesperson].[Salesperson].members,1,([Measures].[Credited Value],[Credit Type].[Manager],[Customer].currentmember)).item(0).properties("Caption")
member [Measures].[Manager] as iif([Measures].[ManagerTemp] = [Salesperson].UnknownMember.properties("caption"), null, [Measures].[ManagerTemp])
select
{[Measures].[Director],[Measures].[Manager],[Measures].[Value]} on 0,
{[Customer].members} on 1
from MyCube

Resources