I have a database in Stata by country and the regions these belong to. I want to create graphs with one variable (varA) by year for each region. In each graph I want the line of varA of every country belonging to that region.
I want to do it automatically because there are a lot of countries (however only five regions).
Any help to do this?
The database is like this:
region varA
country 1 year 1 1
country 1 year 2 1
country 2 year 1 2
country 2 year 2 2
country 3 year 1 2
country 3 year 2 3
I tried using foreach command to loop the creation of the graphs but I can not figure out how to put in each region graph all the varA line series of each country in the region.
You don't show your code but I don't see that you need a loop here. The essence of what you want may be a line graph using by() to specify different regions.
This is an example you can run in your Stata. Some of the details won't necessarily apply to your dataset. company corresponds to country and group to region and invest to varA.
webuse grunfeld, clear
gen group = mod(company, 2)
xtset company year
line invest year, c(L) by(group, imargin(medsmall) note("")) ysc(log) yla(1 10 100 1000, ang(h)) xtitle("")
Related
all:
I am trying to design a shared worksheet that measures salespeople performance over a period of time. In addition to calculating # of units, sales price, and profit, I am trying to calculate how many new account were sold in the month (ideally, I'd like to be able to change the timeframe so I can calculate larger time periods like quarter, year etc').
In essence, I want to find out if a customer was sold to in the 12 months before the present month, and if not, that I will see the customer number and the salesperson who sold them.
So far, I was able to calculate that by adding three columns that each calculate a part of the process (see screenshot below):
Column H (SoldLastYear) - Shows customers that were sold in the year before this current month: =IF(AND(B2>=(TODAY()-365),B2<(TODAY()-DAY(TODAY())+1)),D2,"")
Column I (SoldNow) - Shows the customers that were sold this month, and if they are NOT found in column H, show "New Cust": =IFNA(IF(B2>TODAY()-DAY(TODAY()),VLOOKUP(D2,H:H,1,FALSE),""),"New Cust")
Column J (NewCust) - If Column I shows "New Cust", show me the customer number: =IF(I2="New Cust",D2,"")
Column K (SalesName) - if Column I shows "New Cust", show me the salesperson name: =IF(I2="New Cust",C2,"")
Does anyone have an idea how I can make this more efficient? Could an array formula work here or will it be stuck in a loop since its referring to other lines in the same column?
Any help would be appreciated!!
EDIT: Here is what Im trying to achieve:
Instead of:
having Column H showing me what was sold in the 12 months before the 1st day of the current month (for today's date: 8/1/19-7/31/20);
Having Column I showing me what was sold in August 2020; and
Column I searching column H to see if that customer was sold in the timeframe specified in Column H
I want to have one column that does all three: One column that flags all sales made for the last 12 months from the beginning of the current month (so, 8/1/19 to 8/27/20), then compares sales made in current month (august) with the sales made before it, and lets me know the first time a customer shows up in current month IF it doesn't appear in the 12 months prior --> finds the new customers after a dormant period of 12 months.
Im really just trying to find a way to make the formula better and less-resource consuming. With a large dataset, the three columns (copied a few times for different timeframes) really slow down Excel...
Here is an example of the end result:
Example of final product
I have a pivot table with week number (columns) versus product (rows). I want to see the top 10 products, based on sales, per week, a value of 1 would indicate the product is in the top 10. I'm using the expression below and I'm getting the Top 10 sales over all weeks. How do I get a weekly top 10 (each column would have ten values)?
=if(Aggr(Rank(sum([SALES])),PRODUCT)< 11,1,0)
I would say you are almost there, but you need to add another parameter to the Aggr function (like you've said you want the rank by Product and Week, and currently you are only taking Product into account).
Without knowing the details of your model, the expression would be something like :
=if(Aggr(Rank(sum([SALES])),PRODUCT,WEEK)< 11,1,0)
I'm quite new to BI designing DB, and here some point I do not understand well.
I'm trying to import french census data, where I got population for each city. For each city, I have population with different age classification, that can't really relate with each other.
For instance, let's say that one classification is 00 to 20 years old, 21 to 59, and 60+
And the other is way more precise : 00 to 02, 03 to 05, etc. but the bounds are never the same as the first one classification : I don't have 15 to 20, but 18 to 22, for example.
So those 2 classifications are incompatible. How can I use them in my fact table ? Should I use 2 fact tables and 2 cubes ? Should I use one fact table, and 2 dimensions for 1 cube ? But in this case, I will have double counted facts when I'll sum to have total population for a city, won't I ?
This is national census data, and national classifications, so changing that or estimating population to mix those classifications is not an option. And to be clear, one row doesn't relate to one person, but to one city. My facts are not individuals but cities' populations.
So this table is like :
Line 1 : One city - one amount of population - one code for dim age (ex. 00 to 19 yo) of this population - code (m/f) for the dim gender of that population - date of the census
Line 2 : Same city - one amount of population - one code for dim age (ex. 20 to 34) of this population - code (m/f) for the dim gender - date of the census
And so it goes for a lot of cities, both gender, and multiple years.
Same
I hope this question is clear enough, as english is not my native language and as I'm quite new in DB and BI !
Thanks for helping me with that.
One possible solution using a single fact table and two dimensions for the age ranges:
1 - Categorical range based on the broadest census, for example:
Young 0-20
Adult 21-59
Senior 60+
You could then link the other census to this dimension with approximate values, for example 18-22 could be Young.
2 -Original age range. This dimension could be used for precise age ranges when you report on a single city, it can also help you evaluate the impact of the overlapping bounds (e.g. how many rows are in the young / 18-22 range?)
you can crate one dimention as below
young 1-20
adult 21-59
senior 60+
Classification is
young city 1 : 1-20
young city 2 : 4-23
id field1 field2 field3 field4 .......
1 1 year young_city_1 other .......
2 2 year young_city_1 other .......
3 3 year young_city_1 other .......
4 4 year young_city_1 young_city_2 .......
Now you can report from any item and with any division
i hope it is help you
Background
I have a database that hold records of all assets in an office. Each asset have a condition, a category name and an age.
A ConditionID can be;
In use
Spare
In Circulation
CategoryID are;
Phone
PC
Laptop
and Age is just a field called AquiredDate which holds records like;
2009-04-24 15:07:51.257
Example
I've created an example of the inputs of the query to explain better what I need if possible.
NB.
Inputs are in Orange in the above example.
I've split the example into two separate queries.
Count would be the output
Question
Is this type of query and result set possible using SQL alone? And if so where do I start? Would it be easier to use Ms Excel also?
Yes it is possible, for your orange fields you can just e.g.
where CategoryID ='Phone' and ConditionID in ('In use', 'In Circulation')
For the yellow one you could do a datediff of days of accuired date to now and divide it by 365 and floor that value, to get the last one (6+ years category) you need to take the minimum of 5 and the calculated value so you get 0 for all between 0-1 year old etc. until 5 which has everything above 6 years.
When you group by that calculated column and select the additional the count you get what you desire.
I've got a mental block about what I'm sure is a common scenario:
I have some data in a csv file that I need to do some very basic reporting from.
The data is essentially a table with Resources as column headings and People as row headings, the rest of the table consists of Y/N flag, "Y" if the person has access to the resource, "N" if they don't. Both the resources and the people have unique names.
Sample data:
Res1 Res2 Res3
Bob Y Y N
Tom N N N
Jim Y N Y
The table is too large to simply view it as whole in Excel(say 300 resources and 600 people), so I need a way to easily query and display (A simple list would be ok) what resources a person has access to, given the person's name.
The person that will need to use this has MS Office, and not much else on their PC.
So, the question is: What is the best way to manipulate this data to get the report I need? My gut says MS Access would be the best, but I can't figure out to automatically import data like this into a normal relational database. If not Access, perhaps there are some functions in Excel that could help me out?
You should normalize your data. This will make it easier to query against. For example:
table users:
UserID UserName
1 Bob
2 Tim
3 Jim
table resources:
ResourceID ResourceDesc
1 Printer #1
2 Fax Machine
3 Bowling Ball Wax
table users_resources:
LinkID UserID ResourceID
1 1 1
2 1 2
3 3 1
4 3 3
SELECT ResourceID
FROM users_resources, users
WHERE users.UserName="Bob"