I have a list of names that are either a company name or a person's name. I have to manually decide if it's a company or a person. I have looked at each row and used a case to decide. If it's a company a one is placed. If it is a person, a zero is placed. Also, I have to parse the individual names into columns of first and last. There are 20 lines for each whe statements. It seems that pervasive can't handle all the statements. Am I doing this wrong? Is there a simpler way?
Example
Own_name
Dollar Tree LLC
Jacob Smith
New Life Co
Johnson, Robert
Here's a sample of what the SQL looks like:
select
CASE
WHEN own_name LIKE '%co%'
OR own_name LIKE '%LLC%
THEN 1
END AS is_compan, // if the own_name has Co or LLC, place a 1 in a column
CASE
WHEN own_name NOT LIKE '%co%'
OR own_name NOT LIKE '%LLC%
THEN 0
END AS is_individual, //if the own_name DOES NOT have Co or LLC, place a 0 in a column
CASE
WHEN locate(',', own_name)>0
AND own_name NOT LIKE '%co%'
AND own_name NOT LIKE '%LLC%'
THEN ltrim(left(own_name, locate(',', own_name)-1))
WHEN locate(' ', own_name)>0
AND own_name NOT LIKE '%co%'
AND own_name NOT LIKE '%LLC%'
THEN SUBSTRING(own_name, locate(' ', own_name) + 1, 8000) //if the own_name has a comma or a space and does not have CO or LLC in the column, parse the name for the last name
END AS LAST_NAME
CASE
WHEN locate(',', own_name)>0
AND not locate('-', own_name)<0
AND own_name NOT LIKE '%co%'
AND own_name NOT LIKE '%LLC%'
THEN ltrim(left(own_name, locate(',', own_name)-1))
WHEN locate(',', own_name)=0
AND own_name NOT LIKE '%corp%'
AND own_name NOT LIKE '%LLC%'
THEN Ltrim(SubString(own_name,1,Isnull(Nullif(locate(' ',own_name),0),1000))) //if the own_name has a comma or a space and does not have CO or LLC in the column, parse the name for the first name
END AS First_name
The result
Own_name is_compan is individual Last name First name
Dollar Tree LLC 1
Jacob Smith 0 Smith Jacob
New Life Co 1
Johnson, Robert 0 Johnson Robert
Related
I have this Google Spreadsheet.
-The product name column is already in a good format. Full word and delimited by "+".
-The second column is meaning there is part of the string or Sugar in the product name.
-The third columns are some product reformated in the good format (full name and "+")
Product name Sugar_word_in_product_name Product reformated
Choco + sug,oil 1 Choco + Sugar + oil
Tablets + Sofa 0
Television + table 0
sugar,oil,ingred 1 Sugar + oil + ingredients
I want to return in this format. Ingredients 1, Ingredients 2...
Product name Sugar_word_in_product_name Product reformated Ingredients1 Ingredients2 Ingredients3
Choco + sug,oil 1 Choco + Sugar + oil Choco Sugar oil
Tablets + Sofa 0 Tablets Sofa
Television + table 0 Television Table
sugar,oil,ingred 1 Sugar + oil + ingredients Sugar Oil Ingredients
0
So basically, I want to split if it's "1" in column "Sugar_word_in_product_name" From "Product reformated"
If its "0" I want to split into columns from "product name".
This work on one condition. =IF(G2=0,SPLIT(F2,"+")).
That's the query I started but not sure how to make it work and returning blanks cells when no values.
=IF(G2=0,SPLIT(F2,"+",IF(G2=1,SPLIT(H2,"+"))))
try:
=ARRAYFORMULA(IFERROR(IF(B2:B=1; SPLIT(C2:C; "+"); SPLIT(A2:A; "+"))))
I've tried several of the solutions to my question on the site but could not find one that worked. Please help!
Other than taking some liberties with the report_names, the data is realistic of what I am trying to accomplish and is just a small portion of what I am up against, roughly 97K rows of data with the same type of repetition of branch, file_count, report_name...the file numbers are unique and are insignificant. It is for informational purposes of my question and explains why the amounts are unique - they are tied to the file_name
I am looking for one report_name with the sum of the two amounts.
Here are the current results to my query:
branch file_count file_volume net_profit report_name file_number
Northeast 1 $200,000.00 $200,000.00 bogart.hump.new 12345
Northeast 1 $195,000.00 $197,837.00 bogart.hump.new 23456
Northeast 1 $111,500.00 $113,172.00 bogart.hump.new 34567
Northwest 1 $66,000.00 -$1,500.18 jolie.angela.new 45678
Northwest 1 $159,856.00 -$2,745.58 jolie.angela.new 56789
Northwest 1 $140,998.00 -$2,421.69 jolie.angela.new 67890
Southwest 1 $74,000.00 $73,904.00 Man.bat.net 78901
Southwest 1 $186,245.00 -$4,231.25 Man.bat.net 89012
Southwest 1 $72,375.00 $73,641.00 Man.bat.net 90123
Southeast 1 $79,575.00 -$1,821.76 zep.led.new 1234A
Southeast 1 $268,600.00 $268,600.00 zep.led.new 2345A
Southeast 1 $77,103.00 -$1,751.68 zep.led.new 3456A
This is what I am looking for:
branch file_count file_volume net_profit report_name file_number
Northeast 3 $506,500.00 $511,009.00 bogart.hump.new
Northwest 3 $366,854.00 -$6,667.45 jolie.angela.new
Southwest 3 $332,620.00 $143,313.75 Man.bat.net
Southeast 3 $425,278.00 $265,026.56 zep.led.new
My query:
SELECT
branch,
count(filenumber) AS file_count,
sum(fileAmount) AS file_amount,
sum(netprofit*-1) AS net_profit,
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
FROM user.summary u
inner join user.db1 d1 ON d1.loaname = u.loaname
inner join user.db2 d2 ON d2.cn = u.loaname
WHERE d2.filedate = '2015-09-01'
AND filenumber is not null
GROUP BY branch,concat(d2.lastname,'.',d2.firstname,'.','new')
The only issue i see with your current query is that you have a comma at the end of this line that would give you a syntax error:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
If you want the blank field file_number as shown in your desired result set though, you could leave the comma and follow it with the blank field by adding to it:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
'' file_number
I figured it out but could not have done it without airing it out in this forum. In my actual query, I included the "file_name" column, so I had both the "count(file_name)" and "file_name" columns...but in my query example, I only had the "count(file_name)" column. When I removed the "file_column" column from my actual query, it worked. Side note...it was obvious that I excluded a key component in my query. On any future query questions, I will include the complete query but substitute actual column names with col1, col2, db1, db2, etc... thanks very much for responding to my question.
I have a large Dataset and want to filter it for all rows with date entry closest to the last day of the month, for each month. So there could be multiple entries for the day closest to the last day of month.
So for instance:
original Dataset
date price name
05-01-1995 1,2 abc
06-01-1995 1,5 def
07-01-1995 1,8 ghi
07-01-1995 1,7 mmm
04-02-1995 1,9 jkl
27-02-1995 2,1 mno
goal:
date price name
07-01-1995 1,8 ghi
07-01-1995 1,7 mmm
27-02-1995 2,1 mno
I had 2 ideas, but I am failing with implementing it within a loop (for traversing the months) in SAS.
1.idea: create new column wich indicates last day of the current month (intnx() function); then filter for all entries that are closest to the last day of its month:
date price name last_day_of_month
05-01-1995 1,2 abc 31-01-1995
06-01-1995 1,5 def 31-01-1995
07-01-1995 1,8 ghi 31-01-1995
04-02-1995 1,9 jkl 28-02-1995
27-02-1995 2,1 mno 28-02-1995
2.idea: simply filter for each month the entries with highest date (using maybe max function?!)
I would be very glad if you were able to help me, as I am used to ordinary programming languages and just started with SAS for research purposes.
proc sql is one way to solve this kind of situation. I'll break down your original requirements with explanations in how to interpret them in sql.
Since you want to group your observations on date, you can use the having clause to filter on the max date per month.
data work.have;
input date DDMMYY10. price name $;
format date date9.;
datalines;
05-01-1995 1.2 abc
07-01-1995 1.8 ghi
06-01-1995 1.5 def
07-01-1995 1.7 mmm
04-02-1995 1.9 jkl
27-02-1995 2.1 mno
;
data work.want;
input date DDMMYY10. price name $;
format date date9.;
datalines;
07-01-1995 1.8 ghi
07-01-1995 1.7 mmm
27-02-1995 2.1 mno
;
proc sql ;
create table work.want as
select *
/*, max(date) as max_date format=date9.*/
/*, intnx('month',date,0,'end') as monthend format=date9.*/
from work.have
group by intnx('month',date,0,'end')
having max(date) = date
order by date, name
;
If you uncomment the comments, the actual filters used are shown in the output table.
Comparing the the requirements against the solution:
proc compare base=work.want compare=work.solution;
results in
NOTE: No unequal values were found. All values compared are exactly equal.
1) create a new variable periode = put(date,yymmn6.) /* gives you yyyymm*/
2) sort the table on periode and date
3) now a periode.last logic will select the record you need per periode.
Something like...
data tab2;
set your_table;
periode = put(date,yymmn6.);
run;
proc sort data= tab2;
by periode date;
run;
data tab3;
set tab2;
by periode;
if last.periode then output;
run;
You can use two SAS functions called intnx and intck to do this with proc sql:
proc sql ;
create table want as
select *, put(date,yymmn6.) as month, intck('days',date,intnx('month',date,0,'end')) as DaysToEnd
from have
group by month
having (DaysToEnd=min(DaysToEnd))
;quit ;
Intnx() adjusts dates by intervals. In the above case, the four parameters used are:
What size 'step' you want to add/subrate the intervals in.
The date that is being referenced
How many interval steps to make
How to 'round' the step (eg round it to the start/end/middle of the resultant day/week/year)
Intck() simply counts interval steps between two dates
This will give you all records which fall on the day closest to the end of the month
Another approach is by using proc rank;
data mid;
retain yrmth date;
set have;
format date yymmddn8.;
yrmth = put(date,yymmn6.);
run;
proc sort data = mid;
by yrmth descending date;
run;
proc rank data = mid out = want descending ties=low;
by yrmth;
var date;
ranks rankdt;
run;
data want1;
set want;
where rankdt = 1;
run;
HTH
I have a record set that contains course attendance data in a row that I want to display in columns based on the last letter in the Course_Code and haven't been able to find a method for this.
The Course_Code filed contains the city followed by a sequence letter denoting the order the modules are to be taken. A must be first followed by B, then C etc.
The data looks like this:
Course_Code Student_ID
MadridA 123
ParisB 123
NewYorkC 123
HamburgD 123
HamburgA 456
ParisB 456
HamburgC 456
HamburgD 456
HamburgA 789
ParisB 789
HamburgC 789
MadridD 789
What I need the result to look like is:
Student_ID CourseA CourseB CourseC CourseD
123 MadridA ParisB NewYorkC HamburgD
456 HamburgA ParisB HamburgC HamburgD
789 HamburgA ParisB HamburgC MadridD
I've been looking into PIVOT as a likely solution but can't find any example that doesn't involve SUM or AVG on data values. I don't need to change the data just move to the appropriate column.
Is PIVOT going to do what I need or am I in the wrong creek with a broken paddle on that?
You can use the PIVOT function to get the result, but you will need to use either the max or min aggregate function since your data is a string.
You should be able to use the following:
select student_id,
CourseA, CourseB,
CourseC, CourseD
from
(
select course_code, student_id,
-- append the course letter A, etc to Course to get the new column names
col = 'Course'+right(course_code, 1)
from yourtable
) d
pivot
(
max(course_code)
for col in (CourseA, CourseB, CourseC, CourseD)
) piv;
See SQL Fiddle with Demo
I typed a word "boat", so i need records that start with boat and also that contains "boat".but starting with "boat" must appear first
i tried following
Select AsiccCodeId,AsiccDescription
FROM AsiccCodeMaster c
WHERE c.AsiccDescription like 'boat%' or c.AsiccDescription like '%boat%'
and
select a.* from
(
Select AsiccCodeId,AsiccDescription
FROM AsiccCodeMaster c
WHERE c.IsActive = 1 and (GoodFor = 'M' or GoodFor = 'B')
and c.AsiccDescription like 'Unmilled%'
UNION
Select AsiccCodeId,AsiccDescription
FROM AsiccCodeMaster c
WHERE c.IsActive = 1 and (GoodFor = 'M' or GoodFor = 'B')
and c.AsiccDescription like '%Unmilled%'
)a
but it gives me
4137 Combustion Boats
6360 Boat, Fibre
6361 Boat, Rubber - Motorized
6362 Boat, Wooden Canal Boats
6363 Boat, Wooden With Engine
6370 Wooden Boats Body Building
6374 Boat, Rowing / Sports
6375 Boat, Rubber - Nonmotorized
6376 Boat, Wooden Without Engine-Others
6379 Parts Of Ships, Boats Etc., N.E.C
6391 Ships, Boats & Other Vessels, N.E.C
6394 Ships, Boats & Other Vessels, N.E.C
i need records that starts with "boat" first and then records that contains "boat"
Use a CASE statement in an Order By clause
SELECT AsiccCodeId,AsiccDescription
FROM AsiccCodeMaster c
WHERE c.AsiccDescription like '%boat%'
ORDER BY CASE WHEN c.AsiccDescription like 'boat%' THEN 0 ELSE 1 END, c.AsiccDescription
Since you want titles that start with 'boat' to appear first, the CASE statement will prioritize those first. It looks at each record and, if the description starts with 'boat', assigns it a sort value of 0, otherwise it assigns it a sort value of 1. ORDER BY sorts ascending by default, so it will put all the 0s (the ones that start with 'boat') before all the 1s (the remaining records)
You need to give your results some ordering!
SELECT a.*
FROM
(
SELECT
AsiccCodeId, AsiccDescription, Sequence = 1
FROM AsiccCodeMaster c
WHERE c.IsActive = 1 AND (GoodFor = 'M' or GoodFor = 'B')
AND c.AsiccDescription LIKE 'Boat%'
UNION
SELECT
AsiccCodeId, AsiccDescription, Sequence = 2
FROM AsiccCodeMaster c
WHERE c.IsActive = 1 AND (GoodFor = 'M' or GoodFor = 'B')
AND c.AsiccDescription LIKE '%Boat%'
) a
ORDER BY a.Sequence