Grab string from the last cell that isn't missing in SAS - arrays

HAVE is a dataset of authors on pubs:
^note that auth50 is the last variable in the dataset.
WANT is a dataset where the first and last authors for each observation ofHAVE are their own variables:
I do the following in a data step to create lastauth...
array auths {*} auth1:auth50;
do i=1 to dim(auths);
if auths{1} NE " " then firstauth = auths{1};
if auths{i} = " " then lastauth = auths{i-1};
else lastauth = auths{i};
end;
...which I thought would iteratively write over lastauth until encountering the last authX variable, but the code does not retrieve the last non-missing value. Any thoughts on what I'm misunderstanding? Thanks!

Check the logic carefully. You want to set the value of FIRST when FIRST is missing and set the value of LAST when the current is NOT missing.
array auths auth1-auth50;
firstauth=auths(1);
do i=1 to dim(auths);
if missing(firstauth) then firstauth = auths{i};
if not missing(auths{i}) then lastauth = auths{i};
end;
Note: The extra assignment before the DO loop is to force SAS to define the new variable since otherwise the first use will be in the IF condition. If you already have a LENGTH or other statement that defines FIRSTAUTH then the extra assignment statement is not needed.
Or skip the array and just use the COALESCEC() function to find the first value. And just reverse the order of the variables to also use it to find the last value.
firstauth = coalescec(of auth1-auth50);
lastauth = coalescec(of auth50-auth1);

Related

SORTC always moves first element of array to the end

I've got a character variable which holds a delimited list of strings, like so:
data lists;
format list_val $75.;
list_val = "PDC; QRS; OLN; ABC";
run;
I need to alphabetize the elements of each list (so the desired result when applied to the above string is "ABC; OLN; PDC; QRS;").
I adapted the solution here for my purposes as follows:
data lists_sorted;
set lists;
array_size = count(list_val,";") + 1; /* Cannot be used as array length must be specified at creation */
array t(50) $ 8 _TEMPORARY_;
call missing(of t(*));
do _n_=1 to array_size;
t(_n_)=scan(list_val,_n_,";");
end;
call sortc(of t(*));
new_list_val =catx("; ", of t(*));
put "original: " list_val " new: " new_list_val;
run;
When I run this code I get the following output:
original: PDC; QRS; OLN; ABC new: ABC; OLN; QRS; PDC
Which was not expected or desired. In general, the result of the above code applied to any list is a new list which is sorted alphabetically, except that the first element of the original list becomes the last element of the new list, regardless of its alphabetical ordering.
I can't find anything in the documentation of sortc which would explain this behavior, so I'm wondering if the issue is somehow the way I've set up the temporary array (I don't have much experience with these).
Does anyone know why sortc behaves this way? Side question: is there anyway I can dynamically determine the size of the array, rather than hard-coding a value such as 50?
It is because you included the leading spaces when assigning the values to the array elements. Remove those.
t[_n_]=left(scan(list_val,_n_,";"));
If you want to know what the minimum size array you could use for your data step you would need to process the dataset twice.
proc sql ;
select max(count(list_val,";") + 1) into :max_size trimmed from have;
quit;
....
array t[&max_size] $ 8 _temporary_;
But there is probably not much harm in just using some large constant value.

Finding specific instance in a list when the list starts with a comma

I'm uploading a spreadsheet and mapping the spreadsheet column headings to those in my database. The email column is the only one that is required. In StringB below, the ,,, simply indicates that a column was skipped/ignored.
The meat of my question is this:
I have a string of text (StringA) comes from a spreadsheet that I need to find in another string of text (StringB) which matches my database (this is not the real values, just made it simple to illustrate my problem so hopefully this is clear).
StringA: YR,MNTH,ANNIVERSARIES,FIRSTNAME,LASTNAME,EMAIL,NOTES
StringB: ,YEAR,,MONTH,LastName,Email,Comments <-- this list is dynamic
MNTH and MONTH are intentionally different;
excelColumnList = 'YR,MNTH,ANNIV,FIRST NAME,LAST NAME,EMAIL,NOTES';
mappedColumnList= ',YEAR,,MONTH,,First Name,Last Name,Email,COMMENTS';
mappedColumn= 'Last Name';
local.index = ListFindNoCase(mappedColumnList, mappedColumn,',', true);
local.returnValue = "";
if ( local.index > 0 )
local.returnValue = ListGetAt(excelColumnList, local.index);
writedump(local.returnValue); // dumps "EMAIL" which is wrong
The problem I'm having is the index returned when StringB starts with a , returns the wrong index value which affects the mapping later. If StringB starts with a word, the process works perfectly. Is there a better way to to get the index when StringB starts with a ,?
I also tried using listtoarray and then arraytolist to clean it up but the index is still off and I cannot reliably just add +1 to the index to identify the correct item in the list.
On the other hand, I was considering this mappedColumnList = right(mappedColumnList,len(mappedColumnList)-1) to remove the leading , which still throws my index values off BUT I could account for that by adding 1 to the index and this appears to be reliably at first glance. Just concerned this is a sort of hack.
Any advice?
https://cfdocs.org/listfindnocase
Here is a cfgist: https://trycf.com/gist/4b087b40ae4cb4499c2b0ddf0727541b/lucee5?theme=monokai
UPDATED
I accepted the answer using EDIT #1. I also added a comment here: Finding specific instance in a list when the list starts with a comma
Identify and strip the "," off the list if it is the first character.
EDIT: Changed to a while loop to identify multiple leading ","s.
Try:
while(left(mappedColumnList,1) == ",") {
mappedColumnList = right( mappedColumnList,(len(mappedColumnList)-1) ) ;
}
https://trycf.com/gist/64287c72d5f54e1da294cc2c10b5ad86/acf2016?theme=monokai
EDIT 2: Or even better, if you don't mind dropping back into Java (and a little Regex), you can skip the loop completely. Super efficient.
mappedColumnList = mappedColumnList.replaceall("^(,*)","") ;
And then drop the while loop completely.
https://trycf.com/gist/346a005cdb72b844a83ca21eacb85035/acf2016?theme=monokai
<cfscript>
excelColumnList = 'YR,MNTH,ANNIV,FIRST NAME,LAST NAME,EMAIL,NOTES';
mappedColumnList= ',,,YEAR,MONTH,,First Name,Last Name,Email,COMMENTS';
mappedColumn= 'Last Name';
mappedColumnList = mappedColumnList.replaceall("^(,*)","") ;
local.index = ListFindNoCase(mappedColumnList, mappedColumn,',', true);
local.returnValue = ListGetAt(excelColumnList,local.index,",",true) ;
writeDump(local.returnValue);
</cfscript>
Explanation of the Regex ^(,*):
^ = Start at the beginning of the string.
() = Capture this group of characters
,* = A literal comma and all consecutive repeats.
So ^(,*) says, start at the beginning of the string and capture all consecutive commas until reaching the next non-matched character. Then the replaceall() just replaces that set of matched characters with an empty string.
EDIT 3: I fixed a typo in my original answer. I was only using one list.
writeOutput(arraytoList(listtoArray(mappedColumnList))) will get rid of your leading commas, but this is because it will drop empty elements before it becomes an array. This throws your indexing off because you have one empty element in your original mappedColumnList string. The later string functions will both read and index that empty element. So, to keep your indexes working like you see to, you'll either need to make sure that your Excel and db columns are always in the same order or you'll have to create some sort of mapping for each of the column names and then perform the ListGetAt() on the string you need to use.
By default many CF list functions ignore empty elements. A flag was added to these function so that you could disable this behavior. If you have string ,,1,2,3 by default listToArray would consider that 3 elements but listToArray(listVar, ",", true) will return 5 with first two as empty strings. ListGetAt has the same "includeEmptyValues" flag so your code should work consistently when that is set to true.

VBA nested for loop won't Run

Hi I'm working on a macro in VBA for excel. I have a nested for loop in my code as seen below. The second loop does not run; even a MsgBox() command within the second loop doesn't result in any action and the program just seems to skip over the nested loop with no reported errors.
In plain English this nested loop should:
1) Take the string from the i entry in the array categories_string (first for loop).
2) iterate through the 300+ rows in column "AE" in the excel file (second for loop, "length" is the length of the column of data in "AE")
3) look for string matches and add 1 to the corresponding entry in the categories_value array which is populated with variables set to 0 (the if statement).
For i = LBound(categories_string) To UBound(categories_string)
For p = 1 To p = length
If Worksheets("sheet1").Cells(p + 2, "AE").Value = categories_string(i) Then
categories_value(i) = categories_value(i) + 1
End If
Next
Next
Change
For p = 1 To p = length
to
For p = 1 To length
Should fix you right up.
You may want to consider Scott Cranner's comment as well, especially if your categories_string is large and length is large. Or, just do it for the practice

Subscript out of range when trying to get last item of an Array

For example, I have a string A-456-BC-123;DEF-456;GHI-789. And I need to search for second part:DEF-456 with keyword 456. The potential problem here is that the first part A-456-BC-123 also has the keyword 456. Currently my logic is that, split the string first using ;, split each of it again using -, get the last item of this Array, search keyword 456. Another thing is that I don't want to do a full keyword match like DEF-456, I only want to use 456 as keyword to locate DEF-456, in other words, 456 should be the last segment of the string I want.
Here are my codes:
FirstSplit= split("A-456-BC-123;DEF-456;GHI-789",";")
For each code in FirstSplit
SecondSplit = split(FirstSplit,"-")
'get Array Count
Count = Ubound(SecondSplit)
'get the last item in Array
If SecondSplit(Count-1) = "456" Then
'doing something
End if
Next
Currently, an error will generate at SecondSplit(Count-1), saying that "Subscript out of range: '[number: -1]'"
Could someone tell me how to fix it?
The real problem is here:
SecondSplit = Split(FirstSplit, "-")
You should be splitting your element which you've stored in variable code from your For Each loop. By trying to split an array you should be getting a Type Mismatch error, but perhaps vbscript is forgiving enough to attempt and return a 0 element array back or something. Anyway:
SecondSplit = Split(Code, "-")
Also, to look at the last element just use:
If secondSplit(Ubound(SecondSplit)) = "456" Then
Subtracting 1 from the ubound would get you the second to last element.
You don't need to split it twice - what you're looking for is Right():
FirstSplit = split("A-456-BC-123;DEF-456;GHI-789",";")
For each code in FirstSplit
If Right(code, 3) = "456" Then
' do something
End If
Next
Though this would also match an entry like ABC-1456. If the - symbol is a required delimiter, you'd have to say If Right(code, 4) = "-456".

Pulling out specific column when sometimes, the array is empty

I am trying to pull out a column from inside a cell. However, sometimes, the cell is empty.
For example, if in this line, I try to pull out the last column inside PM25_win{i}, it sometimes has an array inside of size nx28. However, sometimes, the array is zero.
for i = 1:length(years)-1
PM25 = table2array(PM25_win{i}(:,end));
end
When the array is empty, the code stops and I get the error
Subscript indices must either be real positive integers or logicals.
How can I account for both cases so that the code will simply create the PM25 variable as an empty array if PM25_win{i} is empty?
You could simply add an if-else statement in the for loop.
for i = 1:length(years)-1
if isempty(PM25_win{i}(:,end))
PM25 = [];
else
PM25 = table2array(PM25_win{i}(:,end));
end
end

Resources