easier use of loops and vectors in spss to combine variables - loops

I have a student who has gathered data in a survey online whereby each response was given a variable, rather than the variable having whatever the response was. We need a scoring algorithm which reads the statements and integrates. I can do this with IF statements per item, e.g.,
if Q1_1=1 var1=1.
if Q1_2=1 var1=2.
if Q1_3=1 var1=3.
if Q1_4=1 var1=4.
Doing this for a 200 item survey (now more like 1000) will be a drag and subject to many typos unless automated. I have no experience of vectors and loops in SPSS, but some reading suggests this is the way to approach the problem.
I would like to run if statements as something like (pseudocode):
for items=1 1 to 30
for responses=1 to 4
if Q1_2_1=1 a=1.
if Q1_2=1 a=2.
if Q1_3=1 a=3.
if Q1_4=1 a=4.
compute newitem(items)=a.
next response.
next item.
Which I would hope would produce a new variable (newitem1 to 30) which has one of the 4 responses for it's original corresponding 4 variable information.
Never written serious spss code before: please advise!

This will do the Job:
* creating some sample data.
data list free (",")/Item1_1 to Item1_4 Item2_1 to Item2_4 Item3_1 to Item3_4.
begin data
1,,,,,1,,,,,1,,
,1,,,1,,,,1,,,,
,,,1,,,1,,,,,1,
end data.
* now looping over the items and constructing the "NewItems".
do repeat Item1=Item1_1 to Item1_4
/Item2=Item2_1 to Item2_4
/Item3=Item3_1 to Item3_4
/Val=1 to 4.
if Item1=1 NewItem1=Val.
if Item2=1 NewItem2=Val.
if Item3=1 NewItem3=Val.
end repeat.
execute.
In this way you run all you loops simultaneously.
Note that "ItemX_1 to ItemX_4" will only work if these four variables are consecutive in the dataset. If they aren't, you have to name each of them separately - "ItemX_1 ItemX_2 ItemX_3 itemX_4".
Now if you have many such item sets, all named regularly as in the example, the following macro can shorten the process:
define !DoItems (ItemList=!cmdend)
!do !Item !in (!ItemList)
do repeat !Item=!concat(!Item,"_1") !concat(!Item,"_2") !concat(!Item,"_3") !concat(!Item,"_4")/Val=1 2 3 4.
if !item=1 !concat("New",!Item)=Val.
end repeat.
!doend
execute.
!enddefine.
* now you just have to call the macro and list all your Item names:
!DoItems ItemList=Item1 Item2 Item3.
The macro will work with any item name, as long as the variables are named ItemName_1, ItemName_2 etc'.

Related

Repeating syntax for a list of variables in SPSS

I have a dataset with 134 different variables, I have now one variable for where the syntax is working and this runs fine on my SPSS. However, I don't want to have to repeat this step by hand 133 times more, as it is a lot of work. I want to know the string term present in AE_term1 and create a new variable where it is classified as a number, called AE_term1num. However, I don't know how to write a syntax that automatically does this for all 134 AE_term. Could someone please help me?
The data looks like this:
AE_term1
AE_dur1
AE_term2
AE_dur2
etc
Cystitis
14
Heamaturia
3
uti
15
Sepsis
8
etc
etc
etc
etc
To identify the string variable, i have written the following syntax:
`compute AE_term1num=0.
do if (char.index(lower(AE_term1), "cystitis")).
compute AE_term1num=1.
else if (char.index(lower(AE_term1), "uti")).
compute AE_term1num = 2.
else.
compute AE_term1num=0.
end if.
execute.`
This syntax works for finding the correct values and returns what i want, however I do not know how to loop it over the remaining 133 AE_term variables, if this is even possible.
To start with, your original syntax can be streamlined like this:
compute AE_term1num=0.
if char.index(lower(AE_term1), "cystitis") AE_term1num=1.
if char.index(lower(AE_term1), "uti") AE_term1num = 2.
Now if the same conditions exactly apply for all the rest of the variables, you can use do repeat to automate the process:
do repeat AE_term = AE_term1 to AE_term134 / AE_termN = AE_termN1 to AE_termN134.
compute AE_termN=0.
if char.index(lower(AE_term), "cystitis") AE_termN=1.
if char.index(lower(AE_term), "uti") AE_termN= 2.
end repeat.
NOTE - using AE_term1 to AE_term134 - existing variables - requires that they be consecutive in the dataset. Using AE_termN1 to AE_termN134 to create new variables requires that the numbers be at the end of the name, so AE_term1num to AE_term134num wouldn't work.

Lookup project task list an assign to project with yes/no (array vlookup?)

I would like to know if any of my projects (A,B,C or D) have ever been associated with certain tasks from another table.
For instance have project A ever been associated with 'pear' in any of the 'project tasks'
I would assume this would require a array formula to look across a long list of project and check if any project 'A* is associated with pear in a 'contain' manner?
Firstly, it may be possible that your tasks list includes words within words (using your fruit analogy, egAppleand Crabapple
Second, your Project Tasks list needs to be consistant (one choice of seperator, no spaces). So either clean it up manually, or write a formula to do it. Your choice of separator must not appear in in the task list (if you had a task do this, or that then comma wouldn't work) I'll demo the formula
Add a columnJ with this formula
=","&SUBSTITUTE(SUBSTITUTE(I2,";",",")," ","")&","
Add more Substitutes if needed
Then use this formula for the results
=COUNTIFS($H:$H,$A2,$J:$J,"*," & B$1 & ",*") > 0

Talend avoid duplicate external ID with Salesforce Output

We are importing data on Salesforce through Talend and we have multiple items with the same internal id.
Such import fails with error "Duplicate external id specified" because of how upsert works in Salesforce. At the moment, we worked that around by using the commit size of the tSalesforceOutput to 1, but that works only for small amount of data or it would exhaust Salesforce API Limits.
Is there a known approach to it in Talend? For example, to ensure items that have same external ID ends up in different "commits" of tSalesforceOutput?
Here is the design for the solution I wish to propose:
tSetGlobalVar is here to initialize the variable "finish" to false.
tLoop starts a while loop with (Boolean)globalMap.get("finish") == false as an end condition.
tFileCopy is used to copy the initial file (A for example) to a new one (B).
tFileInputDelimited reads file B.
tUniqRow eliminates duplicates. Uniques records go to tLogRow you have to replace by tSalesforceOutput. Duplicates records if any go to tFileOutputDelimited called A (same name as the original file) with the option "Throw an error if the file already exist" unchecked.
OnComponent OK after tUniqRow activates the tJava which set the new value for the global finish with the following code:
if (((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0) globalMap.put("finish", true);
Explaination with the following sample data:
line 1
line 2
line 3
line 2
line 4
line 2
line 5
line 3
On the 1st iteration, 5 uniques records are pushed into tLogRow, 3 duplicates are pushed into file A and "finish" is not changed as there is duplicates.
On the 2nd iteration, operations are repeated for 2 uniques records and 1 duplicate.
On the 3rd iteration, operations are repeated for 1 unique and as there not anymore duplicate, "finish" is set to true and the loop automatically finishes.
Here is the final result:
You can also decide to use an other global variable to set the salesforce commit level (using the syntax (Integer)globalMap.get("commitLevel")). This variable will be set to 200 by default and to 1 in the tJava if any duplicates. At the same time, set "finish" to true (without testing the number of duplicates) and you'll have a commit level to 200 for the 1st iteration and to 1 for the 2nd (and no need more than 2 iterations).
You'll decide the better choice depending on the number of potential duplicates, but you can notice that you can do it whitout any change to the job design.
I think it should solve your problem. Let me know.
Regards,
TRF
Do you mean you have the same record (the same account for example) twice or more in the input?If so, can't you try to eliminate the duplicates and keep only the record you need to push to Salesforce?Else, if each record has specific informations (so you need all the input records to have a complete one in Salesforce), consider to merge the records before to push the result into Salesforce.
And finally, if you can't do that, push the doublons in a temporary space, push the records but the doublons into Salesforce and iterate other this process until there is no more doublons.Personally, if you can't just eliminate the doublons, I prefer the 2nd approach as it's the solution to have less Salesforce API calls.
Hope this helps.
TRF

Merging partial duplicate cases without losing data

I have a question in regards to preparing my dataset for research.
I have a dataset in SPSS 20 in long format as I am researching on individual level over multiple years. However some individuals were added twice to my dataset because there were differences in some variables matched to those individuals (5000 individuals with 25 variables per individual). I would like to merge those duplicates so that I can run my analysis over time. For those variables that differ between the duplicates I would like spss to make additional variables when all the duplicates are merged.
Is this at all possible and if yes HOW?
I suggest following steps>
create auxiliary variable "PrimaryLast" with procedure Data->Identify Duplicate Cases by... , set "Define matching cases by" to your case ID
create 2 new auxiliary datasets with Data->Select Cases with condition "PrimaryLast = 0" and "PrimaryLast = 1" and selection "Copy selected cases to new dataset"
merge both auxiliary datasets with procedure Data -> Merge Files-> Add Variables, rename duplicated variable names in left box and move them in right box and select your case ID as key
don't forget to control if you made "full outer join", in case you lost non-duplicated cases and have only duplicated cases in your dataset, just merge datasets from step 2. in different order in step 3.
Try this:
sort cases by caseID otherVar.
compute ind=1.
if $casenum>1 and caseID=lag(caseID) ind=lag(ind)+1.
casestovars /id=caseID /index=ind.
If a caseID is repeated more then once, after restructure there will be only one line for that case, while all the variables will be repeated with indexes.
If the order of the caseID repeats, replace the otherVar in the sort command with the corresponding variable (e.g. date). This way your new variables will also be indexed accordingly.

Lua string library choices for finding and replacing text

I'm new to Lua programming, having come over from python to basically make a small addon for world of warcraft for a friend. I'm looking into various ways of finding a section of text from a rather large string of plain text. I need to extract the information from the text that I need and then process it in the usual way.
The string of text could be a number of anything, however the below is what we are looking to extract and process
-- GSL --
items = ["itemid":"qty" ,"itemid":"qty" ,"itemid":"qty" ,]
-- ENDGSL --
We want to strip the whole block of text from a potentially large block of text surrounding it, then remove the -- GSL -- and -- ENDGSL -- to be left with:
items = ["itemdid":"qty …
I've looked into various methods, and can't seem to get my head around any of them.
Anyone have any suggestions on the best method to tackle this problem?
EDIT: Additional problem,
Based on the accepted answer I've changed the code slightly to the following.
function GuildShoppingList:GUILDBANKFRAME_OPENED()
-- Actions to be taken when guild bank frame is opened.
if debug == "True" then self:Print("Debug mode on, guild bank frame opened") end
gslBankTab = GetCurrentGuildBankTab()
gslBankInfo = GetGuildBankText(gslBankTab)
p1 = gslBankInfo:match('%-%- GSL %-%-%s+(.*)%s+%-%- ENDGSL %-%-')
self:Print(p1)
end
The string has now changed slightly the information we are parsing is
{itemid:qty, itemid:qty, itemid:qty, itemid:qty}
Now, this is a string that's being called in p1. I need to update the s:match method to strip the { } also, and iterate over each item and its key seperated by, so I'm left with
itemid:qty
itemid:qty
itemid:qty
itemid:qty
Then I can identify each line individually and place it where it needs to go.
try
s=[[-- GSL --
items = ["itemid":"qty" ,"itemid":"qty" ,"itemid":"qty" ,]
-- ENDGSL --]]
print(s:match('%-%- GSL %-%-%s+(.*)%s+%-%- ENDGSL %-%-'))
The key probably is that - is a pattern modifier that needs quoting if you want a literal hyphen. More info on patterns in the Lua Reference Manual, chapter 5.4.1
Edit:
To the additional problem of looping through keys of what is almost an array, you could do 2 things:
Either loop over it as a string, assuming both key and quantity are integers:
p="{1:10, 2:20, 3:30}"
for id,qty in p:gmatch('(%d+):(%d+)') do
--do something with those keys:
print(id,qty)
end
Or slightly change the string, evaluate it as a Lua table:
p="{1:10, 2:20, 3:30}"
p=p:gsub('(%d+):','[%1]=') -- replace : by = and enclose keys with []
t=loadstring('return '..p)() -- at this point, the anonymous function
-- returned by loadstring get's executed
-- returning the wanted table
for k,v in pairs(t) do
print(k,v)
end
If the formats of keys or quantities is not simply integer, changing it in the patterns should be trivial.

Resources