I have a dataset that has patient data according to the site they visited our mobile clinic. I have now written up a series of commands such as freqs and crosstabs to produce the analyses I need, however I would like this to be done for patients at each site, rather than the dataset as whole.
If I had only one site, a mere filter command with the variable that specifies a patient's site would suffice, but alas I have 19 sites, so I would like to find a way to loop through my code to produce these outputs for each site. That is to say for i in 1 to 19:
1. Take the i th site
2. Compute a filter for this i th site
3. Run the tables using this filtered data of patients at ith site
Here is my first attempt using DO REPEA. I also tried using LOOP earler.
However it does not work I keep getting an error even though these are closed loops.
Is there a way to do this in SPSS syntax? Bear in mind I do not know Python well enough to do this using that plugin.
*LOOP #ind= 1 TO 19 BY 1.
DO REPEAT #ind= 1 TO 20.
****8888888888888888888888888888888888888888888888888888888 Select the Site here.
COMPUTE filter_site=(RCDSITE=#ind).
USE ALL.
FILTER BY filter_site.
**********************Step 3: Apply the necessary code for tables
*********Participation in the wellness screening, we actually do not care about those who did FP as we are not reporting it.
COUNT BIO= CheckB (1).
* COUNT FPS=CheckF(1).
* COUNT BnF= CheckB CheckF(1).
VAL LABEL BIO
1 ' Has the Wellness screening'
0 'Does not have the wellness screening'.
*VAL LABEL FPS
1 'Has the First patient survey'.
* VAL LABEL BnF
1 'Has either Wellness or FPS'
2 'Has both surveys done'.
FREQ BIO.
*************************Use simple math to calcuate those who only did the Wellness/First Patient survey FUB= F+B -FnB.
*******************************************************Executive Summary.
***********Blood Pressure.
FREQ BP.
*******************BMI.
FREQ BMI.
******************Waist Circumference.
FREQ OBESITY.
******************Glucose.
FREQ GLUCOSE.
*******************Cholesterol.
FREQ TC.
************************ Heamoglobin.
FREQ HAEMOGLOBIN.
*********************HIV.
FREQ HIV.
******************************************************************************I Lifestyle and General Health.
MISSING VALUES Gender GroupDep B8 to B13 ('').
******************Graphs 3.1
Is this just Frequencies you are producing? Try the SPLIT procedure by the variable RCDSITE. Should be enough.
SPLIT FILES allows you to partition your data by up to eight variables. Then each procedure will automatically iterate over each group.
If you need to group the results at a higher level than the procedure, that is, to run a bunch of procedures for each group before moving on to the next one so that all the output for a group will be together, you can use the SPSSINC SPLIT DATASET and SPSSINC PROCESS files extension commands to do this.
These commands require the Python Essentials. That and the commands can be downloaded from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) if you have at least version 18.
HTH,
Jon Peck
A simple but perhaps not very elegant way is to select from the menu: Data/Select Cases/If condition, there you enter the filter for site 1 and press Paste, not OK.
This will give the used filter as syntax code.
So with some copy/paste/replace/repeat you can get the freqs and all other results based on the different sites.
Related
Summary
I have a database of ~1500 entries. Some of them have more than one entry (up to 5) under the same identifier. I have a pre-set list of IDs I would like to check to see if they have multiple records, and if so, pull all of them, so I can compare them for differences.
Current solution
I currently have this working correctly, using a method to pull the 1st, 2nd, 3rd, 4th and 5th response respectively. The current formula is:
=INDEX('DATABASE'!$B:$B,SMALL(IF($A2='DATABASE'!$A:$A,ROW('DATABASE'!$A:$A)-ROW('DATABASE'!$A$1)+1),1)) for the 1st result, =INDEX('DATABASE'!$B:$B,SMALL(IF($A2='DATABASE'!$A:$A,ROW('DATABASE'!$A:$A)-ROW('DATABASE'!$A$1)+1),2)) for the second, and so on. In this case, the above code would check the value in A2 (ID2Check), and then do a match for it on the 'DATABASE' sheet in column A. If it matches, it will return the first value that matches from column B. The second piece of code above does the same, except it returns the second match from column B. ID2Check is populated by me, and the Results 1-5 are the code output. For example:
Database:
ID
Data Item
AAA
Lemons
111
Greenhouse
FOO
Computer
AAA
Monitor
CAT
Coffee
ORANGE
Pintglass
123
Birthday
FOO
Avengers
AAA
Plasters
FOO
NachoTaco
Code output:
ID2Check
Result 1
Result 2
Result 3
Result 4
Result 5
AAA
Lemons
Monitor
123
Birthday
FOO
Computer
Avengers
NachoTaco
Problem
This solution works correctly, and will pull the expected values as required. The problem is, as this code needs to be entered 5 times per identifier, and there are ~1500 records to check, I need to flash fill 7500 formulas. Unsurprisingly, this leads to Excel crashing and becoming unresponsive for the best part of 30 minutes. I am looking to find a solution that would be more efficient and stable when running en-mass. Any assistance would be appreciated.
Final Notes / Info
I'm using MSO 365, version 2202 (I cannot update beyond this). This will be run in the Desktop version of Excel. I would prefer this is done exclusively using formulas, but I am open to using Visual Basic if it would be otherwise impossible or incredibly inefficient.
=UNIQUE(C5:C15)
=TRANSPOSE(FILTER($D$5:$D$15;$C$5:$C$15=F5))
If you have access to HSTACK() function then could try-
=HSTACK(UNIQUE(A2:A11),TEXTSPLIT(TEXTJOIN("/",FALSE,BYROW(UNIQUE(A2:A11),LAMBDA(x,TEXTJOIN("|",TRUE,FILTER(B2:B11,A2:A11=x))))),"|","/",,,""))
I have a 3rd party database that contains Invoice data I need to report on. The Quantity and Amount Fields are stored as Positive numbers regardless of whether the "invoice" is a Credit Memo or actual Invoice. There is a single character field that contains the Type "I" = Invoice, "R" = Credit.
In a report that is equating 1.4 million records, I need to sum this data, so that Credits subtract from the total and Invoices add to the total, and I need to do this for 8 different columns in the report (CurrentYear, PreviousYear, etc)
My problem is performance of the many different ways to achieve this.
The Best performing seems to be using a CASE statement within the equation like so:
Case WHEN ARH.AccountingYear - 2 = #iCurrentYear THEN ARL.ShipQuantity * (CASE WHEN InvoiceType = 'R' THEN -1 ELSE 1 END) ELSE 0 END as PPY_INVOICED_QTY
But code readable wise, this is super ugly since I have to do it to 8 different columns, performance is good, runs against all 1.4M records in 16 seconds.
Using a Scalar UDF kills performance
Case WHEN ARH.AccountingYear - 2 = #iCurrentYear THEN ARL.ShipQuantity * dbo.fn_GetMultiplier(ARH.InvoiceType) ELSE 0 END as PPY_INVOICED_QTY
Takes almost 5 minutes. So can't do that.
Other options I can think of would be:
Multiple levels of Views, use a new view to add a Multiplier column, then SELECT from that and do the multiplication using the new column
Build a table that has 2 columns and 2 records, R, -1 and I, 1, and join it based on InvoiceType, but this seems excessive.
Any other ideas I am missing, or suggestions on best practice for this sort of thing? I cannot change the stored data, that is established by the 3rd party application.
I decided to go with the multiple views as Igor suggested, actually using the nested version, even though readability is lower, maintenance is easier due to only 1 named view instead of 2. Performance is similar to the 8 different case statements, so overall running in just under 20 seconds.
Thanks for the insights.
Ok, from the title it seems to be impossible to understand, I'll try to be as clear as possible.
Basically, I have a table, let's call it 'records'. In this table I have some products, of which I store 'id', 'codex' (which is a unique identifier for a certain product in the whole database), 'price' and 'situation'. This last one is a string which tells me wether the product has just entered the store (in that case it is set to 'IN'), or it has already been sold ('OUT' in this case).
The database was not created by us, I HAVE to work with that although it is horribly structured... The guy who originally projected the database decided to register when a product's situation passes from 'IN' to 'OUT' in the following way: instead of UPDATEing the corresponding value in the table, he used to take the row of data with 'IN' as situation, and to DUPLICATE it setting, that time, 'OUT' as situation.
Just to sum up: if a product has not been sold yet, it will have one row of dedicated data; otherwise those rows will be two, identical except for the 'situation' field.
What I need to do is: select a product if (and ONLY if) there is no duplicate for it. Basically, I can (and should) look for a 'codex', and if I my Count(codex) ends up being >1, I do not select the row.
I hope the explanation of the process is clear enough...
I tryed many alternative (no, SELECT DISTINCT is not a solution): des anyone have an idea of how to do that? Because really, none of us three could come up with a good solution!
Here is the schema for the table, I hope it is sufficiently clear, and if not do not hesitate asking for more details.
Just as a reminder: the project is in (sigh...) VB.net, the database is in Microsoft Access (mdb).
I could not find a solution on StackOverFlow, I hope this is not a duplicate question! Thanks in advance for the help.
id codex price situation
1 1 2.50 IN
2 1 2.50 OUT
3 2 3.45 IN
4 3 21.50 IN
5 2 3.45 OUT
6 4 1.50 IN
To check if I understand what your problem is... In your example table you just want to get the lines with ID 4 a 6, right?
If is that what you want, and If you want only the not sold ones try this command
SELECT
*
FROM
records
WHERE
codex
not in
(
SELECT
codex
FROM
records
WHERE
situation ='OUT'
)
I have a table that contains test marks from different terms, ca1_percent, sa1_percent, ca2_percent and sa2_percent. These 4 fields reside in the Results table that contains results from the different terms.
I used a self-relationship linking using the matched field overall_percent_match which is calculated using year & " " & subject & " " & _kf_studentID. This relationship allows me to obtain the test results from past terms (of a year). For example, my term 3 results will contain results from term 1 and term 2 (of each subject). All works fine unless there is a new student who joins mid way of the year. If he joins in term 3, his ca2 results (done in term 3) will fall into his ca1_percent column (which is supposed to contain term 1 results) like other records before him.
Image shows what I mean.
I could not figure out the solution. Can anyone help me?
This StackOverflow link contains more details of my work that was done related to this problem.
The underlying problem, per your prior query, is that you're pulling the values through:
GetNthRecord(SA1_Results_Match::mark_percent,2)
This statement assumes the existence of an N=1, N=2 and N=3. To make this work properly you could do any of the following:
Ensure that your Results table always has records from the prior semester, even if the student joins later in the semester. You could keep using GetNthRecord this way, but you will always need to ensure that the records are in order.
Use an ExecuteSQL statement to gather only the correct semester's results for the correct summary field.
Make four separate relationships, with separate Table Occurrences, to define ca1, sa1, ca2 and sa2 each separately. This looks like what you started out trying to do in the prior question.
The problem that I have is SQL Server Reporting Services does not like Sum(First()) notation. It will only allow either Sum() or First().
The Context
I am creating a reconciliation report. ie. what sock we had a the start of a period, what was ordered and what stock we had at the end.
Dataset returns something like
Type,Product,Customer,Stock at Start(SAS), Ordered Qty, Stock At End (SAE)
Export,1,1,100,5,90
Export,1,2,100,5,90
Domestic,2,1,200,10,150
Domestic,2,2,200,20,150
Domestic,2,3,200,30,150
I group by Type, then Product and list the customers that bought that product.
I want to display the total for SAS, Ordered Qty, and SAE but if I do a Sum on the SAS or SAE I get a value of 200 and 600 for Product 1 and 2 respectively when it should have been 100 and 200 respectively.
I thought that i could do a Sum(First()) But SSRS complains that I can not have an aggregate within an aggregate.
Ideally SSRS needs a Sum(Distinct())
Solutions So Far
1. Don't show the Stock at Start and Stock At End as part of the totals.
2. Write some code directly in the report to do the calc. tried this one - didn't work as I expected.
3. Write an assembly to do the calculation. (Have not tried this one)
Edit - Problem clarification
The problem stems from the fact that this is actually two reports merged into one (as I see it). A Production Report and a sales report.
The report tried to address these criteria
the market that we sold it to (export, domestic)
how much did we have in stock,
how much was produced,
how much was sold,
who did we sell it to,
how much do we have left over.
The complicating factor is the who did we sell it to. with out that, it would have been relativly easy. But including it means that the other top line figures (stock at start and stock at end) have nothing to do with the what is sold, other than the particular product.
I had a similar issue and ended up using ROW_NUMBER in my query to provide a integer for the row value and then using SUM(IIF(myRowNumber = 1, myValue, 0)).
I'll edit this when I get to work and provide more data, but thought this might be enough to get you started. I'm curious about Adolf's solution too.
Pooh! Where's my peg?!
Have you thought about using windowing/ranking functions in the SQL for this?
This allows you to aggregate data without losing detail
e.g. Imagine for a range of values, you want the Min and Max returning, but you also wish to return the initial data (no summary of data).
Group Value Min Max
A 3 2 9
A 7 2 9
A 9 2 9
A 2 2 9
B 5 5 7
B 7 5 7
C etc..
Syntax looks odd but its just
AggregateFunctionYouWant OVER (WhatYouWantItGroupedBy, WhatYouWantItOrderedBy) as AggVal
Windowing
Ranking
you're dataset is a little weird but i think i understand where you're going.
try making the dataset return in this order:
Type, Product, SAS, SAE, Customer, Ordered Qty
what i would do is create a report with a table control. i would set up the type, product, and customer as three separate groups. i would put the sas and sae data on the same group as the product, and the quantity on the customer group. this should resemble what i believe you are trying to go for. your sas and sae should be in a first()
Write a subquery.
Ideally SSRS needs a Sum(Distinct())
Re-write your query to do this correctly.
I suspect your problem is that you're written a query that gets you the wrong results, or you have poorly designed tables. Without knowing more about what you're trying to do, I can't tell you how to fix it, but it has a bad "smell".