abap loop multiple tables in sync - loops

I have encountered some issues making mass upload programs using abap. I have gathered all data into one table from web to sap table but then, I have to move content of these tables into 3 different tables in order to use other functions for this. Problem is, if I loop through one table, the other ones are not looping through sync.
TABLES
T_HEADER STRUCTURE ZCST5000
T_DETAIL STRUCTURE ZCSS5001
T_TIME STRUCTURE ZCSS5002
LOOP AT T_HEADER
PERFORM XTY TABLES T_DETAIL
USING LS_HEADER_TMP
PERFORM ZCC TABLES T_TIME
USING LS_HEADER_TMP
ENDLOOP
This is example code. So if I loop through T_HEADER, it only loops through T_HEADER and won't loop through T_DEAIL and T_TIME in sync. They all same same row counts since these structures are originally from 1 table. So when T_HEADER row 1 is running this program, row 1 of T_DETAIL, T_TIME should be picked up and T_HEADER moves to row 2, T_DETAIL, T_TIME row 2 should be picked up. How do I tackle this? :(

You can use the classic syntax READ TABLE itab INDEX n INTO struct or the new syntax struct = itab[ n ] to read a specific line of a table into a structure by line-number.
So when you have three tables, and you can guarantee that they all have the same number of rows which all correspond to the rows of the other table with the same row number, you can do it like this (modern syntax):
DO lines( t_header ) TIMES.
DATA(header) = t_header[ sy-index ].
DATA(detail) = t_detail[ sy-index ].
DATA(time) = t_time[ sy-index ].
"You now have the corresponding lines in the structure-variables header, details and time
ENDDO.
Or this (obsolete syntax using tables with header lines like in the question)
DATA num_lines TYPE i.
DESCRIBE TABLE t_header LINES num_lines.
DO num_lines TIMES.
READ TABLE t_header INDEX sy-index.
READ TABLE t_detail INDEX sy-index.
READ TABLE t_time INDEX sy-index.
"You now have the corresponding lines in the header-lines
ENDDO.
Note that these two snippets will behave slightly differently in the error-case when t_detail or t_time have fewer lines than t_header. The first snippet will throw an exception, which will abort the program unless caught. The second snippet will keep running with the header-line from the previous loop iteration.

Related

Random sample from another table's column

I am trying to figure out how to populate a "fake" column by choosing randomly from another table column.
So far this was easy using an array and the rantbl() function as there were not a lot of modalities.
data want;
set have;
array values[2] $10 _temporary_ ('NO','YES');
value=values[rantbl(0,0.5,0.5)];
array start_dates[4] _temporary_ (1735689600,1780358400,1798848000,1798848000);
START_DATE=start_dates[rantbl(0,0.25,0.25,0.25,0.25)];
format START_DATE datetime20.;
run;
However, my question is what happens if there are, for example, more than 150 modalities in the other table? Hence, is there a way to put into an array all the modalities that are in another table ? Or better, to populate the new "fake" column with modalities from another table's column with regards to the modalities's distribution in the other table ?
I'm not entirely sure, but here's how I interpret your request and how I would solve it.
You have a table one. You want to create a new data set want with an additional column. This column should have values that are sampled from a pool of values given in yet another data set two in column y. You want too simulate the new column in the want data set according to the distribution of y in the two data set.
So, in the example below, there should be a .5 change of simulating y = 3 and .25 for 1 and 2 respectively.
I think the way to go is not using arrays at all. See if this helps you.
data one;
do x = 1 to 1e4;
output;
end;
run;
data two;
input y;
datalines;
1
2
3
3
;
data want;
set one;
p = ceil(rand('uniform')*n);
set two(keep = y) nobs = n point = p;
run;
To verify that the new column resembles the distribution from the two data set:
proc freq data = want;
tables y / nocum;
run;
There are probably a dozen good ways to do this, which one being ideal depending on various details of your data - in particular, how performance sensitive this is.
The most SASsy way to do this, I would say, is to use PROC SURVEYSELECT. This generates a random sample of the size you want, and then merges it on. It is not the fastest way, but it is very easy to understand and is fast-ish as long as you aren't talking humungous data sizes.
data _null_;
set sashelp.cars nobs=nobs_cars;
call symputx('nobs_cars',nobs_Cars);
stop;
run;
proc surveyselect data=sashelp.class sampsize=&nobs_Cars out=names(keep=name)
seed=7 method=urs outhits outorder=random;
run;
data want;
merge sashelp.cars names;
run;
In this example, we are taking the dataset sashelp.cars, and appending an owner's name to each car, which we choose at random from the dataset sashelp.class.
What we're doing here is first determining how many records we need - the number of observations in the to-be-merged-to dataset. This step can be skipped if you know that already, but it takes basically zero time no matter what the dataset size.
Second, we use proc surveyselect to generate the random list. We use method=urs to ask for simple random sampling with replacement, meaning we take 428 (in this case) separate pulls, each time every row being equally likely to be chosen. We use outhits and outorder=random to get a dataset with one row per desired output dataset row and in a random order (without outhits it gives one row per input dataset row, and a number of times sampled variable, and without outrandom it gives them in sorted order). sampsize is used with our created macro variable that stores the number of observations in the eventual output dataset.
Third, we do a side by side merge (with no by statement, intentionally). Please note that in some installations, options mergenoby is set to give a warning or error for this particular usage; if so you may need to do this slightly differently, though it is easy to do so using two set statements (set sashelp.cars; set names;) to achieve the identical results.

SAS, array code, two indices, dropping records

I am looking through some code and wondering what this does. Below are the code comments. I'm still not sure what this code does even with the code comments. I have used arrays but not familiar with this code. It looks like this code dedupes by using two indices. Is that correct? So if there is a combination of CCS_DR_IDX and TXN_IDX, it will delete those records?
Now handle cases where the dollar matches. If ccs_dr_idx has already been used then delete the record. Dropped txns here will be added back in with the claim data called missing.
PROC SORT DATA=OUT.REQ_1_9_F_AMT_MATCH; BY CCS_DR_IDX DATEDIF; RUN;
DATA OUT.REQ_1_9_F_AMT_MATCH_V2;
SET OUT.REQ_1_9_F_AMT_MATCH;
ARRAY id_one{40000} id_one1-id_one40000;
ARRAY id_two{40000} id_two1-id_two40000;
RETAIN id_one1-id_one40000 id_two1-id_two40000;
IF _n_=1 then i=1;
else i+1;
do j=1 to i;
if CCS_DR_IDX=id_one{j} then delete;
end;
do k = 1 to i;
if TXN_IDX = id_two{k} then delete;
end;
id_one{i}=CCS_DR_IDX;
id_two{i}=TXN_IDX;
drop i j k id_one1-id_one40000 id_two1-id_two40000;
run;
The sort is
BY CCS_DR_IDX DATEDIF;
The filtering or selecting occurs when control reaches the bottom of the data step and implicitly OUTPUTs. That occurs only if CCS_DR_IDX and TXN_IDX is a combination where neither has appeared previously.
Since you have sorted by CCS_DR_IDX you can know there is implicit grouping and there will be at most one record per CCS_DR_IDX output, and for the first group it must be the first record in the group. Each successive row in a CCS_DR_IDX group, post output, will match an entry in id_one and be tossed away by DELETE.
When you start processing the next CCS_DR_IDX group the rows will be processed until you reach the next distinct TXN_ID with respect to those tracked in id_two. Because the sort had a second key DATDIF you can say the output is "a selection of the first occurring combinations of unique pair items CCS_DR_IDX TXN_ID" (somewhat akin to pair-sampling without repeats.)
There could be a case where some CCS_DR_IDX is not in the output -- that would happen when the group contains only TX_IDs that occurred in prior CCS_DR_IDXs.
Without seeing the data model and combination reasons (probably some sort of cartesian join) it's hard to make a less vague statement of what is being selected.

SSRS - Print order issue

I have the following scenario: I have a matrix table with dynamic rows and columns, the rows have three groups Product_Type, Manufacturer and Supplier the Columns are stores that are dynamically generated. All the data comes from a single DataSet returned from a stored procedure in SQL Server.
The rows are to big too fit on a single screen and the columns as well, now when this happens the print order must be as follow: if the columns overflows then the columns must print on the following page continuing with all of the rows (the 3 types). If the rows overflows but not the columns then that columns must display on the following page for the remaining rows. These two scenarios is quite straight forward and I already got it to work. If both the columns both overflows it must print as the following picture:
I am struggling to get it right, I am not a novice in SSRS and only know the fundamentals. I have struggled quite a while trying to figure it out and can't seem to get it right.
Any expert advice will be much appreciated.
One way to achieve this if you have two matrixes one after the other.
The first you hide any columns # > x (where x is the number of columns you can fit onto a page) and subsequent matrix hide the column # <= x
Matrix 1
Matrix 2

join columns of arrays in matlab

I have the following inputs
dataset 1 with tens of thousands of rows and 5 array columns
dataset 2 with tens of thousands of rows and 3 array columns
I want to join/merge (add) the 3th column of dataset 1 to a new 4th array column of dataset 2 for the elements for which the ID is the same (same value in column 1 of dataset 1 and column 1 of dataset 2). Mathematically you can write it like this I think:
dataset2(i,4)=dataset1(find(dataset1(:,1)==c(i,1)),3);
but how to put it in MATLAB?
None of the methods mentioned in the MATLAB help function or elsewhere on the internet seem to work. I have already tried merge, join, ismember, vectors, but I can't solve the problem.
Does someone have any ideas? I know the problem can be solved with for loops, but i'm not allowed to use them, so I am searching for alternatives.
I believe this is what you want
%We keep the index of all the matching rows
%NOTICE: I changed c(i,1) to dataset2(:,1)
%matches_in_col_1 = find(dataset1(:,1)==dataset2(:,1));
%EDIT: HOW TO COMPARE MORE THAN 2 COLUMNS
%if you want to find matches in 4 datasets just use this
matches_in_col_1 = find(dataset1(:,1)==dataset2(:,1)==dataset3(:,1)==dataset4(:,1));
%now copy the values from those rows into the corresponding row
%of datsaset2
dataset2(matches_in_col_1,4) = dataset1(matches_in_col_1,3);
I'm not 100% sure. Why is i present? were you trying a loop implementation? My solution also assumes that c was supposed to be dataset2

Finding arrays that contain a subset of another array without using #> with postgreSQL

I have a table with 1.5 MM records. Each record has a row number and an array with between 1 and 1,000 elements in the array. I am trying to find all of the arrays that are a subset of the larger arrays.
When I use the code below, I get ERROR: statement requires more resources than resource queue allows (possibly because there are over a trillion possible combinations):
select
a.array as dup
from
table a
left join
table b
on
b.array #> a.array
and a.row_number <> b.row_number
Is there a more efficient way to identify which arrays are subsets of the other arrays and mark them for removal other than using #>?
Your example code suggests that you are only interested in finding arrays that are subsets of any other array in another row of the table.
However, your query with a JOIN returns all combinations, possibly multiplying results.
Try an EXISTS semi-join instead, returning qualifying rows only once:
SELECT a.array as dup
FROM table a
WHERE EXISTS (
SELECT 1
FROM table b
WHERE a.array <# b.array
AND a.row_number <> b.row_number
);
With this form, Postgres can stop iterating rows as soon as the first match is found. If this won't go through either, try partitioning your query. Add a clause like
AND table_id BETWEEN 0 AND 10000
and iterate through the table. Should be valid for this case.
Aside: it's a pity that your derivate (Greenplum) doesn't seem to support GIN indexes, which would make this operation much faster. (The index itself would be big, though)
Well, I don't see how to do this efficiently in a single declarative SQL statement without appropriate support from an index. I don't know how well this would work with a GIN index, but using a GIN index would certainly avoid the need to compare every possible pair of rows.
The first thing I would do is to carefully investigate the types of indexes you do have at your disposal, and try creating one as necessary.
If that doesn't work, the first thing that comes to my mind, procedurally speaking, would be to sort all the arrays, then sort the rows into a graded lexicographic order on the arrays. Then start with the shortest arrays, and work upwards as follows: e.g. for [1,4,9], check all the arrays with length <= 3 that start with 1 if they are a subset, then check all arrays with length <= 2 that start with 4, and then check all the arrays of length <= 1 that start with 9, removing any found subsets from consideration as you go so that you don't keep re-checking the same rows over and over again.
I'm sure you could tune this algorithm a bit, especially depending on the particular nature of the data involved. I wouldn't be surprised if there was a much better algorithm; this is just the first thing that I thought of. You may be able to work from this algorithm backwards to the SQL you want, or you may have to dump the table for client-side processing, or some hybrid thereof.

Resources