ABAP - Group employees by cost center and calculate sum - loops

I have an internal table with employees. Each employee is assigned to a cost center. In another column is the salary. I want to group the employees by cost center and get the total salary per cost center. How can I do it?
At first I have grouped them as follows:
Loop at itab assigning field-symbol(<c>)
group by <c>-kostl ascending.
Write: / <c>-kostl.
This gives me a list of all cost-centers. In the next step I would like to calculate the sum of the salaries per cost center (the sum for all employees with the same cost-center).
How can I do it? Can I use collect?
Update:
I have tried with the follwing coding. But I get the error "The syntax for a method specification is "objref->method" or "class=>method"". lv_sum_salary = sum( <p>-salary ).
loop at i_download ASSIGNING FIELD-SYMBOL(<c>)
GROUP BY <c>-kostl ascending.
Write: / <c>-kostl, <c>-salary.
data: lv_sum_salary type p DECIMALS 2.
Loop at group <c> ASSIGNING FIELD-SYMBOL(<p>).
lv_sum_salary = sum( <p>-salary ).
Write: /' ',<p>-pernr, <p>-salary.
endloop.
Write: /' ', lv_sum_salary.
endloop.

I am not sure where you got the sum function from, but there is no such build-in function. If you want to calculate a sum in a group-by loop, then you have to do it yourself.
" make sure the sum is reset to 0 for each group
CLEAR lv_sum_salary.
" Do a loop over the members of this group
LOOP AT GROUP <c> ASSIGNING FIELD-SYMBOL(<p>).
" Add the salary of the current group-member to the sum
lv_sum_salary = lv_sum_salary + <p>-salary.
ENDLOOP.
" Now we have the sum of all members
WRITE |The sum of cost center { <c>-kostl } is { lv_sum_salary }.|.

Generally speaking, to group and sum, there are these 4 possibilities (code snippets provided below):
SQL with an internal table as source: SELECT ... SUM( ... ) ... FROM #itab ... GROUP BY ... (since ABAP 7.52, HANA database only); NB: beware the possible performance overhead.
The classic way, everything coded:
Sort by cost center
Loop at the lines
At each line, add the salary to the total
If the cost center is different in the next line, process the total
LOOP AT with GROUP BY, and LOOP AT GROUP
VALUE with FOR GROUPS and GROUP BY, and REDUCE and FOR ... IN GROUP for the sum
Note that only the option with the explicit sorting will sort by cost center, the other ones won't provide a result sorted by cost center.
All the below examples have in common these declarative and initialization parts:
TYPES: BEGIN OF ty_itab_line,
kostl TYPE c LENGTH 10,
pernr TYPE c LENGTH 10,
salary TYPE p LENGTH 8 DECIMALS 2,
END OF ty_itab_line,
tt_itab TYPE STANDARD TABLE OF ty_itab_line WITH EMPTY KEY,
BEGIN OF ty_total_salaries_by_kostl,
kostl TYPE c LENGTH 10,
total_salaries TYPE p LENGTH 10 DECIMALS 2,
END OF ty_total_salaries_by_kostl,
tt_total_salaries_by_kostl TYPE STANDARD TABLE OF ty_total_salaries_by_kostl WITH EMPTY KEY.
DATA(itab) = VALUE tt_itab( ( kostl = 'CC1' pernr = 'E1' salary = '4000.00' )
( kostl = 'CC1' pernr = 'E2' salary = '3100.00' )
( kostl = 'CC2' pernr = 'E3' salary = '2500.00' ) ).
DATA(total_salaries_by_kostl) = VALUE tt_total_salaries_by_kostl( ).
and the expected result will be:
ASSERT total_salaries_by_kostl = VALUE tt_total_salaries_by_kostl(
( kostl = 'CC1' total_salaries = '7100.00' )
( kostl = 'CC2' total_salaries = '2500.00' ) ).
Examples for each possibility:
SQL on internal table:
SELECT kostl, SUM( salary ) AS total_salaries
FROM #itab AS itab ##DB_FEATURE_MODE[ITABS_IN_FROM_CLAUSE]
GROUP BY kostl
INTO TABLE #total_salaries_by_kostl.
Classic way:
SORT itab BY kostl.
DATA(next_line) = VALUE ty_ref_itab_line( ).
DATA(total_line) = VALUE ty_total_salaries_by_kostl( ).
LOOP AT itab REFERENCE INTO DATA(line).
DATA(next_kostl) = VALUE #( itab[ sy-tabix + 1 ]-kostl OPTIONAL ).
total_line-total_salaries = total_line-total_salaries + line->salary.
IF next_kostl <> line->kostl.
total_line-kostl = line->kostl.
APPEND total_line TO total_salaries_by_kostl.
CLEAR total_line.
ENDIF.
ENDLOOP.
EDIT: I don't talk about AT NEW and AT END OF because I'm not fan of them, as they don't explicitly define the possible multiple fields, they implicitly consider all the fields before the mentioned field + this field included. I also ignore ON CHANGE OF, this one being obsolete.
LOOP AT with GROUP BY:
LOOP AT itab REFERENCE INTO DATA(line)
GROUP BY ( kostl = line->kostl )
REFERENCE INTO DATA(kostl_group).
DATA(total_line) = VALUE ty_total_salaries_by_kostl(
kostl = kostl_group->kostl ).
LOOP AT GROUP kostl_group REFERENCE INTO line.
total_line-total_salaries = total_line-total_salaries + line->salary.
ENDLOOP.
APPEND total_line TO total_salaries_by_kostl.
ENDLOOP.
VALUE with FOR and GROUP BY, and REDUCE for the sum:
total_salaries_by_kostl = VALUE #(
FOR GROUPS <kostl_group> OF <line> IN itab
GROUP BY ( kostl = <line>-kostl )
( kostl = <kostl_group>-kostl
total_salaries = REDUCE #( INIT sum = 0
FOR <line_2> IN GROUP <kostl_group>
NEXT sum = sum + <line_2>-salary ) ) ).

Related

How to extract data from an Array looking JSON in BigQuery

I am trying to extract data from the variable metric_data that looks like an Array but it's a JSON.
This is an example:
[{"segmentName":"control","values":[[1588636800000.0,101],[1588723200000.0,546],[1588809600000.0,1195],[1591056000000.0,129]]},{"segmentName":"experiment","values":[[1588636800000.0,91],[1588723200000.0,680],[1588809600000.0,1214],[1588896000000.0,1269],.0,290],[1589760000000.0,248],[1589846400000.0,173],[1589932800000.0,167],[1590019200000.0,178],[1590105600000.0,131],[1590192000000.0,110]]}]
I am specifically trying to sum up the second part of the sub-arrays associated with the key "value" so that I have a row for each segmentName and sum of its values. I only got as far as transforming into an array.
SELECT
array(select
x
FROM UNNEST(JSON_EXTRACT_ARRAY(metric_data, '$')) x
) extracted
FROM temp
Based from my understanding you would like to get the sum for each "segmentName". Two possibilities could be to sum everything (both array elements) or get the sum per element. But if my understanding is wrong please let me know so I can edit/delete my answer.
You can consider the queries below illustrating these two possibilities:
Sum of values
with sample_data as (
select '[{"segmentName":"control","values":[[1588636800000.0,101],[1588723200000.0,546],[1588809600000.0,1195],[1591056000000.0,129]]},{"segmentName":"experiment","values":[[1588636800000.0,91],[1588723200000.0,680],[1588809600000.0,1214],[1588896000000.0,1269],[1588896000000.0,290],[1589760000000.0,248],[1589846400000.0,173],[1589932800000.0,167],[1590019200000.0,178],[1590105600000.0,131],[1590192000000.0,110]]}]' as json_string
)
-- Sum all values
select
json_query(js,'$.segmentName') as segment_name,
sum(cast(arr_val as NUMERIC)) as sum_of_values
from sample_data
,unnest(json_query_array(json_string, '$')) js
,unnest(json_query_array(js,'$.values')) val
,unnest(json_query_array(val,'$')) arr_val
group by 1
Sum of values output:
Sum per element of values
with sample_data as (
select '[{"segmentName":"control","values":[[1588636800000.0,101],[1588723200000.0,546],[1588809600000.0,1195],[1591056000000.0,129]]},{"segmentName":"experiment","values":[[1588636800000.0,91],[1588723200000.0,680],[1588809600000.0,1214],[1588896000000.0,1269],[1588896000000.0,290],[1589760000000.0,248],[1589846400000.0,173],[1589932800000.0,167],[1590019200000.0,178],[1590105600000.0,131],[1590192000000.0,110]]}]' as json_string
)
,cte as (
select
json_query(js,'$.segmentName') as segment_name,
split(regexp_extract(val,r'\[(\d+\.?\d+?,\d+)\]'),',') as new_value
from sample_data
,unnest(json_query_array(json_string, '$')) js
,unnest(json_query_array(js,'$.values')) val
)
select
segment_name,
sum(cast(new_value[offset(0)] as numeric)) as elem1,
sum(cast(new_value[offset(1)] as numeric)) as elem2
from cte
group by segment_name
Sum per element of values output:
NOTE: Your JSON string is missing some brackets and values, hence I created a dummy value.

loop with condition ignored by the system in sap abap

I try to applied loop with condition to sum up the respective row(field), the where condition should be correct but during running of the system, the program ignored the condition and sum up all rows, any suggestion to fix this problem?
SELECT * FROM LIPS INTO CORRESPONDING FIELDS OF TABLE LT_LIPS
WHERE VGBEL = LT_BCODE_I-VGBEL "getDN number
AND VGPOS = LT_BCODE_I-VGPOS. " get vgpos = 01/02/03
LOOP AT LT_BCODE_I INTO LT_BCODE_I WHERE VGBEL = LT_LIPS-VGBEL AND VGPOS = LT_LIPS-VGPOS.
SUM.
LT_BCODE_I-MENGE = LT_BCODE_I-MENGE.
ENDLOOP
.
Although you are asking about LOOP, I think the issue is more about how you use SUM.
The statement SUM can only be specified within a loop LOOP and is only respected within a AT-ENDAT control structure.
Here is an excerpt from the ABAP documentation, for "Calculation of a sum with SUM at AT LAST. All lines of the internal table are evaluated":
DATA:
BEGIN OF wa,
col TYPE i,
END OF wa,
itab LIKE TABLE OF wa WITH EMPTY KEY.
itab = VALUE #( FOR i = 1 UNTIL i > 10 ( col = i ) ).
LOOP AT itab INTO wa.
AT LAST.
SUM.
cl_demo_output=>display( wa ).
ENDAT.
ENDLOOP.

How do I nest if then statements with select statements including other if then statements?

I'm fairly new to SQL and can't figure out how to combine several if .. then statements.
What is the right syntax for this?
I'm using SQL Server Management Studio 2017.
I've tried to combine if... else if..statements and I tried using case statements, but I always get lost in the nesting of the statements.
I have several condtions whom have to be met before I can execute some sort of calculation.
It should be something like this:
If CalculationMethod = x
and if (Price * coefficient) < Amount
then CalculatedAmount = Amount
else CalculatedAmount = (Price * coefficient)
Where Amount has it's own if statements:
Amount =
If Category = a and DistanceFrom <= Distance >= DistanceUntill then take amount from that particular cell
If Category = b and DistanceFrom <= Distance >= DistanceUntill then take amount from that particular cell
If Category = c and DistanceFrom <= Distance >= DistanceUntill then take amount from that particular cell
In this case, Amount is a cell in a table with columns DistanceFrom, DistanceUntill, a, b and c.
CalculationMethod and Coefficient are columns in another table.
Price is a column in third table.
In the end I want the CalculatedAmount based on the Amount, Price and Coefficient.
Does this make any sense? Does anyone has an idea on how to tackle this?
If you have an IF...THEN...ELSE type of scenario I think the right direction would be to use a CASE statement such as:
SELECT CASE WHEN CalculationMethod = x AND ((Price * coefficient) < Amount) THEN Amount
ELSE (Price * coefficient) END CalculatedAmount
You can read about it here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/case-transact-sql?view=sql-server-2017
An IIF clause works very well when there is only one decision branch, but if you have multiple things to choose from the CASE statement is the way to go.
SELECT IIF(CalculationMethod = x and Price * coefficient < Amount, Amount, Price * coefficient) as CalculatedAmount
FROM aTable

T-SQL : Cannot perform an aggregate function on an expression containing an aggregate or a subquery

I am trying to add the result of the total of some amount and substract it to the total but i see the following error:
Imagine something like this
First Subquery : 1 3 5 7
Second Subquery : 2 4 6
Total : (1+3+5+7) - (2+4+6) = 4
This is my query but as I said i see the following error:
Select SUM ((
(select SUM (amount) FROM transfer tr1
where transfer_type = 'Positive' group by transfer_id)
EXCEPT
(SELECT SUM (amount) from transfer tr2
where transfer_type = 'Negative' group by transfer_id)))
How could I convert the query not to see the error :
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Many thanks in advance
You can construct a query in a way to turn additions into subtractions for 'Negative' values, like this:
SELECT
transfer_id
, SUM (
CASE 'transfer_type'
WHEN 'Positive' THEN amount
WHEN 'Negative' THEN -amount
ELSE NULL
END
) AS total
FROM transfer
GROUP BY transfer_id
Now a single SUM is used, with the sign of the addition controlled by the CASE expression.

The n-gram that is the most frequent one among all the words

I came across the following programming interview problem:
Challenge 1: N-grams
An N-gram is a sequence of N consecutive characters from a given word. For the word "pilot" there are three 3-grams: "pil", "ilo" and "lot".
For a given set of words and an n-gram length
Your task is to
• write a function that finds the n-gram that is the most frequent one among all the words
• print the result to the standard output (stdout)
• if there are multiple n-grams having the same maximum frequency please print the one that is the smallest lexicographically (the first one according to the dictionary sorting order)
Note that your function will receive the following arguments:
• text
○ which is a string containing words separated by whitespaces
• ngramLength
○ which is an integer value giving the length of the n-gram
Data constraints
• the length of the text string will not exceed 250,000 characters
• all words are alphanumeric (they contain only English letters a-z, A-Z and numbers 0-9)
Efficiency constraints
• your function is expected to print the result in less than 2 seconds
Example
Input
text: “aaaab a0a baaab c”
Output aaa
ngramLength: 3
Explanation
For the input presented above the 3-grams sorted by frequency are:
• "aaa" with a frequency of 3
• "aab" with a frequency of 2
• "a0a" with a frequency of 1
• "baa" with a frequency of 1
If I have only one hour to solve the problem and I chose to use the C language to solve it: is it a good idea to implement a Hash Table to count the frequency of the N-grams with that amount of time? because in the C library there is no implementation of a Hash Table...
If yes, I was thinking to implement a Hash Table using separate chaining with ordered linked lists. Those implementations reduce the time that you have to solve the problem....
Is that the fastest option possible?
Thank you!!!
If implementation efficiency is what matters and you are using C, I would initialize an array of pointers to the starts of n-grams in the string, use qsort to sort the pointers according to the n-gram that they are part of, and then loop over that sorted array and figure out counts.
This should execute fast enough, and there is no need to code any fancy data structures.
Sorry for posting python but this is what I would do:
You might get some ideas for the algorithm. Notice this program solves an order of magnitude more words.
from itertools import groupby
someText = "thibbbs is a test and aaa it may haaave some abbba reptetitions "
someText *= 40000
print len(someText)
n = 3
ngrams = []
for word in filter(lambda x: len(x) >= n, someText.split(" ")):
for i in range(len(word)-n+1):
ngrams.append(word[i:i+n])
# you could inline all logic here
# add to an ordered list for which the frequiency is the key for ordering and the paylod the actual word
ngrams_freq = list([[len(list(group)), key] for key, group in groupby(sorted(ngrams, key=str.lower))])
ngrams_freq_sorted = sorted(ngrams_freq, reverse=True)
popular_ngrams = []
for freq in ngrams_freq_sorted:
if freq[0] == ngrams_freq_sorted[0][0]:
popular_ngrams.append(freq[1])
else:
break
print "Most popular ngram: " + sorted(popular_ngrams, key=str.lower)[0]
# > 2560000
# > Most popular ngram: aaa
# > [Finished in 1.3s]**
So the basic recipe for this problem would be:
Find all n-grams in string
Map all duplicate entries into a new structure that has the n-gram and the number of times it occurs
You can find my c++ solution here: http://ideone.com/MNFSis
Given:
const unsigned int MAX_STR_LEN = 250000;
const unsigned short NGRAM = 3;
const unsigned int NGRAMS = MAX_STR_LEN-NGRAM;
//we will need a maximum of "the length of our string" - "the length of our n-gram"
//places to store our n-grams, and each ngram is specified by NGRAM+1 for '\0'
char ngrams[NGRAMS][NGRAM+1] = { 0 };
Then, for the first step - this is the code:
const char *ptr = str;
int idx = 0;
//notTerminated checks ptr[0] to ptr[NGRAM-1] are not '\0'
while (notTerminated(ptr)) {
//noSpace checks ptr[0] to ptr[NGRAM-1] are isalpha()
if (noSpace(ptr)) {
//safely copy our current n-gram over to the ngrams array
//we're iterating over ptr and because we're here we know ptr and the next NGRAM spaces
//are valid letters
for (int i=0; i<NGRAM; i++) {
ngrams[idx][i] = ptr[i];
}
ngrams[idx][NGRAM] = '\0'; //important to zero-terminate
idx++;
}
ptr++;
}
At this point, we have a list of all n-grams. Lets find the most popular one:
FreqNode head = { "HEAD", 0, 0, 0 }; //the start of our list
for (int i=0; i<NGRAMS; i++) {
if (ngrams[i][0] == '\0') break;
//insertFreqNode takes a start node, this where we will start to search for duplicates
//the simplest description is like this:
// 1 we search from head down each child, if we find a node that has text equal to
// ngrams[i] then we update it's frequency count
// 2 if the freq is >= to the current winner we place this as head.next
// 3 after program is complete, our most popular nodes will be the first nodes
// I have not implemented sorting of these - it's an exercise for the reader ;)
insertFreqNode(&head, ngrams[i]);
}
//as the list is ordered, head.next will always be the most popular n-gram
cout << "Winner is: " << head.next->str << " " << " with " << head.next->freq << " occurrences" << endl
Good luck to you!
Just for fun, I wrote a SQL version (SQL Server 2012):
if object_id('dbo.MaxNgram','IF') is not null
drop function dbo.MaxNgram;
go
create function dbo.MaxNgram(
#text varchar(max)
,#length int
) returns table with schemabinding as
return
with
Delimiter(c) as ( select ' '),
E1(N) as (
select 1 from (values
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
)T(N)
),
E2(N) as (
select 1 from E1 a cross join E1 b
),
E6(N) as (
select 1 from E2 a cross join E2 b cross join E2 c
),
tally(N) as (
select top(isnull(datalength(#text),0))
ROW_NUMBER() over (order by (select NULL))
from E6
),
cteStart(N1) as (
select 1 union all
select t.N+1 from tally t cross join delimiter
where substring(#text,t.N,1) = delimiter.c
),
cteLen(N1,L1) as (
select s.N1,
isnull(nullif(charindex(delimiter.c,#text,s.N1),0) - s.N1,8000)
from cteStart s
cross join delimiter
),
cteWords as (
select ItemNumber = row_number() over (order by l.N1),
Item = substring(#text, l.N1, l.L1)
from cteLen l
),
mask(N) as (
select top(#length) row_Number() over (order by (select NULL))
from E6
),
topItem as (
select top 1
substring(Item,m.N,#length) as Ngram
,count(*) as Length
from cteWords w
cross join mask m
where m.N <= datalength(w.Item) + 1 - #length
and #length <= datalength(w.Item)
group by
substring(Item,m.N,#length)
order by 2 desc, 1
)
select d.s
from (
select top 1 NGram,Length
from topItem
) t
cross apply (values (cast(NGram as varchar)),(cast(Length as varchar))) d(s)
;
go
which when invoked with the sample input provided by OP
set nocount on;
select s as [ ] from MaxNgram(
'aaaab a0a baaab c aab'
,3
);
go
yields as desired
------------------------------
aaa
3
If you're not bound to C, I've written this Python script in about 10 minutes which processes 1.5Mb file, containing more than 265 000 words looking for 3-grams in 0.4s (apart from printing the values on the screen)
The text used for the test is Ulysses of James Joyce, you can find it free here https://www.gutenberg.org/ebooks/4300
Words separators here are both space and carriage return \n
import sys
text = open(sys.argv[1], 'r').read()
ngram_len = int(sys.argv[2])
text = text.replace('\n', ' ')
words = [word.lower() for word in text.split(' ')]
ngrams = {}
for word in words:
word_len = len(word)
if word_len < ngram_len:
continue
for i in range(0, (word_len - ngram_len) + 1):
ngram = word[i:i+ngram_len]
if ngram in ngrams:
ngrams[ngram] += 1
else:
ngrams[ngram] = 1
ngrams_by_freq = {}
for key, val in ngrams.items():
if val not in ngrams_by_freq:
ngrams_by_freq[val] = [key]
else:
ngrams_by_freq[val].append(key)
ngrams_by_freq = sorted(ngrams_by_freq.items())
for key in ngrams_by_freq:
print('{} with frequency of {}'.format(key[1:], key[0]))
You can convert trigram into RADIX50 code.
See http://en.wikipedia.org/wiki/DEC_Radix-50
In radix50, output value for trigram fits into 16-bit unsigned int value.
Thereafter, you can use radix-encoded trigram as index in the array.
So, your code would be like:
uint16_t counters[1 << 16]; // 64K counters
bzero(counters, sizeof(counters));
for(const char *p = txt; p[2] != 0; p++)
counters[radix50(p)]++;
Thereafter, just search for max value in the array, and decode index into trigram back.
I used this trick, when implemented Wilbur-Khovayko algorithm for fuzzy search ~10 years ago.
You can download source here: http://itman.narod.ru/source/jwilbur1.tar.gz.
You can solve this problem in O(nk) time where n is the number of words and k is the average number of n-grams per word.
You're correct in thinking that a hash table is a good solution to the problem.
However, since you have limited time to code a solution, I'd suggest using open addressing instead of a linked list. The implementation may be simpler: if you reach a collision you just walk farther along the list.
Also, be sure to allocate enough memory to your hash table: something about twice the size of the expected number of n-grams should be fine. Since the expected number of n-grams is <=250,000 a hash table of 500,000 should be more than sufficient.
In terms of coding speed, the small input length (250,000) makes sorting and counting a feasible option. The quickest way is probably to generate an array of pointers to each n-gram, sort the array using an appropriate comparator, and then walk along it keeping track of the which n-gram appeared the most.
One simple python solution for this question
your_str = "aaaab a0a baaab c"
str_list = your_str.split(" ")
str_hash = {}
ngram_len = 3
for str in str_list:
start = 0
end = ngram_len
len_word = len(str)
for i in range(0,len_word):
if end <= len_word :
if str_hash.get(str[start:end]):
str_hash[str[start:end]] = str_hash.get(str[start:end]) + 1
else:
str_hash[str[start:end]] = 1
start = start +1
end = end +1
else:
break
keys_sorted =sorted(str_hash.items())
for ngram in sorted(keys_sorted,key= lambda x : x[1],reverse = True):
print "\"%s\" with a frequency of %s" % (ngram[0],ngram[1])

Resources