What are the reasons for this benchmark result? - loops

Two functions that convert a rgb image to a gray scale image:
function rgb2gray_loop{T<:FloatingPoint}(A::Array{T,3})
r,c = size(A)
gray = similar(A,r,c)
for i = 1:r
for j = 1:c
#inbounds gray[i,j] = 0.299*A[i,j,1] + 0.587*A[i,j,2] + 0.114 *A[i,j,3]
end
end
return gray
end
And:
function rgb2gray_vec{T<:FloatingPoint}(A::Array{T,3})
gray = similar(A,size(A)[1:2]...)
gray = 0.299*A[:,:,1] + 0.587*A[:,:,2] + 0.114 *A[:,:,3]
return gray
end
The first one is using loops, while the second one uses vectorization.
When benchmarking them (with the Benchmark package) I get the following results for different sized input images (f1 is the loop version, f2 the vectorized version):
A = rand(50,50,3):
| Row | Function | Average | Relative | Replications |
|-----|----------|-------------|----------|--------------|
| 1 | "f1" | 3.23746e-5 | 1.0 | 1000 |
| 2 | "f2" | 0.000160214 | 4.94875 | 1000 |
A = rand(500,500,3):
| Row | Function | Average | Relative | Replications |
|-----|----------|------------|----------|--------------|
| 1 | "f1" | 0.00783007 | 1.0 | 100 |
| 2 | "f2" | 0.0153099 | 1.95527 | 100 |
A = rand(5000,5000,3):
| Row | Function | Average | Relative | Replications |
|-----|----------|----------|----------|--------------|
| 1 | "f1" | 1.60534 | 2.56553 | 10 |
| 2 | "f2" | 0.625734 | 1.0 | 10 |
I expected one function to be faster than the other (maybe f1 because of the inbounds macro).
But I can't explain, why the vectorized version gets faster for larger images.
Why is that?

The answer for the results is that multidimensional arrays in Julia are stored in column-major order. See Julias Memory Order.
Fixed looped version, regarding column-major-order (inner and outer loop variables swapped):
function rgb2gray_loop{T<:FloatingPoint}(A::Array{T,3})
r,c = size(A)
gray = similar(A,r,c)
for j = 1:c
for i = 1:r
#inbounds gray[i,j] = 0.299*A[i,j,1] + 0.587*A[i,j,2] + 0.114 *A[i,j,3]
end
end
return gray
end
New results for A = rand(5000,5000,3):
| Row | Function | Average | Relative | Replications |
|-----|----------|----------|----------|--------------|
| 1 | "f1" | 0.107275 | 1.0 | 10 |
| 2 | "f2" | 0.646872 | 6.03004 | 10 |
And the results for smaller Arrays:
A = rand(500,500,3):
| Row | Function | Average | Relative | Replications |
|-----|----------|------------|----------|--------------|
| 1 | "f1" | 0.00236405 | 1.0 | 100 |
| 2 | "f2" | 0.0207249 | 8.76671 | 100 |
A = rand(50,50,3):
| Row | Function | Average | Relative | Replications |
|-----|----------|-------------|----------|--------------|
| 1 | "f1" | 4.29321e-5 | 1.0 | 1000 |
| 2 | "f2" | 0.000224518 | 5.22961 | 1000 |

Just speculation because I don't know Julia-Lang:
I think the statement gray = ... in the vectorized form creates a new Array where all the calculated values are stored, while the old array is scrapped. In f1 the values are overwritten in place, so no new memory allocation is needed. Memory allocation is quite expensive so the loop-version with in-place overwrites is faster for low numbers.
But memory allocation is usually a static overhead (allocation twice as much doesn't take twice as long) and the vectorized version is computing faster (maybe in parallel ?) so if the numbers get big enough the faster calculation makes more difference than the memory allocation.

I cannot reproduce your results.
See this IJulia notebook: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/24c17478ae0f5562c449/raw/8d5d32c13209a6443c6d72b31e2459d70607d21b/rgb2gray.ipynb
The numbers I get are:
In [5]:
#time rgb2gray_loop(rand(50,50,3));
#time rgb2gray_vec(rand(50,50,3));
elapsed time: 7.591e-5 seconds (80344 bytes allocated)
elapsed time: 0.000108785 seconds (241192 bytes allocated)
In [6]:
#time rgb2gray_loop(rand(500,500,3));
#time rgb2gray_vec(rand(500,500,3));
elapsed time: 0.021647914 seconds (8000344 bytes allocated)
elapsed time: 0.012364489 seconds (24001192 bytes allocated)
In [7]:
#time rgb2gray_loop(rand(5000,5000,3));
#time rgb2gray_vec(rand(5000,5000,3));
elapsed time: 0.902367223 seconds (800000440 bytes allocated)
elapsed time: 1.237281103 seconds (2400001592 bytes allocated, 7.61% gc time)
As expected, the looped version is faster for large inputs. Also note how the vectorized version allocated three times as much memory.
I also want to point out that the statement gray = similar(A,size(A)[1:2]...) is redundant and can be omitted.
Without this unnecessary allocation, the results for the largest problem are:
#time rgb2gray_loop(rand(5000,5000,3));
#time rgb2gray_vec(rand(5000,5000,3));
elapsed time: 0.953746863 seconds (800000488 bytes allocated, 3.06% gc time)
elapsed time: 1.203013639 seconds (2200001200 bytes allocated, 7.28% gc time)
So the memory usage went down, but the speed did not noticeably improve.

Related

dynamic excel array based on input

I am looking for a little help in making a formula based dynamic array in excel.
KPI | Tgt | number | Weight
FCR | 0% | 1 | 45%
FCR | 60% | 2 | 45%
FCR | 80% | 3 | 45%
Leads | 45% | 4 | 25%
Leads | 50% | 5 | 25%
Leads | 200% | 6 | 25%
Attrition | 8% | 7 | 10%
Attrition | 12% | 8 | 10%
Attrition | 100% | 9 | 10%
Abandon | 1% | 10 | 20%
Abandon | 5% | 11 | 20%
Abandon | 200% | 12 | 20%
So if i have a Leads score in cell E2 as 3%, then i want output in F2 as Number 4 which is <45% hence 4.
PS: I have a spreadsheet but don't know how to attach it.
Try this in F2:
=INDEX(C:C,AGGREGATE(15,6,ROW(B:B)/((B:B>=E2)*(A:A="Leads")),1))
This will only return matches for rows with "Leads" in column A. This can be made into another reference if you have or want to put that into another cell instead.
EDIT:
Based on you're comment below, this formula works as well if entered as an array (CTRL + SHIFT + ENTER):
=INDEX(C:C,MATCH(1,(A:A="Leads")*(B:B>=E2),0))
EDIT 2:
We can cover our bases for an unsorted list in column B by combining the two solution:
{=INDEX(C:C,MATCH(1,(A:A="Leads")*(B:B=AGGREGATE(15,6,B:B/((B:B>=E2)*(A:A="Leads")),1)),0))}

How to make a loop or "sum" formula in Microsoft Excel?

Say that I gain +5 coins from every room I complete. What I'm trying to do is to make a formula in Excel that gets the total coins I've gotten from the first room to the 100th room.
With C++, I guess it would be something like:
while (lastRoom > 0)
{
totalCoins = lastRoom*5;
lastRoom--;
}
totalCoins, being an array so that you can just output the sum of the array.
So if ever, how do you put this code in excel and get it to work? Or is there any other way to get the total coins?
The are infinite solutions.
One is to build a table like this:
+---+----------+---------------+
| | A | B |
+---+----------+---------------+
| 1 | UserID | RoomCompleted |
| 2 | User 001 | Room 1 |
| 3 | User 002 | Room 1 |
| 4 | User 002 | Room 2 |
| 5 | User 002 | Room 3 |
+---+----------+---------------+
them pivot the spreadsheet to get the following:
+---+----------+-----------------------+
| | A | B |
+---+----------+-----------------------+
| 1 | User | Total Rooms completed |
| 2 | User 001 | 1 |
| 3 | User 002 | 3 |
+---+----------+-----------------------+
where you have the number of completed rooms for each users. You can now multiplicate the number per 5 as a simple formula or (better) as a calculated filed of the pivot.
If I understand you correctly you shouldn't need any special code, just a formula:
=(C2-A2+1)*B2
Where C2 = Nth room, A2 = Previous Room, and B2 = coin reward. You can change A2, B2, or C2 and the formula in D2 will output the result.
You can use the formula for sum of integers less than n: (n - 1)*(n / 2), then multiply it by coin count so you will get something like: 5 * (n - 1)*(n / 2). Then you just hook it up to your table.
Hope it helps

Why does Neo4j hit every indexed record when only returning a count?

I am using version 3.0.3, and running my queries in the shell.
I have ~58 million record nodes with 4 properties each, specifically an ID string, a epoch time integer, and lat/lon floats.
When I run a query like profile MATCH (r:record) RETURN count(r); I get a very quick response:
+----------+
| count(r) |
+----------+
| 58430739 |
+----------+
1 row
29 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------------+----------------+------+---------+-----------+--------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------------+----------------+------+---------+-----------+--------------------------------+
| +ProduceResults | 7644 | 1 | 0 | count(r) | count(r) |
| | +----------------+------+---------+-----------+--------------------------------+
| +NodeCountFromCountStore | 7644 | 1 | 0 | count(r) | count( (:record) ) AS count(r) |
+--------------------------+----------------+------+---------+-----------+--------------------------------+
Total database accesses: 0
The Total database accesses: 0 and NodeCountFromCountStore tells me that neo4j uses a counting mechanism here that avoids iterating over all the nodes.
However, when I run profile MATCH (r:record) WHERE r.time < 10000000000 RETURN count(r);, I get a very slow response:
+----------+
| count(r) |
+----------+
| 58430739 |
+----------+
1 row
151278 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+-----------------------+----------------+----------+----------+-----------+------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+----------+----------+-----------+------------------------------+
| +ProduceResults | 1324 | 1 | 0 | count(r) | count(r) |
| | +----------------+----------+----------+-----------+------------------------------+
| +EagerAggregation | 1324 | 1 | 0 | count(r) | |
| | +----------------+----------+----------+-----------+------------------------------+
| +NodeIndexSeekByRange | 1752922 | 58430739 | 58430740 | r | :record(time) < { AUTOINT0} |
+-----------------------+----------------+----------+----------+-----------+------------------------------+
Total database accesses: 58430740
The count is correct, as I chose a time value larger than all of my records. What surprises me here is that Neo4j is accessing EVERY single record. The profiler states that Neo4j is using the NodeIndexSeekByRange as an alternative method here.
My question is, why does Neo4j access EVERY record when all it is returning is a count? Are there no intelligent mechanisms inside the system to count a range of values after seeking the boundary/threshold value within the index?
I use Apache Solr for the same data, and returning a count after searching an index is extremely fast (about 5 seconds). If I recall correctly, both platforms are built on top of Apache Lucene. While I don't know much about that software internally, I would assume that the index support is fairly similar for both Neo4j and Solr.
I am working on a proxy service that will deliver results in a paginated form (using the SKIP n LIMIT m technique) by first getting a count, and then iterating over results in chunks. This works really well for Solr, but I am afraid that Neo4j may not perform well in this scenario.
Any thoughts?
The later query does a NodeIndexSeekByRange operation. This is going through all your matched nodes with the record label to look up the value of the node property time and does a comparison if its value is less than 10000000000.
This query actually has to get every node and read some info for comparison, and that's the reason why it is much slower.

Ordo - Calculating c & n₀

I have this assignment where I have some problems. I have a hard time knowing what to do with my data I've collected.
My assignment is to calculate the constant c in ordo and as well n₀.
We have an unknown code that we execute via the terminal. We can choose how many elements we want to process. The more elements, the longer time it takes for the program to run.
At the end of the program we get a number on how long the program took to complete.
Here is the collected data:
Input | time (s)
--------+----------
1000 | 0.0015
1000 | 0.0016
1000 | 0.0015
2000 | 0.0063
2000 | 0.0063
3000 | 0.0063
4000 | 0.0281
4500 | 0.0344
5000 | 0.0453
6000 | 0.0672
7000 | 0.0953
8000 | 0.1265
9000 | 0.1656
10000 | 0.2078
11000 | 0.2547
12000 | 0.3062
15000 | 0.4875
20000 | 0.8953
25000 | 1.4125
30000 | 2.0390
35000 | 2.8750
40000 | 3.6641
50000 | 5.7641
50000 | 5.7438
70000 | 11.4781
75000 | 13.7312
80000 | 15.0828
85000 | 17.1156
90000 | 19.8610
100000 | 23.2328
110000 | 28.8032
130000 | 40.6344
The thing is: How do I move on from here? I have my guess looking at the chart is telling me that the complexity is O(n²).
Is there any tips for me how to take the next step and calculate c & n₀?
In general its not a good idea to "compute" the complexity of a function by measure the time it takes to run on different input.
E.g. Lets say the function uses big integers and large strings 50% each. Now you have a special compiler extension that speeds up the big integer arithmetics massivly. Now it is possible to miss how the runtime scales with the integer input.
If you have to get an idea of the complexity without the source code of the function, you could use the run time t as input to your "complexity function" f(t). To show that the function is in O(n²) you only have to give a g(t), c and n₀, such that f(t) ≤ c⋅g(t) holds for all t ≥ n₀, It does not have to be exact.
In your case it would be ok to choose g(t) = t² + 1, c = 1 and n₀ = 0.
You also could use g(t) = 1/4⋅10⁻⁸⋅t²+1 with c = 1 and n₀ = 0 (red)
or g(t) = 1/4⋅10⁻⁸⋅t² with c = 1 and n₀ = 40000 (blue).
But notice: You cannot do this exact. It is also possible that this result turns out as wrong if you test 10¹⁰ as input. If you want to get the exact complexity you have to take a look at the code.

MySQL Import into Innodb table severely spikes at a certain point

I'm trying to migrate a 30GB database from one server to another.
The short story is that at a certain point through the process, the amount of time it takes to import records severely increases as a spike. The following is from using the SOURCE command to import a chunk of 500k records (out of about ~25-30 million throughout the database) that was exported as an sql file that was ssh tunnelled over to the new server:
...
Query OK, 2871 rows affected (0.73 sec)
Records: 2871 Duplicates: 0 Warnings: 0
Query OK, 2870 rows affected (0.98 sec)
Records: 2870 Duplicates: 0 Warnings: 0
Query OK, 2865 rows affected (0.80 sec)
Records: 2865 Duplicates: 0 Warnings: 0
Query OK, 2871 rows affected (0.87 sec)
Records: 2871 Duplicates: 0 Warnings: 0
Query OK, 2864 rows affected (2.60 sec)
Records: 2864 Duplicates: 0 Warnings: 0
Query OK, 2866 rows affected (7.53 sec)
Records: 2866 Duplicates: 0 Warnings: 0
Query OK, 2879 rows affected (8.70 sec)
Records: 2879 Duplicates: 0 Warnings: 0
Query OK, 2864 rows affected (7.53 sec)
Records: 2864 Duplicates: 0 Warnings: 0
Query OK, 2873 rows affected (10.06 sec)
Records: 2873 Duplicates: 0 Warnings: 0
...
The spikes eventually average to 16-18 seconds per ~2800 rows affected. Granted I don't usually use Source for a large import, but for the sakes of showing legitimate output, I used it to understand when the spikes happen. Using mysql command or mysqlimport yields the same results. Even piping the results directly into the new database instead of through an sql file has these spikes.
As far as I can tell, this happens after a certain amount of records are inserted into a table. The first time I boot up a server and import a chunk that size, it goes through just fine. Give or take the estimated amount it handles until these spikes occur. I can't correlate that because I haven't consistently replicated the issue to evidently conclude that. There are ~20 tables that have sub 500,000 records that all imported just fine when those 20 tables were imported through a single command. This seems to only happen to tables that have an excessive amount of data. Granted, the solutions I've come cross so far seem to only address the natural DR that occurs when you import over time (The expected output in my case was that eventually at the end of importing 500k records, it would take 2-3 seconds per ~2800, whereas it seems the questions were addressing that at the end it shouldn't take that long). This comes from a single sugarCRM table called 'campaign_log', which has ~9 million records. I was able to import in chunks of 500k back onto the old server i'm migrating off of without these spikes occuring, so I assume this has to do with my new server configuration. Another thing is that whenever these spikes occur, the table that it is being imported into seems to have an awkward way of displaying the # of records via count. I know InnoDB gives count estimates, but the number doesn't succeed the ~, indicating the estimate. It usually is accurate and that each time you refresh the table, it doesn't change the amount it displays (This is based on what it reports through PHPMyAdmin)
Here's the following commands/InnoDB system variables I have on the new server:
INNODB System Vars:
+---------------------------------+------------------------+
| Variable_name | Value |
+---------------------------------+------------------------+
| have_innodb | YES |
| ignore_builtin_innodb | OFF |
| innodb_adaptive_flushing | ON |
| innodb_adaptive_hash_index | ON |
| innodb_additional_mem_pool_size | 8388608 |
| innodb_autoextend_increment | 8 |
| innodb_autoinc_lock_mode | 1 |
| innodb_buffer_pool_instances | 1 |
| innodb_buffer_pool_size | 8589934592 |
| innodb_change_buffering | all |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_format | Antelope |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | OFF |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_flush_method | fsync |
| innodb_force_load_corrupted | OFF |
| innodb_force_recovery | 0 |
| innodb_io_capacity | 200 |
| innodb_large_prefix | OFF |
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 5242880 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 75 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_old_blocks_pct | 37 |
| innodb_old_blocks_time | 0 |
| innodb_open_files | 300 |
| innodb_print_all_deadlocks | OFF |
| innodb_purge_batch_size | 20 |
| innodb_purge_threads | 1 |
| innodb_random_read_ahead | OFF |
| innodb_read_ahead_threshold | 56 |
| innodb_read_io_threads | 8 |
| innodb_replication_delay | 0 |
| innodb_rollback_on_timeout | OFF |
| innodb_rollback_segments | 128 |
| innodb_spin_wait_delay | 6 |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | ON |
| innodb_stats_sample_pages | 8 |
| innodb_strict_mode | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 30 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
| innodb_use_native_aio | ON |
| innodb_use_sys_malloc | ON |
| innodb_version | 5.5.39 |
| innodb_write_io_threads | 8 |
+---------------------------------+------------------------+
System Specs:
Intel Xeon E5-2680 v2 (Ivy Bridge) 8 Processors
15GB Ram
2x80 SSDs
CMD to Export:
mysqldump -u <olduser> <oldpw>, <olddb> <table> --verbose --disable-keys --opt | ssh -i <privatekey> <newserver> "cat > <nameoffile>"
Thank you for any assistance. Let me know if there's any other information I can provide.
I figured it out. I increased the innodb_log_file_size from 5MB to 1024MB. While it did significantly increase the amount of records I imported (Never went above 1 second per 3000 rows), it also fixed the spikes. There were only 2 in all the records I imported, but after they happened, they immediately went back to taking sub 1 second.

Resources